Cut your LLM bill by 7–50%, depending on your workload.

Conversation history: median 35.5%, max 39.7%. Long instructions: median 28.0%, max 32.9%. Mixed production traffic: 7–10%. Blind-tested across 150 prompts × 4 providers (RAG — Retrieval-Augmented Generation — cohort in progress brings total to 200 on completion) — ranges published by workload type, not averaged into a single marketing number.

Your monthly LLM spend
$ / month
Gross savings
$2,175/ month
$26,100 cut from your annual LLM bill.
Pro tier — you keep 70%
$1,523 / mo
$18,270 / year
Enterprise — you keep 80%
$1,740 / mo
$20,880 / year
Default shown is the Conservative cohort (dense structured traffic, 8.7% median) — pick a specific provider or switch to Blended above to see median or higher-savings projections. The projected savings table further below uses the blended four-cohort median (33.8%, RAG-inclusive — RAG quality judging in progress); the calculator default is deliberately lower for a cautious estimate. Based on 150 prompts × 4 providers across GPT-4o, Claude, Grok, and Gemini (RAG cohort in progress brings total to 200 on completion). Savings are consistent and repeatable for the same prompt under equivalent configuration. Grok token counts in our benchmark runs used a documented approximation — see methodology for per-model details. Validated range 7.0% to 49.8% by workload. Under our current pricing model, Ordica only profits when you save. See pricing →
# Before
from openai import OpenAI

# After — one import changes everything
from ordica import OpenAI
VERIFIED SAVINGS BY WORKLOAD
Your savings depend on what you send us.
Here is the honest range.

Conversation history

median 35.5%, max 39.7%

Multi-turn chat histories carry the most redundancy. When we see a conversation that has grown past 2K tokens, the compression gain is the highest of any workload class we measure.

Instruction-heavy prompts

median 28.0%, max 32.9%

Long system prompts, detailed role definitions, and few-shot exemplars compress well. Typical result for production AI applications with heavy prompt scaffolding.

Mixed structured traffic

7–10%

When your prompts are already dense — short JSON payloads, minimal instructions, tight function-calling — savings are modest. We publish this. Your worst case is still a positive number.

RAG retrieval context

median 49.8%, max 55.3%

⚠ Savings only. Blind quality judging for the RAG cohort is in progress — equivalence scores pending.

Retrieved document blocks show the highest savings of any workload class we measure. Savings validated on 50 prompts × 3 non-Gemini providers.

Token savings measured across 150 unique prompts (one deterministic savings number per prompt) plus 600 blind quality judgments across 3 completed cohorts; RAG cohort in progress will bring the totals to 200 / 800 / 1,000. Response quality was separately blind-judged on a 1–5 scale by a separate AI; mean equivalence is 4.38 on OpenAI, 3.96 on Anthropic, 3.78 on Gemini, and 3.76 on Grok. Full methodology →

How it works
Three steps. Sixty seconds.
1

Point

Set one environment variable
OPENAI_BASE_URL or ANTHROPIC_BASE_URL
= https://api.ordica.ai

2

Authenticate

Keep your existing provider API key
We forward it to your chosen provider
and don't retain it after the request

3

Save

Compressed when safe, passed through when not
Quality tracked against blind-judged equivalence scores, fewer tokens when we can

xAI users: the OpenAI SDK already works — same env var, your Grok API key. Gemini users: pass http_options={"base_url":"https://api.ordica.ai/gemini"} to the google-genai client, or contact us for deeper integration support.

Privacy
Prompts pass through.
Nothing sticks.

Your prompts transit the Ordica proxy in-memory, get compressed, and are forwarded to the AI provider you chose. We do not store them, log them, read them, or train on them. The only data retained is anonymous counts: tokens sent, tokens saved, and which provider you used.

Billing runs on those counts alone. Prompt content is not persisted beyond the compression step — our audit logs contain metadata only: token counts, timestamps, billing meters. Retention and sub-processor details are enumerated in our Data Processing Agreement.

Your prompt Ordica proxy compressed Your AI provider
Token counts Ordica telemetry anon
Message content Our storage never retained
Compatible
Works with the AI you use

Optimized for each provider. Switch between them freely.

GPT-4o
Claude
Gemini
Grok
Validated
Measured, not promised

Every message is a real-world test. A separate judge model scores compressed vs. original responses without knowing which is which.

GPT-4o
4.38
/ 5.0 mean equivalence
Instruction cohort4.78
Mixed cohort4.62
Claude
3.96
/ 5.0 mean equivalence
Instruction cohort4.10
Mixed cohort4.10
Gemini
3.78
/ 5.0 mean equivalence
Instruction cohort3.94
Mixed cohort4.34
Grok
3.76
/ 5.0 mean equivalence
Instruction cohort4.06
Mixed cohort3.58

⚠ A single judge model (Claude Sonnet 4.5) scored all four target families. Same-family bias could inflate one family's scores relative to the others. Read the per-provider gaps with that caveat in mind — see methodology: judge bias honesty. Cross-judge validation is on the roadmap.

150
prompts × 4 providers (RAG pending)
4
providers validated
3.97
mean equivalence / 5.0
methodology published

Blind-tested across 150 prompts × 4 providers (RAG cohort in progress brings total to 200): a separate AI judged compressed vs. original responses without knowing which was which. Equivalence is scored on a 1–5 scale, where 5 means indistinguishable and 4 means "substantively equivalent — a customer would accept either response." Mean equivalence above 3.5 on this scale indicates the compressed response reliably passes blind substitution. Scores cluster in the 3.5–4.4 range across providers because compression is probabilistic, not lossless — we publish the real distribution rather than round up. Per-provider means are averaged across the three quality-judged cohorts (instruction, history, mixed); the RAG cohort has validated savings with blind quality judging in progress. Savings are consistent and repeatable for the same prompt under equivalent configuration. Grok token counts in our benchmark runs used a documented approximation — see methodology for per-model details.
All tests run on current production models. No cherry-picking. Full methodology and the per-prompt judge output: analyze/methodology.html.

Estimated annual savings ⚠ Figures include RAG cohort — blind quality judging for that cohort is still in progress. Three-cohort validated median available as the conservative reference.

Savings are consistent and repeatable for the same prompt under equivalent configuration. Grok token counts in our benchmark runs used a documented approximation — see methodology for per-model details. Figures below use the benchmark median (33.8% — RAG-inclusive, quality judging for the RAG cohort in progress) across four cohorts. The three-cohort validated median (excluding RAG) is a more conservative reference, and the calculator above defaults to the single-cohort Conservative figure (8.7%) for a lower-bound estimate — toggle the Blended view to apply this 33.8% median to your spend. Actual savings depend on workload mix — RAG-heavy prompts save more, mixed-structure prompts save less.

RAG cohort quality judging in progress — the median includes RAG savings, but blind equivalence scoring for that cohort is still pending.

Monthly API spend Conservative (p25, 14.6%) Median (33.8%) Optimistic (p75, 47.5%)
$1,000 /mo $1,752 $4,056 $5,700
$5,000 /mo $8,760 $20,280 $28,500
$25,000 /mo $43,800 $101,400 $142,500
$100,000 /mo $175,200 $405,600 $570,000
Savings rate 14.6% (blended p25) 33.8% (median) 47.5% (blended p75)
Validated
Savings by workload type

Results across 150 unique prompts (one deterministic savings number per prompt, 4 providers verifying each) with 600 blind-judged quality validations across 3 completed cohorts. RAG cohort in progress will bring totals to 200 prompts / 800 quality judgments.

Conversation history: 31.3–39.7% (median 35.5%). Instruction-heavy workloads: 21.9–32.9% (median 28.0%). RAG retrieval context: 42.8–55.3% (median 49.8%) — savings validated, blind quality judging in progress. Dense structured traffic: 7.0–10.3% (median 8.7%). Savings depend on your workload mix and are consistent and repeatable for the same prompt under equivalent configuration. For Grok, our benchmark runs used a documented approximation — see methodology for per-model details. Quality preservation varies by provider.

⚠ RAG quality disclosure: The RAG cohort savings figures above are validated on 50 prompts × 3 non-Gemini providers. Blind-judged quality equivalence scoring for the RAG cohort is still in progress. Quality parity with the other three cohorts has not yet been independently confirmed.

Pricing
You only pay when we save you money

Your billing dashboard shows how many tokens you sent, how many tokens we compressed them to, and what that saved you in dollars. Your bill is 30% of that dollar savings on Pro, 20% on Enterprise, zero on Free. No compression, no fee.

Free
$0
Real savings. No credit card. No catch.

  • Drop-in proxy endpoint — point your existing SDK at api.ordica.ai
  • GPT-4o, Claude, Gemini, Grok
  • Conversation-history workloads: 35.5% median (p25 32.0%, p75 39.3%)
  • Up to 10K requests/month
  • Savings dashboard
  • Community support
  • Best-effort availability — see Terms
What you get: Point your existing OpenAI or Anthropic SDK at Ordica with one environment variable. No new package, no code rewrite. Token savings measured by provider tokenizers across 150 unique prompts (one deterministic savings number per prompt, 4 providers verifying each). Response-quality preservation separately blind-judged on a 1–5 scale by a separate AI — see full methodology.
Enterprise
20% of your measured savings
Custom-tuned for your prompts. Maximum savings.

  • Everything in Pro
  • Custom optimization profile
  • On-prem / air-gapped SDK deployment (requires secure deployment attestation — contact sales)
  • Direct access to the founder
  • Custom SLA under enterprise contract
  • Unlimited requests
  • Annual commitment, terms negotiated per contract
  • On-prem billing uses signed usage attestations from your own telemetry
Why 20% not 30%: At enterprise scale, your usage helps us improve optimization for all customers. The lower rate is earned, not negotiated. On-prem SDK deployment is offered under enterprise contract only; secure deployment attestation is required to protect the compression IP on customer-controlled infrastructure. When Ordica cannot observe traffic directly, billing runs off signed usage attestations your team generates from your own telemetry — contract defines the exact format and cadence.
Automatic monthly renewal — please read.
By starting a paid plan, you authorize Ordica LLC to charge your payment method on a recurring monthly basis until you cancel. Your subscription automatically renews at the then-current price at the end of each monthly term. You may cancel at any time from your account dashboard or by emailing billing@ordica.ai; cancellation takes effect at the end of the current billing period and stops further charges. Refund and cancellation terms are set out in our Terms.
Questions
Frequently asked

No. Your messages pass through the proxy to reach the AI provider. Our audit and billing logs contain metadata only — token counts, timestamps, provider, and billing meters. Prompt and response content is not persisted beyond the compression step. Retention windows, sub-processors, and data-subject rights are enumerated in our Data Processing Agreement.

No. Ordica reduces token count without changing what your prompt asks the AI to do. The compressed version produces equivalent responses — validated blind across 150 prompts × 4 providers (RAG cohort in progress) with a separate AI judging equivalence on a 1–5 scale: mean equivalence 4.38 on OpenAI, 3.96 on Anthropic, 3.78 on Gemini, 3.76 on Grok. If the engine cannot compress safely, your prompt goes through untouched — you never get a worse response. Per-provider quality scores are published in the methodology page.

The compression adds a few milliseconds — you won't notice it. Whether you're using GPT-4o, Claude, Gemini, or Grok, the provider's response time is what you feel, and that's unchanged. In some cases, shorter prompts actually get faster responses because the AI has less to process.

Ordica's compression is deterministic on our tested model corpus — the same input produces byte-identical output across repeated runs under equivalent configuration, so provider-side cache keys stay stable. In our verified test harness, a compressed prompt produced a full read-hit on the second call (Anthropic cache_read_input_tokens equal to cache_creation_input_tokens), confirming no interference on that path. Results on other prompt shapes are expected to hold but are measured per workload rather than universally asserted. If you're already using provider-side caching, you keep your cache hits.

For customers whose prompts already include cache_control markers: Ordica leaves those markers untouched. Your existing caching setup works exactly as before.

The system is fail-safe by design. If compression cannot be applied confidently, your prompt passes through untouched — you just don't save tokens on that message. Under normal operation the compressed prompt reaches the model and responses should match the original within the equivalence bounds we publish; if the engine can't compress safely, you get the uncompressed path rather than a degraded one. Worst case is zero savings, not worse quality.

Billing runs on token counts, not estimates. For every request, we record the token count of the prompt you sent and the token count of the prompt we forwarded to the provider — both measured against the provider's tokenizer reference. The difference is your savings; your bill is a fixed percentage of the dollar value of that difference. Your Stripe receipt shows the metered total for the billing period, and the savings analytics in your account show the per-request counts that produced it. Anything we cannot measure, we do not charge for: if compression is skipped on a request (pass-through), that request contributes zero to your bill. Enterprise customers can request a full billing audit export covering any billing period.

GPT-4o is a strong all-rounder — great for general questions, writing, and brainstorming. Claude is known for natural writing and careful, thoughtful responses. Gemini has a large context window and strong reasoning on structured problems. Grok is fast and conversational with less filtering. Try each and see which fits your workload.

Yes. The Free tier gives you the drop-in proxy endpoint — point your existing OpenAI or Anthropic SDK at Ordica with one environment variable — 10,000 requests per month, and a savings dashboard. No credit card required. You get the same compression engine as paid tiers. When you're ready for higher limits and the full optimization profile, Pro and Enterprise are there.

We charge a percentage of your measured savings — 30% on Pro, 20% on Enterprise. If compression saves you $100, you keep $70 (Pro) or $80 (Enterprise). If it saves you nothing, you pay nothing. Your dashboard shows every dollar in real time. No flat fees, no minimums, no surprises.

Your provider API key transits the Ordica proxy in-memory. We use it to authenticate with the provider you chose — OpenAI, Anthropic, xAI, or Google — on your behalf for that single request, then discard it when the request completes. We do not retain it, log it, store it in a database, or share it. The key is used only to reach the provider endpoint you designated; it never leaves that path.

Honest limitation: while your request is in flight, the key is held in process memory on our proxy layer — that's unavoidable for any middleware that forwards authenticated traffic. Ordica is not a secrets vault or KMS and does not claim to be. You hold the key in your code or secret manager the same way you do today, and you can rotate it anytime through your provider's dashboard if you ever want to cut access.

Routing your own API key through a middleware layer is a standard pattern — every major provider (OpenAI, Anthropic, xAI, Google) builds an API ecosystem around SDK wrappers, gateways, observability proxies, and caching layers. Ordica is structured to fit cleanly within that allowed pattern: you remain the authorized account holder, your requests go only to providers you designate, and we do not resell, pool, rent, or share API access across customers. We also do not retain prompt or response content after each request completes.

On that basis Ordica keeps you on the permitted side of each provider's current Terms. What providers actually prohibit (multi-tenant key pooling, API-key resale, unauthorized account sharing) is not what we do. Provider Terms vary by vendor and evolve over time — review your current provider's Terms to confirm compatibility with your specific workload. For enterprise documentation — DPAs, subprocessor list, data-flow diagrams — contact us.

Ready to compress without compromise?

We're onboarding customers in small batches. Request access and we'll get you set up.

Aurora Ordica Support
Email
First name
Question or message