Public cohort data → Full methodology → Audit log →
Legal: Terms of Service · Privacy Policy · Data Processing Agreement (request a signed copy at legal@ordica.ai)
Conversation history: 35.5% median. Long instructions: 28% median. RAG retrieval: 42.8–55.3% (median 49.8%). Mixed production traffic: 7–10%. Blind-tested across 200 prompts × 4 providers — ranges published by workload type, not averaged into a single marketing number.
33% median
Multi-turn chat histories carry the most redundancy. When we see a conversation that has grown past 2K tokens, the compression gain is the highest of any workload class we measure.
28% median
Long system prompts, detailed role definitions, and few-shot exemplars compress well. Typical result for production AI applications with heavy prompt scaffolding.
7–10%
When your prompts are already dense — short JSON payloads, minimal instructions, tight function-calling — savings are modest. We publish this. When compression won't help your prompt, we pass it through at 0% — you're never charged a fee on a pass-through.
32–38% depending on provider (median ~35%)
Blind-judged equivalence 4.35 / 5.0 across 175 validations
Retrieved document blocks show the highest savings of any workload class we measure. Savings validated on 50 prompts × 4 providers; quality validated on 175 prompts (OpenAI 50, Claude 50, Grok 50, Gemini 25 — see methodology for Gemini sample note).
Token savings measured across 200 unique prompts (one deterministic savings number per prompt) plus 775 blind quality judgments across all four cohorts — 50 prompts each for OpenAI, Claude, and Grok, with Gemini at 25 pending an API parser fix. Response quality was separately blind-judged on a 1–5 scale by a separate AI; per-provider means are 4.38 on OpenAI, 3.96 on Claude, 3.78 on Gemini, 3.76 on Grok. Three independent judges (GPT-4o, Gemini, and Grok-3) separately cross-validated the same corpus — aggregate corrected mean: 3.88 / 5.0 with 74% exact agreement between the two most consistent judges. RAG-cohort means: OpenAI 4.22, Claude 4.22, Gemini 4.40, Grok 4.60 — cross-provider 4.35 / 5.0 on 175 RAG validations. Full methodology →
Set one environment variable
OPENAI_BASE_URL or ANTHROPIC_BASE_URL
= https://api.ordica.ai
Keep your existing provider API key
We forward it to your chosen provider
and don't retain it after the request
Compressed when safe, passed through when not.
Quality tracked against blind-judged equivalence scores. Fewer tokens when we can; your original prompt when we can't.
xAI users: the OpenAI SDK already works — same env var, your Grok API key. Gemini users: pass http_options={"base_url":"https://api.ordica.ai/gemini"} to the google-genai client, or contact us for deeper integration support.
Your prompts transit the Ordica proxy in-memory, get compressed, and are forwarded to the AI provider you chose. We do not store them, log them, read them, or train on them. The only data retained is anonymous counts: tokens sent, tokens saved, and which provider you used.
Billing runs on those counts alone. Prompt content is not persisted beyond the compression step — our audit logs contain metadata only: token counts, timestamps, billing meters. Retention and sub-processor details are enumerated in our Data Processing Agreement.
Optimized for each provider. Switch between them freely.
Every message is a real-world test. A separate judge model scores compressed vs. original responses without knowing which is which.
✓ Cross-validated by three independent judges (GPT-4o, Gemini, and Grok-3) across all 200 prompts. Inter-judge agreement: 74% exact match, 99% within one point. Cross-judge corrected mean: 3.88 / 5.0. Full results and per-provider breakdown at methodology: cross-judge validation.
Blind-tested across 200 prompts × 4 providers: a separate AI judged compressed vs. original responses without knowing which was which. Equivalence is scored on a 1–5 scale, where 5 means indistinguishable and 4 means "substantively equivalent — a customer would accept either response." Mean equivalence above 3.5 on this scale indicates the compressed response reliably passes blind substitution. Scores cluster in the 3.5–4.6 range across providers because compression is probabilistic, not lossless — we publish the real distribution rather than round up. All four cohorts (instruction, history, mixed, RAG) are now quality-judged; the RAG cohort shows a cross-provider mean of 4.35 / 5.0 across 175 validations (Gemini ran at n=25 due to a parser fix now resolved). Savings are consistent and repeatable for the same prompt under equivalent configuration. Grok token counts in our benchmark runs used a documented approximation — see methodology for per-model details.
All tests run on current production models. No cherry-picking. Full methodology and the per-prompt judge output: analyze/methodology.html.
Savings are consistent and repeatable for the same prompt under equivalent configuration. Grok token counts in our benchmark runs used a documented approximation — see methodology for per-model details. Figures below use the four-cohort blended median (30%). The calculator above defaults to the single-cohort Conservative figure (10% p25) for a lower-bound estimate — toggle the Blended view to apply this 30% median to your spend. Actual savings depend on workload mix — RAG-heavy prompts save more, mixed-structure prompts save less.
| Monthly API spend | Conservative (p25, 10%) | Median (30%) | Optimistic (p75, 40%) |
|---|---|---|---|
| $1,000 /mo | $1,200 | $3,600 | $4,800 |
| $5,000 /mo | $6,000 | $18,000 | $24,000 |
| $25,000 /mo | $30,000 | $90,000 | $120,000 |
| $100,000 /mo | $120,000 | $360,000 | $480,000 |
| Savings rate | 10% (Conservative p25) | 30% (median) | 40% (blended p75) |
Results across 200 unique prompts (one deterministic savings number per prompt, 4 providers verifying each) with 775 blind-judged quality validations across all four cohorts. Gemini RAG validations ran at n=25 instead of 50 because of a parser fix now resolved — the other three providers ran the full 50 RAG validations.
Conversation history: 33% median. Instruction-heavy workloads: 28% median. RAG retrieval context: 32–38% depending on provider (median ~35%) — blind equivalence 4.35 / 5.0 on 175 validations. Dense structured traffic: 7–10%. Savings depend on your workload mix and are consistent and repeatable for the same prompt under equivalent configuration. For Grok, our benchmark runs used a documented approximation — see methodology for per-model details. Quality preservation varies by provider.
The 7–38% range above reflects mixed production traffic across all workload types. Document-processing prompts — where a single prompt carries a full document — compress further: 44–81% depending on document type, versus 7–38% for mixed workloads. These figures come from a separate domain-specific benchmark: 200 samples per domain, public-domain sources, reproducible at seed=42.
81% mean token reduction
Measured on public SEC EDGAR filings — annual reports, 10-K exhibits, and quarterly disclosures. p25: 76% · p50: 84% · p75: 88%. Financial filings carry dense repetitive structure — boilerplate disclosures, repeated header blocks, and standardized exhibit language — which is why reduction rates here run significantly higher than general production traffic.
44% mean token reduction
Measured on Federal Register rules and proposed rulemakings. p25: 45% · p50: 46% · p75: 48% · p95: 51%. The p25–p95 central distribution spans only 6 points — most documents compress in a tight band. The arithmetic mean (44%) falls slightly below p25 because a small number of long regulatory documents with lower compressibility pull the average down; the central distribution is the better planning estimate for most workloads.
Reproducible benchmark · seed=42 · 200 samples per domain · public-domain sources (SEC EDGAR, Federal Register). These numbers come from document-processing workloads. General LLM traffic will produce different results. We don't know yours yet. Run the trial. Domain coverage is expanding — financial and regulatory are the first two measured; additional document types are in progress. Methodology →
Your billing dashboard shows how many tokens you sent, how many tokens we compressed them to, and what that saved you in dollars. Your bill is 30% of that dollar savings on Pro, 20% on Enterprise, zero on Free. No compression, no fee.
No. Your messages pass through the proxy to reach the AI provider. Our audit and billing logs contain metadata only — token counts, timestamps, provider, and billing meters. Prompt and response content is not persisted beyond the compression step. Retention windows, sub-processors, and data-subject rights are enumerated in our Data Processing Agreement.
Sometimes. We ran 200 prompts across 4 providers — 800 blind evaluations total. Compressed outputs scored 3.88/5.0 on average. We recommend running your own evaluation before committing. Benchmark protocol →
The compression adds a few milliseconds. The provider's response time dominates. Shorter prompts can produce faster provider responses — fewer tokens to process.
Ordica's compression is deterministic on our tested corpus — same input, byte-identical output. Cache keys stay stable. In our test harness, a compressed prompt produced a full read-hit on the second call (Anthropic cache_read_input_tokens equal to cache_creation_input_tokens). Validate your own prompt shapes. cache_control markers are left untouched. Provider-side cache hits are unaffected.
If compression confidence is insufficient, the prompt passes through unchanged. You lose the savings on that request. You do not get a degraded response. Worst case is zero savings, not worse quality.
Billing runs on token counts. For every request: tokens you sent, tokens we forwarded. The difference is your savings. Your bill is a fixed percentage of that. Stripe receipts show the metered total. Your account shows per-request counts. Skipped compression contributes zero. Enterprise: full audit export on request.
GPT-4o and Claude handle general workloads. Gemini has the largest native context window. Grok is fast on short prompts. Run the analyzer. It will tell you which one scores best on your specific prompt type.
Yes. The Free tier gives you the drop-in proxy endpoint — point your existing OpenAI or Anthropic SDK at Ordica with one environment variable — 10,000 requests per month, and a savings dashboard. No credit card required. You get the same compression engine as paid tiers.
We charge a percentage of your measured savings — 30% on Pro, 20% on Enterprise. If compression saves you $100, you keep $70 (Pro) or $80 (Enterprise). If it saves you nothing, you pay nothing. Your dashboard shows every dollar in real time. No flat fees, no minimums.
Your provider API key transits the Ordica proxy in-memory. We use it to authenticate with the provider you chose — OpenAI, Anthropic, xAI, or Google — on your behalf for that single request, then discard it when the request completes. We do not retain it, log it, store it in a database, or share it. The key is used only to reach the provider endpoint you designated; it never leaves that path.
While your request is in flight, the key is held in process memory on our proxy layer. Ordica is not a secrets vault or KMS. You hold the key in your code or secret manager, and you can rotate it through your provider's dashboard to revoke access.
Routing your own API key through a middleware layer is a standard pattern — every major provider (OpenAI, Anthropic, xAI, Google) builds an API ecosystem around SDK wrappers, gateways, observability proxies, and caching layers. Ordica fits within that pattern: you remain the authorized account holder, your requests go only to providers you designate, and we do not resell, pool, rent, or share API access across customers. We do not retain prompt or response content after each request completes.
Ordica fits within what each provider permits. What providers prohibit — multi-tenant key pooling, API-key resale, unauthorized account sharing — is not what we do. Provider Terms change. Review yours. For enterprise documentation — DPAs, subprocessor list, data-flow diagrams — contact us.
We route to any OpenAI-compatible endpoint. If one provider changes pricing or access, traffic moves to another. Your integration does not change. We do not depend on the continued goodwill of any single provider.
We do not cache responses. We do not match prompts. Every request goes to the model and comes back fresh. We reduce what you send — we do not decide what counts as equivalent. There are no stored answers to return incorrectly.
Every request produces two token counts: what you sent, and what we forwarded. The difference is your savings. Both are logged. You can audit them.
Free tier is open globally. Pro and Enterprise require a US billing address.