Conversation history: median 35.5%, max 39.7%. Long instructions: median 28.0%, max 32.9%. Mixed production traffic: 7–10%. Blind-tested across 200 prompts × 4 providers — ranges published by workload type, not averaged into a single marketing number.
median 35.5%, max 39.7%
Multi-turn chat histories carry the most redundancy. When we see a conversation that has grown past 2K tokens, the compression gain is the highest of any workload class we measure.
median 28.0%, max 32.9%
Long system prompts, detailed role definitions, and few-shot exemplars compress well. Typical result for production AI applications with heavy prompt scaffolding.
7–10%
When your prompts are already dense — short JSON payloads, minimal instructions, tight function-calling — savings are modest. We publish this. Your worst case is still a positive number.
benchmark in progress
We are finishing a dedicated RAG benchmark now. We will publish numbers when we have them, not before.
Token savings measured across 200 prompts × 4 providers (600 savings measurements across three complete cohorts; the RAG cohort is still in progress). Response quality was separately blind-judged by an independent AI on a 1–5 scale; mean equivalence exceeds 4.0/5.0 on OpenAI and Anthropic and 3.7/5.0 on Gemini and Grok. Full methodology →
pip install ordica
Set your Ordica API key
Change one line of code
from ordica import OpenAI
Every API call is compressed
Same responses, fewer tokens
Ordica compresses prompts before they reach the AI provider. Messages are processed in transit and never stored. Your data is never retained.
The only signal that crosses the wire is anonymous telemetry — token counts and savings percentages. This isn't a policy. It's the architecture.
Optimized for each provider. Switch between them freely.
Every message is a real-world test. An independent AI judges compressed vs. original responses without knowing which is which.
Blind-tested across 200 prompts × 4 providers: an independent AI judged compressed vs. original responses without knowing which was which. Equivalence = how closely the compressed response matches the original, on a 1–5 scale. Per-provider means shown are averaged across the three complete cohorts (instruction, history, mixed); the RAG cohort is still being benchmarked. Savings are deterministic per prompt when measured with each provider's native tokenizer; Grok uses OpenAI's cl100k_base as a documented approximation. Quality preservation varies by provider.
All tests run on current production models. No cherry-picking. Full methodology: analyze/methodology.html.
Savings are deterministic per prompt when measured with each provider's native tokenizer; Grok uses OpenAI's cl100k_base as a documented approximation. Figures below use the benchmark median (28.0%). Actual savings depend on workload mix.
| Monthly API spend | Conservative (p25, 9.6%) | Median (28.0%) | Optimistic (p75, 32.4%) |
|---|---|---|---|
| $1,000 /mo | $1,152 | $3,360 | $3,888 |
| $5,000 /mo | $5,760 | $16,800 | $19,440 |
| $25,000 /mo | $28,800 | $84,000 | $97,200 |
| $100,000 /mo | $115,200 | $336,000 | $388,800 |
| Savings rate | 9.6% (blended p25) | 28.0% (median) | 32.4% (blended p75) |
Results from an independent AI judge across 200 prompts × 4 providers — 600 savings measurements and 800 blind-judged quality validations.
Conversation history: 31.3–39.7% (median 35.5%, p75 39.3%). Instruction-heavy workloads: 21.9–32.9% (median 28.0%, p75 29.8%). Dense structured traffic: 7.0–10.3% (median 8.7%). Savings depend on your workload mix and are deterministic per prompt when measured with each provider's native tokenizer; for Grok, which does not publish a tokenizer, we use OpenAI's cl100k_base as a documented approximation — results are near-identical across providers but not bit-identical. Quality preservation varies by provider.
Data-gap disclosure: the RAG-pruning cohort is still being benchmarked and is not represented in these figures. RAG workloads are expected to compress well but are not yet measured; we will publish the numbers when the benchmark closes.
Your billing dashboard shows how many tokens you sent, how many tokens we compressed them to, and what that saved you in dollars. Your bill is 30% of that dollar savings on Pro, 20% on Enterprise, zero on Free. No compression, no fee.
No. Your messages pass through the proxy to reach the AI provider, but we never store, log, or read them. The only data we keep is anonymous counts — how many tokens were sent, how many were saved, and which provider you used. There is no database column for message content. It doesn't exist in our system.
The engine operates on redundant structure, not on the semantic payload. Static boilerplate, repeated preamble, and syntactically equivalent phrasing are candidates for compression; unique instructions, structured schema fields, and the operative content of the request are preserved. A confidence-gated fail-safe passes the prompt through uncompressed when the engine cannot apply its transforms safely. Validated blind across 200 prompts × 4 providers with an independent AI judge scoring response equivalence on a 1–5 scale: mean equivalence ≥4.0 on OpenAI and Anthropic, ≥3.7 on Gemini and Grok. Per-provider means are published in the methodology page.
The compression adds a few milliseconds — you won't notice it. Whether you're using ChatGPT, Claude, Gemini, or Grok, the provider's response time is what you feel, and that's unchanged. In some cases, shorter prompts actually get faster responses because the AI has less to process.
No. Ordica's compression is deterministic — the same input produces byte-identical output every time — which is the exact property Anthropic's cache_control and OpenAI's automatic prefix cache rely on. We verified end-to-end against Claude Sonnet 4.5: a prompt sent through Ordica's compression pipeline shows cache_creation_input_tokens > 0 on the first call and cache_read_input_tokens > 0 on the second, confirming the cache hit. If you're already using provider-side caching, Ordica's compression does not interfere with it — you keep your cache hits.
OpenAI's automatic prefix cache relies on the same byte-identical property, so we expect the same result — but we haven't run the empirical OpenAI test yet. If you need that result before adopting, ask us and we'll run it. Cost: under a dollar.
For customers whose prompts already include cache_control markers: Ordica is designed to leave those markers untouched, verified in code review. An empirical test of that pass-through path is coming.
The system is fail-safe. If compression can't be applied confidently, your prompt goes through untouched — you just don't save tokens on that message. You'll never get a broken response because of compression. Worst case is zero savings, not worse quality.
ChatGPT (GPT-4o) is a strong all-rounder — great for general questions, writing, and brainstorming. Claude is known for natural writing and careful, thoughtful responses. Gemini has the deepest reasoning and largest context window. Grok is fast and conversational with less filtering. Try all four and see which one clicks for you.
Yes. The Free tier gives you SDK access, 10,000 requests per month, and a savings dashboard — no credit card required. You get the same compression technology as paid tiers. When you're ready for higher limits and advanced optimization, Pro and Enterprise are there.
We charge a percentage of your measured savings — 30% on Pro, 20% on Enterprise. If compression saves you $100, you keep $70 (Pro) or $80 (Enterprise). If it saves you nothing, you pay nothing. Your dashboard shows every dollar in real time. No flat fees, no minimums, no surprises.
We're onboarding customers in small batches. Request access and we'll get you set up.