Cut your LLM bill by 20–35%, depending on your workload.

Conversation history: median 35.5%, max 39.7%. Long instructions: median 28.0%, max 32.9%. Mixed production traffic: 7–10%. Blind-tested across 200 prompts × 4 providers — ranges published by workload type, not averaged into a single marketing number.

Your monthly LLM spend
$ / month
Gross savings
$6,625/ month
$79,500 cut from your annual LLM bill.
Pro tier — you keep 70%
$4,638 / mo
$55,650 / year
Enterprise — you keep 80%
$5,300 / mo
$63,600 / year
Based on 200 prompts × 4 providers across GPT-4o, Claude, Grok, and Gemini. Savings are deterministic per prompt when measured with each provider's native tokenizer; Grok uses OpenAI's cl100k_base as a documented approximation — see methodology. Typical range 7.0% to 39.7% depending on workload. Ordica only profits when you save. See pricing →
# Before
from openai import OpenAI

# After — one import changes everything
from ordica import OpenAI
VERIFIED SAVINGS BY WORKLOAD
Your savings depend on what you send us.
Here is the honest range.

Conversation history

median 35.5%, max 39.7%

Multi-turn chat histories carry the most redundancy. When we see a conversation that has grown past 2K tokens, the compression gain is the highest of any workload class we measure.

Instruction-heavy prompts

median 28.0%, max 32.9%

Long system prompts, detailed role definitions, and few-shot exemplars compress well. Typical result for production AI applications with heavy prompt scaffolding.

Mixed structured traffic

7–10%

When your prompts are already dense — short JSON payloads, minimal instructions, tight function-calling — savings are modest. We publish this. Your worst case is still a positive number.

RAG retrieval context

benchmark in progress

We are finishing a dedicated RAG benchmark now. We will publish numbers when we have them, not before.

Token savings measured across 200 prompts × 4 providers (600 savings measurements across three complete cohorts; the RAG cohort is still in progress). Response quality was separately blind-judged by an independent AI on a 1–5 scale; mean equivalence exceeds 4.0/5.0 on OpenAI and Anthropic and 3.7/5.0 on Gemini and Grok. Full methodology →

How it works
Three steps. Sixty seconds.
1

Install

pip install ordica
Set your Ordica API key

2

Import

Change one line of code
from ordica import OpenAI

3

Save

Every API call is compressed
Same responses, fewer tokens

Privacy
Your data never
leaves your hands

Ordica compresses prompts before they reach the AI provider. Messages are processed in transit and never stored. Your data is never retained.

The only signal that crosses the wire is anonymous telemetry — token counts and savings percentages. This isn't a policy. It's the architecture.

Your prompt Ordica proxy compressed AI provider
Token counts Ordica telemetry anon
Message content Our storage never retained
Compatible
Works with the AI you use

Optimized for each provider. Switch between them freely.

ChatGPT
Claude
Gemini
Grok
Validated
Measured, not promised

Every message is a real-world test. An independent AI judges compressed vs. original responses without knowing which is which.

ChatGPT
4.38
/ 5.0 mean equivalence
Instruction cohort4.78
Mixed cohort4.62
Claude
3.96
/ 5.0 mean equivalence
Instruction cohort4.10
Mixed cohort4.10
Gemini
3.78
/ 5.0 mean equivalence
Instruction cohort3.94
Mixed cohort4.34
Grok
3.76
/ 5.0 mean equivalence
Instruction cohort4.06
Mixed cohort3.58
200
prompts × 4 providers
4
providers validated
3.97
mean equivalence / 5.0
patent pending

Blind-tested across 200 prompts × 4 providers: an independent AI judged compressed vs. original responses without knowing which was which. Equivalence = how closely the compressed response matches the original, on a 1–5 scale. Per-provider means shown are averaged across the three complete cohorts (instruction, history, mixed); the RAG cohort is still being benchmarked. Savings are deterministic per prompt when measured with each provider's native tokenizer; Grok uses OpenAI's cl100k_base as a documented approximation. Quality preservation varies by provider.
All tests run on current production models. No cherry-picking. Full methodology: analyze/methodology.html.

Projected annual savings

Savings are deterministic per prompt when measured with each provider's native tokenizer; Grok uses OpenAI's cl100k_base as a documented approximation. Figures below use the benchmark median (28.0%). Actual savings depend on workload mix.

Monthly API spend Conservative (p25, 9.6%) Median (28.0%) Optimistic (p75, 32.4%)
$1,000 /mo $1,152 $3,360 $3,888
$5,000 /mo $5,760 $16,800 $19,440
$25,000 /mo $28,800 $84,000 $97,200
$100,000 /mo $115,200 $336,000 $388,800
Savings rate 9.6% (blended p25) 28.0% (median) 32.4% (blended p75)
Validated
Savings by workload type

Results from an independent AI judge across 200 prompts × 4 providers — 600 savings measurements and 800 blind-judged quality validations.

Conversation history: 31.3–39.7% (median 35.5%, p75 39.3%). Instruction-heavy workloads: 21.9–32.9% (median 28.0%, p75 29.8%). Dense structured traffic: 7.0–10.3% (median 8.7%). Savings depend on your workload mix and are deterministic per prompt when measured with each provider's native tokenizer; for Grok, which does not publish a tokenizer, we use OpenAI's cl100k_base as a documented approximation — results are near-identical across providers but not bit-identical. Quality preservation varies by provider.

Data-gap disclosure: the RAG-pruning cohort is still being benchmarked and is not represented in these figures. RAG workloads are expected to compress well but are not yet measured; we will publish the numbers when the benchmark closes.

Pricing
You only pay when we save you money

Your billing dashboard shows how many tokens you sent, how many tokens we compressed them to, and what that saved you in dollars. Your bill is 30% of that dollar savings on Pro, 20% on Enterprise, zero on Free. No compression, no fee.

Free
$0
Real savings. No credit card. No catch.

  • SDK access (pip install ordica)
  • GPT-4o, Claude, Gemini, Grok
  • Conversation-history workloads: 35.5% median (p25 32.0%, p75 39.3%)
  • Up to 10K requests/month
  • Savings dashboard
  • Community support
What you get: Drop-in compression. Token savings measured by provider tokenizers across 200 prompts × 4 providers (600 savings measurements across three complete cohorts; RAG cohort in progress). Response-quality preservation separately blind-judged by an independent AI on a 1–5 scale — see full methodology.
Enterprise
20% of your measured savings
Custom-tuned for your prompts. Maximum savings.

  • Everything in Pro
  • Custom optimization profile
  • Air-gapped deployment
  • Direct access to the founder
  • Contracted SLA with service credits
  • Unlimited requests
Why 20% not 30%: At enterprise scale, your usage helps us improve optimization for all customers. The lower rate is earned, not negotiated.
Automatic monthly renewal — please read.
By starting a paid plan, you authorize Ordica LLC to charge your payment method on a recurring monthly basis until you cancel. Your subscription automatically renews at the then-current price at the end of each monthly term. You may cancel at any time from your account dashboard or by emailing billing@ordica.ai; cancellation takes effect at the end of the current billing period and stops further charges. Refund and cancellation terms are set out in our Terms.
Questions
Frequently asked

No. Your messages pass through the proxy to reach the AI provider, but we never store, log, or read them. The only data we keep is anonymous counts — how many tokens were sent, how many were saved, and which provider you used. There is no database column for message content. It doesn't exist in our system.

The engine operates on redundant structure, not on the semantic payload. Static boilerplate, repeated preamble, and syntactically equivalent phrasing are candidates for compression; unique instructions, structured schema fields, and the operative content of the request are preserved. A confidence-gated fail-safe passes the prompt through uncompressed when the engine cannot apply its transforms safely. Validated blind across 200 prompts × 4 providers with an independent AI judge scoring response equivalence on a 1–5 scale: mean equivalence ≥4.0 on OpenAI and Anthropic, ≥3.7 on Gemini and Grok. Per-provider means are published in the methodology page.

The compression adds a few milliseconds — you won't notice it. Whether you're using ChatGPT, Claude, Gemini, or Grok, the provider's response time is what you feel, and that's unchanged. In some cases, shorter prompts actually get faster responses because the AI has less to process.

No. Ordica's compression is deterministic — the same input produces byte-identical output every time — which is the exact property Anthropic's cache_control and OpenAI's automatic prefix cache rely on. We verified end-to-end against Claude Sonnet 4.5: a prompt sent through Ordica's compression pipeline shows cache_creation_input_tokens > 0 on the first call and cache_read_input_tokens > 0 on the second, confirming the cache hit. If you're already using provider-side caching, Ordica's compression does not interfere with it — you keep your cache hits.

OpenAI's automatic prefix cache relies on the same byte-identical property, so we expect the same result — but we haven't run the empirical OpenAI test yet. If you need that result before adopting, ask us and we'll run it. Cost: under a dollar.

For customers whose prompts already include cache_control markers: Ordica is designed to leave those markers untouched, verified in code review. An empirical test of that pass-through path is coming.

The system is fail-safe. If compression can't be applied confidently, your prompt goes through untouched — you just don't save tokens on that message. You'll never get a broken response because of compression. Worst case is zero savings, not worse quality.

ChatGPT (GPT-4o) is a strong all-rounder — great for general questions, writing, and brainstorming. Claude is known for natural writing and careful, thoughtful responses. Gemini has the deepest reasoning and largest context window. Grok is fast and conversational with less filtering. Try all four and see which one clicks for you.

Yes. The Free tier gives you SDK access, 10,000 requests per month, and a savings dashboard — no credit card required. You get the same compression technology as paid tiers. When you're ready for higher limits and advanced optimization, Pro and Enterprise are there.

We charge a percentage of your measured savings — 30% on Pro, 20% on Enterprise. If compression saves you $100, you keep $70 (Pro) or $80 (Enterprise). If it saves you nothing, you pay nothing. Your dashboard shows every dollar in real time. No flat fees, no minimums, no surprises.

Ready to compress without compromise?

We're onboarding customers in small batches. Request access and we'll get you set up.

Aurora Ordica Support
Email
First name
Question or message