7% to 50% depends on what you send. Paste a prompt and find out. Benchmark →
Token counts are from the vendor's tokenizer. Savings estimates come from cohort matching against a blind-judged benchmark — the compression engine never runs on your input here. Your prompt content is not logged. Full audit policy →
Legal: Terms of Service · Privacy Policy · Data Processing Agreement (request a signed copy at legal@ordica.ai)
RAG retrieval: 42.8–55.3% (median 49.8%). Conversation history: 35.5% median. Instruction-heavy prompts: 28% median. Dense structured traffic: 7–10%. Full breakdown →
Set one environment variable
OPENAI_BASE_URL or ANTHROPIC_BASE_URL
= https://api.ordica.ai
Keep your existing provider API key
We forward it to your chosen provider
and don't retain it after the request
Compressed when savings confidence clears the threshold.
Passed through unchanged when it does not. You do not get a worse response. You get no savings on that request.
xAI users: the OpenAI SDK already works — same env var, your Grok API key. Gemini users: pass http_options={"base_url":"https://api.ordica.ai/gemini"} to the google-genai client, or contact us for deeper integration support.
Your prompts move through the proxy in memory, get compressed, and go to the provider you designated. We do not store them, log them, read them, or train on them. The only thing we retain is counts: tokens in, tokens out, which provider you used.
Billing runs on those counts alone. Prompt content is not persisted beyond the compression step — our audit logs contain metadata only: token counts, timestamps, billing meters. Retention and sub-processor details are enumerated in our Data Processing Agreement.
Same proxy endpoint. Same API key. Different cost line on your bill.
Every message is a real-world test. A separate judge model scores compressed vs. original responses without knowing which is which.
Blind-judged 3.88 / 5.0 mean across 775 quality evaluations — 200 prompts × 4 providers, single judge blind to which response was compressed. Per-provider breakdown →
Document-processing workloads compress further: financial filings at 81% mean reduction, regulatory documents at 44%. Domain benchmarks →
Your billing dashboard shows how many tokens you sent, how many tokens we compressed them to, and what that saved you in dollars. Your bill is 30% of that dollar savings on Pro, 20% on Enterprise, zero on Free. No compression, no fee.
Sometimes. We ran 200 prompts across 4 providers — 775 blind quality judgments total. Compressed outputs scored 3.88/5.0 on average. We recommend running your own evaluation before committing. Benchmark protocol →
The compression adds a few milliseconds. The provider's response time dominates. Shorter prompts can produce faster provider responses — fewer tokens to process.
Ordica's compression is deterministic on our tested corpus — same input, byte-identical output. Cache keys stay stable. In our test harness, a compressed prompt produced a full read-hit on the second call (Anthropic cache_read_input_tokens equal to cache_creation_input_tokens). Validate your own prompt shapes. cache_control markers are left untouched. Provider-side cache hits are unaffected.
If compression confidence is insufficient, the prompt passes through unchanged. You lose the savings on that request. You do not get a degraded response. Worst case is zero savings, not worse quality.
No. Your messages pass through the proxy to reach the AI provider. Our audit and billing logs contain metadata only — token counts, timestamps, provider, and billing meters. Prompt and response content is not persisted beyond the compression step. Retention windows, sub-processors, and data-subject rights are enumerated in our Data Processing Agreement.
Free tier: open globally. Pro and Enterprise: US billing address required.