Smartflowis an enterprise-grade AI proxy, policy engine, and agent gateway.
It sits in front of any LLM provider — OpenAI, Anthropic, Azure, Google, local models — and adds
compliance enforcement, semantic caching, MCP tool orchestration, virtual key budgeting,
intelligent load balancing, and full observability, all without changing your existing API calls.
Core Proxy & Routing
Universal LLM Proxy
Single endpoint for every major LLM provider. Drop-in replacement for OpenAI's
/v1/chat/completions— no client-side changes required.
- OpenAI, Anthropic, Azure, Google, Mistral, Cohere, local GGUF/ONNX
- Streaming (SSE) and synchronous responses
- Per-request model overrides via headers
- Transparent passthrough — existing SDKs work unchanged
Feature Parity
Load Balancing & Fallback Chains
Distribute traffic across provider endpoints with automatic failover.
Named fallback chains stored in Redis — model-level or provider-level granularity.
- Strategies: Round Robin, Weighted, Least Connections, Random, Priority
- Retry with exponential backoff for 429 / 5xx
- Non-retryable 4xx errors bypass retry, move to next target immediately
- Chain CRUD via
/api/routing/fallback-chains
Feature Parity
Advanced Retry Logic
Virtual Key Management
Issue sk-sf-{48-hex} tokens to users and applications
with hard budget caps, rate limits, and automatic period resets.
- Budget periods: Daily, Weekly, Monthly, Lifetime
- Pre-request budget check — 429 with budget headers before any spend
- Post-response spend recording per actual cost
- TPM / RPM rate limit enforcement
- Full CRUD at
/api/enterprise/vkeys
Feature Parity
Key Vault & Provider Key Store
Centrally store and retrieve provider API keys. The proxy never exposes raw keys
to clients — all credentials stay server-side in Redis.
- Intercepts and stores keys from inbound requests
- Per-provider resolution at routing time
- Supports environment variable, Redis, and vault-style references
Feature Parity
Intelligent Caching
Smartflow advantage: LiteLLM supports exact-match caching and semantic caching via an
external Qdrant vector store. Smartflow's entire caching stack is native— no external
vector database required. Semantic matching, adaptive TTL, per-request control, and cost tracking
are built directly into the proxy.
4-Phase MetaCache
Four sequential lookup phases before any token is sent to a provider. The first hit wins:
intent fingerprint → near-miss → SHA-256 exact key → VectorLite BERT KNN.
Responses carryx-smartflow-cache-hit and
x-smartflow-cache-key for client-side correlation.
- Phase 4 — VectorLite BERT KNN:
sentence-transformers/all-MiniLM-L6-v2, 384-dim, cosine similarity ≥ 0.90
- Local BERT inference — no external Qdrant, Weaviate, or Pinecone
- Adaptive TTL based on query volatility
- Per-request opt-out via
x-smartflow-cache: skip
- Cost saved estimated per hit, reported in dashboards
Unique: Native BERT Semantic Cache
MCP Tool-Call Cache
Separately caches MCP tool responses. Identical tool calls with identical parameters
return instantly without re-invoking the MCP server.
- Per-server, per-tool hit statistics
- Selective flush: entire server or single tool
- Cache ping, delete, and alias endpoints
- Cache statistics visible in dashboard
Unique
MCP Gateway
Smartflow advantage:LiteLLM's MCP support is server-list plus tool routing.
Smartflow adds an enterprise control plane: AD-group access control, approval workflows,
OAuth PKCE consent, per-server auth header forwarding, semantic tool search,
and performance-based routing — none of which exist in LiteLLM.
MCP Server Registry
Central registry for all MCP servers. Supports HTTP, SSE, and STDIO transports.
Configuration persisted in Redis and manageable via API or dashboard.
- HTTP, SSE, and STDIO transports
- Auth: API key, OAuth client_credentials, Basic, mTLS, PKCE
- Server aliases for stable routing regardless of URL changes
- Health monitoring with cost and latency tracking
- OpenAPI spec auto-generation per server
Feature Parity
+ PKCE, mTLS, Aliases
Access Control & Approval Workflow
Enterprise-grade gates on every MCP tool call. AD/LDAP group membership drives
allow/deny decisions. Sensitive tools require explicit admin approval before use.
- Per-server and per-tool AD group allow/deny lists
- Catalog of approval-required tool requests
- Approve / deny workflow via dashboard or API
- Every tool call logged with user, server, result, and cost
Unique
Semantic Tool Search
NLP search across every tool on every registered MCP server.
Ask "find a tool that reads files" and get back ranked results
— no need to know which server hosts the tool.
- Embeddings indexed per tool: name + description + parameter names
- Cosine similarity ranking across all servers simultaneously
GET /api/mcp/tools/search?q=...&k=5
POST /api/mcp/tools/reindex triggers full re-index
- Optional server filter for scoped search
New
Unique
OAuth PKCE Browser Consent
User-facing interactive OAuth for MCP servers that require individual user consent
(GitHub, Google Workspace, Slack) — not just machine-to-machine credentials.
- PKCE code_verifier + SHA256 challenge — RFC 7636 compliant
GET /api/mcp/auth/initiate→ browser redirect URL
GET /api/mcp/auth/callback→ token exchange + storage
- User-scoped tokens in Redis, independent per (user, server)
- Pending sessions expire after 10 minutes (configurable)
New
Unique
Per-Server Auth Header Forwarding
Pass server-specific credentials on individual requests without storing them.
Clients send x-mcp-{alias}-* headers;
Smartflow strips them before forwarding to the end LLM.
- Headers scoped by server alias — no credential leakage across servers
- Supports any arbitrary header name per server
- Works across HTTP, SSE, and STDIO transports
New
Unique
A2A Agent Gateway
Smartflow leads here:LiteLLM has a basic A2A prototype.
Smartflow implements the full Google A2A open protocol — Agent Cards, task lifecycle,
SSE streaming, Redis-backed task history, and cross-agent trace headers.
Any A2A-compatible client (LangGraph, Vertex AI, Azure AI Foundry, Bedrock AgentCore)
can connect to a Smartflow agent out of the box.
Agent Registry & Agent Cards
Register named agents in Redis. Each agent has a model, system prompt, optional
MCP tool access, and a machine-readable Agent Card that advertises its capabilities.
- Agent profiles stored in Redis — instant updates, no redeploy
GET /.well-known/agent.json— gateway card listing all agents
GET /a2a/{id}/.well-known/agent.json— per-agent card
- Skills, auth schemes, streaming capability all declared in card
New
A2A Protocol
Task Lifecycle Management
Full A2A task state machine: submitted → working → completed / failed / canceled.
Every task is persisted in Redis with full message history and artifact outputs.
tasks/send— synchronous execution with full Task response
tasks/sendSubscribe— SSE stream of status update events
tasks/get— retrieve task + history + artifacts by ID
tasks/cancel— cancel in-flight tasks
- Task history trimming via
history_length parameter
- 24-hour TTL with per-agent task index in Redis
New
Cross-Agent Tracing
Trace requests across multiple agents using a shared trace ID.
Smartflow propagates and stores the trace context with every task.
X-A2A-Trace-Id header forwarded through the call chain
- Trace ID stored in task metadata for correlation
- Interoperable with LangGraph, Vertex AI, Azure AI Foundry, Bedrock
New
A2A Protocol
A2A Admin API
Full management surface for the agent gateway — create, inspect, and remove
agents without touching config files or restarting the proxy.
GET/POST /api/a2a/agents— list and register agents
GET/DELETE /api/a2a/agents/{id}— inspect or remove
GET /api/a2a/agents/{id}/tasks— recent task history
GET /api/a2a/tasks/{id}— inspect any task by ID
New
Policy & Compliance Engine
Smartflow leads here: LiteLLM has no equivalent. Smartflow's policy engine
classifies applications, enforces usage policies, detects compliance violations in real time,
and archives structured logs to MongoDB and TimescaleDB for audit and BI.
Policy Engine (Maestro)
Real-time policy evaluation on every request and response. Policies are defined,
versioned, and stored in TimescaleDB. The Maestro dashboard provides a unified
policy management interface.
- Input and output content scanning
- Application classification by request pattern
- Policy CRUD at
/api/policy/*
- Violation alerting and audit trail
Unique
Compliance API
Dedicated compliance microservice (compliance_api_server)
for regulated industries. Runs independently for high-availability compliance checking.
- Regulatory framework mapping (GDPR, HIPAA, financial)
- ML-based violation detection
- Structured compliance reports via
/api/compliance/*
Unique
VAS Logging & Analytics
Every request generates a structured Value-Added Service log with cost, latency,
model, user, policy result, virtual key token, and compliance verdict.
- TimescaleDB time-series for metrics and trends
- MongoDB archival of full request/response logs
- Cost breakdown by user, team, model, and provider
- Sustainability metrics (energy / carbon per token)
Unique
Observability & Telemetry
Real-Time Dashboards
Browser-based dashboards served directly from the proxy. No external monitoring
stack required for core operational visibility.
- Cost and usage by model, provider, user
- Cache hit rate, cost savings, per-phase breakdown (Phase 1–4)
- MCP tool call volume, latency, and error rates
- Policy violation trends and compliance score
- Sustainability metrics
Unique
Telemetry API
Programmatic access to all platform telemetry via structured REST endpoints.
Feed dashboards, alert systems, or BI tools directly.
/api/telemetry/*— time-windowed metrics
/api/insights/*— aggregated cost and usage insights
- Export compatible with Grafana, Prometheus, and custom integrations
Feature Parity
+ Compliance & Sustainability
Feature Comparison — Smartflow vs LiteLLM
| Feature |
Smartflow |
LiteLLM |
Notes |
| Proxy & Routing |
| OpenAI-compatible endpoint | | | Full parity |
| 100+ LLM providers | | | Full parity |
| Streaming (SSE) | | | Full parity |
| Load balancing strategies | | | Both support multiple strategies |
| Fallback chains with retry logic | Advanced | | Smartflow: retryable vs non-retryable error classification, exponential backoff |
| Virtual key budgets | | | Full parity |
| Caching |
| Exact-match cache | | | Full parity |
| Semantic cache | Native | ◑ Requires Qdrant | Smartflow: no external vector DB required |
| 4-phase cache (fingerprint + VectorLite BERT KNN) | | | Smartflow only |
| Adaptive TTL | | | Smartflow only |
| Per-request cache control header | | | Smartflow: x-smartflow-cache: skip |
| MCP tool-call cache | | | Smartflow only |
| MCP Gateway |
| MCP server registry | | | Full parity |
| HTTP + SSE transports | | | Full parity |
| STDIO transport | | | Full parity |
| AD/LDAP group access control | | | Smartflow only |
| Tool approval workflow | | | Smartflow only |
| Semantic tool search (NLP) | | | Smartflow only |
| OAuth PKCE interactive flow | | | Smartflow only |
| Per-server auth header forwarding | | | Smartflow only |
| OpenAPI generation per server | | | Smartflow only |
| Agent Gateway (A2A) |
| A2A protocol (Google standard) | Full | ◑ Partial | Smartflow: Agent Cards, task lifecycle, SSE, Redis task store |
| Agent Card auto-generation | | | Smartflow only |
| SSE task event streaming | | | Smartflow only |
| Redis-backed task history | | | Smartflow only |
| Cross-agent trace propagation | | | X-A2A-Trace-Id — Smartflow only |
| Policy & Compliance |
| Real-time policy engine | | | Smartflow only |
| Compliance microservice | | | Smartflow only |
| VAS audit logging (MongoDB + TimescaleDB) | | | Smartflow only |
| Sustainability metrics | | | Smartflow only |
| Enterprise Auth |
| SAML / Kerberos / AD integration | | | Smartflow only |
| mTLS for upstream connections | | | Smartflow only |
| Multi-arch Docker (AMD64 + ARM64) | | | Full parity |
Supported
Not supported
◑ Partial
Advanced = Smartflow's implementation goes further
New added in current release
Unique only in Smartflow
Feature Parity equivalent to LiteLLM