Version 1.5

Smartflow Platform Overview

The enterprise AI gateway that speaks LLM, MCP, and A2A — a unified control plane with policy enforcement, a semantic cache that learns, and observability that tells you exactly what happened and what it cost.

Production Ready AMD64 + ARM64 A2A Protocol MCP Gateway Policy Engine
100+
LLM Providers
3-tier
Semantic Cache
A2A
Agent Protocol
MCP
Tool Gateway
PKCE
OAuth Flows
NLP
Tool Search

Smartflowis an enterprise-grade AI proxy, policy engine, and agent gateway. It sits in front of any LLM provider — OpenAI, Anthropic, Azure, Google, local models — and adds compliance enforcement, semantic caching, MCP tool orchestration, virtual key budgeting, intelligent load balancing, and full observability, all without changing your existing API calls.

Core Proxy & Routing
Universal LLM Proxy
Single endpoint for every major LLM provider. Drop-in replacement for OpenAI's /v1/chat/completions— no client-side changes required.
  • OpenAI, Anthropic, Azure, Google, Mistral, Cohere, local GGUF/ONNX
  • Streaming (SSE) and synchronous responses
  • Per-request model overrides via headers
  • Transparent passthrough — existing SDKs work unchanged
Feature Parity
Load Balancing & Fallback Chains
Distribute traffic across provider endpoints with automatic failover. Named fallback chains stored in Redis — model-level or provider-level granularity.
  • Strategies: Round Robin, Weighted, Least Connections, Random, Priority
  • Retry with exponential backoff for 429 / 5xx
  • Non-retryable 4xx errors bypass retry, move to next target immediately
  • Chain CRUD via /api/routing/fallback-chains
Feature Parity Advanced Retry Logic
Virtual Key Management
Issue sk-sf-{48-hex} tokens to users and applications with hard budget caps, rate limits, and automatic period resets.
  • Budget periods: Daily, Weekly, Monthly, Lifetime
  • Pre-request budget check — 429 with budget headers before any spend
  • Post-response spend recording per actual cost
  • TPM / RPM rate limit enforcement
  • Full CRUD at /api/enterprise/vkeys
Feature Parity
Key Vault & Provider Key Store
Centrally store and retrieve provider API keys. The proxy never exposes raw keys to clients — all credentials stay server-side in Redis.
  • Intercepts and stores keys from inbound requests
  • Per-provider resolution at routing time
  • Supports environment variable, Redis, and vault-style references
Feature Parity
Intelligent Caching

Smartflow advantage: LiteLLM supports exact-match caching and semantic caching via an external Qdrant vector store. Smartflow's entire caching stack is native— no external vector database required. Semantic matching, adaptive TTL, per-request control, and cost tracking are built directly into the proxy.

4-Phase MetaCache
Four sequential lookup phases before any token is sent to a provider. The first hit wins: intent fingerprint → near-miss → SHA-256 exact key → VectorLite BERT KNN. Responses carryx-smartflow-cache-hit and x-smartflow-cache-key for client-side correlation.
  • Phase 4 — VectorLite BERT KNN: sentence-transformers/all-MiniLM-L6-v2, 384-dim, cosine similarity ≥ 0.90
  • Local BERT inference — no external Qdrant, Weaviate, or Pinecone
  • Adaptive TTL based on query volatility
  • Per-request opt-out via x-smartflow-cache: skip
  • Cost saved estimated per hit, reported in dashboards
Unique: Native BERT Semantic Cache
MCP Tool-Call Cache
Separately caches MCP tool responses. Identical tool calls with identical parameters return instantly without re-invoking the MCP server.
  • Per-server, per-tool hit statistics
  • Selective flush: entire server or single tool
  • Cache ping, delete, and alias endpoints
  • Cache statistics visible in dashboard
Unique
MCP Gateway

Smartflow advantage:LiteLLM's MCP support is server-list plus tool routing. Smartflow adds an enterprise control plane: AD-group access control, approval workflows, OAuth PKCE consent, per-server auth header forwarding, semantic tool search, and performance-based routing — none of which exist in LiteLLM.

MCP Server Registry
Central registry for all MCP servers. Supports HTTP, SSE, and STDIO transports. Configuration persisted in Redis and manageable via API or dashboard.
  • HTTP, SSE, and STDIO transports
  • Auth: API key, OAuth client_credentials, Basic, mTLS, PKCE
  • Server aliases for stable routing regardless of URL changes
  • Health monitoring with cost and latency tracking
  • OpenAPI spec auto-generation per server
Feature Parity + PKCE, mTLS, Aliases
Access Control & Approval Workflow
Enterprise-grade gates on every MCP tool call. AD/LDAP group membership drives allow/deny decisions. Sensitive tools require explicit admin approval before use.
  • Per-server and per-tool AD group allow/deny lists
  • Catalog of approval-required tool requests
  • Approve / deny workflow via dashboard or API
  • Every tool call logged with user, server, result, and cost
Unique
Semantic Tool Search
NLP search across every tool on every registered MCP server. Ask "find a tool that reads files" and get back ranked results — no need to know which server hosts the tool.
  • Embeddings indexed per tool: name + description + parameter names
  • Cosine similarity ranking across all servers simultaneously
  • GET /api/mcp/tools/search?q=...&k=5
  • POST /api/mcp/tools/reindex triggers full re-index
  • Optional server filter for scoped search
New Unique
OAuth PKCE Browser Consent
User-facing interactive OAuth for MCP servers that require individual user consent (GitHub, Google Workspace, Slack) — not just machine-to-machine credentials.
  • PKCE code_verifier + SHA256 challenge — RFC 7636 compliant
  • GET /api/mcp/auth/initiate→ browser redirect URL
  • GET /api/mcp/auth/callback→ token exchange + storage
  • User-scoped tokens in Redis, independent per (user, server)
  • Pending sessions expire after 10 minutes (configurable)
New Unique
Per-Server Auth Header Forwarding
Pass server-specific credentials on individual requests without storing them. Clients send x-mcp-{alias}-* headers; Smartflow strips them before forwarding to the end LLM.
  • Headers scoped by server alias — no credential leakage across servers
  • Supports any arbitrary header name per server
  • Works across HTTP, SSE, and STDIO transports
New Unique
A2A Agent Gateway

Smartflow leads here:LiteLLM has a basic A2A prototype. Smartflow implements the full Google A2A open protocol — Agent Cards, task lifecycle, SSE streaming, Redis-backed task history, and cross-agent trace headers. Any A2A-compatible client (LangGraph, Vertex AI, Azure AI Foundry, Bedrock AgentCore) can connect to a Smartflow agent out of the box.

Agent Registry & Agent Cards
Register named agents in Redis. Each agent has a model, system prompt, optional MCP tool access, and a machine-readable Agent Card that advertises its capabilities.
  • Agent profiles stored in Redis — instant updates, no redeploy
  • GET /.well-known/agent.json— gateway card listing all agents
  • GET /a2a/{id}/.well-known/agent.json— per-agent card
  • Skills, auth schemes, streaming capability all declared in card
New A2A Protocol
Task Lifecycle Management
Full A2A task state machine: submitted → working → completed / failed / canceled. Every task is persisted in Redis with full message history and artifact outputs.
  • tasks/send— synchronous execution with full Task response
  • tasks/sendSubscribe— SSE stream of status update events
  • tasks/get— retrieve task + history + artifacts by ID
  • tasks/cancel— cancel in-flight tasks
  • Task history trimming via history_length parameter
  • 24-hour TTL with per-agent task index in Redis
New
Cross-Agent Tracing
Trace requests across multiple agents using a shared trace ID. Smartflow propagates and stores the trace context with every task.
  • X-A2A-Trace-Id header forwarded through the call chain
  • Trace ID stored in task metadata for correlation
  • Interoperable with LangGraph, Vertex AI, Azure AI Foundry, Bedrock
New A2A Protocol
A2A Admin API
Full management surface for the agent gateway — create, inspect, and remove agents without touching config files or restarting the proxy.
  • GET/POST /api/a2a/agents— list and register agents
  • GET/DELETE /api/a2a/agents/{id}— inspect or remove
  • GET /api/a2a/agents/{id}/tasks— recent task history
  • GET /api/a2a/tasks/{id}— inspect any task by ID
New
Policy & Compliance Engine

Smartflow leads here: LiteLLM has no equivalent. Smartflow's policy engine classifies applications, enforces usage policies, detects compliance violations in real time, and archives structured logs to MongoDB and TimescaleDB for audit and BI.

Policy Engine (Maestro)
Real-time policy evaluation on every request and response. Policies are defined, versioned, and stored in TimescaleDB. The Maestro dashboard provides a unified policy management interface.
  • Input and output content scanning
  • Application classification by request pattern
  • Policy CRUD at /api/policy/*
  • Violation alerting and audit trail
Unique
Compliance API
Dedicated compliance microservice (compliance_api_server) for regulated industries. Runs independently for high-availability compliance checking.
  • Regulatory framework mapping (GDPR, HIPAA, financial)
  • ML-based violation detection
  • Structured compliance reports via /api/compliance/*
Unique
VAS Logging & Analytics
Every request generates a structured Value-Added Service log with cost, latency, model, user, policy result, virtual key token, and compliance verdict.
  • TimescaleDB time-series for metrics and trends
  • MongoDB archival of full request/response logs
  • Cost breakdown by user, team, model, and provider
  • Sustainability metrics (energy / carbon per token)
Unique
Observability & Telemetry
Real-Time Dashboards
Browser-based dashboards served directly from the proxy. No external monitoring stack required for core operational visibility.
  • Cost and usage by model, provider, user
  • Cache hit rate, cost savings, per-phase breakdown (Phase 1–4)
  • MCP tool call volume, latency, and error rates
  • Policy violation trends and compliance score
  • Sustainability metrics
Unique
Telemetry API
Programmatic access to all platform telemetry via structured REST endpoints. Feed dashboards, alert systems, or BI tools directly.
  • /api/telemetry/*— time-windowed metrics
  • /api/insights/*— aggregated cost and usage insights
  • Export compatible with Grafana, Prometheus, and custom integrations
Feature Parity + Compliance & Sustainability
Feature Comparison — Smartflow vs LiteLLM
Feature Smartflow LiteLLM Notes
Proxy & Routing
OpenAI-compatible endpointFull parity
100+ LLM providersFull parity
Streaming (SSE)Full parity
Load balancing strategiesBoth support multiple strategies
Fallback chains with retry logic AdvancedSmartflow: retryable vs non-retryable error classification, exponential backoff
Virtual key budgetsFull parity
Caching
Exact-match cacheFull parity
Semantic cache Native Requires QdrantSmartflow: no external vector DB required
4-phase cache (fingerprint + VectorLite BERT KNN)Smartflow only
Adaptive TTLSmartflow only
Per-request cache control headerSmartflow: x-smartflow-cache: skip
MCP tool-call cacheSmartflow only
MCP Gateway
MCP server registryFull parity
HTTP + SSE transportsFull parity
STDIO transportFull parity
AD/LDAP group access controlSmartflow only
Tool approval workflowSmartflow only
Semantic tool search (NLP)Smartflow only
OAuth PKCE interactive flowSmartflow only
Per-server auth header forwardingSmartflow only
OpenAPI generation per serverSmartflow only
Agent Gateway (A2A)
A2A protocol (Google standard) Full PartialSmartflow: Agent Cards, task lifecycle, SSE, Redis task store
Agent Card auto-generationSmartflow only
SSE task event streamingSmartflow only
Redis-backed task historySmartflow only
Cross-agent trace propagationX-A2A-Trace-Id — Smartflow only
Policy & Compliance
Real-time policy engineSmartflow only
Compliance microserviceSmartflow only
VAS audit logging (MongoDB + TimescaleDB)Smartflow only
Sustainability metricsSmartflow only
Enterprise Auth
SAML / Kerberos / AD integrationSmartflow only
mTLS for upstream connectionsSmartflow only
Multi-arch Docker (AMD64 + ARM64)Full parity
Supported Not supported Partial Advanced = Smartflow's implementation goes further New added in current release Unique only in Smartflow Feature Parity equivalent to LiteLLM