Smartflow — Enterprise AI Platform Overview

100+

LLM Providers

3-tier

Semantic Cache

A2A

Agent Protocol

MCP

Tool Gateway

PKCE

OAuth Flows

NLP

Tool Search

Smartflowis an enterprise-grade AI proxy, policy engine, and agent gateway. It sits in front of any LLM provider — OpenAI, Anthropic, Azure, Google, local models — and adds compliance enforcement, semantic caching, MCP tool orchestration, virtual key budgeting, intelligent load balancing, and full observability, all without changing your existing API calls.

Core Proxy & Routing

Universal LLM Proxy

Single endpoint for every major LLM provider. Drop-in replacement for OpenAI's /v1/chat/completions— no client-side changes required.

OpenAI, Anthropic, Azure, Google, Mistral, Cohere, local GGUF/ONNX
Streaming (SSE) and synchronous responses
Per-request model overrides via headers
Transparent passthrough — existing SDKs work unchanged

Feature Parity

Load Balancing & Fallback Chains

Distribute traffic across provider endpoints with automatic failover. Named fallback chains stored in Redis — model-level or provider-level granularity.

Strategies: Round Robin, Weighted, Least Connections, Random, Priority
Retry with exponential backoff for 429 / 5xx
Non-retryable 4xx errors bypass retry, move to next target immediately
Chain CRUD via /api/routing/fallback-chains

Feature Parity Advanced Retry Logic

Virtual Key Management

Issue sk-sf-{48-hex} tokens to users and applications with hard budget caps, rate limits, and automatic period resets.

Budget periods: Daily, Weekly, Monthly, Lifetime
Pre-request budget check — 429 with budget headers before any spend
Post-response spend recording per actual cost
TPM / RPM rate limit enforcement
Full CRUD at /api/enterprise/vkeys

Feature Parity

Key Vault & Provider Key Store

Centrally store and retrieve provider API keys. The proxy never exposes raw keys to clients — all credentials stay server-side in Redis.

Intercepts and stores keys from inbound requests
Per-provider resolution at routing time
Supports environment variable, Redis, and vault-style references

Feature Parity

Intelligent Caching

Smartflow advantage: LiteLLM supports exact-match caching and semantic caching via an external Qdrant vector store. Smartflow's entire caching stack is native— no external vector database required. Semantic matching, adaptive TTL, per-request control, and cost tracking are built directly into the proxy.

4-Phase MetaCache

Four sequential lookup phases before any token is sent to a provider. The first hit wins: intent fingerprint → near-miss → SHA-256 exact key → VectorLite BERT KNN. Responses carryx-smartflow-cache-hit and x-smartflow-cache-key for client-side correlation.

Phase 4 — VectorLite BERT KNN: sentence-transformers/all-MiniLM-L6-v2, 384-dim, cosine similarity ≥ 0.90
Local BERT inference — no external Qdrant, Weaviate, or Pinecone
Adaptive TTL based on query volatility
Per-request opt-out via x-smartflow-cache: skip
Cost saved estimated per hit, reported in dashboards

Unique: Native BERT Semantic Cache

MCP Tool-Call Cache

Separately caches MCP tool responses. Identical tool calls with identical parameters return instantly without re-invoking the MCP server.

Per-server, per-tool hit statistics
Selective flush: entire server or single tool
Cache ping, delete, and alias endpoints
Cache statistics visible in dashboard

Unique

MCP Gateway

Smartflow advantage:LiteLLM's MCP support is server-list plus tool routing. Smartflow adds an enterprise control plane: AD-group access control, approval workflows, OAuth PKCE consent, per-server auth header forwarding, semantic tool search, and performance-based routing — none of which exist in LiteLLM.

MCP Server Registry

Central registry for all MCP servers. Supports HTTP, SSE, and STDIO transports. Configuration persisted in Redis and manageable via API or dashboard.

HTTP, SSE, and STDIO transports
Auth: API key, OAuth client_credentials, Basic, mTLS, PKCE
Server aliases for stable routing regardless of URL changes
Health monitoring with cost and latency tracking
OpenAPI spec auto-generation per server

Feature Parity + PKCE, mTLS, Aliases

Access Control & Approval Workflow

Enterprise-grade gates on every MCP tool call. AD/LDAP group membership drives allow/deny decisions. Sensitive tools require explicit admin approval before use.

Per-server and per-tool AD group allow/deny lists
Catalog of approval-required tool requests
Approve / deny workflow via dashboard or API
Every tool call logged with user, server, result, and cost

Unique

Semantic Tool Search

NLP search across every tool on every registered MCP server. Ask "find a tool that reads files" and get back ranked results — no need to know which server hosts the tool.

Embeddings indexed per tool: name + description + parameter names
Cosine similarity ranking across all servers simultaneously
GET /api/mcp/tools/search?q=...&k=5
POST /api/mcp/tools/reindex triggers full re-index
Optional server filter for scoped search

New Unique

OAuth PKCE Browser Consent

User-facing interactive OAuth for MCP servers that require individual user consent (GitHub, Google Workspace, Slack) — not just machine-to-machine credentials.

PKCE code_verifier + SHA256 challenge — RFC 7636 compliant
GET /api/mcp/auth/initiate→ browser redirect URL
GET /api/mcp/auth/callback→ token exchange + storage
User-scoped tokens in Redis, independent per (user, server)
Pending sessions expire after 10 minutes (configurable)

New Unique

Per-Server Auth Header Forwarding

Pass server-specific credentials on individual requests without storing them. Clients send x-mcp-{alias}-* headers; Smartflow strips them before forwarding to the end LLM.

Headers scoped by server alias — no credential leakage across servers
Supports any arbitrary header name per server
Works across HTTP, SSE, and STDIO transports

New Unique

A2A Agent Gateway

Smartflow leads here:LiteLLM has a basic A2A prototype. Smartflow implements the full Google A2A open protocol — Agent Cards, task lifecycle, SSE streaming, Redis-backed task history, and cross-agent trace headers. Any A2A-compatible client (LangGraph, Vertex AI, Azure AI Foundry, Bedrock AgentCore) can connect to a Smartflow agent out of the box.

Agent Registry & Agent Cards

Register named agents in Redis. Each agent has a model, system prompt, optional MCP tool access, and a machine-readable Agent Card that advertises its capabilities.

Agent profiles stored in Redis — instant updates, no redeploy
GET /.well-known/agent.json— gateway card listing all agents
GET /a2a/{id}/.well-known/agent.json— per-agent card
Skills, auth schemes, streaming capability all declared in card

New A2A Protocol

Task Lifecycle Management

Full A2A task state machine: submitted → working → completed / failed / canceled. Every task is persisted in Redis with full message history and artifact outputs.

tasks/send— synchronous execution with full Task response
tasks/sendSubscribe— SSE stream of status update events
tasks/get— retrieve task + history + artifacts by ID
tasks/cancel— cancel in-flight tasks
Task history trimming via history_length parameter
24-hour TTL with per-agent task index in Redis

New

Cross-Agent Tracing

Trace requests across multiple agents using a shared trace ID. Smartflow propagates and stores the trace context with every task.

X-A2A-Trace-Id header forwarded through the call chain
Trace ID stored in task metadata for correlation
Interoperable with LangGraph, Vertex AI, Azure AI Foundry, Bedrock

New A2A Protocol

A2A Admin API

Full management surface for the agent gateway — create, inspect, and remove agents without touching config files or restarting the proxy.

GET/POST /api/a2a/agents— list and register agents
GET/DELETE /api/a2a/agents/{id}— inspect or remove
GET /api/a2a/agents/{id}/tasks— recent task history
GET /api/a2a/tasks/{id}— inspect any task by ID

New

Policy & Compliance Engine

Smartflow leads here: LiteLLM has no equivalent. Smartflow's policy engine classifies applications, enforces usage policies, detects compliance violations in real time, and archives structured logs to MongoDB and TimescaleDB for audit and BI.

Policy Engine (Maestro)

Real-time policy evaluation on every request and response. Policies are defined, versioned, and stored in TimescaleDB. The Maestro dashboard provides a unified policy management interface.

Input and output content scanning
Application classification by request pattern
Policy CRUD at /api/policy/*
Violation alerting and audit trail

Unique

Compliance API

Dedicated compliance microservice (compliance_api_server) for regulated industries. Runs independently for high-availability compliance checking.

Regulatory framework mapping (GDPR, HIPAA, financial)
ML-based violation detection
Structured compliance reports via /api/compliance/*

Unique

VAS Logging & Analytics

Every request generates a structured Value-Added Service log with cost, latency, model, user, policy result, virtual key token, and compliance verdict.

TimescaleDB time-series for metrics and trends
MongoDB archival of full request/response logs
Cost breakdown by user, team, model, and provider
Sustainability metrics (energy / carbon per token)

Unique

Observability & Telemetry

Real-Time Dashboards

Browser-based dashboards served directly from the proxy. No external monitoring stack required for core operational visibility.

Cost and usage by model, provider, user
Cache hit rate, cost savings, per-phase breakdown (Phase 1–4)
MCP tool call volume, latency, and error rates
Policy violation trends and compliance score
Sustainability metrics

Unique

Telemetry API

Programmatic access to all platform telemetry via structured REST endpoints. Feed dashboards, alert systems, or BI tools directly.

/api/telemetry/*— time-windowed metrics
/api/insights/*— aggregated cost and usage insights
Export compatible with Grafana, Prometheus, and custom integrations

Feature Parity + Compliance & Sustainability

Feature Comparison — Smartflow vs LiteLLM

Feature	Smartflow	LiteLLM	Notes
Proxy & Routing
OpenAI-compatible endpoint			Full parity
100+ LLM providers			Full parity
Streaming (SSE)			Full parity
Load balancing strategies			Both support multiple strategies
Fallback chains with retry logic	Advanced		Smartflow: retryable vs non-retryable error classification, exponential backoff
Virtual key budgets			Full parity
Caching
Exact-match cache			Full parity
Semantic cache	Native	◑ Requires Qdrant	Smartflow: no external vector DB required
4-phase cache (fingerprint + VectorLite BERT KNN)			Smartflow only
Adaptive TTL			Smartflow only
Per-request cache control header			Smartflow: `x-smartflow-cache: skip`
MCP tool-call cache			Smartflow only
MCP Gateway
MCP server registry			Full parity
HTTP + SSE transports			Full parity
STDIO transport			Full parity
AD/LDAP group access control			Smartflow only
Tool approval workflow			Smartflow only
Semantic tool search (NLP)			Smartflow only
OAuth PKCE interactive flow			Smartflow only
Per-server auth header forwarding			Smartflow only
OpenAPI generation per server			Smartflow only
Agent Gateway (A2A)
A2A protocol (Google standard)	Full	◑ Partial	Smartflow: Agent Cards, task lifecycle, SSE, Redis task store
Agent Card auto-generation			Smartflow only
SSE task event streaming			Smartflow only
Redis-backed task history			Smartflow only
Cross-agent trace propagation			X-A2A-Trace-Id — Smartflow only
Policy & Compliance
Real-time policy engine			Smartflow only
Compliance microservice			Smartflow only
VAS audit logging (MongoDB + TimescaleDB)			Smartflow only
Sustainability metrics			Smartflow only
Enterprise Auth
SAML / Kerberos / AD integration			Smartflow only
mTLS for upstream connections			Smartflow only
Multi-arch Docker (AMD64 + ARM64)			Full parity

Supported Not supported ◑ Partial Advanced = Smartflow's implementation goes further New added in current release Unique only in Smartflow Feature Parity equivalent to LiteLLM

Smartflow Platform Overview