Smartflow Architecture — Scenario 4: Policy-Based Hybrid LLM Routing

SMARTFLOW INTELLIGENT ROUTER

SMARTFLOW

Query Analysis: Complexity, PII, Cost Target

Policy Check: User Permissions, Compliance

Cost Optimise: Budget vs. Quality Trade-off

Performance: Latency Requirements, Health

Dynamic Route: Real-time Best Provider Selection

Standardise: Normalise Params Across Models

LOCAL / ON-PREM MODELS

FAST

Llama 3.2 (Local)

$0/token · 50ms latency

CHEAP

Mistral 7B (Hosted)

$0.0001/token · 80ms

PRIVATE

Custom Fine-tuned

$0/token · Air-gapped

CLOUD PREMIUM MODELS

SMART

GPT-4 Turbo

$0.01/token · 200ms

QUALITY

Claude Sonnet 4.6

$0.015/token · 180ms

VISION

Gemini Ultra

$0.008/token · 250ms

Routes to LOCAL when:

Simple Q&A or fact lookup

High-volume repetitive queries

PII or sensitive data present

Budget constraints active

Air-gapped requirements

Low latency critical (<100ms)

User role = internal / free tier

Routes to CLOUD when:

Complex reasoning required

Multi-modal content (images, video)

High accuracy critical

Latest model features needed

User role = premium / enterprise

Local model confidence too low

Document analysis & generation

Where Smartflow Sits
Single gateway analysing every query and intelligently routing to the optimal LLM (local or cloud) based on real-time policies and context.

Dynamic Routing

Automatically select best model per query. Start with local Llama, escalate to GPT-4 only when complexity requires it. Cache check happens first.

70–90% Cost Control

80%+ of queries handled by free/cheap local models. Premium cloud LLMs only for genuinely complex tasks.

Policy Enforcement

Route PII queries to on-prem models only. Enforce per-user budgets. Block sensitive queries from leaving the corporate network automatically.

Fallback & Failover

If local model is unavailable or confidence is low, automatically failover to cloud. Ensure 99.9% uptime with multi-tier routing.

Cache Before Routing

Check semantic cache first. If cached, return instantly. Only route uncached queries to local/cloud models — saving tokens on both paths.

Cost Attribution

Track per-user, per-department costs. See exactly how much each team saves by routing to local vs. cloud models.

Patent-Pending Standardisation

Proprietary normalisation layer preserves temperature, top-p, and model-specific tunings. Switch seamlessly between Llama and GPT-4 without breaking response consistency.