What is Agentic AI?

Agentic AI refers to systems where AI agents autonomously plan, use tools, and iterate to accomplish tasks. Unlike single-prompt workflows, agentic systems:

Classify user intent and route accordingly
Plan which tools or knowledge sources to use
Execute tools (APIs, databases, MCP servers)
Retrieve from external knowledge (RAG)
Synthesize a final response from gathered context
Validate output and retry when needed

Architecture Principles

┌─────────────────────────────────────────────────────────────────┐
│                     AGENTIC AI SYSTEM                             │
├─────────────────────────────────────────────────────────────────┤
│  Input Guard     →  Intent & Planning  →  Tool Execution         │
│       ↓                    ↓                      ↓              │
│  Validation     →  Route Selection         RAG / MCP / APIs      │
│       ↓                    ↓                      ↓              │
│  Context Merge  ←  Rerank & Compress  ←  Tool Results           │
│       ↓                                                           │
│  Synthesizer    →  Output Guard       →  Response                │
└─────────────────────────────────────────────────────────────────┘

Principle	Implementation
Guardrails	Input validation, output validation, prompt-injection checks
Orchestration	LangGraph state machine with conditional edges
Tool use	MCP (Model Context Protocol) for external tools
RAG	Optional retrieval based on query intent
Context management	MMR reranking, compression when context exceeds limits
Multi-model	Specialized LLMs per stage (planning vs. synthesis)
Streaming	Phase events + token-by-token final output

Pipeline Flow

START
  │
  ▼
┌─────────────────┐
│  input_guard    │  Validate query (length, safety)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ intent_classifier│  LLM → intent (DATA_LOOKUP, CLINICAL_REASONING, etc.)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ planner_agent   │  LLM → selected_tools, use_rag
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ researcher_agent│  Execute MCP tools
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ mmr_reranker    │  (conditional) Embed + MMR when context large
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│context_compressor│  (conditional) LLM summarize when still large
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  merge_node     │  RAG (if use_rag) + merge tool + RAG context
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  synthesizer    │  LLM → final response with source attribution
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ output_guard    │  Validate; retry synthesizer on failure
└────────┬────────┘
         │
         ▼
        END

Agent Roles

1. Planner Agent

Role: Decide which tools to call and whether RAG is needed
Input: User query, intent
Output: selected_tools, use_rag
Pattern: LLM with structured JSON output

2. Researcher Agent

Role: Execute tools (MCP, APIs, databases)
Input: selected_tools
Output: tool_results
Pattern: Tool invocation with optional context (e.g. patient_id, ehr_id)

3. Synthesizer Agent

Role: Generate final answer from merged context
Input: merged_context, user query
Output: llm_response, final_output
Pattern: Dedicated LLM for generation (e.g. domain-specific model)

Supporting Nodes

Node	Purpose
input_guard	Validate input, block unsafe patterns
intent_classifier	Route by intent (lookup vs. reasoning vs. drug info)
query_rewriter	Rewrite query for RAG retrieval
rag_retriever	Vector search over knowledge base
mmr_reranker	Rerank large tool results via MMR
context_compressor	Summarize when context exceeds limit
context_merger	Merge tool results + RAG context
output_guard	Validate output, trigger retry

Streaming Architecture

For streaming, two phases:

Phase 1: Run the graph until merged_context is available; emit phase events.
Phase 2: Stream directly from the final LLM API (bypass orchestration) for reliable token-by-token delivery.

SSE event sequence:

classified → planned → retrieved → synthesizing → token... → done

LLM & Embedding Usage

Stage	Model	When
Intent	General LLM	Always
Planning	General LLM	Always
Query rewrite	General LLM	When RAG needed
Context compression	General LLM	When context > threshold
Synthesis	Domain LLM	Always
RAG retrieval	Embeddings	When RAG needed
MMR reranking	Embeddings	When tool results > threshold

Implementation: EHR Copilot

src/
├── main.py              # CLI entry
├── api.py               # FastAPI app
├── graph.py             # LangGraph definition
├── state.py             # Shared state schema
├── config.py            # LLMs, embeddings, env
├── agents/
│   ├── planner_agent.py
│   ├── researcher_agent.py
│   └── synthesizer_agent.py
└── nodes/
    ├── input_guard.py
    ├── intent_classifier.py
    ├── query_rewriter.py
    ├── rag_retriever.py
    ├── mmr_reranker.py
    ├── context_compressor.py
    ├── context_merger.py
    └── output_guard.py

Entry points:

CLI: python src/main.py "query"
API: POST /chat, POST /chat/stream, GET /health

Request Format

json

{
  "query": "What is third nerve palsy?",
  "patient_id": "patient-123",
  "ehr_id": "ehr-456"
}

patient_id and ehr_id are optional and passed to tools when provided.

Summary

This agentic AI architecture provides:

Input/output guards for safety
Intent classification for routing
Planner–researcher–synthesizer agent chain
Optional RAG via planner decision
Context management for large tool results
Streaming with phase events and token-by-token output
Multi-model use (general vs. domain-specific LLMs)

The EHR Copilot is one implementation of this pattern, adaptable to other domains by changing tools, intents, and models.

PreOCR

Agentic AI Architecture