What is Agentic AI?
Agentic AI refers to systems where AI agents autonomously plan, use tools, and iterate to accomplish tasks. Unlike single-prompt workflows, agentic systems:
- Classify user intent and route accordingly
- Plan which tools or knowledge sources to use
- Execute tools (APIs, databases, MCP servers)
- Retrieve from external knowledge (RAG)
- Synthesize a final response from gathered context
- Validate output and retry when needed
Architecture Principles
┌─────────────────────────────────────────────────────────────────┐
│ AGENTIC AI SYSTEM │
├─────────────────────────────────────────────────────────────────┤
│ Input Guard → Intent & Planning → Tool Execution │
│ ↓ ↓ ↓ │
│ Validation → Route Selection RAG / MCP / APIs │
│ ↓ ↓ ↓ │
│ Context Merge ← Rerank & Compress ← Tool Results │
│ ↓ │
│ Synthesizer → Output Guard → Response │
└─────────────────────────────────────────────────────────────────┘
| Principle | Implementation |
|---|---|
| Guardrails | Input validation, output validation, prompt-injection checks |
| Orchestration | LangGraph state machine with conditional edges |
| Tool use | MCP (Model Context Protocol) for external tools |
| RAG | Optional retrieval based on query intent |
| Context management | MMR reranking, compression when context exceeds limits |
| Multi-model | Specialized LLMs per stage (planning vs. synthesis) |
| Streaming | Phase events + token-by-token final output |
Pipeline Flow
START
│
▼
┌─────────────────┐
│ input_guard │ Validate query (length, safety)
└────────┬────────┘
│
▼
┌─────────────────┐
│ intent_classifier│ LLM → intent (DATA_LOOKUP, CLINICAL_REASONING, etc.)
└────────┬────────┘
│
▼
┌─────────────────┐
│ planner_agent │ LLM → selected_tools, use_rag
└────────┬────────┘
│
▼
┌─────────────────┐
│ researcher_agent│ Execute MCP tools
└────────┬────────┘
│
▼
┌─────────────────┐
│ mmr_reranker │ (conditional) Embed + MMR when context large
└────────┬────────┘
│
▼
┌─────────────────┐
│context_compressor│ (conditional) LLM summarize when still large
└────────┬────────┘
│
▼
┌─────────────────┐
│ merge_node │ RAG (if use_rag) + merge tool + RAG context
└────────┬────────┘
│
▼
┌─────────────────┐
│ synthesizer │ LLM → final response with source attribution
└────────┬────────┘
│
▼
┌─────────────────┐
│ output_guard │ Validate; retry synthesizer on failure
└────────┬────────┘
│
▼
END
Agent Roles
1. Planner Agent
- Role: Decide which tools to call and whether RAG is needed
- Input: User query, intent
- Output:
selected_tools,use_rag - Pattern: LLM with structured JSON output
2. Researcher Agent
- Role: Execute tools (MCP, APIs, databases)
- Input:
selected_tools - Output:
tool_results - Pattern: Tool invocation with optional context (e.g.
patient_id,ehr_id)
3. Synthesizer Agent
- Role: Generate final answer from merged context
- Input:
merged_context, user query - Output:
llm_response,final_output - Pattern: Dedicated LLM for generation (e.g. domain-specific model)
Supporting Nodes
| Node | Purpose |
|---|---|
| input_guard | Validate input, block unsafe patterns |
| intent_classifier | Route by intent (lookup vs. reasoning vs. drug info) |
| query_rewriter | Rewrite query for RAG retrieval |
| rag_retriever | Vector search over knowledge base |
| mmr_reranker | Rerank large tool results via MMR |
| context_compressor | Summarize when context exceeds limit |
| context_merger | Merge tool results + RAG context |
| output_guard | Validate output, trigger retry |
Streaming Architecture
For streaming, two phases:
- Phase 1: Run the graph until
merged_contextis available; emit phase events. - Phase 2: Stream directly from the final LLM API (bypass orchestration) for reliable token-by-token delivery.
SSE event sequence:
classified → planned → retrieved → synthesizing → token... → done
LLM & Embedding Usage
| Stage | Model | When |
|---|---|---|
| Intent | General LLM | Always |
| Planning | General LLM | Always |
| Query rewrite | General LLM | When RAG needed |
| Context compression | General LLM | When context > threshold |
| Synthesis | Domain LLM | Always |
| RAG retrieval | Embeddings | When RAG needed |
| MMR reranking | Embeddings | When tool results > threshold |
Implementation: EHR Copilot
src/
├── main.py # CLI entry
├── api.py # FastAPI app
├── graph.py # LangGraph definition
├── state.py # Shared state schema
├── config.py # LLMs, embeddings, env
├── agents/
│ ├── planner_agent.py
│ ├── researcher_agent.py
│ └── synthesizer_agent.py
└── nodes/
├── input_guard.py
├── intent_classifier.py
├── query_rewriter.py
├── rag_retriever.py
├── mmr_reranker.py
├── context_compressor.py
├── context_merger.py
└── output_guard.py
Entry points:
- CLI:
python src/main.py "query" - API:
POST /chat,POST /chat/stream,GET /health
Request Format
{
"query": "What is third nerve palsy?",
"patient_id": "patient-123",
"ehr_id": "ehr-456"
}
patient_id and ehr_id are optional and passed to tools when provided.
Summary
This agentic AI architecture provides:
- Input/output guards for safety
- Intent classification for routing
- Planner–researcher–synthesizer agent chain
- Optional RAG via planner decision
- Context management for large tool results
- Streaming with phase events and token-by-token output
- Multi-model use (general vs. domain-specific LLMs)
The EHR Copilot is one implementation of this pattern, adaptable to other domains by changing tools, intents, and models.