LlamaBot — High-Level Design
Created: 2026-03-28
Last updated: 2026-03-28 — Decision 3: sync routing uses ToolBot.__call__; async routing uses AsyncToolBot.__call__ (acompletion). See AgentBot LLD.
Problem Statement
Researchers and developers need a Pythonic, composable way to call many LLM providers (via LiteLLM), experiment in notebooks, and ship small apps (CLI, FastAPI, HTMX) without ad-hoc glue code for every project.
Goals
- Unified bot API — Common patterns for completion, tools, RAG, structure, and agents with consistent message and config handling.
- Provider breadth — Support models exposed through LiteLLM with sensible defaults and overrides.
- Composable components — Messages, docstores, tools, memory, and observability as optional building blocks.
- Clear reference patterns — Documented reference implementations (including PocketFlow-based agents) that users can copy or replace.
Non-Goals
- Being a full application framework — LlamaBot is a library; apps bring routing, auth, and deployment.
- Mandating one agent architecture — Multiple orchestration styles are valid; the library ships examples and reference graphs, not a single “blessed” agent product.
Target Users
- Notebook experimenters — Quick
SimpleBot/ToolBot/AgentBotusage with marimo or Jupyter-style workflows. - App builders — FastAPI + HTMX demos, CLI tools, and bots with streaming and SSE.
- Maintainers — Contributors extending bots, components, and docs.
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ LlamaBot library │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │ Bots │ │Components│ │ CLI / │ │ Recorder │ │
│ │ (Simple, │ │ messages, │ │ helpers │ │ spans, │ │
│ │ Tool, │ │ docstore, │ │ │ │ sqlite │ │
│ │ Query, │ │ tools, │ │ │ │ │ │
│ │ Agent, │ │ memory │ │ │ │ │ │
│ │ …) │ │ │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └────────────┘ │
│ │ │ LiteLLM │
│ └──────────────┴──────────────────────► API │
└─────────────────────────────────────────────────────────────┘
Agent / orchestration: AgentBot is a reference implementation of one PocketFlow graph (decide → tool → loopback). Other graphs are encouraged; see the AgentBot feature LLD.
Key Design Decisions
Decision 1: LiteLLM as the single provider surface
Choice: Route completions through LiteLLM.
Rationale: One integration point for many providers and options; aligns with LlamaBot’s “all models LiteLLM supports” goal.
Decision 2: PocketFlow for agent graph orchestration
Choice: Use PocketFlow Flow / AsyncFlow for the default AgentBot topology.
Rationale: Explicit graph, routing labels, and loopback without a custom scheduler; users can fork or replace DecideNode / edges.
Decision 3: Tool routing via ToolBot / AsyncToolBot inside DecideNode
Choice: The sync graph uses ToolBot.__call__; the async graph uses AsyncToolBot.__call__ with acompletion, both single-turn tool-calling.
Rationale: Reuses the same message and schema plumbing as ToolBot; async entrypoint matches other Async* bots.
Risks and Mitigations
| Risk | Mitigation |
|---|---|
| Confusion between “AgentBot” and “the only agent” | Document AgentBot as reference graph; see designs/agentbot LLD and EARS. |
| Sync/async drift | Parallel Async* bots and async completion paths where supported; document in LLDs. |