SimpleBot and AsyncSimpleBot — Low-Level Design
Created: 2026-03-28
Last updated: 2026-03-28
HLD Link: ../../high-level-design.md
Requirements (EARS)
- simplebot-EARS.md —
SimpleBot,AsyncSimpleBot, shared completion helpers.
Overview
SimpleBot is the base completion stack for most LlamaBot bots: LiteLLM (completion / acompletion) behind the shared message model, optional recorder spans, and SQLite logging of turns. Module-level helpers (completion_kwargs_for_messages, make_response, make_async_response, stream_chunks, async_stream_chunks, stream_tokens_for_messages, extract_tool_calls, extract_content) live in llamabot.bot.simplebot and are reused by StructuredBot, ToolBot, QueryBot, and others.
Classes
| Class | Module | Base |
|---|---|---|
SimpleBot |
llamabot.bot.simplebot |
— |
AsyncSimpleBot |
llamabot.bot.simplebot |
SimpleBot |
Shared completion pipeline
LiteLLM calls are built by completion_kwargs_for_messages: model, messages (role/content from BaseMessage.model_dump), temperature, stream, completion_kwargs, optional api_key, mock_response, and—when a tools attribute exists on the bot—tools and tool_choice.
- Sync responses:
make_response→litellm.completion. - Async responses:
make_async_response→litellm.acompletion. - Streaming assembly:
stream_chunks(sync) andasync_stream_chunks(async) consume LiteLLM streams whenstream_targetis not"none";stream_tokens_for_messagesyields text deltas for async streaming.
extract_tool_calls / extract_content normalize ModelResponse into AIMessage fields and support JSON-in-content tool calls (e.g. some Ollama-style outputs) when message.tool_calls is absent.
Message composition
compose_messages_for_human_messages builds:
SystemMessage(fromsystem_prompt).- Optional memory messages via
memory.retrieve(...)whenmemoryis set (typeAbstractDocumentStorein the constructor; used for chat history or RAG-style retrieval depending on implementation). - User messages after
to_basemessage(...).
Call semantics
SimpleBot.__call__: Creates a Span (child of current span when present), records metadata, callsmake_response+stream_chunks, buildsAIMessagewith content and tool calls, sqlite_log, and appends tomemorywhen configured (last processed user message + assistant message).AsyncSimpleBot.__call__: Uses token streaming viastream_tokens_for_messagesand afinalizecallback to assemble the finalAIMessage, span fields, logging, and memory. RaisesRuntimeErrorif no assistant message was assembled.
Configuration
stream_target:stdout,panel,api, ornone(invalid values raise).o1-preview/o1-mini: System prompt is coerced toHumanMessage,temperatureset to1.0,stream_targetforced tonone.json_mode: WhenTrue,completion_kwargs_for_messagesrequirespydantic_modelon the bot (used byStructuredBot).
Traceability (intent → code)
| EARS ID prefix | Code |
|---|---|
CORE-SIMPLE-* |
llamabot/bot/simplebot.py |
Related Documents
- High-Level Design
- StructuredBot LLD — Pydantic validation loop; subclasses
SimpleBot. - ToolBot LLD — tool selection; subclasses
SimpleBot. - QueryBot LLD — RAG path (
compose_rag_messages); does not useSimpleBot.__call__message composition. - simplebot-EARS