QueryBot and AsyncQueryBot — Low-Level Design
Created: 2026-03-28
Last updated: 2026-03-28
HLD Link: ../../high-level-design.md
Requirements (EARS)
- querybot-EARS.md — RAG composition, retrieval spans, logging, async behavior.
Overview
QueryBot answers a user query using retrieval-augmented generation: it pulls chunks from a required AbstractDocumentStore (docstore), optionally pulls additional chunks from a second store passed as memory, then completes with LiteLLM via the same completion_kwargs_for_messages path as SimpleBot. It does not use SimpleBot.compose_messages_for_human_messages or the default “single user turn + optional memory retrieve” flow from SimpleBot.__call__; it replaces the call path with compose_rag_messages and QueryBot.__call__.
Inheritance and modules
| Class | Module | Base |
|---|---|---|
QueryBot |
llamabot.bot.querybot |
SimpleBot |
AsyncQueryBot |
llamabot.bot.querybot |
QueryBot |
Data and dependencies
| Field | Type | Role |
|---|---|---|
docstore |
AbstractDocumentStore |
Primary retrieval corpus; required. |
memory |
Optional[AbstractDocumentStore] |
Optional second retrieval source; also receives append of assistant text after each successful call when set. |
SimpleBot.__init__ is invoked without wiring memory from the QueryBot constructor; QueryBot assigns self.memory after super().__init__, reusing the attribute name for the optional second store.
RAG message composition
compose_rag_messages(query, n_results, outer_span):
- Resolves
query_contentfromstr,HumanMessage, orBaseMessage. - Starts
messageswithsystem_prompt. - Under a nested span
retrieval:docstore.retrieve(query_content, n_results); each chunk becomesRetrievedMessage(content=chunk); span records document counts. - If
memoryis set: under spanmemory_retrieval,memory.retrieve(query_content, n_results)with the same pattern; span records counts. - Appends
HumanMessage(content=query_content). - Sets outer span
query/temperaturemetadata. - Returns
(messages, processed_messages)whereprocessed_messages = to_basemessage(messages)for the LiteLLM call, andmessagesis the list used forsqlite_log(see below).
Completion and logging
-
QueryBot.__call__: Outer Span (same pattern as other bots),compose_rag_messages, thenmake_response(self, processed_messages, stream)andstream_chunks, thenextract_tool_calls/extract_content→AIMessage.sqlite_log(self, messages + [response_message])uses the pre-to_basemessagemessageslist (system + retrieved + human). Ifmemoryis set,memory.append(response_message.content)(assistant string only). -
AsyncQueryBot.__call__: Runssuper().__call__inasyncio.to_threadwith the samequeryandn_results. -
AsyncQueryBot.stream_async: Same spans andcompose_rag_messages, thenstream_tokens_for_messages(self, processed_messages, finalize=...)with afinalizethat logs and appends tomemorylike sync.
Traceability (intent → code)
| EARS ID | Code |
|---|---|
QRY-RAG-* |
llamabot/bot/querybot.py |
Related Documents
- High-Level Design
- SimpleBot LLD — shared LiteLLM helpers (
make_response,stream_chunks,stream_tokens_for_messages,completion_kwargs_for_messages). - QueryBot EARS