Chat Memory

Note

This module provides intelligent conversation memory with both linear and graph-based threading capabilities.

The chat memory system allows bots to maintain context across conversation turns, enabling more coherent and contextual responses. It supports two main modes:

Linear Memory: Fast, simple memory that stores messages in chronological order
Graph Memory: Intelligent threading that connects related conversation topics using LLM-based analysis

Quick Start

Basic Linear Memory

import llamabot as lmb

# Create a bot with simple linear memory (fast, no LLM calls)
memory = lmb.ChatMemory()  # Default linear mode
bot = lmb.SimpleBot(
    system_prompt="You are a helpful assistant.",
    model_name="gpt-4o-mini",
    memory=memory
)

# Chat loop automatically uses memory
response1 = bot("Hello! How are you?")
response2 = bot("What did I just ask you?")  # Bot can reference previous conversation

Intelligent Graph Memory

import llamabot as lmb

# Create a bot with intelligent threading (uses LLM for smart connections)
memory = lmb.ChatMemory.threaded(model="gpt-4o-mini")
bot = lmb.SimpleBot(
    system_prompt="You are a helpful assistant.",
    model_name="gpt-4o-mini",
    memory=memory
)

# Bot can now handle conversation threading intelligently
response1 = bot("Let's talk about Python programming.")
response2 = bot("What are the benefits of using Python?")  # Continues Python thread
response3 = bot("Now let's discuss machine learning.")  # Starts new thread
response4 = bot("What libraries should I use for ML in Python?")  # Connects back to Python thread

Core Components

ChatMemory

The main class that provides unified memory functionality.

class ChatMemory:
    def __init__(self,
                 node_selector: Optional[NodeSelector] = None,
                 summarizer: Optional[Summarizer] = None,
                 context_depth: int = 5):
        """Initialize chat memory with configuration.

        :param node_selector: Strategy for selecting parent nodes (None = LinearNodeSelector)
        :param summarizer: Optional summarization strategy (None = no summarization)
        :param context_depth: Default depth for context retrieval
        """

Factory Methods:

ChatMemory() - Creates linear memory (fast, no LLM calls)
ChatMemory.threaded(model="gpt-4o-mini") - Creates graph memory with LLM-based threading

Node Selectors

LinearNodeSelector

Always selects the most recent assistant message as parent
Creates linear conversation flow
No LLM calls required
Used by default in linear mode

LLMNodeSelector

Uses LLM to intelligently select which assistant message to branch from
Considers message content and conversation context
Supports retry logic with feedback
Used in graph mode

Summarizers

LLMSummarizer

Generates message summaries for better threading
Optional component that can be disabled for performance
Uses LLM to create concise summaries of message content

Usage Patterns

Memory in Chat Loop

The standard pattern for using memory in a bot:

def __call__(self, *human_messages):
    # 1. Process incoming messages
    processed_messages = to_basemessage(human_messages)

    # 2. RETRIEVAL: Get relevant context from memory
    memory_messages = []
    if self.memory:
        memory_messages = self.memory.retrieve(
            query=f"From our conversation history, give me the most relevant information to the query, {[p.content for p in processed_messages]}",
            n_results=10,
            context_depth=5
        )

    # 3. Build message list with context
    messages = [self.system_prompt] + memory_messages + processed_messages

    # 4. Generate response
    response_message = AIMessage(content=content, tool_calls=tool_calls)

    # 5. STORAGE: Save conversation turn to memory
    if self.memory:
        self.memory.append(processed_messages[-1], response_message)

    return response_message

Memory Operations

Storage

# Add conversation turn to memory
memory.append(human_message, assistant_message)

Retrieval

# Get relevant context
context_messages = memory.retrieve(
    query="What did we discuss about Python?",
    n_results=5,
    context_depth=3
)

Reset

# Clear all stored messages
memory.reset()

Visualization

# Export conversation graph as Mermaid diagram
mermaid_diagram = memory.to_mermaid()
print(mermaid_diagram)

When to Use Each Memory Type

Use Case	Memory Type	Why
Simple Q&A	`lmb.ChatMemory()`	Fast, no LLM calls needed
Multi-topic conversations	`lmb.ChatMemory.threaded()`	Smart threading connects related topics
Performance critical	`lmb.ChatMemory()`	No additional LLM latency
Complex discussions	`lmb.ChatMemory.threaded()`	Maintains conversation context across topics
Real-time chat	`lmb.ChatMemory()`	Immediate responses
Research/analysis	`lmb.ChatMemory.threaded()`	Can reference earlier parts of conversation

Advanced Configuration

Custom Memory Setup

import llamabot as lmb

# Custom memory configuration (advanced users)
memory = lmb.ChatMemory(
    node_selector=lmb.LLMNodeSelector(model="gpt-4o-mini"),
    summarizer=lmb.LLMSummarizer(model="gpt-4o-mini"),  # Optional for better threading
    context_depth=10  # How far back to look for context
)

bot = lmb.SimpleBot(
    system_prompt="You are a coding assistant.",
    model_name="gpt-4o-mini",
    memory=memory
)

Memory with Different Bot Types

import llamabot as lmb

# Linear memory for simple conversations (fast)
linear_memory = lmb.ChatMemory()  # Default linear
simple_bot = lmb.SimpleBot(
    system_prompt="You are a helpful assistant.",
    memory=linear_memory
)

# Graph memory for complex conversations with threading (smart)
graph_memory = lmb.ChatMemory.threaded(model="gpt-4o-mini")
query_bot = lmb.QueryBot(
    system_prompt="You are a helpful assistant.",
    memory=graph_memory
)

# Custom memory for specific needs (advanced)
custom_memory = lmb.ChatMemory(
    node_selector=lmb.LLMNodeSelector(model="gpt-4o-mini"),
    summarizer=None  # No summarization for performance
)
structured_bot = lmb.StructuredBot(
    system_prompt="You are a helpful assistant.",
    pydantic_model=SomeModel,
    memory=custom_memory
)

Architecture

The chat memory system uses a modular architecture:

llamabot/components/chat_memory/
├── __init__.py        # Exports main classes and functions
├── memory.py          # Main ChatMemory class
├── retrieval.py       # Retrieval functions
├── storage.py         # Storage functions
├── visualization.py   # Visualization functions
└── selectors.py       # Node selection strategies

Data Model

ConversationNode

@dataclass
class ConversationNode:
    id: int  # Auto-incremented based on number of nodes in graph
    message: BaseMessage  # Single message (not conversation turn)
    summary: Optional[MessageSummary] = None
    parent_id: Optional[int] = None
    timestamp: datetime = field(default_factory=datetime.now)

MessageSummary

class MessageSummary(BaseModel):
    title: str = Field(..., description="Title of the message")
    summary: str = Field(..., description="Summary of the message. Two sentences max.")

Threading Model

The graph memory uses a tree structure where all nodes are connected in a single conversation tree:

```