Skip to content

Chat Memory

Note

This module provides intelligent conversation memory with both linear and graph-based threading capabilities.

The chat memory system allows bots to maintain context across conversation turns, enabling more coherent and contextual responses. It supports two main modes:

  • Linear Memory: Fast, simple memory that stores messages in chronological order
  • Graph Memory: Intelligent threading that connects related conversation topics using LLM-based analysis

Quick Start

Basic Linear Memory

import llamabot as lmb

# Create a bot with simple linear memory (fast, no LLM calls)
memory = lmb.ChatMemory()  # Default linear mode
bot = lmb.SimpleBot(
    system_prompt="You are a helpful assistant.",
    model_name="gpt-4o-mini",
    memory=memory
)

# Chat loop automatically uses memory
response1 = bot("Hello! How are you?")
response2 = bot("What did I just ask you?")  # Bot can reference previous conversation

Intelligent Graph Memory

import llamabot as lmb

# Create a bot with intelligent threading (uses LLM for smart connections)
memory = lmb.ChatMemory.threaded(model="gpt-4o-mini")
bot = lmb.SimpleBot(
    system_prompt="You are a helpful assistant.",
    model_name="gpt-4o-mini",
    memory=memory
)

# Bot can now handle conversation threading intelligently
response1 = bot("Let's talk about Python programming.")
response2 = bot("What are the benefits of using Python?")  # Continues Python thread
response3 = bot("Now let's discuss machine learning.")  # Starts new thread
response4 = bot("What libraries should I use for ML in Python?")  # Connects back to Python thread

Core Components

ChatMemory

The main class that provides unified memory functionality.

class ChatMemory:
    def __init__(self,
                 node_selector: Optional[NodeSelector] = None,
                 summarizer: Optional[Summarizer] = None,
                 context_depth: int = 5):
        """Initialize chat memory with configuration.

        :param node_selector: Strategy for selecting parent nodes (None = LinearNodeSelector)
        :param summarizer: Optional summarization strategy (None = no summarization)
        :param context_depth: Default depth for context retrieval
        """

Factory Methods:

  • ChatMemory() - Creates linear memory (fast, no LLM calls)
  • ChatMemory.threaded(model="gpt-4o-mini") - Creates graph memory with LLM-based threading

Node Selectors

LinearNodeSelector

  • Always selects the most recent assistant message as parent
  • Creates linear conversation flow
  • No LLM calls required
  • Used by default in linear mode

LLMNodeSelector

  • Uses LLM to intelligently select which assistant message to branch from
  • Considers message content and conversation context
  • Supports retry logic with feedback
  • Used in graph mode

Summarizers

LLMSummarizer

  • Generates message summaries for better threading
  • Optional component that can be disabled for performance
  • Uses LLM to create concise summaries of message content

Usage Patterns

Memory in Chat Loop

The standard pattern for using memory in a bot:

def __call__(self, *human_messages):
    # 1. Process incoming messages
    processed_messages = to_basemessage(human_messages)

    # 2. RETRIEVAL: Get relevant context from memory
    memory_messages = []
    if self.memory:
        memory_messages = self.memory.retrieve(
            query=f"From our conversation history, give me the most relevant information to the query, {[p.content for p in processed_messages]}",
            n_results=10,
            context_depth=5
        )

    # 3. Build message list with context
    messages = [self.system_prompt] + memory_messages + processed_messages

    # 4. Generate response
    response_message = AIMessage(content=content, tool_calls=tool_calls)

    # 5. STORAGE: Save conversation turn to memory
    if self.memory:
        self.memory.append(processed_messages[-1], response_message)

    return response_message

Memory Operations

Storage

# Add conversation turn to memory
memory.append(human_message, assistant_message)

Retrieval

# Get relevant context
context_messages = memory.retrieve(
    query="What did we discuss about Python?",
    n_results=5,
    context_depth=3
)

Reset

# Clear all stored messages
memory.reset()

Visualization

# Export conversation graph as Mermaid diagram
mermaid_diagram = memory.to_mermaid()
print(mermaid_diagram)

When to Use Each Memory Type

Use Case Memory Type Why
Simple Q&A lmb.ChatMemory() Fast, no LLM calls needed
Multi-topic conversations lmb.ChatMemory.threaded() Smart threading connects related topics
Performance critical lmb.ChatMemory() No additional LLM latency
Complex discussions lmb.ChatMemory.threaded() Maintains conversation context across topics
Real-time chat lmb.ChatMemory() Immediate responses
Research/analysis lmb.ChatMemory.threaded() Can reference earlier parts of conversation

Advanced Configuration

Custom Memory Setup

import llamabot as lmb

# Custom memory configuration (advanced users)
memory = lmb.ChatMemory(
    node_selector=lmb.LLMNodeSelector(model="gpt-4o-mini"),
    summarizer=lmb.LLMSummarizer(model="gpt-4o-mini"),  # Optional for better threading
    context_depth=10  # How far back to look for context
)

bot = lmb.SimpleBot(
    system_prompt="You are a coding assistant.",
    model_name="gpt-4o-mini",
    memory=memory
)

Memory with Different Bot Types

import llamabot as lmb

# Linear memory for simple conversations (fast)
linear_memory = lmb.ChatMemory()  # Default linear
simple_bot = lmb.SimpleBot(
    system_prompt="You are a helpful assistant.",
    memory=linear_memory
)

# Graph memory for complex conversations with threading (smart)
graph_memory = lmb.ChatMemory.threaded(model="gpt-4o-mini")
query_bot = lmb.QueryBot(
    system_prompt="You are a helpful assistant.",
    memory=graph_memory
)

# Custom memory for specific needs (advanced)
custom_memory = lmb.ChatMemory(
    node_selector=lmb.LLMNodeSelector(model="gpt-4o-mini"),
    summarizer=None  # No summarization for performance
)
structured_bot = lmb.StructuredBot(
    system_prompt="You are a helpful assistant.",
    pydantic_model=SomeModel,
    memory=custom_memory
)

Architecture

The chat memory system uses a modular architecture:

llamabot/components/chat_memory/
├── __init__.py        # Exports main classes and functions
├── memory.py          # Main ChatMemory class
├── retrieval.py       # Retrieval functions
├── storage.py         # Storage functions
├── visualization.py   # Visualization functions
└── selectors.py       # Node selection strategies

Data Model

ConversationNode

@dataclass
class ConversationNode:
    id: int  # Auto-incremented based on number of nodes in graph
    message: BaseMessage  # Single message (not conversation turn)
    summary: Optional[MessageSummary] = None
    parent_id: Optional[int] = None
    timestamp: datetime = field(default_factory=datetime.now)

MessageSummary

class MessageSummary(BaseModel):
    title: str = Field(..., description="Title of the message")
    summary: str = Field(..., description="Summary of the message. Two sentences max.")

Threading Model

The graph memory uses a tree structure where all nodes are connected in a single conversation tree:

```