Build a Data Analysis Chatbot
To run this notebook, click on the molab shield above or run the following command at the terminal:
uvx marimo edit --sandbox --mcp --no-token --watch https://github.com/ericmjl/llamabot/blob/main/docs/how-to/data-analysis-agentbot.py
import marimo as mo
How to Build a Data Analysis Chatbot with AgentBot
Learn how to build a chatbot that executes code for data analysis using AgentBot. Unlike ToolBot which handles single-turn function calls, AgentBot can orchestrate multi-step workflows and make decisions about which tools to use.
Prerequisites
Before you begin, ensure you have:
- Ollama installed and running locally: Visit ollama.ai to install
- Required Ollama model: Run
ollama pull deepseek-r1:32b(or another model that supports tool calling) - Python 3.10+ with llamabot, pandas, and numpy installed
- Sample data to analyze (or we'll create some in this guide)
All llamabot models in this guide use the ollama_chat/ prefix for local execution.
Goal
By the end of this guide, you'll have built a data analysis chatbot that:
- Executes Python code to analyze data
- Makes multi-step decisions about which analyses to perform
- Returns DataFrames and visualizations
- Provides observability through spans and workflow visualization
import pandas as pd
import numpy as np
import llamabot as lmb
from llamabot.bot.agentbot import AgentBot
from llamabot.components.tools import tool
Step 1: Create Sample Data
Let's create some sample data to analyze. In a real scenario, you'd load your own data.
# Create sample sales data
np.random.seed(42)
dates = pd.date_range("2024-01-01", periods=100, freq="D")
sales_data = pd.DataFrame(
{
"date": dates,
"product": np.random.choice(["Widget A", "Widget B", "Widget C"], 100),
"sales": np.random.randint(10, 100, 100),
"revenue": np.random.uniform(100, 1000, 100),
"region": np.random.choice(["North", "South", "East", "West"], 100),
}
)
sales_data.head()
Step 2: Create Data Analysis Tools
We'll create tools that the agent can use to analyze data. Each tool is decorated with @tool
to make it agent-callable.
@tool
def calculate_statistics(
dataframe_name: str, column: str, _globals_dict: dict = None
) -> str:
"""Calculate basic statistics (mean, median, std) for a column in a DataFrame.
:param dataframe_name: Name of the DataFrame variable in globals
:param column: Name of the column to analyze
:param _globals_dict: Internal parameter - automatically injected by AgentBot
:return: String summary of statistics
"""
if _globals_dict is None or dataframe_name not in _globals_dict:
return f"DataFrame '{dataframe_name}' not found in workspace."
df = _globals_dict[dataframe_name]
if column not in df.columns:
return f"Column '{column}' not found in DataFrame."
stats = {
"mean": df[column].mean(),
"median": df[column].median(),
"std": df[column].std(),
"min": df[column].min(),
"max": df[column].max(),
}
return f"Statistics for {column}:\n" + "\n".join(
f" {k}: {v:.2f}" for k, v in stats.items()
)
@tool
def group_by_analysis(
dataframe_name: str,
group_by: str,
aggregate_column: str,
_globals_dict: dict = None,
) -> str:
"""Group DataFrame by a column and aggregate another column.
:param dataframe_name: Name of the DataFrame variable in globals
:param group_by: Column name to group by
:param aggregate_column: Column name to aggregate
:param _globals_dict: Internal parameter - automatically injected by AgentBot
:return: String representation of grouped results
"""
if _globals_dict is None or dataframe_name not in _globals_dict:
return f"DataFrame '{dataframe_name}' not found in workspace."
df = _globals_dict[dataframe_name]
if group_by not in df.columns or aggregate_column not in df.columns:
return "One or more columns not found in DataFrame."
grouped = (
df.groupby(group_by)[aggregate_column]
.agg(["sum", "mean", "count"])
.round(2)
)
return f"Grouped analysis by {group_by}:\n{grouped.to_string()}"
@tool
def execute_custom_code(code: str, _globals_dict: dict = None) -> str:
"""Execute custom Python code for data analysis.
This tool allows the agent to execute arbitrary Python code for complex analyses.
Use this when standard tools aren't sufficient.
:param code: Python code to execute (must be safe and data-focused)
:param _globals_dict: Internal parameter - automatically injected by AgentBot
:return: String representation of the result
"""
if _globals_dict is None:
return "No workspace available."
try:
# Execute code with access to globals (including DataFrames)
exec(code, _globals_dict)
return "Code executed successfully. Check workspace for results."
except Exception as e:
return f"Error executing code: {str(e)}"
Step 3: Create the AgentBot
AgentBot orchestrates multiple tools and makes decisions about which ones to use. It uses a graph-based workflow where tools can loop back to the decision node.
# Create AgentBot with our data analysis tools
analysis_agent = AgentBot(
tools=[calculate_statistics, group_by_analysis, execute_custom_code],
system_prompt="""You are a data analysis assistant. You help users analyze data by:
1. Understanding what analysis they want
2. Selecting the appropriate tool(s) to use
3. Executing multi-step analyses when needed
4. Returning clear, informative results
Available tools:
- calculate_statistics: Get basic stats for a column
- group_by_analysis: Group and aggregate data
- execute_custom_code: Run custom Python code for complex analyses
Always use return_object_to_user() to return DataFrames or results to the user.
""",
model_name="ollama_chat/deepseek-r1:32b",
)
Step 4: Visualize the Agent Workflow
AgentBot automatically generates a mermaid diagram showing the workflow graph. Blue nodes are tools that loop back to the decision node, green nodes are terminal tools.
# Display the agent to see the workflow graph
analysis_agent
The mermaid diagram shows:
- Decision node: Where the agent decides which tool to use
- Tool nodes (blue): Tools that can loop back for multi-step workflows
- Terminal nodes (green): Tools like
respond_to_userthat end the workflow
This visualization helps you understand how the agent orchestrates tools.
Step 5: Use the Agent for Data Analysis
Now let's use the agent to analyze our data. The agent will decide which tools to use and can perform multi-step analyses.
# Use the agent to analyze data
# The agent will decide which tools to use based on the query
result = analysis_agent(
"Calculate the mean and standard deviation of sales, then group by region and show total revenue per region.",
globals(),
)
print(result)
Step 6: View Observability with Spans
AgentBot creates spans that track the entire workflow, including decision-making and tool execution.
# Display spans to see the agent's decision-making process
analysis_agent.spans
The spans show:
- agentbot_call: The main agent call with query and max_iterations
- iterations: How many tool calls were made
- result: The final result
- Nested spans: Each tool execution creates its own span
This observability helps you understand:
- Which tools the agent chose to use
- How many steps were needed
- What decisions were made at each step
Step 7: Create an Interactive Chat Interface
Let's create a Marimo chat interface so users can interact with the agent naturally.
def chat_turn(messages, config):
"""Handle a chat turn with the data analysis agent."""
user_message = messages[-1].content
# Make sure sales_data is available in globals
globals_dict = {"sales_data": sales_data}
# Call the agent
result = analysis_agent(user_message, globals_dict)
return result
# Create chat interface with example prompts
example_prompts = [
"What's the average sales by product?",
"Show me total revenue per region",
"Calculate statistics for the sales column",
]
chat = mo.ui.chat(chat_turn, max_height=600, prompts=example_prompts)
mo.vstack(
[
mo.md("### Data Analysis Agent"),
mo.md(
"Ask questions about the sales data. The agent will decide which tools to use."
),
chat,
]
)
Summary
You've built a data analysis chatbot with AgentBot that:
- Executes code for data analysis
- Makes multi-step decisions about which tools to use
- Orchestrates complex workflows automatically
- Provides workflow visualization through mermaid diagrams
- Tracks observability through spans
- Offers an interactive chat interface
Key Takeaways:
- AgentBot orchestrates multiple tools in a graph-based workflow
- Tools decorated with
@toolbecome agent-callable - Display the agent to see the workflow graph visualization
- Spans track decision-making and tool execution
- AgentBot can handle multi-step workflows automatically
- Use
globals_dictto share data between tool calls - Terminal tools (like
respond_to_user) end the workflow