Process Receipts with LLM Agents
To run this notebook, click on the molab shield above or run the following command at the terminal:
uvx marimo edit --sandbox --mcp --no-token --watch https://github.com/ericmjl/llamabot/blob/main/docs/how-to/receipt-processing.py
import marimo as mo
How to Process Receipts with LLM Agents
Learn how to extract structured data from receipt PDFs and images using a two-step OCR and structuring pattern with llamabot's SimpleBot and StructuredBot.
Prerequisites
Before you begin, ensure you have:
- Ollama installed and running locally: Visit ollama.ai to install
- Required Ollama models:
ollama pull deepseek-ocr(for OCR text extraction)ollama pull gemma3n:latest(for structured output, or another model that supports structured outputs)- Python 3.10+ with llamabot and pdf2image installed
- A receipt PDF or image to process (or use the example provided)
All llamabot models in this guide use the ollama/ or ollama_chat/ prefix for local execution.
Goal
By the end of this guide, you'll have built a receipt processing system that:
- Converts receipt PDFs to images
- Extracts text using vision models (OCR)
- Structures the extracted data into a validated Pydantic model
- Provides observability through spans
from pathlib import Path
import tempfile
from pdf2image import convert_from_path
from pydantic import BaseModel, Field
import llamabot as lmb
from llamabot import get_current_span, span
from llamabot.bot.structuredbot import StructuredBot
from llamabot.components.messages import user
from llamabot.prompt_manager import prompt
Step 1: Define Your Receipt Data Schema
First, define the Pydantic model that represents the structured receipt data. This schema must be defined before building the extraction agent.
class ReceiptData(BaseModel):
"""Receipt data schema - must be defined BEFORE building extraction agent."""
vendor: str = Field(..., description="The name of the vendor/merchant")
date: str = Field(..., description="The transaction date in YYYY-MM-DD format")
amount: float = Field(
..., description="The total amount as a number (without currency symbols)"
)
category: str = Field(
...,
description="Business category (e.g., 'Office Supplies', 'Travel', 'Meals', 'Software', 'Equipment')",
)
description: str = Field(
..., description="Brief description of what was purchased"
)
Step 2: Create the Two-Step Processing Bots
We use a two-step pattern because vision models like DeepSeek-OCR excel at OCR but don't necessarily support structured outputs. The solution:
- OCR Step (SimpleBot): Extract text from images using vision models
- Structuring Step (StructuredBot): Convert unstructured text to validated Pydantic models
@prompt("system")
def receipt_extraction_sysprompt() -> str:
"""You are an expert at extracting financial information from receipt and invoice documents.
Extract the following information accurately:
- vendor: The name of the vendor/merchant
- date: The transaction date in YYYY-MM-DD format
- amount: The total amount as a number (without currency symbols)
- category: Business category (e.g., "Office Supplies", "Travel", "Meals", "Software", "Equipment")
- description: Brief description of what was purchased
If any field is unclear or missing, use your best judgment based on the context.
For dates, convert any format to YYYY-MM-DD. For amounts, extract only the numerical value.
"""
# Step 1: OCR extraction with DeepSeek-OCR (SimpleBot)
# DeepSeek-OCR doesn't support structured outputs, so we use SimpleBot
ocr_bot = lmb.SimpleBot(
system_prompt="Extract all text from receipts accurately. "
"Preserve the structure and include all numbers, dates, and vendor names.",
model_name="ollama/deepseek-ocr",
stream_target="none",
)
# Step 2: Structure the data (using a model that supports structured outputs)
receipt_structuring_bot = StructuredBot(
system_prompt=receipt_extraction_sysprompt(),
pydantic_model=ReceiptData,
model_name="ollama_chat/gemma3n:latest",
stream_target="none",
)
Step 3: Create PDF to Image Converter with Spans
Let's create a function that converts PDFs to images, using spans for observability.
@span
def convert_pdf_to_images(file_path: str):
"""Convert PDF to list of image paths."""
s = get_current_span()
s["file_path"] = file_path
file_extension = Path(file_path).suffix.lower()
s["file_extension"] = file_extension
if file_extension == ".pdf":
images = convert_from_path(file_path, dpi=200)
image_paths = []
for i, image in enumerate(images):
with tempfile.NamedTemporaryFile(
delete=False, suffix=f"_page_{i + 1}.png"
) as temp_img:
image.save(temp_img.name, "PNG")
image_paths.append(temp_img.name)
s["page_count"] = len(image_paths)
s["conversion_success"] = True
return image_paths
elif file_extension in [".png", ".jpg", ".jpeg"]:
s["page_count"] = 1
s["conversion_success"] = True
return [file_path]
else:
s["conversion_success"] = False
raise ValueError(f"Unsupported file type: {file_extension}")
Step 4: Process a Receipt
Now let's process a receipt through the complete workflow:
- Convert PDF to images
- Extract text with OCR
- Structure the data
# Example: Process a receipt
# Replace with your own receipt file path
receipt_path = "./receipt_lunch.pdf" # Or use your own: "/path/to/your/receipt.pdf"
# Step 1: Convert PDF to images
image_paths = convert_pdf_to_images(receipt_path)
print(f"Converted to {len(image_paths)} image(s)")
# Step 2: Extract text with OCR
ocr_texts = []
for image_path in image_paths:
ocr_response = ocr_bot(
user("Extract all text from this receipt image.", image_path)
)
ocr_texts.append(ocr_response.content)
print(f"Extracted text from {len(ocr_texts)} page(s)")
# Step 3: Structure the extracted text
combined_ocr_text = "\n\n--- Page Break ---\n\n".join(ocr_texts)
receipt_data = receipt_structuring_bot(combined_ocr_text)
print(f"Structured data: {receipt_data.model_dump_json(indent=2)}")
receipt_data
Step 5: View Observability with Spans
Both bots automatically create spans for observability. Let's see what information is tracked.
# Display spans from both bots
print("OCR Bot Spans:")
ocr_bot.spans
print("\n\nReceipt Structuring Bot Spans:")
receipt_structuring_bot.spans
The spans show:
- OCR Bot: query, model, input_message_count, duration_ms
- Structuring Bot: query, model, validation_attempts, validation_success, schema_fields, duration_ms
You can also see nested spans from the convert_pdf_to_images function showing:
- file_path, file_extension, page_count, conversion_success
This observability helps you debug issues and understand the workflow execution.
Step 6: Create a Complete Receipt Processing Function
Let's combine everything into a single function that can be used as a tool.
from llamabot.components.tools import tool
@tool
def process_receipt(file_path: str, _globals_dict: dict = None) -> str:
"""Process a receipt PDF or image and extract structured data.
This tool demonstrates that agents can have read access to the local file system.
Simply provide a file path and the tool will read it from disk.
:param file_path: Path to the receipt file (PDF, PNG, JPG, or JPEG)
:param _globals_dict: Internal parameter - automatically injected by AgentBot
:return: JSON string of extracted receipt data
"""
# Access current span to add attributes
s = get_current_span()
s["file_path"] = file_path
# Verify the file exists
if not Path(file_path).exists():
raise FileNotFoundError(f"Receipt file not found: {file_path}")
# PDF to image conversion
image_paths = convert_pdf_to_images(file_path)
s["page_count"] = len(image_paths)
if len(image_paths) == 1:
prompt_text = "Extract all text from this receipt image."
else:
prompt_text = (
f"Extract all text from this {len(image_paths)}-page receipt document."
)
# Step 1: OCR extraction - extract text from images
ocr_texts = []
for image_path in image_paths:
ocr_response = ocr_bot(user(prompt_text, image_path))
ocr_texts.append(ocr_response.content)
s.log("ocr_completed", pages=len(image_paths))
# Combine OCR results from all pages
combined_ocr_text = "\n\n--- Page Break ---\n\n".join(ocr_texts)
# Step 2: Structure the extracted text according to ReceiptData schema
result = receipt_structuring_bot(combined_ocr_text)
s.log("structuring_completed")
s["vendor"] = result.vendor
s["amount"] = result.amount
# Store ReceiptData object in globals for returning to user
if _globals_dict is not None:
_globals_dict["receipt_data"] = result
return result.model_dump_json()
Summary
You've built a receipt processing system that:
- Uses a two-step OCR + structuring pattern
- Leverages vision models for text extraction
- Validates output with Pydantic schemas
- Provides observability through spans
- Can be used as a tool in agent workflows
Key Takeaways:
- Define your Pydantic schema first
- Use SimpleBot for vision/OCR tasks
- Use StructuredBot for validated structured outputs
- Use
@spandecorator andget_current_span()for manual observability - Spans automatically track bot operations
- The
@tooldecorator makes functions agent-callable