StructuredBot Tutorial

Welcome to the StructuredBot tutorial! In this tutorial, we will learn how to use the StructuredBot class to get validated, structured outputs from LLMs using Pydantic models.

What is StructuredBot?

StructuredBot is designed for scenarios where you need guaranteed structured outputs from LLMs. Unlike SimpleBot, StructuredBot:

Enforces Pydantic schema validation on all responses
Automatically retries when the LLM produces invalid output
Returns validated Pydantic objects instead of raw text
Provides clear error messages when validation fails

This makes StructuredBot perfect for data extraction, API responses, form processing, and any scenario where you need reliable structured data.

Prerequisites

Before you begin, ensure you have the following:

Basic knowledge of Python programming
Familiarity with Pydantic models
Access to a Python environment with the necessary libraries installed

Installation

First, ensure you have the llamabot library installed:

pip install llamabot

Basic Usage

Step 1: Define Your Pydantic Model

Start by creating a Pydantic model that defines the structure you want:

from pydantic import BaseModel
from typing import List, Optional
from datetime import datetime

class Person(BaseModel):
    name: str
    age: int
    email: Optional[str] = None
    hobbies: List[str] = []
    created_at: datetime

Step 2: Create a StructuredBot

import llamabot as lmb
from datetime import datetime

# Create a StructuredBot with your Pydantic model
bot = lmb.StructuredBot(
    system_prompt="Extract person information from the given text. Always include a created_at timestamp.",
    pydantic_model=Person,
    model_name="gpt-4o"
)

Step 3: Use the Bot

# The bot will return a validated Person object
person = bot("John Smith is 25 years old and enjoys hiking, photography, and cooking. His email is john@example.com.")

print(person.name)        # "John Smith"
print(person.age)         # 25
print(person.email)       # "john@example.com"
print(person.hobbies)     # ["hiking", "photography", "cooking"]
print(person.created_at)  # datetime object

Advanced Features

Validation and Retry Logic

StructuredBot automatically handles validation failures by retrying with the LLM:

# If the LLM produces invalid output, StructuredBot will:
# 1. Show the validation error to the LLM
# 2. Ask it to fix the output
# 3. Retry up to the maximum number of attempts
# 4. Raise an error if all attempts fail

try:
    person = bot("Invalid input that might confuse the model")
except ValidationError as e:
    print(f"Validation failed after retries: {e}")

Custom Validation Rules

You can add custom validation to your Pydantic models:

from pydantic import BaseModel, validator
from typing import List

class Product(BaseModel):
    name: str
    price: float
    category: str
    tags: List[str]

    @validator('price')
    def price_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError('Price must be positive')
        return v

    @validator('category')
    def category_must_be_valid(cls, v):
        valid_categories = ['electronics', 'clothing', 'books', 'home']
        if v.lower() not in valid_categories:
            raise ValueError(f'Category must be one of: {valid_categories}')
        return v.lower()

# Create bot with custom validation
bot = lmb.StructuredBot(
    system_prompt="Extract product information from text.",
    pydantic_model=Product,
    model_name="gpt-4o"
)

Complex Nested Models

StructuredBot works with complex nested structures:

from pydantic import BaseModel
from typing import List, Optional

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class Company(BaseModel):
    name: str
    industry: str
    address: Address
    employees: int

class Employee(BaseModel):
    name: str
    position: str
    salary: float
    company: Company
    skills: List[str]

# Create bot for complex nested data
bot = lmb.StructuredBot(
    system_prompt="Extract employee information including company details.",
    pydantic_model=Employee,
    model_name="gpt-4o"
)

employee = bot("""
Sarah Johnson works as a Senior Software Engineer at TechCorp,
a technology company in the software industry.
She earns $95,000 per year and has skills in Python, JavaScript, and React.
The company is located at 123 Tech Street, San Francisco, CA 94105
and has 150 employees.
""")

Configuration Options

Retry Behavior

bot = lmb.StructuredBot(
    system_prompt="Extract data from text.",
    pydantic_model=YourModel,
    model_name="gpt-4o",
    allow_failed_validation=False,  # Default: False (retry on validation failure)
    max_retries=3,                  # Default: 3 retries
    temperature=0.1                 # Lower temperature for more consistent outputs
)

Streaming

StructuredBot supports streaming for real-time feedback:

bot = lmb.StructuredBot(
    system_prompt="Extract data from text.",
    pydantic_model=YourModel,
    stream_target="stdout"  # Stream to console
)

Common Use Cases

1. Data Extraction from Documents

class Invoice(BaseModel):
    invoice_number: str
    date: datetime
    total_amount: float
    vendor: str
    line_items: List[dict]

bot = lmb.StructuredBot(
    system_prompt="Extract invoice information from the document.",
    pydantic_model=Invoice,
    model_name="gpt-4o"
)

invoice = bot(invoice_document_text)

2. API Response Processing

class APIResponse(BaseModel):
    status: str
    data: dict
    error_message: Optional[str] = None
    timestamp: datetime

bot = lmb.StructuredBot(
    system_prompt="Parse API response and extract structured data.",
    pydantic_model=APIResponse,
    model_name="gpt-4o"
)

response = bot(api_response_text)

3. Form Data Validation

class ContactForm(BaseModel):
    name: str
    email: str
    phone: Optional[str] = None
    message: str
    urgency: str  # "low", "medium", "high"

bot = lmb.StructuredBot(
    system_prompt="Extract and validate contact form information.",
    pydantic_model=ContactForm,
    model_name="gpt-4o"
)

form_data = bot(user_submitted_text)

Best Practices

1. Design Clear Schemas

# Good: Clear, specific fields
class UserProfile(BaseModel):
    full_name: str
    email: str
    age: int
    interests: List[str]

# Avoid: Vague or overly complex schemas
class BadProfile(BaseModel):
    info: dict  # Too vague
    data: Any   # Too flexible

2. Use Appropriate Types

from typing import Optional, List, Union
from datetime import datetime

class Event(BaseModel):
    title: str
    start_time: datetime
    duration_minutes: int
    attendees: List[str]
    is_online: bool
    location: Optional[str] = None

3. Add Helpful Validation

from pydantic import BaseModel, validator

class Product(BaseModel):
    name: str
    price: float
    category: str

    @validator('price')
    def price_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError('Price must be positive')
        return v

4. Handle Edge Cases

# Use Optional fields for data that might not be present
class Article(BaseModel):
    title: str
    content: str
    author: Optional[str] = None
    publish_date: Optional[datetime] = None
    tags: List[str] = []

Troubleshooting

Common Issues

Validation Errors: Check your Pydantic model for type mismatches
Retry Failures: Ensure your system prompt is clear about the expected format
Complex Nested Data: Start with simpler models and gradually add complexity

Debug Mode

import llamabot as lmb

# Enable debug mode to see validation attempts
lmb.set_debug_mode(True)

bot = lmb.StructuredBot(
    system_prompt="Extract data from text.",
    pydantic_model=YourModel,
    model_name="gpt-4o"
)

Comparison with SimpleBot

Feature	SimpleBot	StructuredBot
Output Type	Raw text	Validated Pydantic objects
Validation	None	Automatic Pydantic validation
Retry Logic	None	Automatic retry on validation failure
Type Safety	No	Yes (Pydantic models)
Use Case	General conversation	Structured data extraction

Conclusion

StructuredBot provides a powerful way to get reliable, validated structured outputs from LLMs. By combining Pydantic models with automatic validation and retry logic, StructuredBot ensures that your applications receive data in the exact format you expect.

Key takeaways:

Use StructuredBot when you need guaranteed structured outputs
Design clear, well-validated Pydantic models
Leverage automatic retry logic for robust data extraction
Combine with appropriate system prompts for best results

For more advanced usage patterns and examples, check out the other bot tutorials in the LlamaBot documentation.