PyData Boston/Cambridge Talk @ Moderna: What makes an agent?

written by Eric J. Ma on 2025-01-31 | tags: large language models python llamabot pydantic structuredbot agentbot talks meetups

In this blog post, I explore the concept of 'what makes an agent' by discussing various implementations of LlamaBot, a Python package for LLM exploration. I dissect the differences between SimpleBots, StructuredBots, and AgentBots, highlighting their capabilities and limitations in terms of agency and decision-making. Through audience discussions and examples, I aimed to provoke thought on the definition and design of agents, and together, we had an engaging discussion. Can we truly define an agent, or is it like the Turing Test, a concept that evolves with our understanding and technological advancements?

This is a written and edited version of a talk I gave at PyData Boston/Cambridge on 29 Jan 2025, hosted at Moderna's new office buildings for the first time ever. This talk will likely leave us with more questions than answers about "what makes an agent." But I hope it also leaves us with more focused questions and frameworks for thinking about "what makes an agent".

It is based loosely on my blog post. The original notebook that I presented is available within the LlamaBot repo.

import llamabot as lmb

Let's start with LLM bots

This is a helpful framing to anchor our discussion of agents.

LlamaBot is a Python package that I made to pedagogically explore the landscape of LLMs as the field evolves. Intentionally designed to lag 1-2 steps behind innovations and incorporate only what turns out to be robust. LlamaBot implements a variety of LLM bots, the simplest being the SimpleBot.

SimpleBot makes clear the basic anatomy of an LLM API call:

1️⃣ A system prompt: maintained over all calls.

2️⃣ A model name: specifying which model to chat with.

3️⃣ A user's prompt: your input into the LLM's internals.

4️⃣ The response: returned by autoregressive next-token prediction.

In SimpleBot, that looks like this:

flight_bot = lmb.SimpleBot(
    system_prompt="""You are a helpful travel assistant.
    You provide information about flights between cities.""",
    model_name="ollama_chat/llama3.2:3b",
)

# Simple interaction
response = flight_bot("Find me the prices for flights from San Francisco to Tokyo.")

And the response looks like this:

I'd be happy to help you find flight prices from San Francisco (SFO) to Tokyo.

Please note that flight prices can vary depending on several factors such as the time of year, demand, airline, and availability. I'll provide you with some general price ranges for flights from SFO to Tokyo, but keep in mind that these may not be exact prices for your specific travel dates.

Here are some approximate price ranges for flights from San Francisco (SFO) to Tokyo:

**Economy Class:**

* Low-cost carriers like ANA, Japan Airlines, and All Nippon Airways: $500-$800 USD
* Mid-range airlines like United Airlines, Delta Air Lines, and American Airlines: $700-$1,200 USD
* Premium economy or business class: $1,200-$2,500 USD

**Premium Economy Class:**

* Low-cost carriers: $900-$1,400 USD
* Mid-range airlines: $1,200-$1,800 USD
* Premium economy or business class: $1,800-$3,000 USD

**Business Class:**

* Low-cost carriers: Not available
* Mid-range airlines: $2,500-$4,000 USD
* Premium business class: $4,000-$6,000 USD

To get a more accurate estimate of flight prices from SFO to Tokyo, I recommend checking with the following airlines or online travel agencies:

1. Expedia
2. Kayak
3. Skyscanner
4. Google Flights
5. ANA (All Nippon Airways)
6. Japan Airlines

Please enter your specific travel dates and preferred airline to get a more accurate quote.

Together with the audience, we evaluate that response together. Is this an agent?

Audience commentary:

Not an agent: didn't take action apart from introspection of weights.
Not an agent: no external APIs called.
Yes: agent doesn't mean it has to have a tool, anything that gives usable information is an agent.
Yes (to an extent): Follows up with a question/prompt back to the human.
No: agent should make decisions. This LLM call did not make any decisions. Decision-making outsourced to user.

My thoughts are as follows: SimpleBots are heavily reliant on what's in the training data; they are just calling the LLM directly, with no memory of historical chat information. Given the range of regular human experiences in North America, I would venture to guess that it is rare for us to go outside of any base LLM's training data, unless we hit very specialized and niche topics not covered on the internet. (Try thinking of some!) It's hard to argue that this would have the properties of agency, which we would associate with an agent.

Agents that give us structure?

Let's try a different view of agents. What if we said, "Agents give you information structured the way you wanted it?" Let's start with that as a working definition.

LlamaBot implements structured generation. To use it, you need to provide a Pydantic model, effectively a form that you want the bot to fill, and pair it with a StructuredBot. StructuredBots have the same basic requirements as SimpleBots, which are a system prompt and model name, but also need to have a Pydantic model passed in. When called, it will return an instance of that Pydantic model.

from pydantic import BaseModel, Field

class FlightInfo(BaseModel):
    origin: str
    destination: str
    duration: float = Field(..., description="Duration in hours")
    price: float = Field(..., description="Price in USD")
    recommended_arrival: float = Field(
        ..., description="Recommended time to arrive at airport before flight, in hours",
    )

flight_bot_structured = lmb.StructuredBot(
    system_prompt="""You are a helpful travel assistant that provides
    structured information about flights.""",
    model_name="ollama_chat/llama3.2:3b",
    pydantic_model=FlightInfo,
)

# Now we get structured data
result = flight_bot_structured(
    "Find me the prices for flights from San Francisco to Tokyo."
)
print()
print("##########")
print(
    f"The price of the flight is ${result.price}. We recommend that you arrive at the airport {result.recommended_arrival} hours before the flight."
)

Its response looks like this:

{ "origin": "San Francisco", "destination": "Tokyo", "duration": 11, "price": 550, "recommended_arrival": 12 }
##########
The price of the flight is $550.0. We recommend that you arrive at the airport 12.0 hours before the flight.

Audience discussion: Is this an agent?

I specifically asked, where does this fall short of an agent's definition?

No: still like RAG. Doesn't read the situation, is this a good time to buy? Still a simple answer.
No: it would independently decide what to look at, where to look.
Yes: Based on the working definition, yes, and the nature of decision-making means the LLM is doing rapid-fire retrieval from memorized data. Recommended arrival time is an added recommendation that was not part of the original query intent.
No: Is the definition even correct? Why isn't a SQL query an agent, under that definition?
No: within this context, the latent query intent might be to buy tix, but this bot doesn't help me do it.

In summary, this bot still misses the crucial elements of (a) autonomy/agency, and (b) intent guessing.

Structured generation gets us midway

It affords us more control over what we are asking an LLM to generate. The response is constrained.

But it seems like it may still lack a crucial element of an "agent". I'm glad the audience shred up this definition! 🤗

What if we said, "An agent needed to interact externally"?

Here is my attempt, within LlamaBot, to try to nail down what an agent actually is. This flight agent-like bot helps me plan out trips, using one tool, the search_internet tool.

from llamabot.bot.agentbot import search_internet

travel_agent = lmb.AgentBot(
    system_prompt="""You are a travel planning assistant. Help users plan their trips
    by searching flights, hotels, and checking weather.""",
    functions=[search_internet],
    model_name="gpt-4o",
)

# Now we can handle complex queries
travel_response = travel_agent(
    "What's the flight time from San Francisco to Tokyo?"
)

print("\n##########")
print(travel_response.content)

Its response looks like this:

{"tool_name":"search_internet","tool_args":[{"name":"search_term","value":"flight time from San Francisco to Tokyo"}],"use_cached_results":[]}
search_internet({'search_term': 'flight time from San Francisco to Tokyo'}) -> 3394908c, {'https://www.flightsfrom.com/SFO-HND': '�(�\x00 ~������\x17\x1e�\x02�.IwѕZ\x0c�\x1cI\\S\x1c\x1d9��d���7aZ��Dٚݝ\x14�R~ ...(truncated)...'}
{"tool_name":"agent_finish","tool_args":[{"name":"message","value":"The flight time from San Francisco to Tokyo is approximately 10 hours and 48 minutes to 11 hours and 10 minutes for a non-stop flight. This duration can vary slightly depending on the specific flight path and weather conditions."}],"use_cached_results":[]}
agent_finish({'message': 'The flight time from San Francisco to Tokyo is approximately 10 hours and 48 minutes to 11 hours and 10 minutes for a non-stop flight. This duration can vary slightly depending on the specific flight path and weather conditions.'}) -> a106bcaa, The flight time from San Francisco to Tokyo is approximately 10 hours and 48 minutes to 11 hours and 10 minutes for a non-stop flight. This duration can vary slightly depending on the specific flight path and weather conditions.

##########
The flight time from San Francisco to Tokyo is approximately 10 hours and 48 minutes to 11 hours and 10 minutes for a non-stop flight. This duration can vary slightly depending on the specific flight path and weather conditions.

Looks like this could be an agent, but let's see what the audience said.

Audience discussion: is this really an agent?

I asked the audience to disregard the fact that I called this an AgentBot. Just because I said so doesn't make it so!

Where are the holes in this definition of an agent?

Getting closer: agent will do research, in this example, it is the use of a search tool, and processing the text to give a response; the task could have been clearer for end user.
Yes: it has introduced a new error mode. A new way to be wrong, need someone to blame for this.
New component here: this bot has agency, decides "do I call this function or not".
Principal-Agent problem: agent does something that may not be what the Principal wanted it to do.

The audience also had more questions raised:

Q: Is part of the definition of an agent the fact that it is going to interact with a human?
Q: In this implementation of search, is the LLM always going to do the function call?

It's at this point that I knew that we had more focused questions from the audience, and that my first goal was accomplished: to get the audience to think more critically about the definition of an agent.

Design patterns: should this be an agent?

We discussed what an agent is. Now, let's assume that we know what an agent is. (This is an assumption!) If so, how should we design agents, and even then, should it be an agent?

I presented to the audience a minimally complex example: a restaurant bill calculator. It's got a few key characteristics:

l calculation is computable (and hence easily verifiable).
There is sufficient variation in the kinds of questions we can ask.
We can easily implement multiple designs.

AgentBot implementation

Let's start with an implementation based on AgentBot.

We have two tools, which are nothing more than Python functions that are decorated with a @tool decorator. They are, namely, calculate_total_with_tip and split_bill.

# Define the tools
@lmb.tool
def calculate_total_with_tip(bill_amount: float, tip_rate: float) -> float:
    if tip_rate < 0 or tip_rate > 1.0:
        raise ValueError("Tip rate must be between 0 and 1.0")
    return bill_amount * (1 + tip_rate)

@lmb.tool
def split_bill(total_amount: float, num_people: int) -> float:
    return total_amount / num_people

Then, we create the AgentBot:

# Create the bot
bot = lmb.AgentBot(
    system_prompt=lmb.system(
        "You are my assistant with respect to restaurant bills."
    ),
    functions=[calculate_total_with_tip, split_bill],
    model_name="gpt-4o",
)

Now, let's try a few calculations, the first one being just calculating the total with tip:

# Calculate total with tip
calculate_total_only_prompt = (
    "My dinner was $2300 without tips. Calculate my total with an 18% tip."
)
resp = bot(calculate_total_only_prompt)

We get this output:

{"tool_name":"calculate_total_with_tip","tool_args":[{"name":"bill_amount","value":2300},{"name":"tip_rate","value":0.18}],"use_cached_results":[]}
calculate_total_with_tip({'bill_amount': 2300, 'tip_rate': 0.18}) -> bdc66afd, 2714.0
{"tool_name":"agent_finish","tool_args":[{"name":"message","value":"The total amount for your dinner, including an 18% tip, is $2714.00."}],"use_cached_results":[]}
agent_finish({'message': 'The total amount for your dinner, including an 18% tip, is $2714.00.'}) -> 73ec736f, The total amount for your dinner, including an 18% tip, is $2714.00.
The total amount for your dinner, including an 18% tip, is $2714.00.

This looks right! And if we try another:

# Split the bill

split_bill_only_prompt = "My dinner was $2300 in total, I added an 18% gratuity, split the bill between 20 people."

resp2 = bot(split_bill_only_prompt)

We get this:

{"tool_name":"calculate_total_with_tip","tool_args":[{"name":"bill_amount","value":2300},{"name":"tip_rate","value":0.18}],"use_cached_results":[]}
calculate_total_with_tip({'bill_amount': 2300, 'tip_rate': 0.18}) -> bdc66afd, 2714.0
{"tool_name":"split_bill","tool_args":[{"name":"total_amount","value":null},{"name":"num_people","value":20}],"use_cached_results":[{"arg_name":"total_amount","hash_key":"bdc66afd"}]}
split_bill({'num_people': 20, 'total_amount': 2714.0}) -> 2a75a8f8, 135.7
{"tool_name":"agent_finish","tool_args":[{"name":"message","value":"The total bill including an 18% gratuity is $2714. When split between 20 people, each person needs to pay $135.70."}],"use_cached_results":[]}
agent_finish({'message': 'The total bill including an 18% gratuity is $2714. When split between 20 people, each person needs to pay $135.70.'}) -> f92ef2d0, The total bill including an 18% gratuity is $2714. When split between 20 people, each person needs to pay $135.70.
The total bill including an 18% gratuity is $2714. When split between 20 people, each person needs to pay $135.70.

Couldn't it have been a Python function?

Should this have been an agent? After all, we very well could have done this instead:

def calculate_bill(
    total_amount: float, tip_percentage: int, num_people: int
) -> float:
    return (total_amount * (100 + tip_percentage) / 100) / num_people

calculate_bill(2300, 18, 20) # gives us 135.7

Is there a way to make that Python function more flexible?

But this is a very restrictive implementation. What if we didn't have any division to do?

def calculate_bill_v2(
    total_amount: float, tip_percentage: int = 0, num_people: int = 1
) -> float:
    if tip_percentage:
        total_amount = total_amount * (100 + tip_percentage) / 100
    if num_people:
        return total_amount / num_people
    return total_amount

calculate_bill_v2(2714, 0, 20) # also gives us 135.7

Some commentary from me: it's the same functionality, but with more flexibility. The scope of inputs is more variable, in the form of almost anything you want... but not with natural language. And this point, by the way, seems to distinguish between an LLM agent and a Python program.

But couldn't reasoning models do this?

What if we did this instead?

r1_bot = lmb.SimpleBot(
    "You are a smart bill calculator.", model_name="ollama_chat/deepseek-r1:latest"
)
r1_bot(
    "My dinner was $2300 in total, with 18% tip. Split the bill between 20 people. Respond with only the number."
)

We get this:

<think>
First, I need to calculate the total amount of the dinner including an 18% tip.

The original cost is \$2300. To find the tip, I'll multiply this by 18%.

\$2300 × 0.18 = \$414

Adding the tip to the original cost gives:

\$2300 + \$414 = \$2714

Next, I need to split this total amount equally among 20 people.

So, I'll divide \$2714 by 20.

\$2714 ÷ 20 = \$135.70

Therefore, each person should contribute \$135.70.
</think>

Sure! Let's break down the calculation step by step:

**Step 1: Calculate the Total with Tip**

- **Original Cost:** \$2,300
- **Tip Percentage:** 18%

\[
\text{Tip} = 2,\!300 \times 0.18 = 414
\]

\[
\text{Total with Tip} = 2,\!300 + 414 = 2,\!714
\]

This didn't involve any tools that the agent had to interact with, so it seems like problems that only involve reasoning don't necessitate building an agent.

What if we didn't have to provide any tools instead?

Or what if, we asked the agent to write and execute its own code? (This follows HuggingFace's definition of an agent.)

from llamabot.bot.agentbot import write_and_execute_script

# Create the bot
autonomous_code_bot = lmb.AgentBot(
    system_prompt=lmb.system(
        "You are my assistant with respect to restaurant bills."
    ),
    functions=[write_and_execute_script],
    model_name="gpt-4o",
)

# Split the bill
autonomous_code_bot(split_bill_only_prompt)

And with this implementation, we get:

{"tool_name":"write_and_execute_script","tool_args":[{"name":"code","value":"# Calculate the total bill including gratuity and then split it among 20 people\n\ntotal_bill = 2300\n\ngratitude_percentage = 18\n\n# Calculate the total amount including gratuity\ntotal_with_gratuity = total_bill + (total_bill * gratitude_percentage / 100)\n\n# Split the total amount among 20 people\namount_per_person = total_with_gratuity / 20\n\nprint(amount_per_person)"}],"use_cached_results":[]}---SCRIPT EXECUTION RESULT---
135.7
---SCRIPT EXECUTION RESULT---

write_and_execute_script({'code': '# Calculate the total bill including gratuity and then split it among 20 people\n\ntotal_bill = 2300\n\ngratitude_percentage = 18\n\n# Calculate the total amount including gratuity\ntotal_with_gratuity = total_bill + (total_bill * gratitude_percentage / 100)\n\n# Split the total amount among 20 people\namount_per_person = total_with_gratuity / 20\n\nprint(amount_per_person)'}) -> 18b90e4d, 135.7
{"tool_name":"agent_finish","tool_args":[{"name":"message","value":"The total dinner bill was $2300, and with an 18% gratuity added, the total becomes $2714. When this total is split between 20 people, each person needs to pay $135.70."}],"use_cached_results":[]}
agent_finish({'message': 'The total dinner bill was $2300, and with an 18% gratuity added, the total becomes $2714. When this total is split between 20 people, each person needs to pay $135.70.'}) -> 0cfe8442, The total dinner bill was $2300, and with an 18% gratuity added, the total becomes $2714. When this total is split between 20 people, each person needs to pay $135.70.

In all cases above, our bots are able to answer the question. But they all approached the question in different ways. It seems like the problems in which we would want an agent aren't the kinds that can be solved by SimpleBots (of various models) alone.

Dissection

If we think carefully about the distinction between a Python function, its stochastic variant, and an LLM agent, we might make the following observations:

Functions

Functions are written to accomplish a goal.
Functions have an input signature, a body, and a return.
Function inputs are constrained to the types that are accepted; they cannot be natural language.
Function program flow is deterministic.

Stochastic Functions

Stochastic functions have non-deterministic flow control, resulting in a distribution of possible outputs.

LLM Agents

Are non-deterministic in flow control.
Rely on structured outputs internally.
Allow for natural language inputs.
Nonetheless accomplish a goal.

What should be an agent?

Anthropic has guidance.

When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all. Agentic systems often trade latency and cost for better task performance, and you should consider when this tradeoff makes sense.

When more complexity is warranted, workflows offer predictability and consistency for well-defined tasks, whereas agents are the better option when flexibility and model-driven decision-making are needed at scale. For many applications, however, optimizing single LLM calls with retrieval and in-context examples is usually enough.

In other words, can you build it with a regular Python program first? If so, maybe just start there.

And from my own blog:

A roadmap for designing agent applications
I've found that the most effective way to design agent applications is to progressively relax constraints on inputs and execution order.

Start with a deterministic program

Design your application as you would with regular API calls

Define clear input/output specifications

Implement core functionality with standard programming patterns

Relax input constraints

Accept natural language input and convert it to structured parameters for function calls

Enable autonomous function calling based on natural language understanding

Relax execution order constraints

Only necessary when natural language inputs are varied enough to require different execution paths

Allow flexible ordering of operations when needed

Enable dynamic selection of which functions to call

Maintain boundaries around available actions while allowing flexibility in their use

This progressive relaxation approach helps us transition from traditional deterministic programming to an agent-driven paradigm where execution order is non-deterministic and inputs are natural language.

More perspectives

Finally, I concluded with a few more perspectives for others to consider.

Conclusion

In this talk, we went through two main questions:

What makes an agent?
What should be an agent?

I am pretty confident that I left the audience with more questions than answers. But I also am confident that those questions were more focused and specific than they were before the talk. I hope that's the same for you, the reader of this blog!

Cite this blog post:

@article{
    ericmjl-2025-pydata-bostoncambridge-talk-moderna-what-makes-an-agent,
    author = {Eric J. Ma},
    title = {PyData Boston/Cambridge Talk @ Moderna: What makes an agent?},
    year = {2025},
    month = {01},
    day = {31},
    howpublished = {\url{https://ericmjl.github.io}},
    journal = {Eric J. Ma's Blog},
    url = {https://ericmjl.github.io/blog/2025/1/31/pydata-bostoncambridge-talk-moderna-what-makes-an-agent},
}

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!

Eric J Ma's Website