written by Eric J. Ma on 2023-03-29 | tags: llms large language models gpt gpt4 openai langchain llama_index
Large Language Models (LLMs) are having a moment now! We can interact with them programmatically in three ways: OpenAI's official API, LangChain's abstractions, and LlamaIndex. How do we choose among the three? I'd like to use a minimally complex example to showcase how we might make this decision.
I blog.
(Ok, well, that's quite the understatement, given that you're reading it right now.)
As you probably can tell from my blog's design, I need to summarize the blog post for the blog landing page. Because I usually share the post on LinkedIn, I also need to generate a LinkedIn post advertising the blog post. I've heard that emojis are great when sharing posts on LinkedIn, but I can only sometimes remember the names of appropriate emojis. Some automation help would be great here. Finally, if I can make a better title than I initially thought of, that would also constitute an improvement.
I've decided that I wanted a tool that can summarize my blog posts, generate appropriately emojified LinkedIn posts for me to advertise those posts, and provide a catchy and engaging title without being clickbait. These are tedious to write! Wouldn't it be great to have a tool that can generate both? That's the use case for LLMs!
Here's the list of requirements I have to build the tool I want.
Firstly, I should be able to provide the text of a blog post as input and get back:
Secondly, the tool should minimize token usage. Tokens equal money spent on this task, so saving on token usage would increase my leverage trading time for money.
Finally, because I'm still writing some code to implement this tool, I'd like to use a package that would provide the most easily maintained code. Of course, that means a subjective judgment of how simple the abstractions are.
To that end, I will show how to implement this writing tool using three APIs currently available: the official OpenAI Python API, the LangChain API, and the LlamaIndex API. This exercise lets us see which ones are most suitable for this particular use case. We will be using GPT-4 everywhere, as its quality is known to be superior to GPT-3.5. Finally, I will focus on blog post summary generation and LinkedIn post generation when comparing the APIs before choosing one framework to implement the complete program.
The blog post I'll use for implementing this is my most recent one on the Arc browser. I will be passing in the Lektor raw source without the title. The raw source is available on my website repository.
According to tiktoken
,
this text uses 1436 tokens to encode:
blog_text = ... # taken from my raw source. encoder = tiktoken.get_encoding("cl100k_base") tokens = encoder.encode(blog_text) len(tokens)
1436
This is a constant throughout the post.
The prompts that I will use to generate the desired text are as follows. If the three prompts are too long to remember, you can focus on the one for summarization as an anchoring example.
summarization_prompt = f"""You are a blog post summarization bot. Your take a blog post and write a summary of it. The summary is intended to hook a reader into the blog post and entice them to read it. You should add in emojis where appropriate. My previous summaries sounded like this: 1. I finally figured out how to programmatically create Google Docs using Python. Along the way, I figured out service accounts, special HTML tags, and how to set multi-line environment variables. Does that tickle your brain? 2. Here is my personal experience creating an app for translating Ark Channel devotionals using OpenAI's GPT-3. In here, I write about the challenges I faced and the lessons I learned! I hope it is informative for you! 3. I discuss two Twitter threads that outline potential business ideas that could be built on top of ChatGPT3, a chatbot-based language model. What's the tl;dr? As mankind has done over and over, we build machines to solve mundane and repetitive tasks, and ChatGPT3 and other generative models are no exception! The summary you generate should match the tone and style of the previously-generated ones. """
linkedin_prompt = """You are a LinkedIn post generator bot. You take a blog post and write a LinkedIn post. The post is intended to hook a reader into reading the blog post. The LinkedIn post should be written with one line per sentence. Each sentence should begin with an emoji appropriate to that sentence. The post should be written in professional English and in first-person tone. """
title_prompt = """You are a blog post title generator. You take a blog post and write a title for it. The title should be short, ideally less than 100 characters. It should be catchy but not click-baity, free from emojis, and should intrigue readers, making them want to read the summary and the post itself. Ensure that the title accurately captures the contents of the post. """
Using OpenAIβs API requires we set the OpenAI API key before doing anything;
this is also true for using LangChain and LlamaIndex.
As always, to adhere to reasonable security practices when developing locally,
we should store the API key as an environment variable in a .env
file
listed in the .gitignore
of a repository
and then load the API key in our Jupyter notebooks.
Here is how we implement it. Firstly, the .env
file:
# .env export OPENAI_API_KEY="..."
And then, at the top of our Jupyter notebook or Python script:
from dotenv import load_dotenv import openai import os load_dotenv() openai.api_key = os.getenv("OPENAI_API_KEY")
Letβs start off using the OpenAI Python API. To use GPT-4, we need to provide a chat history of messages that looks something like this:
messages = [ {"role": "system", "content": summarization_prompt}, {"role": "user", "content": f"Here is my blog post source: {blog_post}" ]
Then, we pass the message history into the OpenAI chat completion class
result = openai.ChatCompletion.create(messages=messages, model="gpt-4", temperature=0.0)
Once the API call has returned, we can see what gets returned:
summary = result["choices"][0]["message"]["content"] print(summary)
I recently got my hands on the new [Arc browser](https://arc.net/) and I'm loving it! π€© Arc reimagines the browser as a workspace, helping us manage our chaotic tabs and multiple projects. Some cool features include tabs that expire β³, spaces for grouping tabs ποΈ, rapid switching between tabbed and full-screen mode π₯οΈ, side-by-side view for multitasking π, and automatic developer mode for locally hosted sites π§. Arc's focus on UI design is a game-changer for productivity and focus! Check out my first impressions and see if Arc could be your new favorite browser! π
The total number of tokens used here is:
len(encoder.encode(summarization_prompt)) \ + len(encoder.encode(human_message["content"])) \ + len(encoder.encode(summary))
1870
Now, let's make the LinkedIn post.
linkedin_prompt = {"role": "system", "content": linkedin_prompt} messages = [linkedin_prompt, human_message] result = openai.ChatCompletion.create(messages=messages, model="gpt-4", temperature=0.0) linkedin_post = result["choices"][0]["message"]["content"] print(linkedin_post)
π Just tried the new [Arc browser](https://arc.net/) for 24 hours! π§ It's designed to fit the modern multitasker's brain. β³ Love the tabs that expire feature - goodbye clutter! π Spaces for grouping tabs - perfect for juggling multiple projects. π Rapid switching between tabbed and full-screen mode for better focus. π Side-by-side view for efficient multitasking. π¨βπ» Automatic developer mode for locally hosted sites - a developer's dream! π Overall, Arc is a game-changer for productivity and focus. π Read my full experience in the blog post [here](<blog_post_link>).
The total number of tokens used here is:
len(encoder.encode(linkedin_prompt)) \ + len(encoder.encode(human_message["content"])) \ + len(encoder.encode(linkedin_post))
1669
Tabulating the total cost of tokens, we have 3c per 1,000 tokens for prompts and 6c per 1,000 token for generated texts. We can do the math easily here:
# Cost prompt_encoding_lengths = ( len(encoder.encode(linkedin_prompt)) \ + len(encoder.encode(human_message["content"])) \ + len(encoder.encode(summarization_prompt)) \ + len(encoder.encode(human_message["content"])) ) generated_encoding_lengths = ( len(encoder.encode(linkedin_post)) + \ len(encoder.encode(summary)) ) cost = ( 0.03 * prompt_encoding_lengths + 0.06 * generated_encoding_lengths ) / 1000 print(cost)
0.11657999999999999
Or about 12c to perform this operation.
How much ROI do we get here?
Excluding benefits, equity, and more, a new Ph.D. grad data scientist
is paid about $150,000
(give or take) per year in the biomedical industry in 2023.
Assuming about 250 days of work per year at an average of 8 hours per day,
we're talking about an hourly rate of $75
/hr at that salary.
If it takes that person 10 minutes to cook up a summary and LinkedIn post
(which is about how long I take -
excluding figuring out what emojis to put in
because that's, for me, bloody time-consuming.),
then we're looking at $12.5
worth of time put into crafting that post.
Looking at just the cold hard numbers,
we're looking at a 100X cost improvement by using GPT-4 as a writing aid.
Now, let's do the same thing, except with the LangChain API. Here, we'll be using LangChain's Chain API.
First off, summarization:
from langchain.chains import LLMChain from langchain.prompts.chat import ( ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate ) from langchain.prompts import PromptTemplate system_message_prompt = SystemMessagePromptTemplate( prompt=PromptTemplate( template=summarization_prompt, input_variables=[], ), ) from langchain.chat_models import ChatOpenAI chat = ChatOpenAI(model_name="gpt-4", temperature=0.0) human_message_prompt = HumanMessagePromptTemplate( prompt=PromptTemplate( template="Here is my post:\n\n {blog_post}?", input_variables=["blog_post"], ) ) chat_prompt_template = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt]) summary_chain = LLMChain(llm=chat, prompt=chat_prompt_template) summary = summary_chain.run(blog_post=blog_text) print(summary)
I recently tried out the Arc browser π, which reimagines the browser as a workspace to boost productivity! π After 24 hours of use, I'm loving its features: tabs that expire β³, spaces for organizing tabs ποΈ, rapid switching between tabbed and full-screen mode π₯οΈ, side-by-side view for multitasking π, and automatic developer mode for locally hosted sites π». Arc's focus on UI design helps us stay focused and organized in our digital lives. Curious to know more? Dive into my first impressions! π
According to tiktoken:
len(encoder.encode(summary))
The generated text was 191 tokens.
len(encoder.encode(system_message_prompt.format_messages()[0].content))
The system prompt was 237 tokens.
len(encoder.encode(human_message_prompt.format_messages(blog_post=blog_text)[0].content))
The human message prompt was 1442.
We'll keep those numbers in mind when doing the final cost accounting.
Now, let's make the LinkedIn post.
linkedin_system_prompt = SystemMessagePromptTemplate( prompt=PromptTemplate( template=linkedin_prompt, input_variables=[], ), ) linkedin_blog_prompt = HumanMessagePromptTemplate( prompt=PromptTemplate( template="Here is the blog post:\n\n{blog_post}", input_variables=["blog_post"], ) ) linkedin_chat_prompt_template = ChatPromptTemplate.from_messages([linkedin_system_prompt, linkedin_blog_prompt]) linkedin_chain = LLMChain(llm=chat, prompt=linkedin_chat_prompt_template) linkedin_post = linkedin_chain.run(blog_post=blog_text) print(linkedin_post)
π Just tried the new [Arc browser](https://arc.net/) and I'm loving it! π§ π Arc's unique features like tabs that expire and spaces for different projects help me stay focused and organized. π π Check out my blog post for a detailed review and first impressions of this game-changing browser: [Arc Browser: First Impressions](<blog post link>) π π¨βπ» Are you ready to revolutionize your browsing experience? #ArcBrowser #Productivity #Tools
Doing an accounting of the tokens once again:
len(encoder.encode(linkedin_post))
The generated LinkedIn post used 151 generated tokens.
len(encoder.encode(linkedin_blog_prompt.format_messages(blog_post=blog_text)[0].content))
The blog post prompt itself used 1441 prompt tokens.
len(encoder.encode(linkedin_system_prompt.format_messages()[0].content))
And the system prompt used 71 tokens.
As usual, let's do the accounting of tokens.
# Cost Accounting generated_encodings_length = len(encoder.encode(linkedin_post)) + len(encoder.encode(summary)) prompt_lengths = ( len(encoder.encode(human_message_prompt.format_messages(blog_post=blog_text)[0].content)) \ + len(encoder.encode(system_message_prompt.format_messages()[0].content)) \ + len(encoder.encode(linkedin_blog_prompt.format_messages(blog_post=blog_text)[0].content)) \ + len(encoder.encode(linkedin_system_prompt.format_messages()[0].content)) ) cost = 0.03 * prompt_lengths + 0.06 * generated_encodings_length print(cost / 1000)
0.11648999999999998
As we can see, this also costs about 12c for the entire exercise.
The way LlamaIndex works is different from the previous two frameworks. With OpenAI's and LangChain's APIs, we stuffed the entire document into the prompt for each task. With LlamaIndex, we can embed and store the text beforehand and then query it with a prompt for our LLM. Here is how it looks:
To do this, we use the GPTSimpleVectorIndex
provided by LlamaIndex:
from llama_index import Document, GPTSimpleVectorIndex, LLMPredictor # Index documents using GPT4 llm_predictor = LLMPredictor(llm=chat) documents = [Document(blog_text)] index = GPTSimpleVectorIndex(documents=documents, llm_predictor=llm_predictor)
Thanks to the built-in token accounting capabilities of LlamaIndex, we can see that building the index costed us 1637 tokens.
INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total embedding token usage: 1637 tokens
Let's start by creating a summary of the blog post.
response = index.query(summarization_prompt) print(response.response)
Discover the Arc browser, a game-changer that reimagines the browser as a workspace π! With features like tabs that expire β³, spaces for grouping tabs π, rapid switching between modes β‘, side-by-side view π, and automatic developer mode for locally hosted sites π§, Arc is designed to boost your productivity and focus π. Dive into my 24-hour experience with this innovative browser and see how it fits my brain π§ !
Not bad! The response looks pretty good, though I might edit it a bit further.
The token usage here is:
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 2026 tokens INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 257 tokens
Now, let's try generating a LinkedIn post the same way.
response = index.query(linkedin_prompt) print(response.response)
π Just tried the new Arc browser for 24 hours! π It reimagines the browser as a workspace, perfect for multitaskers. β° Love the tabs that expire feature, keeping my workspace clutter-free. π Spaces help me group tabs by context, boosting focus and productivity. π₯οΈ Rapid switching between tabbed and full-screen mode is a game-changer. π©βπ» Developer mode for locally hosted sites makes web development a breeze. π Check out my detailed review on my blog and see how Arc can transform your browsing experience!
Not bad, it followed my instructions as well.
Here's the token usage:
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 1874 tokens INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 78 tokens
LlamaIndex did not break down the difference in prompt tokens vs. generated text tokens for me but split out embedding tokens. In calculating the cost, we will make the following assumptions:
Crunching the numbers, we get:
embedding_tokens = 1637 + 257 + 78 linkedin_llm_tokens = 1874 linkedin_generated_tokens = len(encoder.encode(response.response)) summary_llm_tokens = 2026 summary_generated_tokens = len(encoder.encode(summary_response.response)) llm_tokens_cost = (0.06 * (linkedin_generated_tokens + summary_generated_tokens) + 0.03 * (linkedin_llm_tokens - linkedin_generated_tokens) + 0.03 * (summary_llm_tokens - summary_generated_tokens)) / 1000 embedding_tokens_cost = embedding_tokens * 0.0004 / 1000 print(f"Cost: {llm_tokens_cost + embedding_tokens_cost}")
Cost: 0.1243588
Also about 12c for this exercise.
Here are the generated texts for summarization side-by-side:
I recently got my hands on the new [Arc browser](https://arc.net/) and I'm loving it! π€© Arc reimagines the browser as a workspace, helping us manage our chaotic tabs and multiple projects. Some cool features include tabs that expire β³, spaces for grouping tabs ποΈ, rapid switching between tabbed and full-screen mode π₯οΈ, side-by-side view for multitasking π, and automatic developer mode for locally hosted sites π§. Arc's focus on UI design is a game-changer for productivity and focus! Check out my first impressions and see if Arc could be your new favorite browser! π
I recently tried out the Arc browser π, which reimagines the browser as a workspace to boost productivity! π After 24 hours of use, I'm loving its features: tabs that expire β³, spaces for organizing tabs ποΈ, rapid switching between tabbed and full-screen mode π₯οΈ, side-by-side view for multitasking π, and automatic developer mode for locally hosted sites π». Arc's focus on UI design helps us stay focused and organized in our digital lives. Curious to know more? Dive into my first impressions! π
Discover the Arc browser, a game-changer that reimagines the browser as a workspace π! With features like tabs that expire β³, spaces for grouping tabs π, rapid switching between modes β‘, side-by-side view π, and automatic developer mode for locally hosted sites π§, Arc is designed to boost your productivity and focus π. Dive into my 24-hour experience with this innovative browser and see how it fits my brain π§ !
Of the three, LlamaIndex's style is the furthest from my usual style, though, in some instances, I might choose the style generated by LlamaIndex to change things up on my blog.
π Just tried the new [Arc browser](https://arc.net/) for 24 hours! π§ It's designed to fit the modern multitasker's brain. β³ Love the tabs that expire feature - goodbye clutter! π Spaces for grouping tabs - perfect for juggling multiple projects. π Rapid switching between tabbed and full-screen mode for better focus. π Side-by-side view for efficient multitasking. π¨βπ» Automatic developer mode for locally hosted sites - a developer's dream! π Overall, Arc is a game-changer for productivity and focus. π Read my full experience in the blog post [here](<blog_post_link>).
π Just tried the new [Arc browser](https://arc.net/) and I'm loving it! π§ π Arc's unique features like tabs that expire and spaces for different projects help me stay focused and organized. π π Check out my blog post for a detailed review and first impressions of this game-changing browser: [Arc Browser: First Impressions](<blog post link>) π π¨βπ» Are you ready to revolutionize your browsing experience? #ArcBrowser #Productivity #Tools
π Just tried the new Arc browser for 24 hours! π It reimagines the browser as a workspace, perfect for multitaskers. β° Love the tabs that expire feature, keeping my workspace clutter-free. π Spaces help me group tabs by context, boosting focus and productivity. π₯οΈ Rapid switching between tabbed and full-screen mode is a game-changer. π©βπ» Developer mode for locally hosted sites makes web development a breeze. π Check out my detailed review on my blog and see how Arc can transform your browsing experience!
Of the three, OpenAI's is furthest from how I usually write my LinkedIn posts, but in fairness, I didn't instruct the model with examples provided. If I were to choose, I would pick LlamaIndex's generated post.
For this simple use case, which would I go with? LlamaIndex, OpenAI's official API, or LangChain?
As a lazy programmer, I would go with LlamaIndex. That's because I needed the fewest lines of code to reach my desired endpoints. It is easy to remember the API as well - you only need to remember the pattern:
Document
to wrap the text that we imported,GPTSimpleVectorIndex
(or more generically <some>Index
), andLLMPredictor
to wrap around LangChain's LLMs.<Index>.query(prompt)
The biggest reason why LlamaIndex is best suited to this use case is that querying text is a relatively simple use case. We only need to load the document in a query-able fashion. Furthermore, as you can see from my code above, using LlamaIndex resulted in the least boilerplate code.
That said, is LlamaIndex the right choice for every application? Probably not. As of March 2023, LangChain is geared towards much more complex (and autonomous) LLM use cases. So the abstractions that the library creators put in the library may be geared to design larger LLM applications rather than simple ones like we just did. And OpenAI's API is officially supported, which means it would likely have an official "blessing" in the long run.
I have a penchant/bias for more tightly-controlled LLM applications, which means forgoing a chat interface in favour of a single-text-in-single-text-out interface. Indeed, while I see the usefulness of chat-based interfaces for exploration, I don't think it will become the dominant UI when embedding LLMs in applications. Rather, I predict that we'll see UI/UX researchers designing, on a per-application basis, whether to use a free-flowing chat UI or to use a more tightly controlled ad-lib-style interface, or even crazier interfaces! The factor that should matter the most here is the ROI gained in human time.
Additionally, I predict that we will see data science teams use LLMs to bang out dozens of little utility apps much faster than previously possible. These utility apps will likely serve niche-specific but often-repeated use cases, and may serve as the basis of larger applications that get developed. For example, we'll probably see built apps that let users ad-lib a templated prompt. We'll see things like LLMs being the role of data engineer (e.g. structuring data from unstructured text) and domain-specific idea generators prompted by domain experts or domain needs (as we saw here),
The short answer is that we'll see more customization to local contexts, with highest ROI use cases being prioritized.
There are some caveats here to what I've done that I'd like to point out.
Firstly, I've used the most straightforward implementations, stuffing the most text into the prompt. Why? It is because I'm still exploring the libraries and their APIs and partly because so much development has happened quickly. Others will be able to figure out more efficient ways of implementing what we did here with their favourite framework.
Secondly, this particular task is quite simple. LangChain, on the other hand, can generate much more complex programs, as we can see from its documentation. Critiquing LangChain's heavy abstractions would be unfair, as it's very likely designed for more complex autonomous applications.
One application built three ways with three different libraries. We have seen how the outputs can differ even with the same prompts and how the developer experience can vary between the three. That said, I caution that these are early days. Libraries evolve. Their developers can introduce more overhead or reduce it. It's probably too early to "pick a winner"; as far as I can tell, it's less a competition and more a collaboration between these library developers. However, we can keep abreast of the latest developments and keep experimenting with the libraries on the side, finding out what works and what doesn't and adapting along the way.
The near-term value proposition of GPT as a tool lies in its ability to automate low-risk, high-effort writing. We saw through cost calculations that LLMs can represent a 100X ROI in time saved. For us, as data scientists, that should be heartening! Where in your day-to-day work can you find a similar 100X ROI? Let me know in the comments!
@article{
ericmjl-2023-a-2023,
author = {Eric J. Ma},
title = {A Developer-First Guide to LLM APIs (March 2023)},
year = {2023},
month = {03},
day = {29},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2023/3/29/a-developer-first-guide-to-llm-apis-march-2023},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!