written by Eric J. Ma on 2023-10-22 | tags: python large language models llms gitbot zotero local llms ollama langchain openai gpt-4 prompt engineering llamabot
In this blog post, I explore the integration of local Large Language Models (LLMs) with my LlamaBot project using Ollama. I discuss how Ollama simplifies the setup of local LLMs and demonstrate how to use Ollama models with LlamaBot. I also share a quick demo with Zotero chat using Ollama models. While OpenAI's GPT-4 remains the benchmark, local models offer cost-free alternatives. Curious to read more?
If you've been following the LlamaBot project, you know it's my pet Pythonic project to interface with Large Language Models (LLMs). I've had fun building some cool stuff, like GitBot, a chatbot for your Zotero library, and even a blogging assistant (more on that later, promise!).
However, there's been one area I've shied away from until now: using local LLMs. The setup could be daunting, but I finally found a way to simplify it with Ollama.
A community member posted an issue on GitHub:
"Hi @ericmjl, thank you for making this! I'd sent a PR but it's a little beyond me but I was wondering if there is a simple way to use local models such as Ollama?"
And I thought, why not? Can it work? Let's find out.
Ollama made setting up local LLMs a breeze. I was pleasantly surprised at the smooth installation process; you simply need to follow the instructions on their main page.
Following that, there are two ways to access Ollama models.
ollama run <model name>
ollama serve
One thing to note: Ollama pulls in models on the fly. They range from 3GB to 16GB, so you may need to be patient while they download.
My earlier decision to use LangChain paid off, even with all of the frustrations I had trying to track a fast-evolving Python package. LangChain's architecture made it straightforward to write a model dispatcher.
def create_model( model_name, temperature=0.0, streaming=True, verbose=True, ): """Dispatch and create the right model. This is necessary to validate b/c LangChain doesn't do the validation for us. :param model_name: The name of the model to use. :param temperature: The model temperature to use. :param streaming: (LangChain config) Whether to stream the output to stdout. :param verbose: (LangChain config) Whether to print debug messages. :return: The model. """ ModelClass = ChatOpenAI if model_name.split(":")[0] in ollama_model_keywords: ModelClass = ChatOllama return ModelClass( model_name=model_name, temperature=temperature, streaming=streaming, verbose=verbose, callback_manager=BaseCallbackManager( handlers=[StreamingStdOutCallbackHandler()] if streaming else [] ), )
I made some tweaks in SimpleBot, ChatBot, and QueryBot to ensure that they work with Ollama models.
So, how exactly do we use Ollama models with LlamaBot?
Firstly, start by serving an Ollama:
ollama pull <ollama model name> ollama serve
Secondly, in your Jupyter notebook, initialize the bot:
bot = SimpleBot(..., model_name=<ollama model name>)
And that's it!
I’ve already enabled Zotero chat to use Ollama models to give you a taste. Try it out:
llamabot zotero chat --model-name <ollama model name>
While OpenAI's GPT-4 still sets the benchmark in speed and response quality, local models offer the freedom of being cost-free. This opens up new avenues for fine-tuning and prompt engineering!
Thanks for reading, and happy coding!
@article{
ericmjl-2023-how-llamabot,
author = {Eric J. Ma},
title = {How to run Ollama with LlamaBot},
year = {2023},
month = {10},
day = {22},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2023/10/22/how-to-run-ollama-with-llamabot},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!