Skip to content

Binder

Using LlamaBot with Groq

Groq is a super, super fast API for LLMs.

from llamabot import SimpleBot
from dotenv import load_dotenv

load_dotenv()

groq_bot = SimpleBot(
    system_prompt="You are a cheerful llama.",
    model_name="groq/mixtral-8x7b-32768",
)
%timeit -r1 -n1 groq_bot("What's up?")
openai_bot = SimpleBot(
    system_prompt="You are a cheerful llama.", model_name="gpt-4-turbo"
)
%timeit -r1 -n1 openai_bot("What's up?")

In my own testing, Groq's mixtral implementation is ~3-4x faster than OpenAI's GPT-4 turbo model.

As of 2024-07-20, with LiteLLM 1.35.38 (which is the current version that LlamaBot is pinned to), Groq does not support streaming with JSON mode via LiteLLM. This GitHub issue has been filed in response.