Mixtral-8x7b-Instruct works on an old GTX1080!

written by Eric J. Ma on 2024-03-10 | tags: mixtral 8x7b-instruct old gpu linux tower 4-bit quantized llama bot keyword generator protein engineering machine learning older commodity hardware

Today, on a whim, I decided to try running the Mixtral 8x7b-Instruct model (via Ollama) on my old Linux GPU tower. Specifically, I am using the 4-bit quantized model. To my surprise, it works!

As always, in LlamaBot, this is relatively easy. To start, on my GPU server, I ran:

ollama pull mixtral:8x7b-instruct-v0.1-q4_0

Then, within my Jupyter on my MacBook Air:

keywords_sysprompt = """"Generate keywords for the document provided to you.
Please return JSON of format:

    {'keywords': ['keyword', 'keyword', 'keyword',...]}.


Keywords should be one or two words, separated by a space.
Return only keywords, nothing else.
Do not add your own commentary.
"""

keyword_generator_ollama = SimpleBot(
    model_name="ollama/mixtral:8x7b-instruct-v0.1-q4_0",  # Specifying Ollama via the model_name argument is necessary!s
    system_prompt=keywords_sysprompt,
    stream_target="stdout",  # this is the default!
    api_base=f"http://{os.getenv('OLLAMA_SERVER')}:11434",
    # json_mode=True,
    # format="json",
)

response = keyword_generator_ollama(document)

In generating keywords for a paper on protein engineering and machine learning, I had the following:

{'keywords': ['machine learning', 'functional protein design', 'AI', 'protein sequence', 'structure data', 'core data modalities', 'enzymes', 'antibodies', 'vaccines', 'nanomachines', 'large-scale assays', 'robust benchmarks', 'multimodal foundation models', 'enhanced sampling strategies', 'laboratory automation', 'protein fitness landscape', 'rational design methods', 'directed evolution', 'combinatorial libraries', 'biophysics-based models', 'DNA sequencing', 'algorithmic advances', 'computing advances', 'machine learning-based design methods']}

This wasn't too bad at all; it feels similar to what GPT-4 would provide, which has been on par with what I've observed with Mixtral-8x7b's output quality. The thing is qualitatively much slower than running mistral-7b (I have not measured tokens per second yet), but it does work.

As I mentioned in my previous post, running LLMs on my old GPU tower helped me breathe some new usage life into it. Running Mixtral-8x7b was another hardware challenge that I was eager to see, and I'm glad to have more evidence that LLMs can run on older commodity hardware!

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for organizations who are seeking guidance on how to best leverage this technology. Consider booking a call on Calendly if you're interested!

Eric J Ma's Website