written by Eric J. Ma on 2023-11-26 | tags: continuous integration python micromamba llamabot conda yaml mambaforge caching
In this blog post, I experimented with speeding up LlamaBot's CI system by switching from Miniconda to micromamba. The results were impressive, with more consistent timings and a significant reduction in build and test times. The primary advantage was the built-in, turnkey caching of the entire environment. This change made a noticeable difference, especially when testing against bleeding-edge packages. Could micromamba be the solution to your CI delays? Read on to find out!
Nobody likes waiting around for continuous integration (CI) pipelines to finish. I'm sure many of you can relate to the frustration of slow build times. In my recent post, How to choose a (conda) distribution of Python, I touched upon Python distributions. However, a comment on LinkedIn by Wade Rosko, referencing Hugo Shi's post about speeding up a Saturn Cloud CI job with micromamba, got me thinking. They achieved a 40X speedup! This was something I had to try.
I decided to use LlamaBot's CI system as a test case. The original setup used Miniconda with the following YAML configuration:
- name: Setup miniconda if: matrix.environment-type == 'miniconda' uses: conda-incubator/setup-miniconda@v2 with: auto-update-conda: true miniforge-variant: Mambaforge channels: conda-forge activate-environment: llamabot environment-file: environment.yml use-mamba: true python-version: ${{ matrix.python-version }}
In my new approach, I switched to using micromamba with the following YAML:
- uses: mamba-org/setup-micromamba@v1 if: matrix.environment-type == 'miniconda' with: micromamba-version: '1.4.5-0' environment-file: environment.yml init-shell: bash cache-environment: true cache-environment-key: environment-${{ steps.date.outputs.date }} cache-downloads-key: downloads-${{ steps.date.outputs.date }} post-cleanup: 'all'
The rest of the YAML file remained unchanged.
I meticulously recorded the timings, and you can find the full record here.
Configuration | Run 1 | Run 2 | Run 3 | Run 4 |
---|---|---|---|---|
Old YAML | 2m 4s | 3m 16s | 2m 3s | 6m 34s |
New YAML | 1m 1s | 1m 19s | 2m 9s | N/A |
The timings on my latest PRs with the new setup feel more consistent. Not only is there a noticeable reduction in build and test times, averaging around 1 minute, but there's also no difference in the final environment, and all tests pass.
The primary win here comes from the built-in, turnkey caching of the entire environment, a stark contrast to the more complicated caching methods suggested by mambaforge. Opting for caching based on the date seems to be a practical compromise, especially since I often test against bleeding-edge packages.
Switching to micromamba for the LlamaBot's CI system was a rewarding experience. It's a straightforward and effective way to reduce CI times significantly. If you're dealing with similar CI delays, consider giving micromamba a try. It could just be the solution you're looking for.
@article{
ericmjl-2023-speeding-study,
author = {Eric J. Ma},
title = {Speeding up CI Pipelines with Micromamba: A LlamaBot Case Study},
year = {2023},
month = {11},
day = {26},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2023/11/26/speeding-up-ci-pipelines-with-micromamba-a-llamabot-case-study},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!