Eric J Ma's Website

How to choose a (conda) distribution of Python

written by Eric J. Ma on 2023-10-07 | tags: conda anaconda miniforge python distribution data science pip tooling python


Note: This is an excerpt from my Data Science Bootstrap Notes, which is freely available online here. If you find the notes useful and wish to support my work, please consider either purchasing a digital copy on LeanPub or sending coffee money via GitHub Sponsor or Patreon.

If you're a conda user, you may have heard of the Anaconda distribution of Python. In this set of notes, however, I've also referenced the Miniforge distribution of Python. What's the difference here? How do you pick which one to use? To answer those questions, we must first understand what is a distribution of Python.

Python distributions

Python can get distributed to users in many ways. You can download it directly from the official Python Software Foundation's (PSF) website. Or you can install it onto your system using the official Anaconda installer, through Homebrew, or through your official Linux package manager. Each way of installing Python can be thought of as a distribution of Python. Each distribution of Python differs ever so slightly. Official Python from the PSF comes with just the standard library. Anaconda, however, ships with the standard library and many other packages that are relevant for data science.

What is common across all Python distributions, however, is that it will ship with a Python executable that, at the end of installation, should be discoverable on your PATH environment variable.

Most commonly, there will be a Python package installer that ships with the distribution as well. This can be pip, the official tool for installing Python packages, or it could be conda, which was developed by the company Anaconda.

As such, the anatomy of a distribution is essentially nothing more than:

  • A Python interpreter that can be discovered on your PATH,
  • A Python package manager, and
  • Any other default Python packages that the distributor thinks you might want

With that aside, let's look at three distributions of Python that are relevant to this set of notes.

Anaconda Python

The Anaconda distribution of Python is the official distribution from Anaconda. It ships with a modern version of Python, both pip and conda package managers, and a whole slew of default data science packages (pandas, numpy, scikit-learn, scipy, matplotlib, for example). With the Anaconda distribution, conda is configured such that packages are installed from the anaconda repository of packages, hosted by Anaconda itself. Its default installation location is ~/anaconda or ~/anaconda3.

Miniconda Python

The Miniconda Python distribution also comes from Anaconda. It looks like Anaconda except it ships with fewer packages in the base environment. You wouldn't, for example, find pandas installed for you. This was mostly intended to keep the base environment small for use within Docker containers.

Its default installation location is ~/miniconda or ~/miniconda3.

Miniforge Python

This distribution of Python comes from the open-source developer team behind conda-forge. Miniforge looks like Miniconda, but instead of configuring conda to pull packages from the anaconda repository, conda packages are instead pulled from the conda-forge repository of packages by default. This has the advantage of being able to pull more bleeding-edge versions of packages that you may use. Additionally, Miniforge Python ships with mamba as well.

Summary Table

Here's a summary table of these features.

Attribute Anaconda Python Miniconda Python Miniforge Python
Origin Official distribution from Anaconda Comes from Anaconda From open-source developer team behind conda-forge
Version of Python Modern Similar to Anaconda Similar to Miniconda
Package Managers pip and conda Similar to Anaconda pip, conda and mamba
Default Data Science Packages pandas, numpy, scikit-learn, scipy, matplotlib Fewer packages (e.g., pandas not pre-installed) Similar to Miniconda
Conda Configuration Pulls from anaconda repository Pulls from anaconda repository Pulls from conda-forge repository by default
Primary Use Case General-purpose with lots of pre-installed data science packages Keeping base environment small, e.g., for Docker containers Access to bleeding-edge versions of packages.

Tip: All of the distributions can be installed into the ~/anaconda directory if you desire consistent behaviour regardless of the installation source. All three installers provide the -p flag when executing it, thus allowing us to specify the prefix directory in which to install. We would thus do something like:

bash Miniforge3.sh -b -p "${HOME}/anaconda"

Which to use?

Depends on your persona! If you're an indie hacker type, I would strongly recommend the Miniforge Python as it is lightweight and fast to get set up with and fully open source. On the other hand, if you're more inclined to want enterprise support, vetting of packages, and wish to support a company that backs so much of the Python open source world, then I would recommend reaching out to Anaconda and talking with their sales reps.


Cite this blog post:
@article{
    ericmjl-2023-how-python,
    author = {Eric J. Ma},
    title = {How to choose a (conda) distribution of Python},
    year = {2023},
    month = {10},
    day = {07},
    howpublished = {\url{https://ericmjl.github.io}},
    journal = {Eric J. Ma's Blog},
    url = {https://ericmjl.github.io/blog/2023/10/7/how-to-choose-a-conda-distribution-of-python},
}
  

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!