Data Science Bootstrap

Prioritize conda to install packages

Why should you use conda for packages

As a matter of practical advice, I usually prefer conda-installed packages over pip-installed packages. Here are the reasons why.

Firstly, Conda packages have their versions and dependencies tracked properly, and so the conda dependency solver (or its drop-in replacement mamba) can be used to pick out the right set of packages.

Secondly, on occasion one might need to use packages that come from multiple languages. There have been projects I worked on that used Python calling out to R packages. Conda was designed to handle mutliple programming languages in the same environment, and will help you pull down packages used in multiple languages, and all of their dependencies.

Thirdly, as the suite of packages that become available in conda-forge increases, and as the conda-forge developers increase the amount of tooling to automatically mirror language-specific packages on conda-forge, it becomes progressively easier to rely primarily on the conda package manager. This idea relates to the notion of specifying single sources of truth for categories of stuff.

How to search for conda-installable versions of packages

To do so, you specify your environment using environment.yml files. These are used by the conda package manager to download the desired packages, their dependencies, and their appropriate versions onto your machine.

When you want to search for a package, before you assume it's available on PyPI, search for it on Anaconda.org. You can do this by either running:

conda search package_name

or by going to the Anaconda.org website and search for the package that you're interested in.

Also, be sure you check the GitHub repository under the "Installation" instructions for anything that suggests that you could install the package from conda-forge.

Once you've found it, add the package to your environment.yml file under the dependencies section.

If you can't find a conda-installable version of the package, then consider using pip. (see: Use pip only when you cannot find packages on conda)

Pages that link here

Configure your conda installation
Why you would want to configure your conda installation Configuring some things with conda can help lubricate your interactions with the conda package manager

Use pip only when you cannot find packages on conda
When you can use pip If you can't find a package on conda (see Prioritize conda to install packages), then pip can serve as a viable alternative for adding packages to your environment

Navigate the packaging world
Where do we get our software from? Most commonly, they come from package repositories that we interact with using package managers

Configure your conda installation

Why you would want to configure your conda installation

Configuring some things with conda can help lubricate your interactions with the conda package manager. It will save you keystrokes at the terminal, primarily, thus saving you time. The place to do this configuration is in the .condarc file, which the conda package manager searches for by default in your user's home directory.

The condarc docs are your best bet for the full configuration, but I have some favourites that I'm more than happy to share below.

How to configure your condarc

Firstly, you create a file in your home directory called .condarc. Then edit it to have the following contents:

channels:
  - conda-forge
  - defaults

auto_update_conda: True

always_yes: True

The whys

The auto_update_conda saves me from having to update conda all the time,
always_yes lets me always answer y to the conda installation and update prompts.
Setting conda-forge as the default channel above the defaults channel allows me to type conda install some_package rather than conda install -c conda-forge some_package each time I want to install a package, as conda will prioritize channels according to their order under the channels section.

About channel priorities

If you prefer, you can set the channel priorities in a different order and/or expand the list. For example, bioinformatics users may want to add in the bioconda channel, while R users may want to add in the r channel. Users who prefer stability may want to prioritize defaults ahead of conda-forge.

What this affects is how conda will look for packages when you execute the conda install command. However, it doesn't affect the channel priority in your per-project environment.yml file (see: Create one conda environment per project).

Other conda-related pages to look at

Use pip only when you cannot find packages on conda

When you can use pip

If you can't find a package on conda (see Prioritize conda to install packages), then pip can serve as a viable alternative for adding packages to your environment.

How to use pip with conda environments

In your environment.yml file:

name: some_env_name
channels:
- conda-forge
dependencies:
- python=3.8
- pandas
- scipy
- numpy
- ...
- pip:
  - some_pip_package==2.1

Some things to note here.

Firstly the pip section uses the same syntax for setting versions as requirements.txt. It uses == rather than =, which conda uses. This is because its contents are dumped to a temporary text file that gets parsed by pip itself.

Secondly, keep monitoring for when the package shows up on conda-forge, as that will help you retain the advantages of installing packages by a single package manager.