Configure your conda installation
Configuring some things with conda can help lubricate your interactions with the conda package manager. It will save you keystrokes at the terminal, primarily, thus saving you time. The place to do this configuration is in the
.condarc file, which the
conda package manager searches for by default in your user's home directory.
The condarc docs are your best bet for the full configuration, but I have some favourites that I'm more than happy to share below.
Firstly, you create a file in your home directory called
.condarc. Then edit it to have the following contents:
channels: - conda-forge - defaults auto_update_conda: True always_yes: True
auto_update_condasaves me from having to update conda all the time,
always_yeslets me always answer
yto the conda installation and update prompts.
conda-forgeas the default channel above the
defaultschannel allows me to type
conda install some_packagerather than
conda install -c conda-forge some_packageeach time I want to install a package, as conda will prioritize channels according to their order under the
If you prefer, you can set the channel priorities in a different order and/or expand the list. For example, bioinformatics users may want to add in the
bioconda channel, while R users may want to add in the
r channel. Users who prefer stability may want to prioritize
defaults ahead of
What this affects is how
conda will look for packages when you execute the
conda install command. However, it doesn't affect the channel priority in your per-project
environment.yml file (see: Create one conda environment per project).
Create one conda environment per project
If you have multiple projects that you work on, but you install all project dependencies into a shared environment, then I guarantee you that at some point, you will run into dependency conflicts as you try to upgrade/update packages to try out new things.
"So what?" you might ask. Well, you'll end up breaking your code! Take this word of advice from someone who has had to deal with the consequences of having his code not working in one project even as code in another does. And finding out one day before an important presentation, right when you need to put in new versions of figures that were made before. The horror!
You will want to ensure that you have an isolated conda environment for each project to keep your projects insulated from one another.
Here is a baseline that you can copy and modify at any time.
name: project-name-goes-here ## CHANGE THIS TO YOUR ACTUAL PROJECT channels: ## Add any other channels below if necessary - conda-forge dependencies: ## Prioritize conda packages - python=3.9 - jupyter - conda - mamba - ipython - ipykernel - numpy - matplotlib - scipy - pandas - pip - pre-commit - black - nbstripout - mypy - flake8 - pycodestyle - pydocstyle - pytest - pytest-cov - pytest-xdist - pip: ## Add in pip packages if necessary - mkdocs - mkdocs-material - mkdocstrings - mknotebooks
If a package exists in both
pip and you rely primarily on
then I recommend prioritizing the
conda package over the
The advantage here is that
conda's dependency solver
can grab the latest compatible version
without worrying about
pip clobbering over other dependencies.
(h/t my reviewer Simon, who pointed out that
newer versions of
pip have a dependency solver,
though as far as possible, staying consistent is preferable,
though mixing-and-matching is alright if you know what you're doing.)
This baseline helps me bootstrap conda environments. The packages that are in there each serve a purpose. You can read more about them on the page: Install code checking tools to help write better code.
Initially, I only specify the version of Python I want, and allow the conda package manager to solve the environment.
However, there may come a time when a new package version brings a new capability. That is when you may wish to pin the version of that particular package to be at the minimum that version. (See below for the syntax needed to pin a version.) At the same time, the new package version may break compatibility -- in this case, you will want to pin it to a maximum package version.
It's not always obvious, though, so be sure to use version control
If you wish, you can also pin versions to a minimum, maximum, or specific one, using version modifiers.
<. (You should be able to grok what is what!)
<. (Note: for pip, it is double equals
==and not single equals
So when do you use each of the modifiers?
==sparingly while in development: you will be stuck with a particular version and will find it difficult to update other packages together.
pipfrom upgrading a package beyond a certain version. This can be helpful if new versions of packages you rely on have breaking API changes.
pipfrom installing a package below a certain version. This is helpful if you've come to depend on breaking API changes from older versions.
Upgrading and/or installing packages should be done on an as-needed basis. There are two paths to do upgrade packages that I have found:
The principled way to do an upgrade is to first pin the version inside
and then use the following command to update the environment:
conda env update -f environment.yml
The hacky way to do the upgrade is to directly
pip install the package,
and then add it (or modify its version) in the
Do this only if you know what you're doing!
By practicing "one project gets one environment",
then ensuring that those environments' Python interpreters are available to Jupyter
is going to be crucial.
If you find that your project's environment Python is unavailable,
then you'll need to ensure that it's available.
To do so, ensure that the Python environment has the package
(If not, install it by hand and add it to the
Then, run the following command:
# assuming you have already activated your environment, # replace $ENVIRONMENT_NAME with your environment's name. python -m ipykernel install --user --name $ENVIRONMENT_NAME
Now, it will show up as a "kernel" for executing Python code in your Jupyter notebooks. (see Configure Jupyter and Jupyter Lab for more information on how to configure it.)
Now, how should you name your conda environment? See the page: Sanely name things consistently!
Prioritize conda to install packages
As a matter of practical advice, I usually prefer conda-installed packages over pip-installed packages. Here are the reasons why.
Firstly, Conda packages have their versions and dependencies tracked properly, and so the conda dependency solver (or its drop-in replacement mamba) can be used to pick out the right set of packages.
Secondly, on occasion one might need to use packages that come from multiple languages. There have been projects I worked on that used Python calling out to R packages. Conda was designed to handle mutliple programming languages in the same environment, and will help you pull down packages used in multiple languages, and all of their dependencies.
Thirdly, as the suite of packages that become available in conda-forge increases, and as the conda-forge developers increase the amount of tooling to automatically mirror language-specific packages on conda-forge, it becomes progressively easier to rely primarily on the conda package manager. This idea relates to the notion of specifying single sources of truth for categories of stuff.
To do so, you specify your environment using
environment.yml files. These are used by the
conda package manager to download the desired packages, their dependencies, and their appropriate versions onto your machine.
When you want to search for a package, before you assume it's available on PyPI, search for it on Anaconda.org. You can do this by either running:
conda search package_name
or by going to the Anaconda.org website and search for the package that you're interested in.
Also, be sure you check the GitHub repository under the "Installation" instructions for anything that suggests that you could install the package from
Once you've found it, add the package to your
environment.yml file under the
If you can't find a conda-installable version of the package, then consider using pip. (see: Use pip only when you cannot find packages on conda)
Install Anaconda on your machine
Anaconda is a way to get a Python installed on your system.
One of the neat but oftentimes confusing things about Python is that you can have multiple Python executables living around on your system. Anaconda makes it easy for you to:
Why is this a good thing? Primarily because you might have individual projects that need different version of Python and different versions of packages that are built for Python. Also, default Python installations, such as the ones shipped with older versions of macOS, tend to be versions behind the latest, which is to the detriment of your projects. Some built-in apps in an operating system may depend on that old version of Python (such as iPhoto), which means if you mess up the installation, you might break those built-in apps. Hence, you will want a tool that lets you easily create isolated Python environments.
The Anaconda Python distribution fulfills the following key needs:
Installing Anaconda on your local machine thus helps you get easy access to Python, Jupyter (see: Use Jupyter as an experimentation playground), and other tools for modelling and analysis.
If you're on macOS: I'm assuming you have installed
homebrew (see: Install homebrew on your machine) and
wget. Then, install Miniconda, which will be a lighter-weight installer, using the following command:
cd ~ wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O anaconda.sh
This will send you to your home directory, and then download the Miniconda bash script installer from Anaconda's download page.
If you're on Linux: Make sure you have
wget available on your system. Then:
cd ~ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O anaconda.sh
This will download the Miniconda installer for Linux operating sytems onto your home directory.
If you don't have
wget: You can head over to the Miniconda docs and download the bash installer to whatever location you want (the home directory is a convenient place). Rename it to
anaconda.sh to stay compatible with the instructions below.
Now, install Anaconda:
bash anaconda.sh -b -p $HOME/anaconda/
This will install the Anaconda distribution of Python onto your system inside your home directory. You can now install packages at will, without needing sudo privileges!
Configure your machine
After getting access to your development machine, you'll want to configure it and take full control over how it works. Backing the following steps are a core set of ideas:
Head over to the following pages to see how you can get things going.
Bootstrap your base conda environment
In a pinch, you might want to muck around on your system with some quick-and-dirty experiment. Having a suite of packages inside your base environment can be handy. It's like having a scratch environment available.
I would recommend bootstrapping your base anaconda environment with some basic data science packages.
conda activate base conda install -c conda-forge \ scipy numpy pandas matplotlib \ numpy jupyter jupyterlab \ scikit-learn ipython ipykernel \ ipywidgets mamba
Doing so gives you an environment where you can quickly prototype new things
without necessarily going through the overhead of creating an entirely new project
(and with it, a full conda environment).
Of course, the alternative is to set up a
in which you install packages on-the-fly.
mamba can be helpful if you want a faster drop-in replacement for
(see: Use Mamba as a faster drop-in replacement for conda for more information.)