Bootstrap a scratch conda environment

A scratch environment is your playground

In a pinch, you might want to muck around on your system with some quick-and-dirty experiment. Having a suite of packages inside a scratch environment can be handy. Your scratch environment can be your base environment if you'd like, but I would strongly recommend creating a separate scratch environment instead.

How to bootstrap a scratch environment

I would recommend that you bootstrap a scratch conda environment with some basic data science packages.

mamba activate base
mamba install -c conda-forge \
    scipy numpy pandas matplotlib \
	numpy jupyter jupyterlab \
	scikit-learn ipython ipykernel \
	ipywidgets mamba

(Replace mamba with conda if you don't have mamba installed on your system.)

Doing so gives you an environment where you can quickly prototype new things without necessarily going through the overhead of creating an entirely new project (and with it, a full conda environment).

Installing mamba can be helpful if you want a faster drop-in replacement for conda. (see: Use Mamba as a faster drop-in replacement for conda for more information.)

Configure your machine

After getting access to your development machine, you'll want to configure it and take full control over how it works. Backing the following steps are a core set of ideas:

  • Give yourself a rich set of commonly necessary tooling right from the beginning, but without the bloat that might be unnecessary.
  • Standardize your compute environment for maximal portability from computer to computer.
  • Build up automation to get you up and running as fast as possible.
  • Have full control over your system, such that you know as much about your configuration as possible.
  • Squeeze as much productivity out of your UI as possible.

Head over to the following pages to see how you can get things going.

Initial setup

Getting Anaconda Python installed

Master the shell

Further configuration

Advanced Stuff

Configure your conda installation

Why you would want to configure your conda installation

Configuring some things with conda can help lubricate your interactions with the conda package manager. It will save you keystrokes at the terminal, primarily, thus saving you time. The place to do this configuration is in the .condarc file, which the conda package manager searches for by default in your user's home directory.

The condarc docs are your best bet for the full configuration, but I have some favourites that I'm more than happy to share below.

How to configure your condarc

Firstly, you create a file in your home directory called .condarc. Then edit it to have the following contents:

channels:
  - conda-forge
  - defaults

auto_update_conda: True

always_yes: True

The whys

  • The auto_update_conda saves me from having to update conda all the time,
  • always_yes lets me always answer y to the conda installation and update prompts.
  • Setting conda-forge as the default channel above the defaults channel allows me to type conda install some_package rather than conda install -c conda-forge some_package each time I want to install a package, as conda will prioritize channels according to their order under the channels section.

About channel priorities

If you prefer, you can set the channel priorities in a different order and/or expand the list. For example, bioinformatics users may want to add in the bioconda channel, while R users may want to add in the r channel. Users who prefer stability may want to prioritize defaults ahead of conda-forge.

What this affects is how conda will look for packages when you execute the conda install command. However, it doesn't affect the channel priority in your per-project environment.yml file (see: Create one conda environment per project).

Other conda-related pages to look at

Install Anaconda on your machine

What is anaconda

Anaconda is a way to get a Python installed on your system.

One of the neat but oftentimes confusing things about Python is that you can have multiple Python executables living around on your system. Anaconda makes it easy for you to:

  1. Obtain Python
  2. Manage different Python versions into isolated environments using a consistent interface
  3. Install packages into these environments

Why use anaconda (or one of its variants)?

Why is this a good thing? Primarily because you might have individual projects that need different version of Python and different versions of packages that are built for Python. Also, default Python installations, such as the ones shipped with older versions of macOS, tend to be versions behind the latest, which is to the detriment of your projects. Some built-in apps in an operating system may depend on that old version of Python (such as iPhoto), which means if you mess up the installation, you might break those built-in apps. Hence, you will want a tool that lets you easily create isolated Python environments.

The Anaconda Python distribution fulfills the following key needs:

  1. You'll be able to create isolated environments on a per-project basis. (see: Follow the rule of one-to-one in managing your projects)
  2. You'll be able to install packages into those isolated environments, and evolve them over time. (see: Create one conda environment per project)

Installing Anaconda on your local machine thus helps you get easy access to Python, Jupyter (see: Use Jupyter as an experimentation playground), and other tools for modelling and analysis.

How to get anaconda?

To install the Miniforge variant of Anaconda, which will be lighter-weight than the full Anaconda distribution, using the following command:

cd ~
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" -O anaconda.sh

This will send you to your home directory, and then download the Miniforge bash script installer from Anaconda's download page as anaconda.sh.

Now, install Anaconda:

bash anaconda.sh -b -p $HOME/anaconda/

This will install the Anaconda distribution of Python onto your system inside your home directory. You can now install packages at will, without needing sudo privileges!

Next steps

Level-up your conda skills

Use Mamba as a faster drop-in replacement for conda

What is mamba

Mamba is a project originally developed by the Quantstack team. They went in and solved some of the annoyances with the conda package manager - specifically the problem of how long it takes to solve an environment specification.

How do you get mamba

Mamba is available on conda-forge and PyPI. Follow the instructions on the mamba repo to install it.

Alias mamba to conda

If you have muscle memory and want to make the switch from conda to mamba as easy as possible, you can use a shell alias inside your sourced .aliases file:

alias conda="mamba"

See the page Create shell command aliases for your commonly used commands for more information on shell aliases.