Take full control of your shell environment variables
If you're not sure what environment variables are, I have an essay on them that you can reference. Mastering environment variables is crucial for data scientists!
Your shell environment, whether it is zsh or bash or fish or something else, is supremely important. It determines the runtime environment, which in turn determines which Python you're using, whether you have proxies set correctly, and more. Rather than leave this to chance, I would recommend instead gaining full control over your environment variables.
The simplest way is to set them explicitly in your shell initialization script. For bash shells, it's either .bashrc
or .bash_profile
. For the Z shell, it'll be the .zshrc
file. In there, step by step, set the environment variables that you need system-wide.
For example, explicitly set your PATH
environment variable with explainers that tell future you why you ordered the PATH in a certain way.
# Start with an explicit minimal PATH
export PATH=/bin:/usr/bin:/usr/local/bin
# Add in my custom binaries that I want available across projects
export PATH=$HOME/bin:$PATH
# Add in anaconda installation path
export PATH=$HOME/anaconda/bin:$PATH
# Add more stuff below...
If you want your shell initialization script to be cleaner, you can refactor it out into a second bash script called env_vars.sh
, which lives either inside your home directory or your dotfiles repository (see: Leverage dotfiles to get your machine configured quickly). Then, source the env_vars.sh
script from the shell initialization script:
source ~/env_vars.sh
There may be a chance that other things, like the Anaconda installer, will give you an option to modify your shell initializer script. If so, be sure to keep this in the back of your mind. At the end, of your shell initializer script, you can echo the final state of environment variables to help you debug.
Environment variables that need to be set on a per-project basis are handled slightly differently. See Create runtime environment variable configuration files for each of your projects.
Create runtime environment variable configuration files for each of your projects
When you work on your projects, one assumption you will usually have is that your development environment will look like your project's runtime environment with all of its environment variables. The runtime environment is usually your "production" setting: a web app or API, a model in a pipeline, or a software package that gets distributed. (For more on environment variables, see: Take full control of your shell environment variables)
Here, I'm assuming that you follow the practice of and that you Use pyprojroot to define relative paths to the project root.
To configure environment variables for your project,
a recommended practice is to create a .env
file in your project's root directory,
which stores your environment variables as such:
export ENV_VAR_1 = "some_value"
export DATABASE_CONNECTION_STRING = "some_database_connection_string"
export ENV_VAR_3 = "some_other_value"
We use the export
syntax here because we can, in our shells,
run the command source .env
and have the environment variables defined in there applied to our environment.
Now, if you're using a Python project,
make sure you have the package python-dotenv
(Github repo here)
installed in the conda environment.
Then, in your Python .py
source files:
from dotenv import load_dotenv
from pyprojroot import here
import os
dotenv_path = here() / ".env"
load_dotenv(dotenv_path=dotenv_path) # this will load the .env file in your project directory root.
# Now, get the environment variable.
DATABASE_CONNECTION_STRING = os.getenv("DATABASE_CONNECTION_STRING")
In this way, your runtime environment variables get loaded into the runtime environment, and become available to all child processes started from within the shell (e.g. Jupyter Lab, or Python, etc.).
Your .env file might contain some sensitive secrets.
You should always ensure that your .gitignore
file contains .env
in it.
See also: Set up an awesome default gitignore for your projects
Use Jupyter as an experimentation playground
I use Jupyter notebooks in the following ways.
Firstly, I use them as a prototyping environment. They are wonderful, because I can hold the state of a program in memory and interactively modify it until I get what I need out of the program. (This especially saves on time spent re-computing things.)
Secondly, I use Jupyter as an authoring environment for interactive computational teaching material. For example, I structured Network Analysis Made Simple as a series of Jupyter notebooks.
Finally, on occasion, I use Jupyter with ipywidgets
and Voila to build out dashboards and interactive applications for my colleagues.
Get Jupyter installed in each of your environments, by including it in your environment.yml
file. (see: Create one conda environment per project)
Doing so is based on advice I received at SciPy 2016, in which one of the Jupyter developers strongly advised against "global" installations of Jupyter, to avoid package conflicts.
To get Jupyter to recognize the Python interpreter that defined by your conda environment (see: Create one conda environment per project), you need to make sure you have ipykernel
installed inside your environment. Then, use the following command:
export ENV_NAME="put_your_environment_name_here"
conda activate $ENV_NAME
python -m ipykernel install --user --name $ENV_NAME
Newcomers to Anaconda are usually spoonfed the GUI, but I am a proponent of launching Jupyter from the terminal because doing so makes us fully aware of our environment, including the environment variables. (see the related: Create runtime environment variable configuration files for each of your projects and Take full control of your shell environment variables)
To launch Jupyter:
jupyter lab
In shell terms:
cd /path/to/project/directory
conda activate $ENV_NAME
jupyter lab
Create shell command aliases for your commonly used commands
Shell aliases can save you keystrokes, which save time. That time saved is compound interest over long time horizons!
Shell aliases are easy to create. In your shell initializer script, use the following syntax, using ls
being aliased to exa
with configuration flags at the end as an example:
alias ls="exa --long"
Now, typing ls
at the shell will instead execute exa
! (To know what is exa
, see Install a suite of really cool utilities on your machine using homebrew.)
In order for these shell aliases to take effect each time you open up your shell, you should ensure that they get sourced in your shell initialization script (see: Take full control of your shell environment variables for more information). You have one of two options:
.zshrc
or .bashrc
(or analogous) file, or~/.aliases
, which you source inside your shell initialization script file (i.e. .zshrc
/.bashrc
/etc.)I recommend the second option as doing so means you'll be putting into practice the philosophy of having clear categories of things in one place.
In my dotfiles repository, I have a .shell_aliases
directory which contains a full suite of aliases that I have installed.
Other external links that showcase shell aliases that could serve as inspiration for your personal collection include:
And finally, to top it off, Twitter user @ctrlshifti suggests aliasing please to sudo for a pleasant experience at the terminal:
alias please="sudo"
# Now you type:
# please apt-get update
# please apt-get upgrade
# etc...
Use docker containers for system-level packages
If conda environments are such a great environment isolation tool, why would we need Docker?
That's because sometimes, your project might have an unavoidable dependency on system-level packages. I have seen some projects that use spatial mapping tooling require system-level packages. Others that depend on audio processing might require packages that can only be obtained outside of conda
. In these cases, yes, installing them locally on your machine can be handy (see Install homebrew on your machine), but if you're also interested in building an app, then you'll need them packaged up inside a Docker container.
What is a Docker container? The best anchoring way to thinking about it is a fully-fledged operating system completely insulated from its host (i.e. your computer). It has no knowledge of your runtime environment variables (see: Create runtime environment variable configuration files for each of your projects and Take full control of your shell environment variables). It's like having a completely clean operating system, without the cost of buying new hardware.
I'm assuming you've already obtained Docker on your system. (see: Install Docker on your machine).
The core thing you need to know how to write is a Dockerfile
. This file specifies exactly how a Docker container is to be built. The easiest way to think about the Dockerfile
syntax is that it's almost bash, with a bit more additional syntax. The Docker docs give an extremely thorough tutorial. For those who are more hands-on, I recommend pair coding with another more experienced individual who is willing to teach you the ropes, to build a Docker container when it becomes relevant to your problem.
Leverage dotfiles to get your machine configured quickly
Your dotfiles control the baseline of your computing environment. Creating a dotfiles repository lets you version control it, make a backup of it on a hosted version control site (like Github or Bitbucket) and quickly deploy it to a new system.
It's really up to you, but you want to make sure that you capture all of the .some_file_extension
files stored in your home directory that are also important for your shell runtime environment.
For example, you might want to include your .zshrc
or your .bashrc
files, i.e. the shell initialization scripts.
You might also want to refactor out some pieces from the .zshrc
and put them into separate files that get sourced inside those files. For example, I have two, one for the PATH
environment variable named .path
(see: Take full control of your shell environment variables) and one for aliases named .aliases
(see: Create shell command aliases for your commonly used commands). You can source these files in the .zshrc
file, so I have everything defined in .path
and .aliases
available to me.
You can also create an install.sh
script that, when executed at the shell, symlinks all the files from the dotfiles directory into the home directory or copies them. (I usually opt to symlink because I can apply updates more easily.) The install.sh
script can be as simple as:
cp .zshrc $HOME/.zshrc
cp .path $HOME/.path
cp .aliases $HOME/.aliases
Everything outlined above forms the basis of your bootstrap for a new computer, which I alluded to in Automate the bootstrapping of your new computer.
If you want to see a few examples of dotfiles in action, check out the following repositories and pages:
From the official "dotfiles" GitHub pages:
My own dotfiles: ericmjl/dotfiles which are inspired by mathiasbynens/dotfiles
Configure your machine
After getting access to your development machine, you'll want to configure it and take full control over how it works. Backing the following steps are a core set of ideas:
Head over to the following pages to see how you can get things going.