Data Science Bootstrap

Create shell command aliases for your commonly used commands

Why create shell aliases

Shell aliases can save you keystrokes, which save time. That time saved is compound interest over long time horizons!

How do I create aliases?

Shell aliases are easy to create. In your shell initializer script, use the following syntax, using ls being aliased to exa with configuration flags at the end as an example:

alias ls="exa --long"

Now, typing ls at the shell will instead execute exa! (To know what is exa, see Install a suite of really cool utilities on your machine using homebrew.)

Where do I store these aliases?

In order for these shell aliases to take effect each time you open up your shell, you should ensure that they get sourced in your shell initialization script (see: Take full control of your shell environment variables for more information). You have one of two options:

These aliases can be declared in your .zshrc or .bashrc (or analogous) file, or
They can be declared in ~/.aliases, which you source inside your shell initialization script file (i.e. .zshrc/.bashrc/etc.)

I recommend the second option as doing so means you'll be putting into practice the philosophy of having clear categories of things in one place.

What are some aliases that could be useful?

In my dotfiles repository, I have a .shell_aliases directory which contains a full suite of aliases that I have installed.

Other external links that showcase shell aliases that could serve as inspiration for your personal collection include:

And finally, to top it off, Twitter user @ctrlshifti suggests aliasing please to sudo for a pleasant experience at the terminal:

alias please="sudo"

# Now you type:
# please apt-get update
# please apt-get upgrade
# etc...

Pages that link here

Leverage dotfiles to get your machine configured quickly
Why create a dotfiles repository Your dotfiles control the baseline of your computing environment

Install a suite of really cool utilities on your machine using homebrew
What utilities are recommended? gcc Install gcc if you want to have the GNU C compiler available on your Mac at the same time as the clang C compiler

Use bash tricks to help save keystrokes and time
There are some bash tricks that can be incredibly helpful

Configure your machine
After getting access to your development machine, you'll want to configure it and take full control over how it works

Use Mamba as a faster drop-in replacement for conda
What is mamba Mamba is a project originally developed by the Quantstack team

Leverage dotfiles to get your machine configured quickly

Why create a dotfiles repository

Your dotfiles control the baseline of your computing environment. Creating a dotfiles repository lets you version control it, make a backup of it on a hosted version control site (like Github or Bitbucket) and quickly deploy it to a new system.

How do you structure a dotfiles repository

It's really up to you, but you want to make sure that you capture all of the .some_file_extension files stored in your home directory that are also important for your shell runtime environment.

For example, you might want to include your .zshrc or your .bashrc files, i.e. the shell initialization scripts.

You might also want to refactor out some pieces from the .zshrc and put them into separate files that get sourced inside those files. For example, I have two, one for the PATH environment variable named .path (see: Take full control of your shell environment variables) and one for aliases named .aliases (see: Create shell command aliases for your commonly used commands). You can source these files in the .zshrc file, so I have everything defined in .path and .aliases available to me.

You can also create an install.sh script that, when executed at the shell, symlinks all the files from the dotfiles directory into the home directory or copies them. (I usually opt to symlink because I can apply updates more easily.) The install.sh script can be as simple as:

cp .zshrc $HOME/.zshrc
cp .path $HOME/.path
cp .aliases $HOME/.aliases

Everything outlined above forms the basis of your bootstrap for a new computer, which I alluded to in Automate the bootstrapping of your new computer.

If you want to see a few examples of dotfiles in action, check out the following repositories and pages:

From the official "dotfiles" GitHub pages:

My own dotfiles: ericmjl/dotfiles which are inspired by mathiasbynens/dotfiles

Take full control of your shell environment variables

Why control your environment variables

If you're not sure what environment variables are, I have an essay on them that you can reference. Mastering environment variables is crucial for data scientists!

Your shell environment, whether it is zsh or bash or fish or something else, is supremely important. It determines the runtime environment, which in turn determines which Python you're using, whether you have proxies set correctly, and more. Rather than leave this to chance, I would recommend instead gaining full control over your environment variables.

How do I control my environment variables

The simplest way is to set them explicitly in your shell initialization script. For bash shells, it's either .bashrc or .bash_profile. For the Z shell, it'll be the .zshrc file. In there, step by step, set the environment variables that you need system-wide.

For example, explicitly set your PATH environment variable with explainers that tell future you why you ordered the PATH in a certain way.

# Start with an explicit minimal PATH
export PATH=/bin:/usr/bin:/usr/local/bin

# Add in my custom binaries that I want available across projects
export PATH=$HOME/bin:$PATH

# Add in anaconda installation path
export PATH=$HOME/anaconda/bin:$PATH

# Add more stuff below...

If you want your shell initialization script to be cleaner, you can refactor it out into a second bash script called env_vars.sh, which lives either inside your home directory or your dotfiles repository (see: Leverage dotfiles to get your machine configured quickly). Then, source the env_vars.sh script from the shell initialization script:

source ~/env_vars.sh

There may be a chance that other things, like the Anaconda installer, will give you an option to modify your shell initializer script. If so, be sure to keep this in the back of your mind. At the end, of your shell initializer script, you can echo the final state of environment variables to help you debug.

Environment variables that need to be set on a per-project basis are handled slightly differently. See Create runtime environment variable configuration files for each of your projects.

Use Mamba as a faster drop-in replacement for conda

What is mamba

Mamba is a project originally developed by the Quantstack team. They went in and solved some of the annoyances with the conda package manager - specifically the problem of how long it takes to solve an environment specification.

How do you get mamba

Mamba is available on conda-forge and PyPI. Follow the instructions on the mamba repo to install it.

Alias mamba to conda

If you have muscle memory and want to make the switch from conda to mamba as easy as possible, you can use a shell alias inside your sourced .aliases file:

alias conda="mamba"

See the page Create shell command aliases for your commonly used commands for more information on shell aliases.

Use bash tricks to help save keystrokes and time

There are some bash tricks that can be incredibly helpful. Here's a collection of those that I have encountered.

Use `||` for fallback commands

An example:

source my_env/bin/activate || conda activate my_env || source activate my_env

Where did this come up? In my continuous integration pipelines, I try to maintain the same syntax between pipelines (e.g. GitHub Actions and Azure Pipelines.) However, as of 2020, Azure Pipelines doesn't play well with conda activate, and requires that I use source activate. As such, in order to use the same bash scripts that need to activate an environment, I used the bash || syntax to create a fallback command for the conda activate command. If the conda activate command fails, the source activate command will be executed.

The commands are executed in order from left to right. One thing neat is that there will be an exit code of 0, which by bash historical convention signifies "success", as soon as one of the commands succeeds. If all three fail, there will be a non-zero exit code, which, depending on your system, should terminate further execution.

Other tricks described in the bootstrap

Create shell command aliases for your commonly used commands

Install a suite of really cool utilities on your machine using homebrew

What utilities are recommended?

gcc

Install gcc if you want to have the GNU C compiler available on your Mac at the same time as the clang C compiler.

C compilers come in handy for some numerical computing packages, which multiple data science languages (Julia, Python, R) depend on.

mobile-shell

If you've ever been disconnected from SSH because of a flaky internet connection, mosh can be your saviour. Check out the tool's homepage.

tmux

This is a tool for multiplexing your shell sessions -- uber handy if you want to persist a shell session on a remote server even after you disconnect. If you're of the type who has a habit of creating new shell sessions for every project, then tmux might be able to help you get things under control. Check out the tool's homepage.

tree

The tree command line tool allows you to see the file tree at the terminal. If you pair it with exa, you will have an upgraded file tree experience. See its homepage.

exa

exa is next-level ls (which is used to list files in a directory). According to the website, "A modern replacement for ls". See the homepage. If you alias ls to exa, it's next-level convenience! (see Create shell command aliases for your commonly used commands)

ripgrep

ripgrep provides a command line tool rg, which recursively scans down the file tree from the current directory for files that contain text that you want to search. Its Github repo should reveal all of its secrets.

diff-so-fancy

This gives you a tool for viewing differences between files, aka "diffs". Check out its Github repo for more information. You can also configure git to use diff-so-fancy to render diffs at the terminal. (see: Install and configure git on your machine)

bat

bat is next-level cat, which is a utility for viewing text files in the terminal. Check out the Github repository for what you get. You can alias cat to bat, and in that way, not need to deviate from muscle memory to use bat.

fd

fd gives you a faster replacement for the shell tool find, which you can use to find files by name. Check out the Github repository to learn more about it.

fzf

On recommendation from my colleague Arkadij Kummer, grab fzf to have extremely fast fuzzy text search on the filesystem. Check out the project's GitHub repository!

croc

Use croc as a tool to move data from one machine to another easily in a secure fashion. (I have used this in lieu of commercial utilities that cost tens of thousands of dollars in license fees.) Check out the project's GitHub repository!

Install these really cool utilities

Now that you've read about these utilities' reason for existence, go ahead and install them!

brew install \
	git gcc tmux wget mobile-shell \
	diff-so-fancy ripgrep bat fd fzf croc

Why create shell aliases

How do I create aliases?

Where do I store these aliases?

What are some aliases that could be useful?

Pages that link here

Why create a dotfiles repository

How do you structure a dotfiles repository

Why control your environment variables

How do I control my environment variables

What is mamba

How do you get mamba

Alias mamba to conda

Use `||` for fallback commands

Other tricks described in the bootstrap

Initial setup

Getting Anaconda Python installed

Master the shell

Further configuration

Advanced Stuff

What utilities are recommended?

gcc

mobile-shell

tmux

tree

exa

ripgrep

diff-so-fancy

bat

fd

fzf

croc

Install these really cool utilities

See also

Why create shell aliases

How do I create aliases?

Where do I store these aliases?

What are some aliases that could be useful?

Pages that link here

Why create a dotfiles repository

How do you structure a dotfiles repository

Why control your environment variables

How do I control my environment variables

What is mamba

How do you get mamba

Alias mamba to conda

Use || for fallback commands

Other tricks described in the bootstrap

Initial setup

Getting Anaconda Python installed

Master the shell

Further configuration

Advanced Stuff

What utilities are recommended?

gcc

mobile-shell

tmux

tree

exa

ripgrep

diff-so-fancy

bat

fd

fzf

croc

Install these really cool utilities

See also

Use `||` for fallback commands