Data Science Bootstrap

Automate the bootstrapping of your new computer

Why automate your configuration

In automating your shell's configuration, you save yourself time each time you get access to a new computer. That is the primary value proposition of automation! No more spending 2-3 hours setting things up. Instead, simply type ./install.sh at the terminal!

How to create a bootstrap script

The best way I would recommend doing this is by creating a dotfiles repository. (see: Leverage dotfiles to get your machine configured quickly) Place every file needed for shell initialization inside there -- primarily, I mean the .zshrc or .bashrc/.bash_profile files, and any other files on which you depend.

Then, create the main script install.sh, which you execute from within the dotfiles repository, and have it perform all of the necessary actions to place the right files in the right place. (Or symlink them from the dotfiles repository to the correct places.)

Keep in mind, there's no "perfect" setup except for the one that matches your brain. (We are, after all, talking about setting up your own computer.) Sophistication is also not a pre-requisite. All you need is to guarantee that your setup ends up working the way you'd like. If you want, you can use my dotfiles as a starting point, but I would strongly suggest that you customize it to your own needs!

Create shell command aliases for your commonly used commands

Why create shell aliases

Shell aliases can save you keystrokes, which save time. That time saved is compound interest over long time horizons!

How do I create aliases?

Shell aliases are easy to create. In your shell initializer script, use the following syntax, using ls being aliased to exa with configuration flags at the end as an example:

alias ls="exa --long"

Now, typing ls at the shell will instead execute exa! (To know what is exa, see Install a suite of really cool utilities on your machine using homebrew.)

Where do I store these aliases?

In order for these shell aliases to take effect each time you open up your shell, you should ensure that they get sourced in your shell initialization script (see: Take full control of your shell environment variables for more information). You have one of two options:

These aliases can be declared in your .zshrc or .bashrc (or analogous) file, or
They can be declared in ~/.aliases, which you source inside your shell initialization script file (i.e. .zshrc/.bashrc/etc.)

I recommend the second option as doing so means you'll be putting into practice the philosophy of having clear categories of things in one place.

What are some aliases that could be useful?

In my dotfiles repository, I have a .shell_aliases directory which contains a full suite of aliases that I have installed.

Other external links that showcase shell aliases that could serve as inspiration for your personal collection include:

And finally, to top it off, Twitter user @ctrlshifti suggests aliasing please to sudo for a pleasant experience at the terminal:

alias please="sudo"

# Now you type:
# please apt-get update
# please apt-get upgrade
# etc...

Configure your machine

After getting access to your development machine, you'll want to configure it and take full control over how it works. Backing the following steps are a core set of ideas:

Give yourself a rich set of commonly necessary tooling right from the beginning, but without the bloat that might be unnecessary.
Standardize your compute environment for maximal portability from computer to computer.
Build up automation to get you up and running as fast as possible.
Have full control over your system, such that you know as much about your configuration as possible.
Squeeze as much productivity out of your UI as possible.

Head over to the following pages to see how you can get things going.

Initial setup

Install homebrew on your machine (for macOS users)
Install a suite of really cool utilities on your machine using homebrew

Getting Anaconda Python installed

Master the shell

Further configuration

Advanced Stuff

Automate the bootstrapping of your new computer

Take full control of your shell environment variables

Why control your environment variables

If you're not sure what environment variables are, I have an essay on them that you can reference. Mastering environment variables is crucial for data scientists!

Your shell environment, whether it is zsh or bash or fish or something else, is supremely important. It determines the runtime environment, which in turn determines which Python you're using, whether you have proxies set correctly, and more. Rather than leave this to chance, I would recommend instead gaining full control over your environment variables.

How do I control my environment variables

The simplest way is to set them explicitly in your shell initialization script. For bash shells, it's either .bashrc or .bash_profile. For the Z shell, it'll be the .zshrc file. In there, step by step, set the environment variables that you need system-wide.

For example, explicitly set your PATH environment variable with explainers that tell future you why you ordered the PATH in a certain way.

# Start with an explicit minimal PATH
export PATH=/bin:/usr/bin:/usr/local/bin

# Add in my custom binaries that I want available across projects
export PATH=$HOME/bin:$PATH

# Add in anaconda installation path
export PATH=$HOME/anaconda/bin:$PATH

# Add more stuff below...

If you want your shell initialization script to be cleaner, you can refactor it out into a second bash script called env_vars.sh, which lives either inside your home directory or your dotfiles repository (see: Leverage dotfiles to get your machine configured quickly). Then, source the env_vars.sh script from the shell initialization script:

source ~/env_vars.sh

There may be a chance that other things, like the Anaconda installer, will give you an option to modify your shell initializer script. If so, be sure to keep this in the back of your mind. At the end, of your shell initializer script, you can echo the final state of environment variables to help you debug.

Environment variables that need to be set on a per-project basis are handled slightly differently. See Create runtime environment variable configuration files for each of your projects.

Why create a dotfiles repository

How do you structure a dotfiles repository

Pages that link here