Configure your shell
As a data scientist, odds are that you are going to be working in a terminal shell. It will absolutely pay off to invest time in making sure your shell is configured for your maximal productivity.
Install Starship
The default or vanilla prompt that ships with most machines is highly uninformative. Yours may look like any one of the following:
%
#
$
This really doesn't tell you much! And yet, the potential of all of that blank space on your terminal screen is immense! At a glance, it's super helpful to know things like:
- Where am I in the file tree?
- What is my current
git
working branch? - Which environment is currently activated?
In principle, these are all displayable at your shell prompt, allowing you to have this information is at your fingertips. Starship gives you an easy way to make this available and is installable by a one-line shell command:
curl -sS https://starship.rs/install.sh | sh
Once Starship is installed, you'll get a prompt that looks something like mine:
data-science-bootstrap-notes on second-edition-rewrite [$!?] is π¦ v1.0.0 via π v3.12.4
β―
At a glance, it tells me that:
- the Python on my PATH is version 3.12.4 (
π v3.12.4
), - that this book is currently at version 1.0 (
π¦ v1.0.0
), - and that I'm on the branch
second-edition-rewrite
- with uncommitted changes (
[$!?]
) - on this book's cloned repository (
data-science-bootstrap-notes
).
Configure environment variables
If you're not sure what environment variables are, I have an essay on them that you can reference. Mastering environment variables is crucial for data scientists!
Your shell environment, whether it is zsh or bash or fish or something else, is supremely important. It determines the runtime environment, which in turn determines which Python you're using, whether you have proxies set correctly, and more. Rather than leave this to chance, I would recommend instead gaining full control over your environment variables.
The simplest way is to set them explicitly in your shell initialization script. For bash shells, it's either .bashrc
or .bash_profile
. For the Z shell, it'll be the .zshrc
file. In there, step by step, set the environment variables that you need system-wide.
For example, explicitly set your PATH
environment variable with explainers that tell future you why you ordered the PATH in a certain way.
# Start with an explicit minimal PATH
export PATH=/bin:/usr/bin:/usr/local/bin
# Add in my custom binaries that I want available across projects
export PATH=$HOME/bin:$PATH
# Add in anaconda installation path
export PATH=$HOME/.pixi/bin:$PATH
# Add more stuff below to your heart's content...
If you want your shell initialization script to be cleaner, you can refactor it out into a second bash script called env_vars.sh
, which lives either inside your home directory or your [dotfiles repository]1. Then, source the env_vars.sh
script from the shell initialization script:
source ~/env_vars.sh
There may be a chance that other programs, such as the pixi
installer, will give you an option to modify your shell initializer script. If so, be sure to keep this in the back of your mind. You can always echo the final state of environment variables to help you debug:
env
And the most important one to look out for is the PATH
variable:
data-science-bootstrap-notes on ξ second-edition-rewrite [$!] is π¦ v1.0.0 via π v3.12.4 on βοΈ ericmajinglong@gmail.com
β― echo $PATH | tr ':' '\n'
/opt/homebrew/bin
/Users/ericmjl/bin
/Applications/quarto/bin
/opt/homebrew/bin
/opt/homebrew/sbin
/usr/local/bin
/System/Cryptexes/App/usr/bin
/usr/bin
/bin
/usr/sbin
/sbin
This will give you a list of all the directories in your PATH
variable, in order of priority.
Global vs. Project-Specific Environment Variables
Your shell environment variables are considered "global" environment variables that are set across every terminal session that you're logged into. On the other hand, there are "project-specific" environment variables which are set on a per-project basis. These should be set within a project-specific .env
file that lives within a code repository.
Create shell aliases
Shell aliases can save you keystrokes, which in turn saves you time time. That time saved is compound interest over long time horizons!
Shell aliases are easy to create. In your shell initializer script, use the following syntax, using ls
being aliased to exa
with configuration flags at the end as an example:
alias ls="exa --long"
Now, typing ls
at the shell will instead execute exa
! (exa
is one of the system-level software that I recommend installing.)
In order for these shell aliases to take effect each time you open up your shell, you should ensure that they get sourced in your shell initialization script such as ~/.bashrc
or ~/.zshrc
. You have one of two options:
- These aliases can be declared in your
.zshrc
or.bashrc
(or analogous) file, or - They can be declared in
~/.aliases
, which you source inside your shell initialization script file (i.e..zshrc
/.bashrc
/etc.)
The latter is done using:
# put this line in your ~/.bashrc
source /path/to/.aliases
And the contents of your .aliases
file is exactly what I showed above.
Of the two options above, I recommend the second as doing so means you'll be putting into practice the philosophy of having clear categories of things in one place.
In my dotfiles repository, I have a .shell_aliases
directory which contains a full suite of aliases that I have installed. Some of my most commonly used ones are for git
commands, such as:
# either your ~/.bashrc or ~/.aliases file that gets sourced in ~/.bashrc or ~/.zshrc
alias gc="git commit"
alias ga="git add"
alias gs="git status"
alias gk="git checkout"
alias gm="git merge"
alias gpl="git pull"
alias gps="git push"
alias gacp="git add . && git commit && git push"
Other external links that showcase shell aliases that could serve as inspiration for your personal collection include:
- Bash aliases you canβt live without
- 10 handy Bash aliases for Linux
- vikaskyadav/awesome-bash-alias
- 30 Handy Bash Shell Aliases For Linux / Unix / MacOS
- A developer's way of using shell aliases
And finally, to top it off, Twitter user @ctrlshifti suggests aliasing please to sudo for a pleasant experience at the terminal:
alias please="sudo"
# Now you type:
# please apt-get update
# please apt-get upgrade
# etc...
Troubleshooting
Starship prompt not showing
If you still see %
or $
after installation:
-
Check your shell configuration:
If it showsecho $SHELL
/bin/zsh
, make sure you added the Starship init to~/.zshrc
-
Verify the init line is correct:
Should show:grep -n "starship" ~/.zshrc
eval "$(starship init zsh)"
-
Restart your terminal completely (don't just source the file)
-
Check if Starship is in your PATH:
Should show something likewhich starship
/usr/local/bin/starship
or/home/user/.local/bin/starship
Environment variables not working
If your environment variables aren't being set:
-
Check if your shell config file is being sourced:
Should show your shell (e.g.,echo $0
-zsh
or-bash
) -
Verify your config file exists:
ls -la ~/.zshrc # or ~/.bashrc
-
Test sourcing manually:
source ~/.zshrc # or ~/.bashrc
PATH issues
If commands aren't found:
-
Check your current PATH:
echo $PATH | tr ':' '\n'
-
Verify the directory exists:
ls -la $HOME/.pixi/bin
-
Add to PATH manually to test:
export PATH=$HOME/.pixi/bin:$PATH
Still not working?
Try the manual installation method from the Starship documentation.
Quick Reference
Essential Commands
# Check your shell
echo $SHELL
# Check your PATH
echo $PATH | tr ':' '\n'
# Reload shell configuration
source ~/.zshrc # or ~/.bashrc
# Check if a command exists
which command_name
# View all environment variables
env | grep VARIABLE_NAME
Common Aliases to Add
# Git shortcuts
alias gc="git commit"
alias ga="git add"
alias gs="git status"
alias gk="git checkout"
# Navigation shortcuts
alias ..="cd .."
alias ...="cd ../.."
alias ll="ls -la"
# System shortcuts
alias please="sudo"
alias ports="lsof -i -P -n | grep LISTEN"
-
dotfiles are named as such because they begin with the
.
character. Examples of these are.bashrc
,.zshrc
, and other shell configuration files. By convention, they serve the role of a configuration file that configures the behaviour of a program. ↩