Install homebrew on your machine

Why install Homebrew?

Your Mac comes with a lot of neat apps, but it's a bit crippled when it comes to shell utilities. (Linux machines can use Homebrew too! Read on to see when you might need it.)

As claimed, Homebrew is the missing package manager for the Mac. From it, you can get shell utilities and apps that don't come pre-installed on your computer, such as wget. Installing these shell utilities can give you a leg-up as you strive to gain mastery over your machine. (see: Install a suite of really cool utilities on your machine using homebrew)

How do we install Homebrew?

Follow the instructions on the homebrew website, but essentially, it's a one bash command install. Usually, you would copy/paste it from the homebrew website, but I've copied it over so you don't have to context-switch:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

It can be executed anywhere, but if you're feeling superstitious, you can always move to your home directory first (cd ~) before executing the command.

Once you're done...

If you're planning to install Anaconda Install Anaconda on your machine, then make sure you install wget, as my bootstrap step for installing Anaconda relies on using wget to pull the installer from the internet.

brew install wget

You can also install some other cool utilities using brew! (see: Install a suite of really cool utilities on your machine using homebrew)

What about Linux machines?

Linux machines usually come with their own package manager, such as yum on CentOS and apt on Ubuntu. If you have the necessary privileges to install packages, which usually means having sudo privileges on your machine, then you probably don't need to install Homebrew on Linux.

However, if you do not have sudo privileges on your machine, then you should consider installing Homebrew inside your home directory. This enables you to use brew to install Linux utilities that might not be built-in to your system. It's a pretty neat hack to have when you're working on a managed system, such as a high performance computing system.

Configure your machine

After getting access to your development machine, you'll want to configure it and take full control over how it works. Backing the following steps are a core set of ideas:

  • Give yourself a rich set of commonly necessary tooling right from the beginning, but without the bloat that might be unnecessary.
  • Standardize your compute environment for maximal portability from computer to computer.
  • Build up automation to get you up and running as fast as possible.
  • Have full control over your system, such that you know as much about your configuration as possible.
  • Squeeze as much productivity out of your UI as possible.

Head over to the following pages to see how you can get things going.

Initial setup

Getting Anaconda Python installed

Master the shell

Further configuration

Advanced Stuff

Install and configure git on your machine

Why do we need Git

Git is an extremely important tool! We use it to do what is known as "version control" -- the act of explicitly curating and keeping track of changes that are made to files in a repository of text files. Using Git, you can even restore files to a previous state. It's like having an extremely powerful undo button at the command line.

Knowing Git also gets you access to the world of open source tooling available on hosted version control storage providers, like GitHub, GitLab, and more.

How to install Git

Linux systems usually come with git pre-installed.

On macOS, you can type git at the Terminal, and a pop-up will show up that prompts you to install XCode and the developer tools for macOS. Accept it, and go about the rest of your day.

Sometimes, the built-in versions of git might be a bit outdated. If you want to install one of the latest versions of git, then you can use Homebrew to install Git. (see: Install homebrew on your machine)

How to configure Git with basic information

You might want to configure git with some basic information.

For example, you might need to configure Git with your username and email address, so that your commits can be attributed to your user accounts on GitHub, GitLab, or Bitbucket. To do this:

git config --global user.name "My name in quotes"
git config --global user.email "myemail@address.com"

This sets your configuration to be "global". However, you can also have "local" (i.e. per-repository) configurations, by changing the --global flag to --local:

# inside a repository, say, your company's project
git config --local user.name "My name in quotes"
git config --local user.email "myemail@company.com"

Doing so is important because you want to ensure that your Git commits are tied to the appropriate email address. Setting the "global" one gives you the convenience of setting a sane default, which you can modify by setting "local", per-repository configuration.

How to configure Git with fancy features

If you installed the cool tools from "Install a suite of really cool utilities on your machine using homebrew", then you'll be thrilled to know that you can configure Git to use diff-so-fancy to render diffs!

Follow the instructions in the diff-so-fancy repository. As of 10 December 2020, my favored set of configurations are:

git config --global core.pager "diff-so-fancy | less --tabs=4 -RFX"

git config --global color.ui true

git config --global color.diff-highlight.oldNormal    "red bold"
git config --global color.diff-highlight.oldHighlight "red bold 52"
git config --global color.diff-highlight.newNormal    "green bold"
git config --global color.diff-highlight.newHighlight "green bold 22"

git config --global color.diff.meta       "11"
git config --global color.diff.frag       "magenta bold"
git config --global color.diff.commit     "yellow bold"
git config --global color.diff.old        "red bold"
git config --global color.diff.new        "green bold"
git config --global color.diff.whitespace "red reverse"

Use docker containers for system-level packages

Why you might need to use Docker

If conda environments are such a great environment isolation tool, why would we need Docker?

That's because sometimes, your project might have an unavoidable dependency on system-level packages. I have seen some projects that use spatial mapping tooling require system-level packages. Others that depend on audio processing might require packages that can only be obtained outside of conda. In these cases, yes, installing them locally on your machine can be handy (see Install homebrew on your machine), but if you're also interested in building an app, then you'll need them packaged up inside a Docker container.

What is a Docker container? The best anchoring way to thinking about it is a fully-fledged operating system completely insulated from its host (i.e. your computer). It has no knowledge of your runtime environment variables (see: Create runtime environment variable configuration files for each of your projects and Take full control of your shell environment variables). It's like having a completely clean operating system, without the cost of buying new hardware.

How do we use Docker

I'm assuming you've already obtained Docker on your system. (see: Install Docker on your machine).

The core thing you need to know how to write is a Dockerfile. This file specifies exactly how a Docker container is to be built. The easiest way to think about the Dockerfile syntax is that it's almost bash, with a bit more additional syntax. The Docker docs give an extremely thorough tutorial. For those who are more hands-on, I recommend pair coding with another more experienced individual who is willing to teach you the ropes, to build a Docker container when it becomes relevant to your problem.

Install Anaconda on your machine

What is anaconda

Anaconda is a way to get a Python installed on your system.

One of the neat but oftentimes confusing things about Python is that you can have multiple Python executables living around on your system. Anaconda makes it easy for you to:

  1. Obtain Python
  2. Manage different Python versions into isolated environments using a consistent interface
  3. Install packages into these environments

Why use anaconda (or one of its variants)?

Why is this a good thing? Primarily because you might have individual projects that need different version of Python and different versions of packages that are built for Python. Also, default Python installations, such as the ones shipped with older versions of macOS, tend to be versions behind the latest, which is to the detriment of your projects. Some built-in apps in an operating system may depend on that old version of Python (such as iPhoto), which means if you mess up the installation, you might break those built-in apps. Hence, you will want a tool that lets you easily create isolated Python environments.

The Anaconda Python distribution fulfills the following key needs:

  1. You'll be able to create isolated environments on a per-project basis. (see: Follow the rule of one-to-one in managing your projects)
  2. You'll be able to install packages into those isolated environments, and evolve them over time. (see: Create one conda environment per project)

Installing Anaconda on your local machine thus helps you get easy access to Python, Jupyter (see: Use Jupyter as an experimentation playground), and other tools for modelling and analysis.

How to get anaconda?

To install the Miniforge variant of Anaconda, which will be lighter-weight than the full Anaconda distribution, using the following command:

cd ~
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" -O anaconda.sh

This will send you to your home directory, and then download the Miniforge bash script installer from Anaconda's download page as anaconda.sh.

Now, install Anaconda:

bash anaconda.sh -b -p $HOME/anaconda/

This will install the Anaconda distribution of Python onto your system inside your home directory. You can now install packages at will, without needing sudo privileges!

Next steps

Level-up your conda skills

Install a suite of really cool utilities on your machine using homebrew

What utilities are recommended?

gcc

Install gcc if you want to have the GNU C compiler available on your Mac at the same time as the clang C compiler.

C compilers come in handy for some numerical computing packages, which multiple data science languages (Julia, Python, R) depend on.

mobile-shell

If you've ever been disconnected from SSH because of a flaky internet connection, mosh can be your saviour. Check out the tool's homepage.

tmux

This is a tool for multiplexing your shell sessions -- uber handy if you want to persist a shell session on a remote server even after you disconnect. If you're of the type who has a habit of creating new shell sessions for every project, then tmux might be able to help you get things under control. Check out the tool's homepage.

tree

The tree command line tool allows you to see the file tree at the terminal. If you pair it with exa, you will have an upgraded file tree experience. See its homepage.

exa

exa is next-level ls (which is used to list files in a directory). According to the website, "A modern replacement for ls". See the homepage. If you alias ls to exa, it's next-level convenience! (see Create shell command aliases for your commonly used commands)

ripgrep

ripgrep provides a command line tool rg, which recursively scans down the file tree from the current directory for files that contain text that you want to search. Its Github repo should reveal all of its secrets.

diff-so-fancy

This gives you a tool for viewing differences between files, aka "diffs". Check out its Github repo for more information. You can also configure git to use diff-so-fancy to render diffs at the terminal. (see: Install and configure git on your machine)

bat

bat is next-level cat, which is a utility for viewing text files in the terminal. Check out the Github repository for what you get. You can alias cat to bat, and in that way, not need to deviate from muscle memory to use bat.

fd

fd gives you a faster replacement for the shell tool find, which you can use to find files by name. Check out the Github repository to learn more about it.

fzf

On recommendation from my colleague Arkadij Kummer, grab fzf to have extremely fast fuzzy text search on the filesystem. Check out the project's GitHub repository!

croc

Use croc as a tool to move data from one machine to another easily in a secure fashion. (I have used this in lieu of commercial utilities that cost tens of thousands of dollars in license fees.) Check out the project's GitHub repository!

Install these really cool utilities

Now that you've read about these utilities' reason for existence, go ahead and install them!

brew install \
	git gcc tmux wget mobile-shell \
	diff-so-fancy ripgrep bat fd fzf croc

See also