Install homebrew on your machine
Your Mac comes with a lot of neat apps, but it's a bit crippled when it comes to shell utilities. (Linux machines can use Homebrew too! Read on to see when you might need it.)
As claimed, Homebrew is the missing package manager for the Mac. From it, you can get shell utilities and apps that don't come pre-installed on your computer, such as
wget. Installing these shell utilities can give you a leg-up as you strive to gain mastery over your machine. (see: Install a suite of really cool utilities on your machine using homebrew)
Follow the instructions on the homebrew website, but essentially, it's a one bash command install. Usually, you would copy/paste it from the homebrew website, but I've copied it over so you don't have to context-switch:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
It can be executed anywhere, but if you're feeling superstitious, you can always move to your home directory first (
cd ~) before executing the command.
If you're planning to install Anaconda Install Anaconda on your machine, then make sure you install
wget, as my bootstrap step for installing Anaconda relies on using
wget to pull the installer from the internet.
brew install wget
You can also install some other cool utilities using brew! (see: Install a suite of really cool utilities on your machine using homebrew)
Linux machines usually come with their own package manager, such as
yum on CentOS and
apt on Ubuntu. If you have the necessary privileges to install packages, which usually means having
sudo privileges on your machine, then you probably don't need to install Homebrew on Linux.
However, if you do not have
sudo privileges on your machine, then you should consider installing Homebrew inside your home directory. This enables you to use
brew to install Linux utilities that might not be built-in to your system. It's a pretty neat hack to have when you're working on a managed system, such as a high performance computing system.
Configure your machine
After getting access to your development machine, you'll want to configure it and take full control over how it works. Backing the following steps are a core set of ideas:
Head over to the following pages to see how you can get things going.
Install a suite of really cool utilities on your machine using homebrew
Install gcc if you want to have the GNU C compiler available on your Mac at the same time as the clang C compiler.
C compilers come in handy for some numerical computing packages, which multiple data science languages (Julia, Python, R) depend on.
If you've ever been disconnected from SSH because of a flaky internet connection, mosh can be your saviour. Check out the tool's homepage.
This is a tool for multiplexing your shell sessions -- uber handy if you want to persist a shell session on a remote server even after you disconnect. If you're of the type who has a habit of creating new shell sessions for every project, then
tmux might be able to help you get things under control. Check out the tool's homepage.
tree command line tool allows you to see the file tree at the terminal. If you pair it with
exa, you will have an upgraded file tree experience. See its homepage.
exa is next-level
ls (which is used to list files in a directory). According to the website, "A modern replacement for
ls". See the homepage. If you alias
exa, it's next-level convenience! (see Create shell command aliases for your commonly used commands)
ripgrep provides a command line tool
rg, which recursively scans down the file tree from the current directory for files that contain text that you want to search. Its Github repo should reveal all of its secrets.
This gives you a tool for viewing differences between files, aka "diffs". Check out its Github repo for more information. You can also configure
git to use diff-so-fancy to render diffs at the terminal. (see: Install and configure git on your machine)
bat is next-level
cat, which is a utility for viewing text files in the terminal. Check out the Github repository for what you get. You can alias
bat, and in that way, not need to deviate from muscle memory to use
fd gives you a faster replacement for the shell tool
find, which you can use to find files by name. Check out the Github repository to learn more about it.
On recommendation from my colleague Arkadij Kummer, grab
fzf to have extremely fast fuzzy text search on the filesystem. Check out the project's GitHub repository!
croc as a tool to move data from one machine to another easily in a secure fashion. (I have used this in lieu of commercial utilities that cost tens of thousands of dollars in license fees.) Check out the project's GitHub repository!
Now that you've read about these utilities' reason for existence, go ahead and install them!
brew install \ git gcc tmux wget mobile-shell \ diff-so-fancy ripgrep bat fd fzf
Install Anaconda on your machine
Anaconda is a way to get a Python installed on your system.
One of the neat but oftentimes confusing things about Python is that you can have multiple Python executables living around on your system. Anaconda makes it easy for you to:
Why is this a good thing? Primarily because you might have individual projects that need different version of Python and different versions of packages that are built for Python. Also, default Python installations, such as the ones shipped with older versions of macOS, tend to be versions behind the latest, which is to the detriment of your projects. Some built-in apps in an operating system may depend on that old version of Python (such as iPhoto), which means if you mess up the installation, you might break those built-in apps. Hence, you will want a tool that lets you easily create isolated Python environments.
The Anaconda Python distribution fulfills the following key needs:
Installing Anaconda on your local machine thus helps you get easy access to Python, Jupyter (see: Use Jupyter as an experimentation playground), and other tools for modelling and analysis.
If you're on macOS: I'm assuming you have installed
homebrew (see: Install homebrew on your machine) and
wget. Then, install Miniconda, which will be a lighter-weight installer, using the following command:
cd ~ wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O anaconda.sh
This will send you to your home directory, and then download the Miniconda bash script installer from Anaconda's download page.
If you're on Linux: Make sure you have
wget available on your system. Then:
cd ~ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O anaconda.sh
This will download the Miniconda installer for Linux operating sytems onto your home directory.
If you don't have
wget: You can head over to the Miniconda docs and download the bash installer to whatever location you want (the home directory is a convenient place). Rename it to
anaconda.sh to stay compatible with the instructions below.
Now, install Anaconda:
bash anaconda.sh -b -p $HOME/anaconda/
This will install the Anaconda distribution of Python onto your system inside your home directory. You can now install packages at will, without needing sudo privileges!
Install and configure git on your machine
Git is an extremely important tool! We use it to do what is known as "version control" -- the act of explicitly curating and keeping track of changes that are made to files in a repository of text files. Using Git, you can even restore files to a previous state. It's like having an extremely powerful undo button at the command line.
Knowing Git also gets you access to the world of open source tooling available on hosted version control storage providers, like GitHub, GitLab, and more.
Linux systems usually come with
On macOS, you can type
git at the Terminal, and a pop-up will show up that prompts you to install XCode and the developer tools for macOS. Accept it, and go about the rest of your day.
Sometimes, the built-in versions of
git might be a bit outdated. If you want to install one of the latest versions of
git, then you can use Homebrew to install Git. (see: Install homebrew on your machine)
You might want to configure
git with some basic information.
For example, you might need to configure Git with your username and email address, so that your commits can be attributed to your user accounts on GitHub, GitLab, or Bitbucket. To do this:
git config --global user.name "My name in quotes" git config --global user.email "email@example.com"
This sets your configuration to be "global". However, you can also have "local" (i.e. per-repository) configurations, by changing the
--global flag to
# inside a repository, say, your company's project git config --local user.name "My name in quotes" git config --local user.email "firstname.lastname@example.org"
Doing so is important because you want to ensure that your Git commits are tied to the appropriate email address. Setting the "global" one gives you the convenience of setting a sane default, which you can modify by setting "local", per-repository configuration.
If you installed the cool tools from "Install a suite of really cool utilities on your machine using homebrew", then you'll be thrilled to know that you can configure Git to use diff-so-fancy to render diffs!
Follow the instructions in the diff-so-fancy repository. As of 10 December 2020, my favored set of configurations are:
git config --global core.pager "diff-so-fancy | less --tabs=4 -RFX" git config --global color.ui true git config --global color.diff-highlight.oldNormal "red bold" git config --global color.diff-highlight.oldHighlight "red bold 52" git config --global color.diff-highlight.newNormal "green bold" git config --global color.diff-highlight.newHighlight "green bold 22" git config --global color.diff.meta "11" git config --global color.diff.frag "magenta bold" git config --global color.diff.commit "yellow bold" git config --global color.diff.old "red bold" git config --global color.diff.new "green bold" git config --global color.diff.whitespace "red reverse"
Use docker containers for system-level packages
If conda environments are such a great environment isolation tool, why would we need Docker?
That's because sometimes, your project might have an unavoidable dependency on system-level packages. I have seen some projects that use spatial mapping tooling require system-level packages. Others that depend on audio processing might require packages that can only be obtained outside of
conda. In these cases, yes, installing them locally on your machine can be handy (see Install homebrew on your machine), but if you're also interested in building an app, then you'll need them packaged up inside a Docker container.
What is a Docker container? The best anchoring way to thinking about it is a fully-fledged operating system completely insulated from its host (i.e. your computer). It has no knowledge of your runtime environment variables (see: Create runtime environment variable configuration files for each of your projects and Take full control of your shell environment variables). It's like having a completely clean operating system, without the cost of buying new hardware.
I'm assuming you've already obtained Docker on your system. (see: Install Docker on your machine).
The core thing you need to know how to write is a
Dockerfile. This file specifies exactly how a Docker container is to be built. The easiest way to think about the
Dockerfile syntax is that it's almost bash, with a bit more additional syntax. The Docker docs give an extremely thorough tutorial. For those who are more hands-on, I recommend pair coding with another more experienced individual who is willing to teach you the ropes, to build a Docker container when it becomes relevant to your problem.