Configure your machine
After getting access to your development machine, you'll want to configure it and take full control over how it works. Backing the following steps are a core set of ideas:
Head over to the following pages to see how you can get things going.
Bootstrap a scratch conda environment
In a pinch, you might want to muck around on your system with some quick-and-dirty experiment. Having a suite of packages inside a scratch environment can be handy. Your scratch environment can be your base environment if you'd like, but I would strongly recommend creating a separate scratch environment instead.
I would recommend that you bootstrap a scratch conda environment with some basic data science packages.
mamba activate base
mamba install -c conda-forge \
scipy numpy pandas matplotlib \
numpy jupyter jupyterlab \
scikit-learn ipython ipykernel \
ipywidgets mamba
(Replace mamba
with conda
if you don't have mamba
installed on your system.)
Doing so gives you an environment where you can quickly prototype new things without necessarily going through the overhead of creating an entirely new project (and with it, a full conda environment).
Installing mamba
can be helpful if you want a faster drop-in replacement for conda
.
(see: Use Mamba as a faster drop-in replacement for conda for more information.)
Take full control of your shell environment variables
If you're not sure what environment variables are, I have an essay on them that you can reference. Mastering environment variables is crucial for data scientists!
Your shell environment, whether it is zsh or bash or fish or something else, is supremely important. It determines the runtime environment, which in turn determines which Python you're using, whether you have proxies set correctly, and more. Rather than leave this to chance, I would recommend instead gaining full control over your environment variables.
The simplest way is to set them explicitly in your shell initialization script. For bash shells, it's either .bashrc
or .bash_profile
. For the Z shell, it'll be the .zshrc
file. In there, step by step, set the environment variables that you need system-wide.
For example, explicitly set your PATH
environment variable with explainers that tell future you why you ordered the PATH in a certain way.
# Start with an explicit minimal PATH
export PATH=/bin:/usr/bin:/usr/local/bin
# Add in my custom binaries that I want available across projects
export PATH=$HOME/bin:$PATH
# Add in anaconda installation path
export PATH=$HOME/anaconda/bin:$PATH
# Add more stuff below...
If you want your shell initialization script to be cleaner, you can refactor it out into a second bash script called env_vars.sh
, which lives either inside your home directory or your dotfiles repository (see: Leverage dotfiles to get your machine configured quickly). Then, source the env_vars.sh
script from the shell initialization script:
source ~/env_vars.sh
There may be a chance that other things, like the Anaconda installer, will give you an option to modify your shell initializer script. If so, be sure to keep this in the back of your mind. At the end, of your shell initializer script, you can echo the final state of environment variables to help you debug.
Environment variables that need to be set on a per-project basis are handled slightly differently. See Create runtime environment variable configuration files for each of your projects.
Automate the bootstrapping of your new computer
In automating your shell's configuration,
you save yourself time each time you get access to a new computer.
That is the primary value proposition of automation!
No more spending 2-3 hours setting things up.
Instead, simply type ./install.sh
at the terminal!
The best way I would recommend doing this is by creating a dotfiles
repository.
(see: Leverage dotfiles to get your machine configured quickly)
Place every file needed for shell initialization inside there --
primarily, I mean the .zshrc
or .bashrc
/.bash_profile
files,
and any other files on which you depend.
Then, create the main script install.sh
,
which you execute from within the dotfiles
repository,
and have it perform all of the necessary actions
to place the right files in the right place.
(Or symlink them from the dotfiles
repository to the correct places.)
Keep in mind, there's no "perfect" setup except for the one that matches your brain. (We are, after all, talking about setting up your own computer.) Sophistication is also not a pre-requisite. All you need is to guarantee that your setup ends up working the way you'd like. If you want, you can use my dotfiles as a starting point, but I would strongly suggest that you customize it to your own needs!
Create shell command aliases for your commonly used commands
Shell aliases can save you keystrokes, which save time. That time saved is compound interest over long time horizons!
Shell aliases are easy to create. In your shell initializer script, use the following syntax, using ls
being aliased to exa
with configuration flags at the end as an example:
alias ls="exa --long"
Now, typing ls
at the shell will instead execute exa
! (To know what is exa
, see Install a suite of really cool utilities on your machine using homebrew.)
In order for these shell aliases to take effect each time you open up your shell, you should ensure that they get sourced in your shell initialization script (see: Take full control of your shell environment variables for more information). You have one of two options:
.zshrc
or .bashrc
(or analogous) file, or~/.aliases
, which you source inside your shell initialization script file (i.e. .zshrc
/.bashrc
/etc.)I recommend the second option as doing so means you'll be putting into practice the philosophy of having clear categories of things in one place.
In my dotfiles repository, I have a .shell_aliases
directory which contains a full suite of aliases that I have installed.
Other external links that showcase shell aliases that could serve as inspiration for your personal collection include:
And finally, to top it off, Twitter user @ctrlshifti suggests aliasing please to sudo for a pleasant experience at the terminal:
alias please="sudo"
# Now you type:
# please apt-get update
# please apt-get upgrade
# etc...
Configure your conda installation
Configuring some things with conda can help lubricate your interactions with the conda package manager. It will save you keystrokes at the terminal, primarily, thus saving you time. The place to do this configuration is in the .condarc
file, which the conda
package manager searches for by default in your user's home directory.
The condarc docs are your best bet for the full configuration, but I have some favourites that I'm more than happy to share below.
Firstly, you create a file in your home directory called .condarc
. Then edit it to have the following contents:
channels:
- conda-forge
- defaults
auto_update_conda: True
always_yes: True
auto_update_conda
saves me from having to update conda all the time,always_yes
lets me always answer y
to the conda installation and update prompts.conda-forge
as the default channel above the defaults
channel allows me to type conda install some_package
rather than conda install -c conda-forge some_package
each time I want to install a package, as conda will prioritize channels according to their order under the channels
section.If you prefer, you can set the channel priorities in a different order and/or expand the list. For example, bioinformatics users may want to add in the bioconda
channel, while R users may want to add in the r
channel. Users who prefer stability may want to prioritize defaults
ahead of conda-forge
.
What this affects is how conda
will look for packages when you execute the conda install
command. However, it doesn't affect the channel priority in your per-project environment.yml
file (see: Create one conda environment per project).
Configure Jupyter and Jupyter Lab
Project Jupyter is a cornerstone of the data science ecosystem. When getting set up to work with Jupyter, here are some settings that you may wish to configure if you are running your own Jupyter server (and don't have a managed one provided).
The best way to learn about Jupyter's configuration system is to read the official documentation. However, I will strive to provide an up-to-date overview here.
You can find Jupyter configuration files in your home directory at the directory ~/.jupyter/
. Jupyter's server, which is what you execute when you execute jupyter lab
(as of Jan 2021, the lab interface is the future of Jupyter).
To configure Jupyter's behaviour, you usually start by generating the configuration file:
jupyter lab --generate-config
Now, there will be a file created at ~/.jupyter/jupyter_lab_config.py
. You can edit that file to configure Jupyter's behaviour.
0.0.0.0
By convention, Jupyter's server runs on 127.0.0.1
, which is aliased by localhost
. Usually, this is not a problem if you run the server from your local machine and then access it by a browser on the same device. However, this configuration can get troublesome if you run Jupyter on a remote workstation or server.
Here's why.
When configured to run on 127.0.0.1
, the Jupyter server will only accept browser requests originating from the local machine. It will deny browser requests that originate from another device will. However, when the Jupyter server runs on 0.0.0.0
instead, it will be able to accept browser requests that originate from outside the server itself (i.e. your laptop).
Are there risks to serving the notebook up on 0.0.0.0
? Yes, the primary one is that your notebook server is now accessible to anybody on the same network as the machine that is serving up Jupyter. To mitigate that risk, Jupyter has "token-based authentication" turned on by default; you might also want to consider turning on password protection.
You can turn on password protection by running the following command at the terminal:
jupyter lab password
Jupyter will prompt you for a password and store a hash of the password in the file ~/.jupyter/jupyter_server_config.json
. (You don't need to edit this file!)
When you enable password protection, you will be prompted for a password when you access the Jupyter lab interface.
Sometimes you and your colleagues might share the same workstation and run your Jupyter servers on there. Because the default server configuration starts at port 8888
, you might end up with the following scenario (primarily if you practice running one Jupyter server per project):
| Notebook port | Owner | |:--------:|:----:| | 8888 | yourself | | 8889 | yourself | | 8890 | your colleague | | 8891 | yourself |
To avoid this scenario, you can agree with your colleague that they can keep their configuration, and you can start your port numbers at 9000 (or 10000). To do so, open up ~/.jupyter/jupyter_lab_config.py
and set the config traitlet below:
c.ServerApp.port = 9000
Now, your port numbers will start from port 9000 upwards, helping to keep your port number range and your colleagues' port number range effectively separated.
Get bootstrapped on your data science projects
I'm super glad you made it to my knowledge base on bootstrapping your data science machine - otherwise known as getting set up for success with great organization and sane structure. The content inside here has been battle-tested through real-world experience with colleagues and others skilled in their computing domains, but a bit new to the modern tooling offered to us in the data science world.
This knowledge base exists because I want to encourage more data scientists to adopt sane practices and conventions that promote collaboration and reproducibility in our data work. These are practices that, through years of practice in developing data projects and open source software, I have come to see the importance of.
The most important thing I'm assuming about you, the reader, is that you have experienced the same challenges I encountered when structure and workflow were absent from my work. I wrote down this knowledge base for your benefit. Based on one decade (as of 2023) of continual refinement, you'll learn how to:
Because I'm a Pythonista who uses Jupyter and VSCode, some tips are specific to the language and these tools. However, being a Python programmer isn't a hard requirement. More than the specifics, I hope this knowledge base imparts to you a particular philosophy of how to work. That philosophy should be portable across languages and tooling, though having specific tooling can sometimes help you adhere to the philosophy. To read more about the philosophies behind this knowledge base, check out the page: The philosophies that ground the bootstrap.
As you grow in your knowledge and skillsets, this knowledge base should help you keep an eye out for critical topics you might want to learn.
If you're looking to refine your skillsets, this knowledge graph should give you the base from which you dive deeper into specific tools.
If you're a seasoned data practitioner, this guide should be able to serve you the way it helps me: as a cookbook/recipe guide to remind you of things when you forget them.
The things you'll learn here cover the first steps, starting at configuring your laptop or workstation for data science development up to some practices that help you organize your projects, regardless of where you do your computing.
I have a recommended order below, based on my experience with colleagues and other newcomers to a project:
However, you may wish to follow the guide differently and not read it in the way I prescribed above. That's not a problem! The online version is intentionally structured as a knowledge graph and not a book so that you can explore it on your own terms.
As you go through this content, I would also encourage you to keep in mind: Time will distill the best practices in your context. Don't feel pressured to apply every single thing you see here to your project. Incrementally adopt these practices as they make sense. They're all quite composable with one another.
Not everything written here is applicable to every single project. Indeed, rarely do I use 100% of everything I've written here. Sometimes, my projects end up being more software tool development oriented, and hence I use a lot of the software-oriented ideas. Sometimes my projects are one-off, and so I ease off on the reproducibility aspect. Most of the time, my projects require a lot of exploratory work beyond simple exploratory data analysis, and imposing structure early on can be stifling for the project.
So rather than see this collection of notes as something that we must impose on every project, I would encourage you to be picky and choosy, and use only what helps you, in a just-in-time fashion, to increase your effectiveness in a project. Just-in-time adoption of a practice or a tool is preferable, because doing so eases the pressure to be rigidly complete from the get-go. In my own work, I incorporate a practice into the project just-in-time as I sense the need for it.
Moreover, as my colleague Zachary Barry would say, none of these practices can be mastered overnight. It takes running into walls to appreciate why these practices are important. For an individual who has not yet encountered problems with disorganized code, multiple versions of the same dataset, and other issues I describe here, it is difficult to deeply appreciate why it matters to apply simple and basic software development practices to your data science work. So I would encourage you to use this knowledge base as a reference tool that helps you find out, in a just-in-time fashion, a practice or tool that helps you solve a problem.
If you wish to support the project, there are a few ways:
Firstly, I spent some time linearizing this content based on my experience guiding skilled newcomers to the DS world. That's available on the eBook version on LeanPub. If you purchase a copy, you will get instructions to access the repository that houses the book source and automation to bootstrap each of your Python data science projects easily!
Secondly, you can support my data science education work on Patreon! My supporters get early access to the data science content that I make.
Finally, if you have a question regarding the content, please feel free to reach out on LinkedIn. (If I make substantial edits on the basis of your comments or questions, I might reach out to you to offer a free copy of the eBook!)
Install zsh and oh-my-zsh for shell hacks
The default shell on most Linux systems is bash
. While bash
is nice, in its vanilla state, you can't get a ton of information at a glance. For example, your bash shell might not be configured by default to:
To illustrate, at a vanilla bash shell, you usually are given:
username@hostname $
With some of the zsh
themes, with minimal configuration, you get a shell that looks like this:
username@hostname at /path/to/current/dir (yyyy-mm-dd hh:ss) [+] $
The [+]
at the end gives us the Git status of a directory that is also a Git repository.
The Z-shell, or zsh
, as well as other shells like the fish
shell, are alternative shells that use Bash-compatible syntax to interact with them, but come pre-configured with themes that you can apply. The zsh
is especially handy for this.
Of course, fancier bash prompts are one thing, but each shell comes with its own set of cool tooling to enhance your productivity at the terminal.
If you're on a fresh install of the latest versions of macOS, the zsh
is already your default shell. You probably want to then install oh-my-zsh
. Follow instructions on the project's webpage. Then, pick one of the themes that you like, and configure your shell that way.
Install homebrew on your machine
Your Mac comes with a lot of neat apps, but it's a bit crippled when it comes to shell utilities. (Linux machines can use Homebrew too! Read on to see when you might need it.)
As claimed, Homebrew is the missing package manager for the Mac. From it, you can get shell utilities and apps that don't come pre-installed on your computer, such as wget
. Installing these shell utilities can give you a leg-up as you strive to gain mastery over your machine. (see: Install a suite of really cool utilities on your machine using homebrew)
Follow the instructions on the homebrew website, but essentially, it's a one bash command install. Usually, you would copy/paste it from the homebrew website, but I've copied it over so you don't have to context-switch:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
It can be executed anywhere, but if you're feeling superstitious, you can always move to your home directory first (cd ~
) before executing the command.
If you're planning to install Anaconda Install Anaconda on your machine, then make sure you install wget
, as my bootstrap step for installing Anaconda relies on using wget
to pull the installer from the internet.
brew install wget
You can also install some other cool utilities using brew! (see: Install a suite of really cool utilities on your machine using homebrew)
Linux machines usually come with their own package manager, such as yum
on CentOS and apt
on Ubuntu. If you have the necessary privileges to install packages, which usually means having sudo
privileges on your machine, then you probably don't need to install Homebrew on Linux.
However, if you do not have sudo
privileges on your machine, then you should consider installing Homebrew inside your home directory. This enables you to use brew
to install Linux utilities that might not be built-in to your system. It's a pretty neat hack to have when you're working on a managed system, such as a high performance computing system.
Leverage dotfiles to get your machine configured quickly
Your dotfiles control the baseline of your computing environment. Creating a dotfiles repository lets you version control it, make a backup of it on a hosted version control site (like Github or Bitbucket) and quickly deploy it to a new system.
It's really up to you, but you want to make sure that you capture all of the .some_file_extension
files stored in your home directory that are also important for your shell runtime environment.
For example, you might want to include your .zshrc
or your .bashrc
files, i.e. the shell initialization scripts.
You might also want to refactor out some pieces from the .zshrc
and put them into separate files that get sourced inside those files. For example, I have two, one for the PATH
environment variable named .path
(see: Take full control of your shell environment variables) and one for aliases named .aliases
(see: Create shell command aliases for your commonly used commands). You can source these files in the .zshrc
file, so I have everything defined in .path
and .aliases
available to me.
You can also create an install.sh
script that, when executed at the shell, symlinks all the files from the dotfiles directory into the home directory or copies them. (I usually opt to symlink because I can apply updates more easily.) The install.sh
script can be as simple as:
cp .zshrc $HOME/.zshrc
cp .path $HOME/.path
cp .aliases $HOME/.aliases
Everything outlined above forms the basis of your bootstrap for a new computer, which I alluded to in Automate the bootstrapping of your new computer.
If you want to see a few examples of dotfiles in action, check out the following repositories and pages:
From the official "dotfiles" GitHub pages:
My own dotfiles: ericmjl/dotfiles which are inspired by mathiasbynens/dotfiles
index
I'm super glad you made it to my knowledge base on bootstrapping your data science machine - otherwise known as getting set up for success with great organization and sane structure. The content inside here has been battle-tested through real-world experience with colleagues and others skilled in their computing domains, but a bit new to the modern tooling offered to us in the data science world.
This knowledge base exists because I want to encourage more data scientists to adopt sane practices and conventions that promote collaboration and reproducibility in our data work. These are practices that, through years of practice in developing data projects and open source software, I have come to see the importance of.
The most important thing I'm assuming about you, the reader, is that you have experienced the same challenges I encountered when structure and workflow were absent from my work. I wrote down this knowledge base for your benefit. Based on one decade (as of 2023) of continual refinement, you'll learn how to:
Because I'm a Pythonista who uses Jupyter and VSCode, some tips are specific to the language and these tools. However, being a Python programmer isn't a hard requirement. More than the specifics, I hope this knowledge base imparts to you a particular philosophy of how to work. That philosophy should be portable across languages and tooling, though having specific tooling can sometimes help you adhere to the philosophy. To read more about the philosophies behind this knowledge base, check out the page: The philosophies that ground the bootstrap.
As you grow in your knowledge and skillsets, this knowledge base should help you keep an eye out for critical topics you might want to learn.
If you're looking to refine your skillsets, this knowledge graph should give you the base from which you dive deeper into specific tools.
If you're a seasoned data practitioner, this guide should be able to serve you the way it helps me: as a cookbook/recipe guide to remind you of things when you forget them.
The things you'll learn here cover the first steps, starting at configuring your laptop or workstation for data science development up to some practices that help you organize your projects, regardless of where you do your computing.
I have a recommended order below, based on my experience with colleagues and other newcomers to a project:
However, you may wish to follow the guide differently and not read it in the way I prescribed above. That's not a problem! The online version is intentionally structured as a knowledge graph and not a book so that you can explore it on your own terms.
As you go through this content, I would also encourage you to keep in mind: Time will distill the best practices in your context. Don't feel pressured to apply every single thing you see here to your project. Incrementally adopt these practices as they make sense. They're all quite composable with one another.
Not everything written here is applicable to every single project. Indeed, rarely do I use 100% of everything I've written here. Sometimes, my projects end up being more software tool development oriented, and hence I use a lot of the software-oriented ideas. Sometimes my projects are one-off, and so I ease off on the reproducibility aspect. Most of the time, my projects require a lot of exploratory work beyond simple exploratory data analysis, and imposing structure early on can be stifling for the project.
So rather than see this collection of notes as something that we must impose on every project, I would encourage you to be picky and choosy, and use only what helps you, in a just-in-time fashion, to increase your effectiveness in a project. Just-in-time adoption of a practice or a tool is preferable, because doing so eases the pressure to be rigidly complete from the get-go. In my own work, I incorporate a practice into the project just-in-time as I sense the need for it.
Moreover, as my colleague Zachary Barry would say, none of these practices can be mastered overnight. It takes running into walls to appreciate why these practices are important. For an individual who has not yet encountered problems with disorganized code, multiple versions of the same dataset, and other issues I describe here, it is difficult to deeply appreciate why it matters to apply simple and basic software development practices to your data science work. So I would encourage you to use this knowledge base as a reference tool that helps you find out, in a just-in-time fashion, a practice or tool that helps you solve a problem.
If you wish to support the project, there are a few ways:
Firstly, I spent some time linearizing this content based on my experience guiding skilled newcomers to the DS world. That's available on the eBook version on LeanPub. If you purchase a copy, you will get instructions to access the repository that houses the book source and automation to bootstrap each of your Python data science projects easily!
Secondly, you can support my data science education work on Patreon! My supporters get early access to the data science content that I make.
Finally, if you have a question regarding the content, please feel free to reach out on LinkedIn. (If I make substantial edits on the basis of your comments or questions, I might reach out to you to offer a free copy of the eBook!)
Install and configure git on your machine
Git is an extremely important tool! We use it to do what is known as "version control" -- the act of explicitly curating and keeping track of changes that are made to files in a repository of text files. Using Git, you can even restore files to a previous state. It's like having an extremely powerful undo button at the command line.
Knowing Git also gets you access to the world of open source tooling available on hosted version control storage providers, like GitHub, GitLab, and more.
Linux systems usually come with git
pre-installed.
On macOS, you can type git
at the Terminal, and a pop-up will show up that prompts you to install XCode and the developer tools for macOS. Accept it, and go about the rest of your day.
Sometimes, the built-in versions of git
might be a bit outdated. If you want to install one of the latest versions of git
, then you can use Homebrew to install Git. (see: Install homebrew on your machine)
You might want to configure git
with some basic information.
For example, you might need to configure Git with your username and email address, so that your commits can be attributed to your user accounts on GitHub, GitLab, or Bitbucket. To do this:
git config --global user.name "My name in quotes"
git config --global user.email "myemail@address.com"
This sets your configuration to be "global". However, you can also have "local" (i.e. per-repository) configurations, by changing the --global
flag to --local
:
# inside a repository, say, your company's project
git config --local user.name "My name in quotes"
git config --local user.email "myemail@company.com"
Doing so is important because you want to ensure that your Git commits are tied to the appropriate email address. Setting the "global" one gives you the convenience of setting a sane default, which you can modify by setting "local", per-repository configuration.
If you installed the cool tools from "Install a suite of really cool utilities on your machine using homebrew", then you'll be thrilled to know that you can configure Git to use diff-so-fancy to render diffs!
Follow the instructions in the diff-so-fancy repository. As of 10 December 2020, my favored set of configurations are:
git config --global core.pager "diff-so-fancy | less --tabs=4 -RFX"
git config --global color.ui true
git config --global color.diff-highlight.oldNormal "red bold"
git config --global color.diff-highlight.oldHighlight "red bold 52"
git config --global color.diff-highlight.newNormal "green bold"
git config --global color.diff-highlight.newHighlight "green bold 22"
git config --global color.diff.meta "11"
git config --global color.diff.frag "magenta bold"
git config --global color.diff.commit "yellow bold"
git config --global color.diff.old "red bold"
git config --global color.diff.new "green bold"
git config --global color.diff.whitespace "red reverse"
Install Docker on your machine
Docker is used in a myriad of ways, but here are the main reasons I see a data scientist will want to use Docker:
While conda
environments give you everything that you need for the practice of data science on your local (or remote) machine, Docker containers will give you the ultimate portability. From a technical standpoint, conda environments package data science packages, stopping the stack at stuff that ships with the operating system, which your project might unavoidably depend on. Docker lets you ship an entire operating system + anything else you install in it.
Docker (the company) has a few good reasons why you would want to use Docker (the tool), and you can read about them here. Those reasons you read about on the Docker website are likely also applicable
Install the Desktop client first. Don't worry about registering for an account on Docker Hub, though that can be useful later.
If you're on Linux, there are a few guides you can follow, my favourite being the curated guides from DigitalOcean.
If you do a quick Google search for "install docker" and tag on your operating system name, look out first for the Digital Ocean tutorials, which are in my opinion the best maintained.
Install Anaconda on your machine
Anaconda is a way to get a Python installed on your system.
One of the neat but oftentimes confusing things about Python is that you can have multiple Python executables living around on your system. Anaconda makes it easy for you to:
Why is this a good thing? Primarily because you might have individual projects that need different version of Python and different versions of packages that are built for Python. Also, default Python installations, such as the ones shipped with older versions of macOS, tend to be versions behind the latest, which is to the detriment of your projects. Some built-in apps in an operating system may depend on that old version of Python (such as iPhoto), which means if you mess up the installation, you might break those built-in apps. Hence, you will want a tool that lets you easily create isolated Python environments.
The Anaconda Python distribution fulfills the following key needs:
Installing Anaconda on your local machine thus helps you get easy access to Python, Jupyter (see: Use Jupyter as an experimentation playground), and other tools for modelling and analysis.
To install the Miniforge variant of Anaconda, which will be lighter-weight than the full Anaconda distribution, using the following command:
cd ~
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" -O anaconda.sh
This will send you to your home directory, and then download the Miniforge bash script installer from Anaconda's download page as anaconda.sh
.
Now, install Anaconda:
bash anaconda.sh -b -p $HOME/anaconda/
This will install the Anaconda distribution of Python onto your system inside your home directory. You can now install packages at will, without needing sudo privileges!
Install a suite of really cool utilities on your machine using homebrew
Install gcc if you want to have the GNU C compiler available on your Mac at the same time as the clang C compiler.
C compilers come in handy for some numerical computing packages, which multiple data science languages (Julia, Python, R) depend on.
If you've ever been disconnected from SSH because of a flaky internet connection, mosh can be your saviour. Check out the tool's homepage.
This is a tool for multiplexing your shell sessions -- uber handy if you want to persist a shell session on a remote server even after you disconnect. If you're of the type who has a habit of creating new shell sessions for every project, then tmux
might be able to help you get things under control. Check out the tool's homepage.
The tree
command line tool allows you to see the file tree at the terminal. If you pair it with exa
, you will have an upgraded file tree experience. See its homepage.
exa
is next-level ls
(which is used to list files in a directory). According to the website, "A modern replacement for ls
". See the homepage. If you alias ls
to exa
, it's next-level convenience! (see Create shell command aliases for your commonly used commands)
ripgrep provides a command line tool rg
, which recursively scans down the file tree from the current directory for files that contain text that you want to search. Its Github repo should reveal all of its secrets.
This gives you a tool for viewing differences between files, aka "diffs". Check out its Github repo for more information. You can also configure git
to use diff-so-fancy to render diffs at the terminal. (see: Install and configure git on your machine)
bat
is next-level cat
, which is a utility for viewing text files in the terminal. Check out the Github repository for what you get. You can alias cat
to bat
, and in that way, not need to deviate from muscle memory to use bat
.
fd
gives you a faster replacement for the shell tool find
, which you can use to find files by name. Check out the Github repository to learn more about it.
On recommendation from my colleague Arkadij Kummer, grab fzf
to have extremely fast fuzzy text search on the filesystem. Check out the project's GitHub repository!
Use croc
as a tool to move data from one machine to another easily in a secure fashion. (I have used this in lieu of commercial utilities that cost tens of thousands of dollars in license fees.) Check out the project's GitHub repository!
Now that you've read about these utilities' reason for existence, go ahead and install them!
brew install \
git gcc tmux wget mobile-shell \
diff-so-fancy ripgrep bat fd fzf croc