Install Docker on your machine
Docker is used in a myriad of ways, but here are the main reasons I see a data scientist will want to use Docker:
While conda
environments give you everything that you need for the practice of data science on your local (or remote) machine, Docker containers will give you the ultimate portability. From a technical standpoint, conda environments package data science packages, stopping the stack at stuff that ships with the operating system, which your project might unavoidably depend on. Docker lets you ship an entire operating system + anything else you install in it.
Docker (the company) has a few good reasons why you would want to use Docker (the tool), and you can read about them here. Those reasons you read about on the Docker website are likely also applicable
Install the Desktop client first. Don't worry about registering for an account on Docker Hub, though that can be useful later.
If you're on Linux, there are a few guides you can follow, my favourite being the curated guides from DigitalOcean.
If you do a quick Google search for "install docker" and tag on your operating system name, look out first for the Digital Ocean tutorials, which are in my opinion the best maintained.
Use docker containers for system-level packages
If conda environments are such a great environment isolation tool, why would we need Docker?
That's because sometimes, your project might have an unavoidable dependency on system-level packages. I have seen some projects that use spatial mapping tooling require system-level packages. Others that depend on audio processing might require packages that can only be obtained outside of conda
. In these cases, yes, installing them locally on your machine can be handy (see Install homebrew on your machine), but if you're also interested in building an app, then you'll need them packaged up inside a Docker container.
What is a Docker container? The best anchoring way to thinking about it is a fully-fledged operating system completely insulated from its host (i.e. your computer). It has no knowledge of your runtime environment variables (see: Create runtime environment variable configuration files for each of your projects and Take full control of your shell environment variables). It's like having a completely clean operating system, without the cost of buying new hardware.
I'm assuming you've already obtained Docker on your system. (see: Install Docker on your machine).
The core thing you need to know how to write is a Dockerfile
. This file specifies exactly how a Docker container is to be built. The easiest way to think about the Dockerfile
syntax is that it's almost bash, with a bit more additional syntax. The Docker docs give an extremely thorough tutorial. For those who are more hands-on, I recommend pair coding with another more experienced individual who is willing to teach you the ropes, to build a Docker container when it becomes relevant to your problem.
Configure your machine
After getting access to your development machine, you'll want to configure it and take full control over how it works. Backing the following steps are a core set of ideas:
Head over to the following pages to see how you can get things going.