Configure Jupyter and Jupyter Lab
Project Jupyter is a cornerstone of the data science ecosystem. When getting set up to work with Jupyter, here are some settings that you may wish to configure if you are running your own Jupyter server (and don't have a managed one provided).
The best way to learn about Jupyter's configuration system is to read the official documentation. However, I will strive to provide an up-to-date overview here.
You can find Jupyter configuration files in your home directory at the directory
~/.jupyter/. Jupyter's server, which is what you execute when you execute
jupyter lab (as of Jan 2021, the lab interface is the future of Jupyter).
To configure Jupyter's behaviour, you usually start by generating the configuration file:
jupyter lab --generate-config
Now, there will be a file created at
~/.jupyter/jupyter_lab_config.py. You can edit that file to configure Jupyter's behaviour.
By convention, Jupyter's server runs on
127.0.0.1, which is aliased by
localhost. Usually, this is not a problem if you run the server from your local machine and then access it by a browser on the same device. However, this configuration can get troublesome if you run Jupyter on a remote workstation or server.
When configured to run on
127.0.0.1, the Jupyter server will only accept browser requests originating from the local machine. It will deny browser requests that originate from another device will. However, when the Jupyter server runs on
0.0.0.0 instead, it will be able to accept browser requests that originate from outside the server itself (i.e. your laptop).
Are there risks to serving the notebook up on
0.0.0.0? Yes, the primary one is that your notebook server is now accessible to anybody on the same network as the machine that is serving up Jupyter. To mitigate that risk, Jupyter has "token-based authentication" turned on by default; you might also want to consider turning on password protection.
You can turn on password protection by running the following command at the terminal:
jupyter lab password
Jupyter will prompt you for a password and store a hash of the password in the file
~/.jupyter/jupyter_server_config.json. (You don't need to edit this file!)
When you enable password protection, you will be prompted for a password when you access the Jupyter lab interface.
Sometimes you and your colleagues might share the same workstation and run your Jupyter servers on there. Because the default server configuration starts at port
8888, you might end up with the following scenario (primarily if you practice running one Jupyter server per project):
To avoid this scenario, you can agree with your colleague that they can keep their configuration, and you can start your port numbers at 9000 (or 10000). To do so, open up
~/.jupyter/jupyter_lab_config.py and set the config traitlet below:
c.ServerApp.port = 9000
Now, your port numbers will start from port 9000 upwards, helping to keep your port number range and your colleagues' port number range effectively separated.
Configure your machine
After getting access to your development machine, you'll want to configure it and take full control over how it works. Backing the following steps are a core set of ideas:
Head over to the following pages to see how you can get things going.
Create one conda environment per project
If you have multiple projects that you work on, but you install all project dependencies into a shared environment, then I guarantee you that at some point, you will run into dependency conflicts as you try to upgrade/update packages to try out new things.
"So what?" you might ask. Well, you'll end up breaking your code! Take this word of advice from someone who has had to deal with the consequences of having his code not working in one project even as code in another does. And finding out one day before an important presentation, and you have to put out figures. The horror!
You will want to ensure that you have an isolated conda environment for each project to keep your projects insulated from one another.
Here is a baseline that you can copy and modify at any time.
name: project ## CHANGE THIS TO YOUR ACTUAL PROJECT channels: ## Add any other channels below if necessary - conda-forge dependencies: ## Prioritize conda packages - python=3.8 - jupyter - conda - mamba - ipython - ipykernel - numpy - matplotlib - scipy - pandas - pip - pre-commit - black - nbstripout - mypy - flake8 - pycodestyle - pydocstyle - pytest - pytest-cov - pytest-xdist - pip: ## Add in pip packages if necessary - mkdocs - mkdocs-material - mkdocstrings - mknotebooks
If a package exists in both
pip and you rely primarily on
conda, then I recommend prioritizing the
conda package over the
pip package. The advantage here is that
conda's dependency solver can grab the latest compatible version without worrying about
pip clobbering over other dependencies. (h/t my reviewer Simon, who pointed out that newer versions of
pip have a dependency solver, though as far as possible, staying consistent is preferable, though mixing-and-matching is alright if you know what you're doing.)
This baseline helps me bootstrap conda environments. The packages that are in there each serve a purpose. You can read more about them on the page: Install code checking tools to help write better code.
Initially, I only specify the version of Python I want, and allow the conda package manager to solve the environment.
However, there may come a time when a new package version brings a new capability. That is when you may wish to pin the version of that particular package to be at the minimum that version. (See below for the syntax needed to pin a version.) At the same time, the new package version may break compatibility -- in this case, you will want to pin it to a maximum package version.
It's not always obvious, though, so be sure to use version control
If you wish, you can also pin versions to a minimum, maximum, or specific one, using version modifiers.
<. (You should be able to grok what is what!)
<. (Note: for pip, it is double equals
==and not single equals
So when do you use each of the modifiers?
==sparingly while in development: you will be stuck with a particular version and will find it difficult to update other packages together.
pipfrom upgrading a package beyond a certain version. This can be helpful if new versions of packages you rely on have breaking API changes.
pipfrom installing a package below a certain version. This is helpful if you've come to depend on breaking API changes from older versions.
Upgrading and/or installing packages should be done on an as-needed basis. There are two paths to do upgrade packages that I have found:
The principled way to do an upgrade is to first pin the version inside
environment.yml, and then use the following command to update the environment:
conda env update -f environment.yml
The hacky way to do the upgrade is to directly
pip install the package, and then add it (or modify its version) in the
environment.yml file. Do this only if you know what you're doing!
By practicing "one project gets one environment", then ensuring that those environments' Python interpreters are available to Jupyter is going to be crucial. If you find that your project's environment Python is unavailable, then you'll need to ensure that it's available. To do so, ensure that the Python environment has the package
ipykernel. (If not, install it by hand and add it to the
environment.yml file.) Then, run the following command:
# assuming you have already activated your environment, # replace $ENVIRONMENT_NAME with your environment's name. python -m ipykernel install --user --name $ENVIRONMENT_NAME
Now, it will show up as a "kernel" for executing Python code in your Jupyter notebooks. (see Configure Jupyter and Jupyter Lab for more information on how to configure it.)
Now, how should you name your conda environment? See the page: Sanely name things consistently!