Skip to content

Take advantage of uv for one-off projects

As data scientists, we often create one-off scripts or notebooks for quick analyses, data exploration, or prototyping. These projects don't warrant a full-fledged development environment but still need proper dependency management. This is where uv shines, offering lightweight solutions for managing dependencies in scripts and notebooks without the overhead of creating virtual environments manually.

Using PEP723 for script dependencies

Python Enhancement Proposal 723 (PEP723) introduced a way to specify metadata directly within Python scripts using inline comments. This approach is perfect for one-off scripts where you want to declare dependencies without creating separate configuration files.

Here's how you can use PEP723-style inline script metadata:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "pandas>=2.0.0",
#   "matplotlib>=3.7.0",
#   "scikit-learn>=1.2.0",
# ]
# ///

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Your script code here

With this metadata in place, you can run your script using uv run:

uv run script.py

The beauty of this approach is that uv will automatically:

  1. Parse the inline metadata
  2. Install the specified dependencies in an isolated environment
  3. Run your script in that environment

This means you don't need to manually create a virtual environment, activate it, install dependencies, and then run your script. Everything happens in one command!

Self-contained notebooks

For data exploration and analysis, Jupyter notebooks are a popular choice. However, managing dependencies for notebooks can be challenging. Here are two excellent options that work well with uv:

Option 1: Using marimo with uvx

Marimo is a reactive notebook environment that offers several advantages over traditional Jupyter notebooks. You can use it with uvx (the extension manager for uv) as follows:

uvx marimo edit --sandbox /path/to/notebook.py

Inside your marimo notebook, you can specify dependencies using the same PEP723-style metadata:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "pandas>=2.0.0",
#   "matplotlib>=3.7.0",
#   "scikit-learn>=1.2.0",
# ]
# ///

# Your notebook cells here

The --sandbox flag ensures that marimo creates an isolated environment for your notebook, using the dependencies specified in the metadata.

Option 2: Using juv for Jupyter notebooks

If you prefer traditional Jupyter notebooks, juv (Jupyter with uv) provides a seamless way to manage dependencies:

uvx juv init /path/to/notebook.ipynb

This command initializes a Jupyter kernel with uv integration. You can then specify dependencies in your notebook using a code cell with PEP723-style metadata:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "pandas>=2.0.0",
#   "matplotlib>=3.7.0",
#   "scikit-learn>=1.2.0",
# ]
# ///

Benefits of this approach

Using uv for one-off projects offers several advantages:

  1. Simplicity: No need to create and manage separate configuration files or virtual environments
  2. Speed: uv is significantly faster than pip for installing dependencies
  3. Isolation: Each script or notebook runs in its own isolated environment, preventing dependency conflicts
  4. Reproducibility: Dependencies are explicitly declared within the script or notebook itself
  5. Portability: Scripts and notebooks are self-contained, making them easier to share with colleagues

Real-world examples

Here are some lifelike examples of how you might use uv for one-off projects.

Data cleaning script

This example shows how to use uv for a simple data cleaning script.

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "pandas>=2.0.0",
#   "numpy>=1.24.0",
#   "loguru>=0.7.0",
# ]
# ///

import pandas as pd
import numpy as np
from loguru import logger
from pathlib import Path

logger.info("Starting data cleaning process")

# Load data
data_path = Path("data/raw/sales_data.csv")
df = pd.read_csv(data_path)

# Cleaning operations
df = df.dropna()
df["sale_date"] = pd.to_datetime(df["sale_date"])
df["revenue"] = df["quantity"] * df["price"]

# Save cleaned data
output_path = Path("data/processed/sales_data_clean.csv")
output_path.parent.mkdir(parents=True, exist_ok=True)
df.to_csv(output_path, index=False)

logger.info(f"Cleaned data saved to {output_path}")

Exploratory data analysis notebook

This one shows how to use uv for an exploratory data analysis notebook.

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "pandas>=2.0.0",
#   "matplotlib>=3.7.0",
#   "seaborn>=0.12.0",
#   "scikit-learn>=1.2.0",
# ]
# ///

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from pathlib import Path

# Load the dataset
data_path = Path("data/processed/customer_data.csv")
df = pd.read_csv(data_path)

# EDA code follows...

Conclusion

For one-off projects, uv provides an elegant solution that balances simplicity with proper dependency management. By using PEP723-style inline metadata, you can create self-contained scripts and notebooks that are easy to run, share, and reproduce. Whether you're using plain Python scripts, marimo notebooks, or Jupyter notebooks with juv, this approach streamlines your workflow and lets you focus on the analysis rather than environment management.