Take advantage of uv
for one-off projects
As data scientists, we often create one-off scripts or notebooks for quick analyses, data exploration, or prototyping. These projects don't warrant a full-fledged development environment but still need proper dependency management. This is where uv
shines, offering lightweight solutions for managing dependencies in scripts and notebooks without the overhead of creating virtual environments manually.
Using PEP723 for script dependencies
Python Enhancement Proposal 723 (PEP723) introduced a way to specify metadata directly within Python scripts using inline comments. This approach is perfect for one-off scripts where you want to declare dependencies without creating separate configuration files.
Here's how you can use PEP723-style inline script metadata:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "pandas>=2.0.0",
# "matplotlib>=3.7.0",
# "scikit-learn>=1.2.0",
# ]
# ///
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Your script code here
With this metadata in place, you can run your script using uv run
:
uv run script.py
The beauty of this approach is that uv
will automatically:
- Parse the inline metadata
- Install the specified dependencies in an isolated environment
- Run your script in that environment
This means you don't need to manually create a virtual environment, activate it, install dependencies, and then run your script. Everything happens in one command!
Self-contained notebooks
For data exploration and analysis, Jupyter notebooks are a popular choice. However, managing dependencies for notebooks can be challenging. Here are two excellent options that work well with uv
:
Option 1: Using marimo with uvx
Marimo is a reactive notebook environment that offers several advantages over traditional Jupyter notebooks. You can use it with uvx
(the extension manager for uv
) as follows:
uvx marimo edit --sandbox /path/to/notebook.py
Inside your marimo notebook, you can specify dependencies using the same PEP723-style metadata:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "pandas>=2.0.0",
# "matplotlib>=3.7.0",
# "scikit-learn>=1.2.0",
# ]
# ///
# Your notebook cells here
The --sandbox
flag ensures that marimo creates an isolated environment for your notebook, using the dependencies specified in the metadata.
Option 2: Using juv for Jupyter notebooks
If you prefer traditional Jupyter notebooks, juv
(Jupyter with uv
) provides a seamless way to manage dependencies:
uvx juv init /path/to/notebook.ipynb
This command initializes a Jupyter kernel with uv
integration. You can then specify dependencies in your notebook using a code cell with PEP723-style metadata:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "pandas>=2.0.0",
# "matplotlib>=3.7.0",
# "scikit-learn>=1.2.0",
# ]
# ///
Benefits of this approach
Using uv
for one-off projects offers several advantages:
- Simplicity: No need to create and manage separate configuration files or virtual environments
- Speed:
uv
is significantly faster than pip for installing dependencies - Isolation: Each script or notebook runs in its own isolated environment, preventing dependency conflicts
- Reproducibility: Dependencies are explicitly declared within the script or notebook itself
- Portability: Scripts and notebooks are self-contained, making them easier to share with colleagues
Real-world examples
Here are some lifelike examples of how you might use uv
for one-off projects.
Data cleaning script
This example shows how to use uv
for a simple data cleaning script.
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "pandas>=2.0.0",
# "numpy>=1.24.0",
# "loguru>=0.7.0",
# ]
# ///
import pandas as pd
import numpy as np
from loguru import logger
from pathlib import Path
logger.info("Starting data cleaning process")
# Load data
data_path = Path("data/raw/sales_data.csv")
df = pd.read_csv(data_path)
# Cleaning operations
df = df.dropna()
df["sale_date"] = pd.to_datetime(df["sale_date"])
df["revenue"] = df["quantity"] * df["price"]
# Save cleaned data
output_path = Path("data/processed/sales_data_clean.csv")
output_path.parent.mkdir(parents=True, exist_ok=True)
df.to_csv(output_path, index=False)
logger.info(f"Cleaned data saved to {output_path}")
Exploratory data analysis notebook
This one shows how to use uv
for an exploratory data analysis notebook.
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "pandas>=2.0.0",
# "matplotlib>=3.7.0",
# "seaborn>=0.12.0",
# "scikit-learn>=1.2.0",
# ]
# ///
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from pathlib import Path
# Load the dataset
data_path = Path("data/processed/customer_data.csv")
df = pd.read_csv(data_path)
# EDA code follows...
Conclusion
For one-off projects, uv
provides an elegant solution that balances simplicity with proper dependency management. By using PEP723-style inline metadata, you can create self-contained scripts and notebooks that are easy to run, share, and reproduce. Whether you're using plain Python scripts, marimo notebooks, or Jupyter notebooks with juv, this approach streamlines your workflow and lets you focus on the analysis rather than environment management.