Secure Script Execution for LLM Agents
A secure pattern for executing self-written code from LLM agents in local environments.
Overview
This design document outlines a secure approach for allowing LLM agents to write and execute Python code in a sandboxed environment. The pattern uses Docker containers and PEP 723 metadata to create a secure execution environment for agent-generated code.
Core Components
1. Script Metadata
Scripts are written with PEP 723 metadata headers that specify dependencies and requirements:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "requests<3",
# "rich",
# ]
# auth = "agent-id-hash"
# purpose = "task-description"
# timestamp = "iso-timestamp"
# ///
# agent code here
2. Docker Configuration
The execution environment uses the Astral UV image for optimal Python package management:
# Use a Python image with uv pre-installed
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim
# Install the project into `/app`
WORKDIR /app
# Enable bytecode compilation
ENV UV_COMPILE_BYTECODE=1
# Copy from the cache instead of linking since it's a mounted volume
ENV UV_LINK_MODE=copy
# Reset the entrypoint, don't invoke `uv`
ENTRYPOINT []
# Run as non-root user for security
USER nobody
# Run the script with uv
CMD ["uv", "run", "--system-site-packages=false"]
3. Script Execution Flow
- Agent generates Python code with metadata
- Code is written to a temporary directory
- Docker container is built with security constraints
- Script is executed in isolated environment
- Results are captured and returned to agent
4. Security Measures
The implementation includes multiple layers of security:
- Container Restrictions:
- Read-only root filesystem
- No network access by default
- Limited CPU (1 core) and memory (512MB)
- Mounted script directory is read-only
- Results directory is write-only
- All capabilities dropped
-
No privilege escalation
-
Code Validation:
- PEP 723 metadata verification
- Execution timeout enforcement
- Structured error handling
- Output validation
Implementation
The core implementation consists of two main classes:
ScriptMetadata
: Pydantic model for script metadataScriptExecutor
: Handles script writing and secure execution
Example usage:
@tool
def write_and_execute_script(
code: str,
python_version: str = ">=3.11",
dependencies: Optional[List[str]] = None,
purpose: str = "",
timeout: int = 30,
) -> Dict[str, Any]:
"""Write and execute a Python script in a secure sandbox."""
metadata = ScriptMetadata(
requires_python=python_version,
dependencies=dependencies or [],
auth=str(uuid4()),
purpose=purpose,
timestamp=datetime.now(),
)
executor = ScriptExecutor()
script_path = executor.write_script(code, metadata)
return executor.run_script(script_path, timeout)
Benefits
- Security: Multiple layers of isolation and restrictions
- Flexibility: Agents can write custom code solutions
- Dependency Management: Clean environment for each execution
- Resource Control: Strict limits on compute resources
- Auditability: Metadata tracking and logging
Testing
The implementation includes comprehensive tests:
- Script writing and metadata handling
- Execution in sandbox environment
- Timeout enforcement
- Error handling
- Dependency management
- Resource restrictions
Future Enhancements
Potential areas for improvement:
- Network access controls for specific domains
- Resource usage monitoring and logging
- Script validation and static analysis
- Caching of commonly used dependencies
- Support for additional runtime environments