Safe automation with coding agents

I discovered the power of autonomous agents the hard way. I asked Comet, an agentic browser, "how to archive repo" in the same casual way I'd ask Google. The agent interpreted this as a direct command and archived my LlamaBot repository. What I wanted was information; what I got was an unintended action with real consequences.

This incident taught me something important: coding agents promise to unlock significant productivity gains by working autonomously in the background, gathering context, running tests, searching documentation, and making progress on tasks without constant human intervention. The more autonomous they become, the more value they deliver. Yet this autonomy creates a fundamental tension: we need agents to act independently to realize their potential, but we must prevent them from taking irreversible actions we don't want.

The problem isn't unique to Comet. Any coding agent with sufficient autonomy can make destructive changes: deleting files, force-pushing to main, committing broken code, or modifying critical configurations. We need safeguards that allow agents to work freely on safe operations while blocking potentially harmful actions. The solution lies in configuring your development environment with intelligent boundaries, auto-approving read-only commands while requiring explicit approval for anything that modifies state.

I wrote about practical strategies for safe autonomous agent operation in a blog post, but here's how these practices fit into your data science workflow. This connects to the automation philosophy we discussed earlier. We want to automate repetitive tasks, but we need to do it safely. Auto-approving safe operations is automation that eliminates the friction of constant approval requests, while still maintaining control over potentially destructive actions.

Auto-approve safe command line commands

The foundation of autonomous coding agent operation is allowing certain command line commands to run without manual approval. Commands like grep/ripgrep, find/fd, pixi run pytest..., and similar read-only or context-gathering operations enable LLM agents to autonomously understand codebases and test suites. For CLI tools that interact with external services, I also auto-approve gh pr view, which allows the agent to gather context from GitHub pull requests while working in the background.

The critical rule: only auto-accept commands that are non-destructive. Never auto-approve git commit, git push, rm, or other filesystem, git, or state-modifying changes. This creates a safe boundary where agents can explore and learn, but cannot make irreversible changes without your explicit approval.

The way I think about this is simple: if a command only reads state, it's safe to auto-approve. If it modifies state, it requires my explicit approval. Read operations like grep, find, and cat fall into the safe category because they're pure reads that can't change anything. The same goes for code analysis tools like pytest, mypy, and ruff check when run without fix flags; they report issues but don't modify files. Context-gathering commands like gh pr view, git log, and git diff are also safe because they only display information.

The unsafe category includes anything that changes state. File system mutations like rm, mv, and cp can cause irreversible damage. Git write operations like git commit and git push modify repository state in ways that affect collaborators. Package installs like pixi add change your environment. These all require my explicit approval before execution.

The edge cases are where it gets interesting. I auto-approve pytest because test runs are read-only, but I require approval for any command that modifies files, even if it's technically reversible. The key distinction is whether a command changes state: git status and git diff are safe because they're pure reads, while git commit and git push modify repository state and require explicit approval. git add is a bit of a gray area, but I am ok with auto-approving it since it's technically reversible, and because coding agents are often much faster than I could be at selectively adding files to the staging area.

Enable automatic web search

For Cursor and Claude Code, automatic web search without approval requests is another powerful capability. I have web search auto-approved on my machine, which allows agents to look up documentation, error messages, and solutions independently. This is particularly valuable when agents encounter unfamiliar error messages or need to check current API documentation that may have changed since the model's training cutoff.

However, I monitor outputs for prompt poisoning, since internet-based prompt poisoning is a known attack vector for AI systems. The risk is that malicious content from web searches could influence the agent's behavior in subsequent actions. I've found this risk manageable for coding tasks, but I'm more cautious with agents that have broader system access or handle sensitive data. This is another form of safe automation: letting the agent gather information autonomously while maintaining awareness of potential risks.

Know your emergency stop shortcuts

Every coding agent platform provides keyboard shortcuts to cancel actions in progress. These are essential when you notice an agent looping, going down an unproductive path, or making changes you don't want. In Cursor, it's Ctrl+C. In VSCode with GitHub Copilot, it's Cmd+Esc. In Claude Code, it's Esc. If you're monitoring the agent's activity, these shortcuts let you intervene immediately when something goes wrong.

Correct agent behavior in real-time

When you catch an agent doing something undesirable, stop it immediately, then redirect it. I instruct agents to record corrections in AGENTS.md and continue with the updated guidance. I'll say something like: "No, I don't want you to do this thing. Instead, you should do that different thing. Record this in AGENTS.md, and then continue what you were doing."

This approach creates a persistent record of preferences that improves future agent behavior. The AGENTS.md file becomes a living document of your development standards and preferences, which agents can reference in future sessions. I've implemented this pattern in my personal productivity MCP server, which provides a standardized way to store and retrieve these preferences across different agent platforms. This connects back to the single source of truth philosophy: AGENTS.md becomes the authoritative record of how agents should behave, updated in real-time as you discover preferences.

Write prescriptive prompts for complex tasks

I created the personal productivity MCP server to help me take my favourite prompts from system to system. MCP (Model Context Protocol) servers provide a standardized way to expose tools and context to AI agents across different platforms. One thing I learned from my colleague Anand Murthy about how to write such prompts is to be extremely prescriptive about the actions and tools that I want the agent to use.

Generic prompts like "help me debug this GitHub Actions workflow" leave too much room for interpretation. Instead, specify exact commands, tools, and steps. For example, if I'm looking to debug a GitHub Actions issue, the prompt that I have looks like this:

You are assisting me in debugging a failed GitHub Actions workflow. Please follow this detailed, step-by-step process to analyze and resolve the issue:

Extract the relevant workflow details from the provided URL—this includes the repository owner, repository name, workflow run ID, workflow name, and the branch or commit that triggered the run.

Use the GitHub CLI to fetch workflow logs and status:

Run gh run list to confirm the workflow run exists.

Use gh run view <run-id> to access detailed run information.

Retrieve the complete logs with gh run view <run-id> --log.

Focus on failed job logs with gh run view <run-id> --log-failed.

Analyze the failure:

Identify which job or jobs failed, pinpoint the step where the failure occurred, and review associated error messages, exit codes, and stack traces.

Check for recurring problems like dependency issues, permission errors, timeouts, or resource constraints.

Examine the workflow configuration and how the environment is set up.

Offer targeted debugging guidance:

Clearly explain what went wrong, using straightforward language.

Recommend specific fixes or configuration adjustments.

Supply concrete commands or code snippets to help resolve the issue.

Suggest ways to prevent the problem from happening again.

Factor in context:

Take into account the type of project (such as Python or Node.js) and recommend relevant solutions.

Review recent changes to determine if they might be connected to the failure.

Propose workflow optimizations or improvements as appropriate.

Recommend follow-up actions:

Outline next steps for testing the fix.

Advise on improvements to monitoring or alerting.

Provide tips on how to avoid similar issues in the future.

Workflow URL: {workflow_url}

Focus on delivering actionable, specific solutions instead of generic troubleshooting tips. Rely on the GitHub CLI workflow commands above to gather thorough information about the failure.

Notice how prescriptive this prompt is. Rather than being a generic troubleshooting guide, it's a step-by-step guide that the agent can follow, down to the level of exact CLI commands to run. Critically, those CLI commands (gh run list, gh run view) are commands that I have auto-approved in my IDE, so the agent can execute the entire workflow autonomously without interrupting me for approval at each step.

The prompt was written with AI assistance, which allows me to iterate to the level of detail I want with minimal effort. I start with a rough outline, then ask the agent to make it more specific, add command examples, and refine the steps until it's actionable enough for autonomous execution. This is another form of automation: write a detailed prompt once, reuse it across sessions.

Use plan mode for complex tasks

Plan mode in Cursor and Claude significantly improves agent performance on complex tasks. Users of AI-assisted coding tools consistently report that plan mode helps agents stay on course, compared to agents working without a structured plan. This mirrors how humans perform better with explicit plans.

The mechanism is straightforward: the agent first generates a detailed plan, you review and refine it, then the agent executes against that plan. This separation of planning and execution prevents the agent from going down rabbit holes or making premature implementation decisions.

In my experience, agents often complete tasks in one attempt after a few iterations on a well-defined plan. The key is ensuring the plan is specific and properly scoped before execution begins. I've found that plans work best when they include specific files and functions to modify, clear acceptance criteria, dependencies and ordering constraints, and test cases or validation steps. Without this structure, agents tend to make assumptions, skip steps, or get distracted by tangential improvements.

Managing multiple background agents

Multiple background agents can be powerful, but they require careful management. Unless agents are handling mundane, well-defined tasks, context switching between multiple active agents becomes challenging. At that point, you're operating at the speed of thought; switching will require significant cognitive overhead.

I've found that multiple agents work well when they're working on independent, well-scoped tasks. For example, one agent might be researching documentation while another refactors a specific module. But when tasks have dependencies or require coordination, a single agent with a clear plan tends to perform better than multiple agents trying to coordinate.

The cognitive load turns out to be more than keeping track of what each agent is doing; we also need to ensure they don't conflict with each other. Two agents modifying the same file simultaneously, or one agent's changes breaking assumptions another agent made, creates more problems than it solves. This is where the automation philosophy meets practical limits: we want to automate, but we need to be thoughtful about when parallel automation creates more complexity than it eliminates.

See this in action

The practices in this chapter connect to several core philosophies and practices throughout the book.

Automation philosophy

This chapter demonstrates the Automation philosophy in action. Auto-approving safe operations eliminates the friction of constant approval requests, allowing agents to work autonomously on read-only tasks. This is automation that creates compound interest: set up safe boundaries once, and agents can work independently on appropriate tasks.

Single source of truth

The practice of recording corrections in AGENTS.md connects to the Single source of truth philosophy. AGENTS.md becomes the authoritative record of agent behavior preferences, updated in real-time as you discover what works and what doesn't.

Emergency controls

Just as we discussed emergency stops for agents, the CI/CD chapter shows how automated systems need manual override capabilities. The principle is the same: automate what's safe, but always maintain the ability to intervene when needed.