Eric J Ma's Website

My coding agent learned a lesson and patched its own skill

written by Eric J. Ma on 2026-06-16 | tags: automation react productivity ai memory learning coding open source state tools


In this blog post, I share how my coding agent learned from my mistakes and automatically patched its own skills using a new OpenCode plugin I built called opencode-autolearn. Over ten days, it observed my coding sessions, extracted lessons, and updated its memory and skills without any manual intervention. I explain the architecture, design decisions, and real-world impact, including how it fixed a tricky Convex migration bug on its own. Curious how an agent can truly learn from your workflow and get better every session? Read on to find out!

I was debugging a transcript duplication bug in my voice-first gym coaching app. The coach's responses were being saved twice to the database, one turn apart. I traced it to a React 18 batching issue in the flush logic, refactored the state management into a hook, wrote a one-shot backfill to clean up the historical data, and committed everything.

then I went to get a coffee.

When I came back, the skill file for Convex migrations had a new entry. The convex-migration-helper skill, installed in my repo a week ago and untouched since, now contained a six-line callout explaining that internalMutation functions cannot be invoked from the Convex CLI. The code example had been corrected from internalMutation to mutation. A reference file deep in the skill's references/ directory had been updated with the same fix.

Nobody told it to do that. In another session, my coding agent, GLM-5.2 on OpenCode, hit the internalMutation wall during the backfill and solved the problem on its own; a background review process then extracted the lesson and patched the skill.

Today I am releasing opencode-autolearn, the plugin that made that possible. It is open source, it runs on your machine, and after ten days of dogfooding across twenty-plus projects, it has logged 922 observations, spawned over 200 review sessions, and tracked 48 recurring patterns with reinforcement counts. My coding agent gets better every session, and I do nothing extra to make that happen.

Agents that forget

I wrote about building self-improving coding agents back in January. The core observation was simple: AI coding agents repeat the same mistakes across sessions because they have no mechanism to learn from corrections. Every session starts from scratch. You re-state the same preferences. You re-correct the same behaviors. You are babysitting a very fast intern.

The post identified two levers: AGENTS.md as repository memory, and skills as reusable playbooks. Both work. I use them every day. But the loop was still manual. I had to notice the pattern, decide what to do with it, and write the correction myself. I was doing the learning, then handing the agent the results.

The question that would not leave me alone: what if the agent could watch its own conversations and extract the lessons itself?

The inspiration from Hermes

A colleague, Edward Miracco, told me about a coding agent called Hermes. Hermes had a property I found fascinating: it got better at working with you over time. The model weights were the same. But Hermes maintained a persistent memory of corrections, preferences, and workarounds, and it updated that memory as you worked.

I wanted that for OpenCode, the coding agent I use daily. OpenCode has a plugin system, a skill discovery mechanism, and a session model that captures full conversation histories. All the raw materials were there. The missing piece was the feedback loop: something that watched the conversation, decided what was worth learning, and wrote the lessons down.

So I asked OpenCode to build it.

What it does

opencode-autolearn is an OpenCode plugin that does four things:

  1. Monitors conversations. A JavaScript plugin hooks into OpenCode's event system. It counts turns, buffers messages (with secret redaction), and watches for idle periods and session exits.
  2. Spawns review agents. Every five assistant turns, or when the session goes idle, or when you close the terminal, the plugin spawns a detached subprocess that runs a review agent. The review agent receives the buffered conversation and an instruction sheet.
  3. Extracts lessons. The review agent reads the conversation looking for corrections ("don't do X"), preferences ("I prefer Y"), workarounds that worked, and recurring patterns. For each one it finds, it takes action.
  4. Writes to durable stores. The review agent uses a Python CLI to update three things: a persistent memory file (loaded into every future session), a user profile (communication and workflow preferences), and skills (created or patched based on observed patterns).

The architecture is deliberately split: a thin JavaScript plugin that only counts and buffers, and a Python CLI that does the data management. The plugin never blocks the main session. Reviews run in a detached subprocess that the plugin fires and forgets. If the review fails, the conversation is saved to a fallback file for debugging. The main session never knows the difference.

Design decisions

Four decisions shaped everything else.

The trigger needs no human

The review fires on its own. By default, every five assistant turns, or when the session goes idle, or when you close the terminal. You never type a command or decide when to review. The loop does not wait for you.

This is the decision I care about most. Other tools use human-triggered commands like /dream for their reflection step, and those work, until they do not. You remember to invoke them for a week. Then you get busy, you forget, and the learning stops. The trigger is the first thing to go when you have real work to do.

Making it automatic costs almost nothing. A handful of background subprocesses you never see. What you buy with that is a system that learns from every session, not just the ones where you remembered to pull the lever.

The trigger itself is just an OpenCode plugin. It hooks into OpenCode's event system, counts turns, and fires reviews. No fork, no separate daemon, no modified binary. You install the plugin and your existing OpenCode setup gains the feedback loop.

Markdown is the data store

Memory lives in ~/.autolearn/memory.md. User preferences live in ~/.autolearn/user-profile.md. Skills live in ~/.autolearn/skills/{name}/SKILL.md. All plain text. All human-readable. All directly loadable as OpenCode instructions.

I considered SQLite. It would be more queryable. But the agent needs to read these files as context, and OpenCode loads instruction files as plain markdown. A database would add a conversion layer between storage and context. Markdown files serve double duty: they are both the storage and the context injection. No conversion needed.

The agent reads the same file the user can open in a text editor. If I want to see what my agent has learned, I cat ~/.autolearn/memory.md. If I want to correct a lesson that is wrong, I edit the file. The next session picks up the change.

Reviews run as subprocesses

Each review runs as a separate opencode run invocation in a detached subprocess. The plugin sets AUTOLEARN_REVIEWER=1 in the subprocess environment so the review session does not trigger its own reviews (which would create an infinite loop).

The subprocess approach has three benefits. First, failures are isolated: a crashed review does not affect the main session. Second, the review has its own context window: it loads the autolearn-reviewer skill and gets a clean slate to evaluate the conversation. Third, concurrency is naturally limited to one review at a time, because the plugin tracks an in-process flag.

Skills are symlinked for auto-discovery

OpenCode discovers skills in ~/.agents/skills/. When the review agent creates a new skill in ~/.autolearn/skills/, the CLI symlinks it into ~/.agents/skills/ so OpenCode picks it up automatically. No restart needed, no configuration change.

This means the loop is: review agent observes a pattern, creates a skill, and the very next session can load that skill if the pattern recurs. The feedback loop closes itself.

The build history

The first commit was a working plugin with the full CLI. I had been thinking about the architecture for a few days, and the initial implementation came out in one piece: turn counting, message buffering, review spawning, memory management, skill creation and patching.

Then came a series of refinements, each driven by a real problem I hit while dogfooding:

  • Exit-triggered reviews. The first version only spawned reviews at turn thresholds. I kept losing the last few turns of a session because I would close the terminal before the threshold fired. So I added beforeExit and signal handlers (SIGINT, SIGTERM) to dispatch a final review on shutdown.

  • EARS specifications. After the initial build, I had my OpenCode agent write five LLDs and eight EARS specifications for the shipped features. This was partly discipline and partly debugging: the EARS specs allowed me to trace each requirement to actual code paths, and the process surfaced edge cases I had missed. I asked the agent to also add @spec annotations in the plugin source linking each code block to its EARS requirement ID.

  • Skill symlinks. The initial version created skills in ~/.autolearn/skills/ but did not symlink them into ~/.agents/skills/. Skills existed but OpenCode could not discover them. The symlink step closed that gap.

  • Reinforcement tracking. Early on, memory entries were append-only. If the agent observed the same correction three times, it would add three entries. I added a strengths.json file that tracks how many times each observed pattern has been reinforced, and strengthen/weaken commands so the review agent can bump the count instead of duplicating the entry.

  • Full-text search. The review agent needs to answer "has this pattern come up before?" I built an FTS5 index over OpenCode's session database so the review agent can search past conversations before concluding "nothing to record." This catches recurring corrections that were never promoted to memory.

  • The review-runner wrapper. Reviews were leaving behind orphaned sessions in OpenCode's session list. I wrote a shell script wrapper that runs the review, captures the session ID from the JSON output, and deletes the session afterward. The plugin calls the wrapper instead of opencode run directly.

The moment I knew it worked

For the first several days, I was not sure it was working. The plugin was spawning reviews. The observations log was filling up. But I had not seen the system do something I did not expect.

Then, during a session on my gym-coach project, I hit a wall with the Convex CLI. I had written a backfill mutation as an internalMutation, tried to invoke it with npx convex run, and discovered that the CLI can only call mutation, query, and action functions. internalMutation is private to Convex's internal calling mechanism. I had to convert the function to a regular mutation, run the backfill, then remove the one-shot code.

I committed the fix and moved on. The session ended. The review agent spawned.

When I looked at the convex-migration-helper skill the next day, it had been patched. The review agent had:

  1. Identified the workaround (convert internalMutation to mutation for CLI-invoked backfills).
  2. Found the existing skill that documented migration patterns (convex-migration-helper).
  3. Patched the skill's SKILL.md with a new entry explaining when to use mutation vs internalMutation.
  4. Patched the references/migration-patterns.md file, correcting the code example and adding a callout box.

The scope was right. It patched the specific reference file where the internalMutation pattern was documented. It went to the exact section that was wrong and fixed it.

That was the moment I stopped wondering whether the system worked.

Dogfooding by the numbers

I have been running opencode-autolearn on my machine for about ten days. Here is what the system has done in that time, without me lifting a finger:

Metric Count
Days running 10
Observations logged 922
Review sessions spawned 200+
Projects covered 20+
Persistent memory entries 9
User profile preferences captured 7
Reinforced patterns tracked 48

Those 200+ reviews ran across projects including my gym-coach voice app, my Brain42 knowledge tools, my network analysis teaching materials, my blogbot automation, canvas-chat, and several others. The system watched every session, decided what was worth remembering, and wrote it down.

The reinforcement tracking is where the compounding shows up. The most-reinforced lesson, at 9 hits, is a SaaS multi-tenant safety rule: verify before configuring any SaaS service. I hit that pattern across multiple projects and sessions, and each time the review agent bumped the counter instead of adding a duplicate entry. The agent now treats that rule as high-priority context because the reinforcement count tells it this one matters.

What the agent has learned

The review agent created four skills from scratch and patched one existing skill in a local repo. The created skills include:

  • A blogbot skill for generating social media posts from blog content. The review agent created it after watching me manually draft posts, then patched it with a URL verification step after observing me checking URLs by hand.
  • An evergreen-note-quality skill for my Obsidian vault, created after watching me audit note quality across multiple sessions. 128 lines of quality checklists covering atomicity, title-as-API, concept-orientation, and duplicate detection.
  • A remotion-best-practices skill, created from scratch after I worked through Remotion version-specific issues. 116 lines covering package structure, animations, fonts, and rendering.
  • A marimo-notebook-patterns skill, created after I hit repeated pitfalls editing marimo notebooks. Covers import patterns, cell parameter requirements, and button API signatures.

The local repo skill patch is the Convex CLI story I described above: the convex-migration-helper in the gym-coach repo, patched with the internalMutation lesson.

The persistent memory (memory.md) has nine entries. Each came from a real mistake. Each has prevented the same mistake in subsequent sessions. The user profile has captured seven preferences about how I like to work: I prefer warm, personal blog conclusions. I expect agents to proactively load writing skills when editing prose. I want autonomous execution without confirmation prompts. I demand quantified evidence in architecture analysis. The agent read these from my conversations and wrote them down. Now every session starts with this context loaded.

The feedback loop for autolearn looks like this:

graph TD
    A[OpenCode session] -->|every 5 turns / idle / exit| B[autolearn.js plugin]
    B -->|spawn detached subprocess| C[autolearn-reviewer agent]
    C -->|reads conversation| D{Learning opportunity?}
    D -->|correction / preference| E[update memory.md]
    D -->|recurring pattern| F[create or patch skill]
    D -->|nothing worth recording| G[exit quietly]
    E -->|loaded into| A
    F -->|symlinked into ~/.agents/skills/| A

The core loop works: watch, review, learn, persist, discover. The system improves the agent's behavior without touching model weights.

Install it yourself

If you use OpenCode and want to try it:

curl -fsSL https://raw.githubusercontent.com/ericmjl/opencode-autolearn/main/install.sh | bash

The plugin activates on your next session. You will not notice it running. But after a few sessions, check ~/.autolearn/memory.md. Your agent has been taking notes.

The full source is on GitHub: ericmjl/opencode-autolearn. The design docs include five LLDs, eight EARS specifications, and a high-level design with full traceability. The README has the complete CLI reference and configuration options.

What comes next

The planned work is about portability. Right now, everything lives on one machine. If I switch laptops, I lose the learned memory and skills. The design docs include specs for E2E-encrypted sync (zero-knowledge server, AES-256-GCM, client-side key derivation) and multi-persona knowledge stores (isolated directories for work, personal, and OSS contexts). Those are designed but not yet built.

I also want to add a curator that runs on a schedule, consolidating narrow skills into broader umbrellas, archiving stale ones, and escalating high-reinforcement lessons to AGENTS.md when they hit a threshold. The curator skill exists and the CLI commands are there. What is missing is the scheduling integration and the first real run.

I keep thinking about the moment I saw the patched convex-migration-helper skill. I had not told anyone to fix it. I had not filed an issue or written a TODO. The conversation where I hit the internalMutation wall was over. I had moved on. But the system was still watching, and it decided that the workaround I found was worth remembering.

That is the property I wanted. The agent gets better without me steering the improvement. I do the work, the system does the learning, and the next session inherits the result.

The Hermes agent had it. Now OpenCode does too.

I hope autolearn brings you the same quiet compounding it has brought me: an agent that remembers your corrections, respects your preferences, and gets a little sharper every time you sit down to work.


Cite this blog post:
@article{
    ericmjl-2026-my-coding-agent-learned-a-lesson,
    author = {Eric J. Ma},
    title = {My coding agent learned a lesson and patched its own skill},
    year = {2026},
    month = {06},
    day = {16},
    howpublished = {\url{https://ericmjl.github.io}},
    journal = {Eric J. Ma's Blog},
    url = {https://ericmjl.github.io/blog/2026/6/16/my-coding-agent-learned-a-lesson},
}
  

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!