written by Eric J. Ma on 2026-03-29 | tags: ai llm coding architecture plugins opencode tools
I vibe-coded canvas-chat into existence in 48 hours, then spent weeks untangling the mess into a clean plugin architecture. The AI could execute my architectural vision, but it couldn't design it. This is the story of how I recovered from AI-generated slop — and why the architecture stays human.
I want to tell you about canvas-chat, a project I built with heavy AI assistance. It's a visual, non-linear chat interface where conversations are nodes on an infinite canvas — think branching, merging, and exploring topics as a directed acyclic graph.
The first commit landed on December 28, 2025. By December 30, it had sessions, matrix evaluation tables, web search, node tagging, and BM25 keyword search. The AI moved fast. Bugs got fixed in the next commit. Features piled in like tetris blocks stacking up.
And yes, it was a mess.
Here's the thing though: the mess was recoverable. Not because the AI got better (it didn't; not really), but because I had battle-tested convictions on how the thing ought to be architected. And those convictions came from years of shipping software, watching architectures crumble, and learning what holds up.
This is the story of how we went from a jumbled 8,500-line app.js to a clean plugin architecture — and why you need battle-tested convictions to make that happen.
The first commit wasn't actually bad. The project had clean separation from day one:
canvas.js — SVG pan/zoom/renderinggraph.js — DAG data structurechat.js — LLM API + SSE streamingstorage.js — IndexedDB persistenceapp.py — FastAPI backendBut app.js was already ~8,500 lines of everything else. Every slash command handler, every modal, every feature logic — all tangled together. Want to add a new feature? You'd grep around in that monolith, hope you found the right spot, and pray you didn't break anything.
The AI could add features to this mess. It could add a /matrix command in a few prompts. It could add /search with Exa integration. But it couldn't see the structure — the latent architecture that would make the whole thing maintainable.
The refactoring started with the purest code — functions with no dependencies:
| Date | What Got Extracted | Why |
|---|---|---|
| Jan 4 | layout.js |
Overlap detection is pure math |
| Jan 5 | highlight-utils.js |
Text selection is isolated |
| Jan 7 | Feature modules | flashcards.js, committee.js, matrix.js, factcheck.js, research.js, code.js |
| Jan 10 | Core infrastructure | undo-manager.js, modal-manager.js, slash-command-menu.js |
This reduced app.js from ~8,500 to ~5,500 lines. But these were still just file splits. The code worked better, but there was no system binding it together.
The AI did this part reasonably well — when I asked "extract this function to a separate module," it could do it. But it never suggested "we should extract this" on its own. It needed direction.
This was the architectural leap. I asked the AI to create a plugin system, and it delivered — but only because I knew what a plugin system should look like.
We ended up with a three-level plugin architecture:
Level 1: Custom Node Types — Node protocols define rendering via a BaseNode class. Each node type can override renderContent(), getActions(), getSummaryText(), and more. Registered in node-registry.js.
Level 2: Feature Plugins — Extend a FeaturePlugin base class. Get AppContext via dependency injection (graph, canvas, chat, storage, modalManager, streamingManager). Define slash commands via getSlashCommands(). Lifecycle hooks: onLoad(), onUnload().
Level 3: Extension Hooks — Subscribe to events. CancellableEvent can block actions. Event names like command:before, node:created, node:deleted.
The key files created:
feature-plugin.js — FeaturePlugin + AppContextfeature-registry.js — Slash command routing with priority (BUILTIN > OFFICIAL > COMMUNITY)plugin-events.js — CanvasEvent, CancellableEventnode-registry.js — Node type registrationThis is where the architecture became a real system. And it only happened because I knew what I wanted.
The same pattern reached the Python side:
pptx_endpoints.py — PowerPoint handlingddg_endpoints.py — DuckDuckGo searchcode_handler.py — Python code executionmatrix_handler.py — Matrix cell fillingEach follows a register_endpoints(app) pattern, loaded dynamically via importlib.
By late January, the plugin architecture was in place. Features were decoupled. The code was cleaner. And then GLM-4.5 started dropping curly braces.
No, really. The AI would "fix" one thing and introduce a missing bracket somewhere else. Merge conflicts became minefields; features that worked yesterday stopped working today; not because of malice, but because the AI didn't understand the dependencies between modules. It was making elementary mistakes that a junior developer wouldn't make.
On January 24, I added Cypress E2E tests. Out of spite, honestly. The first commit gave us canvas_interactions.cy.js, node_selection.cy.js, and note_node.cy.js - three tests that told us whether the canvas still worked.
These tests caught the regressions the AI kept introducing. More importantly, they let me verify changes faster. Instead of manually testing every feature after each AI session, I could run the test suite and know whether things still worked.
The plugin architecture made the code testable. The tests caught what the AI broke.
| Phase | app.js size | Modules |
|---|---|---|
| Initial (Dec 2025) | ~8,500 lines | 5 files |
| After feature splits | ~5,500 lines | 11 files |
| After infrastructure | ~5,400 lines | 15 files |
| After plugin migration | ~5,400 lines | 25+ files |
| Today | ~4,700 lines | 35+ modules |
Here's what I learned from this process:
The AI can execute architecture, but it can't design it. It can split files when asked. It can implement a plugin system from a spec. But it won't look at a 8,500-line app.js and say "this should be a plugin system."
That vision; that opinion; comes from somewhere else. It comes from:
I didn't arrive at "we need a three-level plugin architecture" out of nowhere. It came from discussing tradeoffs with the AI; asking "what if we did it this way?" and "what are the tradeoffs of that approach?"; and applying my best judgment to the options. The AI could explain the pros and cons of different approaches, but I had to pick which tradeoffs I was willing to accept.
The AI didn't teach me this. Experience taught me this.
Here's where it gets interesting. Because we built this modular foundation, I can now swap out the rendering layer. The canvas is currently raw SVG — and I want to move to Svelte Flow. The plugin system I built makes this possible:
app.js internals; they use AppContextcanvas.js; swapping to Svelte Flow means replacing that layerThe abstraction layer we built; FeaturePlugin + AppContext + EventSystem; separates what features do from how they're rendered. That's what makes Svelte Flow viable as a drop-in replacement.
You can undo AI vibe-coded slop. It's possible. But it requires you to have battle-tested convictions on how the thing ought to be.
The AI is an incredible executor. It can refactor, extract, implement. But the vision? That stays human. And that vision comes from battle-tested experience; from having seen enough codebases to know what works and what collapses under its own weight.
So if you're working with AI coding assistants: don't expect them to architect for you. Tell them what to build. Give them the structure. Then let them do the implementation.
That's how you get from a jumbled mess to something you can actually maintain.
@article{
ericmjl-2026-undoing-ai-vibe-coded-slop-with-ai,
author = {Eric J. Ma},
title = {Undoing AI vibe-coded slop with AI},
year = {2026},
month = {03},
day = {29},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2026/3/29/undoing-ai-vibe-coded-slop-with-ai},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!