written by Eric J. Ma on 2026-01-06 | tags: javascript bioinformatics katex canvas algorithms bugfix highlighting selection web development ui
In this blog post, I share how a tricky text highlighting bug in my canvas-chat project led me to use a classic bioinformatics algorithm, Smith-Waterman, to solve messy browser selection issues—especially with KaTeX-rendered math. Instead of struggling with normalization, I reframed the problem as sequence alignment, which proved robust and effective. Curious how an algorithm from DNA analysis can fix web UI bugs?
I ran into a frustrating bug this week in canvas-chat, my experimental canvas-based chat interface I built at the end of last year. The bug seemed simple on the surface: when users selected text from a rendered markdown table and clicked to highlight it, the highlighting would sometimes stop partway through, or highlight the wrong characters entirely.
What started as a "quick fix" turned into a journey through several failed approaches before I remembered an algorithm from my undergraduate bioinformatics days. Sometimes the best solution to a problem comes from an unexpected domain.
Canvas-chat has a feature where you can select text from an AI response, and the app creates a "highlight" node that links back to the source. When you click on the highlight, the corresponding text in the source gets wrapped in a <mark> tag.
This worked fine for simple paragraphs. But when I tried it on tables containing KaTeX-rendered math, things went wrong:
What I expected to highlight: 66.00 (0.18±0.58)
What actually got highlighted: 66.00 ( 0.18±0.58)
The highlighting was off by more than a few characters, and would stop before the end of my selection. In some cases, it would highlight completely wrong sections.
The problem came from how KaTeX renders math and how browsers handle text selection.
KaTeX renders math with multiple text representations:
<span class="katex"> <!-- MathML for accessibility/screen readers --> <span class="katex-mathml"> <math><mn>0.13</mn><mo>±</mo></math> </span> <!-- Visual HTML for display --> <span class="katex-html"> <span class="mord">0.13</span> <span class="mord">±</span> </span> </span>
When you select text that spans across KaTeX-rendered content, selection.toString() gives you something like this:
"66.00 (
0.13
±
0.13±0.58)"
Notice the duplicated 0.13 and the random newlines? The browser included text from both the MathML (for accessibility) and the visual spans. Add in tabs between table cells and inconsistent spacing around operators, and you have a string that looks nothing like the clean HTML text content.
My initial approach was to normalize both strings (the user's selection and the HTML text) before matching:
0.13 ± 0.13± → 0.13±)± operatorsThen find the match in the normalized strings, and map the positions back to the original.
This is where things got complicated. I needed to track:
The code became a tangled mess of position arrays and off-by-one bugs. Here's a simplified version of what it looked like:
// Build mapping from normalized to original positions const normalizedToOriginal = []; let inWhitespace = false; let leadingTrimmed = true; for (let i = 0; i < fullText.length; i++) { const ch = fullText[i]; if (/\s/.test(ch)) { if (!inWhitespace && !leadingTrimmed) { normalizedToOriginal.push(i); } inWhitespace = true; } else { leadingTrimmed = false; inWhitespace = false; normalizedToOriginal.push(i); } } // Then also account for math spacing normalization... // And KaTeX deduplication... // Each layer compounds the position mapping complexity
The position mapping kept breaking. I'd fix one case only to break another. I was trying to maintain a bijection between two strings that had been transformed through multiple non-invertible operations. It wasn't going to work.
After banging my head against the normalization approach for a while, I took a step back. What was I actually trying to do?
I had two strings:
I needed to find where the user's selection "matched" in the HTML text, tolerating:
This is exactly what sequence alignment algorithms are designed for. In bioinformatics, we use these algorithms to compare DNA or protein sequences that may have evolved with insertions, deletions, and mutations. The classic algorithm for finding the best local alignment between two sequences is Smith-Waterman.
I learned Smith-Waterman as an undergraduate, probably around 2008. I never thought I'd use it for web development.
I didn't need to align the entire selection - I just needed to find where it started and ended in the HTML text. So I:
Here's the core alignment function:
function alignStart(queryPrefix, target) { const m = queryPrefix.length; const n = target.length; const MATCH = 2; // Reward for matching characters const MISMATCH = -1; // Penalty for different characters const GAP = -1; // Penalty for insertions/deletions const WS_MATCH = 1; // Softer reward for whitespace matches // Build the scoring matrix const score = Array(m + 1).fill(null) .map(() => Array(n + 1).fill(0)); let maxScore = 0; let maxI = 0, maxJ = 0; for (let i = 1; i <= m; i++) { for (let j = 1; j <= n; j++) { const qChar = queryPrefix[i - 1].toLowerCase(); const tChar = target[j - 1].toLowerCase(); let matchVal; if (qChar === tChar) { matchVal = /\s/.test(qChar) ? WS_MATCH : MATCH; } else if (/\s/.test(qChar) && /\s/.test(tChar)) { matchVal = WS_MATCH; // Any whitespace matches any whitespace } else { matchVal = MISMATCH; } score[i][j] = Math.max( 0, // Local alignment can restart anywhere score[i-1][j-1] + matchVal, // Diagonal: match/mismatch score[i-1][j] + GAP, // Up: gap in target score[i][j-1] + GAP // Left: gap in query ); if (score[i][j] > maxScore) { maxScore = score[i][j]; maxI = i; maxJ = j; } } } // Traceback to find start position // ... (walk backwards from maxI, maxJ to find where alignment began) }
The key insight is that Smith-Waterman's local alignment naturally handles all the messiness. Extra newlines in the selection? They're just gaps. Duplicated numbers? They align to the same position. Different whitespace characters? They all match each other.
The new approach passes all the test cases that the normalization approach failed:
Test: Simple word
"Hello world, this is a test.""world"Test: KaTeX duplication
"66.00 (0.18 ± 0.18±0.58)""66.00 (\n0.18\n±\n0.18±0.58)"Test: Cross-block selection
"The Heading Some paragraph text here.""The Heading\n\nSome paragraph"I didn't invent anything new here. Smith-Waterman has been around since 1981. I just recognized that my web development problem was, at its core, a sequence alignment problem.
This is why I think it's valuable to study algorithms and techniques from different domains, even if they seem unrelated to your day-to-day work. You never know when dynamic programming from bioinformatics will solve your JavaScript text highlighting bug.
The normalization approach was trying to make two messy things identical before comparing them. The alignment approach embraced the messiness and asked: "Given that these are different, where do they best correspond?"
That's a fundamentally different framing, and it's the framing that actually matched the problem.
Interestingly, I couldn't find prior examples of using Smith-Waterman specifically for UI text highlighting or matching browser text selections to source HTML. The algorithm is well-established in bioinformatics for DNA and protein sequence alignment, and it appears in some fuzzy string matching contexts like spell-checking and record linkage. But applying it to handle the specific artifacts that browsers introduce when selecting text from rendered HTML with KaTeX, MathML, or complex table structures? That seems to be a new application. Sometimes the best solutions come from recognizing that your problem, despite appearing domain-specific, maps onto a well-solved problem from an entirely different field.
One more note: I didn't write the JavaScript implementation myself. I directed Claude Opus 4.5 in OpenCode to write it for me. My contribution was recognizing that this was a sequence alignment problem and describing the approach - the actual code was generated by the AI. This is becoming my preferred way to work: I provide the domain insight and algorithmic direction, and the AI handles the implementation details.
For those curious, the complete implementation is in the pull request. The key functions are:
alignStart(queryPrefix, target) - Find where the query beginning matchesalignEnd(querySuffix, target) - Find where the query end matches (by reversing and aligning)findMatchRegion(query, target) - Combine both to get the full match regionThe algorithm runs in O(mn) time where m and n are the lengths of the strings being aligned. For typical text selections (tens to hundreds of characters), this is instantaneous. And unlike the normalization approach, it's robust and correct!
@article{
ericmjl-2026-how-i-fixed-a-browser-selection-bug-with-sequence-alignment-algorithms,
author = {Eric J. Ma},
title = {How I fixed a browser selection bug with sequence alignment algorithms},
year = {2026},
month = {01},
day = {06},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2026/1/6/how-i-fixed-a-browser-selection-bug-with-sequence-alignment-algorithms},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!