The Macro: Nobody Writes Enough Tests and Everyone Knows It
I have never met an engineering team that felt good about their test coverage. Not once. Every team I have ever talked to has the same story: they know testing is important, they have a coverage target on a dashboard somewhere, and they are consistently behind on it because there is always a feature to ship or a bug to fix that feels more urgent than writing tests for code that already works.
The numbers tell the story. Most engineering teams report test coverage between 40 and 70 percent. The industry recommendation is above 80 percent. The gap is not a knowledge problem or a tooling problem. It is a prioritization problem. Writing tests is tedious. It does not ship features. It does not impress stakeholders. It does not make the product roadmap look good. But every time a regression ships to production because a code path was not tested, the entire team pays for it in incident response, hotfixes, and customer complaints.
UI bugs are a similar story. They are annoying, they pile up in the backlog, and they are rarely prioritized over feature work. A button that is misaligned on mobile. A form that does not validate correctly on edge cases. A modal that does not dismiss properly in Safari. These are not hard bugs to fix. They are boring bugs to fix. And because they are boring, they sit in the backlog for weeks or months while the team focuses on things that feel more important.
This is the gap that AI coding agents are attacking. Not the sexy work of building new features from scratch, but the essential, unglamorous work of testing and bug-fixing that every team needs and nobody wants to do. Cursor, Copilot (from GitHub, so I will note the Big Tech parentage and move on), Cody from Sourcegraph, and Codegen are all playing in the AI coding space. But most of them focus on code generation and completion. The testing and bug-fixing niche is more specific and potentially more valuable because the ROI is so easy to measure.
The Micro: From Ticket to PR Without a Human in the Loop
Marcel Tan (CEO) and Sohil Kshirsagar (CTO) cofounded Tusk and brought it through Y Combinator’s Winter 2024 batch. The product has evolved since launch, expanding from test generation into a broader AI coding agent that handles UI bugs and writes complete pull requests from tickets.
The workflow is clean. Tusk connects to your issue tracker (Jira, Linear, or whatever you use), reads a ticket, analyzes your codebase for context, writes the code to fix the bug or add the test, and opens a PR with the changes. For UI bug fixes, it understands the frontend framework, identifies the component causing the issue, and generates a fix that follows your existing code patterns. For test generation, it reads the production code, understands the business logic, and writes tests that cover meaningful scenarios rather than trivial ones.
The test generation side has some particularly smart features. CoverBot analyzes your current test coverage, identifies the most impactful gaps, and generates tests to hit your coverage targets. This is not random test generation. It prioritizes the code paths that are most likely to cause regressions if left untested. Tusk Drift handles API testing by replaying production traffic patterns, which means the tests reflect how your API is actually used rather than how a developer imagines it might be used.
Self-healing tests are the feature that will make or break long-term adoption. The biggest complaint about large test suites is maintenance. Business logic changes, a function signature gets updated, and suddenly 40 tests fail because they reference the old behavior. Tusk automatically updates tests when the underlying code changes, which addresses the number one reason teams give up on maintaining high coverage: the cost of keeping tests current as the codebase evolves.
The pricing model offers a 14-day free trial, which is smart. This is a product where the value is immediately measurable. Either the generated tests catch real bugs and the UI fixes are accurate, or they are not. Two weeks is enough time for an engineering team to evaluate that.
The competitive landscape is worth mapping out. Codium (now Qodo) focused on test generation early and has strong positioning in that niche. Diffblue does automated test writing for Java specifically. Launchable and Trunk focus on test infrastructure and optimization. On the broader AI coding agent side, Devin from Cognition, Sweep, and Factory all build AI agents that can handle engineering tasks end-to-end. Tusk’s positioning in the overlap between test generation and bug fixing is distinctive. Most competitors pick one lane. Tusk is arguing that these are the same workflow: understand the codebase, write code that makes it better, open a PR.
The Verdict
Testing and UI bug fixing are the perfect use case for AI coding agents because the cost of doing it manually is high, the cost of not doing it is higher, and the quality bar is measurable. Either the test catches bugs or it does not. Either the UI fix works or it does not.
At 30 days, I want to see the acceptance rate on generated PRs. If engineering teams are merging 60 percent or more of Tusk’s PRs without significant modification, that is a product that saves real time. Below 40 percent, the review overhead starts to eat into the efficiency gains. At 60 days, the self-healing tests feature needs to prove itself. Does it actually keep tests current as the codebase changes, or does it introduce subtle bugs by auto-updating tests that should have failed? At 90 days, the question is whether Tusk becomes part of the team’s standard workflow or stays as an occasional tool. The best developer tools disappear into the background. You stop thinking about them because they just work.
I think Tusk is onto something real. The backlog of “important but not urgent” engineering work is enormous at every company, and it is exactly the kind of work that AI agents can handle well. If your test coverage is at 50 percent and your UI bug backlog has 200 tickets, Tusk might be the most productive engineer on your team within a week. That is a compelling pitch.