← December 20, 2025 edition

outship

Hire amazing engineers by seeing how they use AI

Outship Wants to Watch Engineers Work With AI and That's the Interview

RecruitingDeveloper ToolsAI

The Macro: The Coding Interview Is Broken (Again)

Every few years, the tech industry collectively acknowledges that its hiring process doesn’t actually identify good engineers, and then keeps doing it anyway. Whiteboard interviews test memorization of algorithms most people will never use on the job. Take-home projects test whether a candidate has free time. Pair programming sessions test whether someone performs well while being watched by a stranger. None of these reliably predict whether a person can ship useful software in a real work environment.

And now there’s a new variable that makes the old playbook even more outdated. AI coding tools have fundamentally changed how software gets built. Engineers who are effective with Claude Code, Cursor, or similar tools produce code differently than engineers who aren’t. The process looks different. The debugging looks different. The way they decompose problems, evaluate suggestions, and recover from wrong turns looks different. And none of that shows up in a traditional coding interview where AI tools are either banned or awkwardly permitted without any framework for evaluating how they’re used.

The recruiting tech market has plenty of players, but most are optimizing the old paradigm. HackerRank and LeetCode test algorithmic problem-solving. Karat runs structured technical interviews with trained interviewers. CodeSignal offers standardized coding assessments. Interviewing.io provides anonymous practice interviews. All of these are useful for what they measure, but what they measure is increasingly disconnected from how engineering work actually happens.

The gap I see is this: nobody has built a credible way to evaluate engineers based on how they work with AI tools, which is quickly becoming the most important skill in software development. That’s the gap Outship is trying to fill.

The Micro: A Cloud Workspace That Records Everything

Outship, a Y Combinator W25 company, takes a straightforward approach to a complicated problem. They spin up a cloud VM for each candidate, pre-configured with VS Code, all necessary dependencies, and AI agents like Claude Code, Cursor, OpenAI Codex, and Qwen. The candidate works on a task, whether that’s an interview question, a take-home assignment, or even a real feature from the company’s codebase, and Outship captures everything: every prompt, every edit, every terminal command, every decision.

The founding team comes from UC Berkeley’s AI research lab. Saner Cakir, the CEO, worked on robot learning at BAIR. Kayla Lee, the COO, has a BS and MS in Computer Science from Berkeley, contributed to NLP publications at the same lab, and co-built an AI copilot for surgeons at UCSF. They’ve spent enough time around AI systems to understand that the interesting signal isn’t in the output, it’s in the process.

That’s the core insight, and I think it’s correct. Two engineers can produce identical code but arrive at it through completely different processes. One might write a precise prompt, evaluate the AI’s output critically, catch a subtle bug, and ship something clean. Another might accept the first suggestion without reading it, paste in error messages when things break, and brute-force their way to something that passes tests but is fragile. The final code might look similar. The engineering quality is very different.

The environment management piece is worth noting too. Setting up a consistent development environment for every candidate is one of those problems that sounds easy and isn’t. Different projects need different runtimes, different dependencies, different configurations. If your interview environment doesn’t mirror real working conditions, you’re testing the wrong thing. Outship handling the VM provisioning and secret management means the candidate walks into something that feels like a real workspace, not a sandboxed puzzle.

What I find most compelling is that this product doesn’t just serve the interviewer. It could genuinely help companies understand what “good AI-assisted engineering” looks like at their organization. If you’re a hiring manager and you’ve watched 50 candidates work with AI tools, you start to see patterns. You learn what the best engineers do differently. That’s organizational knowledge that didn’t exist before because nobody was capturing it.

The competitive response I’d expect: HackerRank and CodeSignal will probably add AI-tool-enabled assessment modes. Karat might update their interview rubrics. But bolting AI evaluation onto an existing assessment platform is harder than it sounds because the scoring criteria are fundamentally different. You’re not grading correctness or speed anymore. You’re grading process, judgment, and adaptability.

The Verdict

I think Outship is asking the right question at the right time. The shift toward AI-assisted development is not slowing down, and companies that can’t evaluate how candidates work with these tools are going to make worse hires.

At 30 days: what does the evaluation framework actually look like? Recording everything is the easy part. Turning those recordings into a structured, fair, repeatable assessment is where the product has to deliver. If it’s just “watch the replay and use your judgment,” that’s not much better than an unstructured interview.

At 60 days: who’s buying? This product could sell to fast-moving startups who already live in AI coding tools and want to hire people who share that fluency. Or it could sell to larger companies who are trying to figure out their AI development strategy and want data on what good looks like. Those are different sales motions.

At 90 days: does the data they’re collecting become a product in itself? If Outship has recordings of thousands of engineers working with AI tools across different task types, that’s a dataset nobody else has. The insights about what makes someone effective with AI-assisted development could be more valuable than the interview platform itself.

The old interview was about proving you could write code. The new interview might be about proving you can work with something that writes code for you. Outship is betting on that transition, and I think the bet is directionally correct.