Caveman: Cut Claude API Token Usage by 75%

A single tagline launched this thing: “Why use so many token when few do trick?”

That’s Caveman, a SKILL.md-based plugin for Claude Code built by JuliusBrussee. The premise is blunt: cut Claude’s output tokens by roughly 75% without sacrificing technical accuracy. As of April 15, 2026, it’s sitting at 31.6k GitHub stars. For a focused prompt-engineering utility that doesn’t pretend to do anything except one job, that number demands some attention.

It got solid traction on launch day, which is the kind of validation Product Hunt occasionally gets right. The idea isn’t complicated. You install Caveman in one line, pick a grunt level, and your AI coding assistant stops writing essays when you asked for a function. Fewer tokens out means lower API spend and faster turnaround. If you’ve watched Claude burn through credits explaining why it’s about to write the code before it writes the code, Caveman is aimed directly at you.

One early adopter put it plainly. “I installed Caveman on my Claude Code setup and it’s great,” the user told the project’s GitHub discussion thread, which is about as unambiguous as reviews get for developer tooling.

The plugin works across Claude Code, Cursor, Windsurf, and Copilot, among others. One line to install. That’s the pitch, and it’s short on purpose.

Grunts on a Spectrum

The design decision that sets Caveman apart from generic token-reduction approaches is the grunt level system. Most compression tools are binary, either you apply a rule or you don’t, which means you’re either in compressed mode for everything or you’re not. Caveman gives you 4 distinct settings along a spectrum from slightly terse to something approaching the functional minimum, where the AI is basically dropping every word that isn’t load-bearing.

This is a real usability distinction. There are moments in a coding workflow where context matters, where you want the model to walk you through why a particular approach handles edge cases a certain way, and there are moments where you need the function signature and the fix and nothing else. Being forced to choose one permanent mode for all of those situations is a tax on the tool. Having a dial is better.

The grunt levels range across 4 settings, and the upper end apparently produces output that reads, charitably, like a developer under extreme time pressure and extreme contempt for punctuation. That’s not a criticism. That’s exactly what some workflows need, and the fact that the project commits to that aesthetic rather than softening it is part of why it landed the way it did.

Commit message generation and one-line PR reviews ship with it. Commit messages are where AI assistants tend to go completely sideways, producing 3-paragraph summaries for a 2-line change. Trimming that behavior down is a legitimate improvement for anyone who does code review at any volume. Developers don’t want prose. They want signal.

Input compression is also in the feature set. That’s worth separating out because it means Caveman isn’t only squashing what the model says back to you. It’s compressing what you send in too, which is a broader token-efficiency argument than just “make the AI shut up faster.” That’s the quote the repo uses for one of its core use cases, and it’s accurate.

The Actual Technical Argument

Here’s why the 75% claim is worth treating as a real number rather than marketing copy.

Large language model APIs charge by token on both sides of the conversation, input and output. For developers running AI-assisted coding at any serious volume, output tokens from a verbose assistant accumulate quickly. Claude, specifically, tends toward thorough responses by default. That’s often a feature. It’s also sometimes 10 paragraphs of reasoning before 3 lines of code, which is a cost structure that doesn’t scale well when you’re making hundreds of API calls a day. If the 75% figure holds across real workloads, you’re looking at meaningful cost reduction for teams that depend on these tools heavily.

The repo’s contributor activity supports the idea that this isn’t a weekend project someone posted and moved on from. As of April 15, 2026, the project has 140 commits and 1.5k forks. That’s consistent maintenance. The SKILL.md architecture means it’s not patching Claude’s behavior through some fragile workaround, it’s working within Claude Code’s own skill system, which gives it a more durable foundation than prompt injection hacks that break whenever the model updates.

The 275675% figure floating around the launch data comes from the Product Hunt API source identifier in the campaign URL, which is the kind of detail that gets cited out of context. What actually matters is that the launch day numbers were strong enough to push the project into visibility, and the GitHub growth since then has been organic.

What It Doesn’t Do

Caveman doesn’t make Claude smarter. It doesn’t improve code quality. It doesn’t give the model new capabilities. What it does is constrain how the model expresses what it already knows, pushing it toward the “just give me the code” end of the response spectrum. For developers who’ve already decided Claude’s underlying reasoning is good enough, that’s the right tradeoff. For developers who rely on the model’s explanations as part of their own learning process, dialing up to full caveman mode might strip out exactly what they came for.

There’s also a version of this concern at the team level. If you’re using AI-assisted coding to onboard junior developers who benefit from seeing why a solution works, not just what it is, then a tool built to eliminate explanatory text is solving the wrong problem for that context. That’s not a flaw in Caveman. That’s scope.

The 4 grunt levels exist precisely because the developers understood this. Level 1 doesn’t look much different from a normally terse assistant. Level 4 looks like it was written by someone who thinks vowels are optional. Most teams will probably find a comfortable setting somewhere in the 15 to 29 token-reduction range before they hit diminishing returns on readability.

Why This Got Traction

The 31.6k stars by April 15, 2026 aren’t entirely about what Caveman does. They’re partly about what it signals.

Developer tooling has spent the last few years moving toward maximalism, models that explain more, suggest more, write longer commit messages, generate fuller documentation. That trajectory has real value in the right contexts. It also has real costs, literal API costs and the subtler cost of attention, where a tool that talks too much starts to feel like a coworker who won’t stop narrating. Caveman is a reaction to that. It’s small, it’s opinionated, and it’s named after a joke that also happens to be the entire technical brief.

The “just give me the code” instinct is widespread among experienced developers. It’s the reason terminal interfaces haven’t died. It’s why terse error messages are often preferred to verbose ones. Caveman is, in some sense, a formalization of that preference as a configurable AI behavior, which is a thing that apparently 31.6k people wanted badly enough to star a repo about it.

The SKILL.md-based architecture also means it’s extensible in ways that raw prompt injection isn’t. That matters for longevity. Tools that survive in developer ecosystems tend to be the ones that can be adapted, not just installed. With 140 commits and 1.5k forks, the project already has the shape of something people are building on top of, not just dropping in and forgetting.

Where It Fits

Caveman is a niche tool that found its niche precisely because the niche is larger than it looks. The population of developers running Claude Code or Cursor or Copilot at any real volume, watching token costs accumulate, and muttering something about how they didn’t ask for a sonnet, that population isn’t small. It’s most of the developers using these tools professionally.

The 75% output token reduction claim is the number that will get tested and debated. Real-world results will vary by use case, grunt level, and how much the underlying task actually requires explanation. But the claim isn’t implausible. And even 29% would be meaningful at scale.

The tagline remains the most efficient summary of the project’s philosophy: “Why use so many token when few do trick?”

Caveman: Cut Claude API Token Usage by 75%

More on this

The HUGE Brief