Is Claude Code Actually Better? 1,820 Hours of Data

What is Reazy?

Reazy is a text-to-speech app that lets you listen to anything you read. Chrome extension, web app—you upload a document or visit a webpage, and it reads it to you with custom AI voices.

It’s the test case for this analysis. I’ve been building it solo for over 2.5 years, and tracking my focused hours data for the last 22 months.

The tech stack:

React and TypeScript on the frontend
Ionic Capacitor for mobile
Firebase for auth and storage
Google Cloud Run for inference

Here’s why this matters to me:

I’m not drawing a salary while I build Reazy. And I scoped a little too large for my first web development project. Shipping faster means revenue. A reduction in debug time isn’t just a stat — it’s runway.

The Setup

months

1,820

hours tracked

1,387

commits

Development Timeline

22 months • 2,100 hours tracked • 4 AI tools

GPT Era

895h

Jan-Oct '24

Cursor

270h

Nov '24-Jan '25

Windsurf

415h

Feb-Jun '25

Claude Code

520h

Jul-Nov '25

Major Phases:

Voice Training

Extension MVP

Web App

Auth + Lazy Loading

TanStack + Reader UX

Mobile + Capacitor

Voice Training v2

Search + Polish

↑ Claude Code Adopted July 2025

Mobile optimization, Capacitor integration, voice training automation, search feature, scroll coordinator, and ongoing polish.

22 months of development. 1,820 hours tracked in a spreadsheet. 1,387 commits across five repositories.

The question I want to answer: Did Claude Code actually make me more productive?

Over those 22 months, I used four different tools:

GPT prompting / Copilot in early 2024
Cursor from November 2024 through January 2025
Windsurf from February through June 2025
Claude Code from July 2025 to now

That gives us a natural experiment. Same developer, same codebase, using different agentic AI tools at different times.

Why This Is Hard to Measure

The Measurement Problem

How do you compare 1 Claude commit (detailed) to 3 pre-Claude commits (terse)?

They might represent the same amount of work—or completely different amounts.

Pre-Claude Commit (May 2025)

[speak-to-me] bug fix: auth delayed settings sync

1 line

No context

No root cause

Future-you: “What was this?”

Claude Code Commit (Nov 2025)

fix(search): add bypassCoordinator to scrollToIndex

Fixed nested scroll operation bug preventing search navigation to distant chunks.

Root Cause:

Search Phase 1 (priority 3.5) called scrollToIndex()

scrollToIndex created TOC operation (priority 3)

ScrollCoordinator dropped inner operation

Solution:

Added bypassCoordinator parameter

Search Phase 1 passes bypassCoordinator: true

30+ lines

Full context

Root cause documented

Complicating Factors

📏Commits vary in size— Claude era commits are more detailed

🧩Features vary in complexity— Stripe integration is not a bug fix

⏰Hours varied week to week— Some weeks 20h, some 40h

📈Skills grew over 22 months— Better developer doesn’t mean better tools

🏗️Infrastructure was built early— Early work enables later speed

Raw numbers are misleading. Here are four different ways to look at this data, each controlling for different complications.

Lens 1: Debug Session Duration

Debug sessions are complexity-independent. When you’re hunting a bug, the feature is already built. You’re just trying to find the root cause. It takes however long it takes.

This isolates “problem-solving speed” from “feature complexity.”

~60% shorter debug sessions with Claude Code

Pre-Claude avg: 54h → Claude avg: 30h

Cursor (Pre-Claude)

Windsurf

Claude Code

The longest debug sessions in the pre-Claude era:

Lazy Loading Text Files (Cursor, December 2024): 59 hours over 10 days. I had to restart completely after 30 hours when the first approach failed.
Firebase Bundle Sizes (Cursor/Windsurf): 52 hours total—and this one happened twice, in June 2024 and June 2025. Same root cause, one year apart.
Web App Authentication (Cursor, November 2024): 50+ hours over 11 days. I tried three different approaches before one worked.
TanStack Virtual Migration (Windsurf, February 2025): 40+ hours over 12 days.

Now look at the Claude Code era:

Search with Virtualization (November 2025): 35 hours over 7 days—and it produced 1,600 lines of documentation.
Scroll Coordinator (November 2025): About 15 hours over 3 days, fully documented.

Pre-Claude debug sessions averaged around 50 hours. Claude Code sessions averaged around 25 hours. That’s roughly 60% shorter.

Debugging with Claude Code feels much easier. It feels smarter (even though I used Claude in the cursor/windsurf). Often pasting in the logs can solve it with a couple code-and-test iterations.

One caveat is that the Tanstack Virtualization was infrastructure work that was more complex. That said, I believe I would have implemented it faster using Claude Code.

Lens 2: Commit Behavior

Commits jumped from ~33/month to ~83/month

Not more output—a behavioral change in how I work

GPT Era

Cursor

Windsurf

Claude Code

Before Claude Code, I averaged about 33 commits per month. After? About 83. That’s a 150% increase.

But here’s the thing—this isn’t “more output.” It’s a behavioral change.

Why Commits Increased

Claude writes good commit messages

I can just ask “commit this with a good message” and get a clear, descriptive commit. Removes friction from the process.

Frequent commits = safety net

More commits meant I felt safe breaking things and pushing forward. Easy to roll back if something goes wrong.

Claude Code made me a better developer

More commits isn’t “more output”—it’s better version control habits. Claude made committing so easy that I actually started doing it more.

The tool changed my behavior, not just my productivity.

Lens 3: Documentation Behavior

3,600+ lines of documentation

Lessons learned, code complexities, debugging context—all easily saved after changes

Documentation by File

Why Documentation Actually Happens Now

Claude makes it frictionless

During debugging, Claude explains what went wrong and why the fix works. That explanation becomes documentation with minimal effort.

AI_docs = Claude’s memory

Claude uses these docs as context. The documentation isn’t just for me—it’s how Claude remembers architectural decisions and past debugging sessions.

Pre-Claude

ai_docs/

└── (empty)

Other tools did not encourage this habit. It is almost essential with Claude Code.

Claude Code

ai_docs/

├── search-virtualization-deep-dive.md

├── search-feature-architecture.md

├── scroll-coordinator-guide.md

├── firebase-architecture.md

└── …

3,600+ lines of architecture docs, debug sessions, patterns.

AI_docs is the best habit for complex projects

It lets you build projects bigger than what fits in your head. Architecture decisions, debugging context, integration details—ask Claude. Those details belong in text so you can focus on higher level issues like user experience.

Claude’s memory is only as good as what you document.

Lens 4: Qualitative Pattern Changes

Numbers don’t capture everything. Some changes are about how work happens.

🤝

Trust Over Time

Claude has earned trust through updates

Started at Claude 3 inside wrappers. Claude Code harness and models receive continuous updates. In just a few short months, the experience has improved significantly.

🔄

Recovery

Easier to recover from mistakes

Cursor/Windsurf UIs got clunky on long threads. Complex multi-file work was slow. Claude Code is faster to jump back with ESC ESC shortcut or using commits.

⚡

Tool Experience

Claude CLI is zippy and simple

Simple design, even toylike. It is the easiest dev tool I’ve ever used.

Conclusion

Claude was first and is still the best (December 2025) despite lots of competition.

Claude Code is a magical experience that makes coding easy

The experience is qualitatively better than any agentic integrated development environment I’ve tried. The traction and progress they’ve made—even over the last few months—is very real.

The Wrapper Experience vs Claude Code

Cursor / Windsurf

UI got clunky on long threads
Complex multi-file work was slow
Context compression issues
Had to re-explain things

Claude Code

Zippy CLI experience
Simple, clean design
Full context preserved
Easy to recover from mistakes

Evidence it’s catching on

Cursor’s Pivot

Cursor hired Claude Code’s creators—Boris and Cat (developer and product manager). They stayed a couple weeks before Anthropic was able to hire them back. Now Cursor has their own CLI agentic coding experience.

Product-Market Fit

My feeling: Sonnet 4 hit product-market fit with Claude Code. Lots of people migrated to the better experience.

Competition Following

Gemini CLI, OpenAI Codex, and others have followed Claude’s lead. Proof that the approach is effective.

Not all agentic coding tools are equal.

What The Data Shows

📉

Debug sessions ~60% shorter

From 50-hour average to 25-hour average

📝

Documentation exists

3,600+ lines that didn’t exist before

💾

Commit behavior changed

~150% more commits because Claude makes it frictionless

⚡

Workflow itself is different

More trust, easier recovery, zippy CLI experience

The Reality Check

As of the end of 2025, coding with agentic tools is still collaborative and iterative. A human in the loop is required. 100% vibe coding probably takes longer than steering smartly toward your goals. Testing is still required, though Claude Code is getting better at one shot implementations. You still have to test and fix almost everything moderately complex. You still have to steer it, especially when things are complex—documentation to follow, integrating parts. You also have to mind your architecture. Sometimes Claude gets stuck in the weeds, and you have to step back and talk about the architecture at a higher level.

Documenting everything has been a game changer. AI_docs is Claude’s memory. When I hit the same issues again, Claude already has the context. When I have documentation, I can use him as an oracle. I ask him to give me a command to do X and he can lookup the context and provide the terminal syntax I asked for.

Appendix: Methodology

Data sources:

hours.csv: 610 entries from April 2024 through December 2025
Git commit history: 1,387 commits across reazy-app, reazy-inference, tts-training-pipeline, and supporting repos
Personal notes and documentation

Tool era boundaries:

GPT Prompting / Copilot: January 2024 - October 2024
Cursor: November 2024 - January 2025
Windsurf: February 2025 - June 2025
Claude Code: July 2025 - present (first commit: July 14, 2025)

Commit attribution:

282 commits (20% of total) explicitly marked with “Generated with Claude Code”
Many more commits in Claude era not explicitly marked but created with Claude assistance

Limitations:

Single developer, single project—may not generalize
Skill growth over 22 months is a complicating factor
Infrastructure maturity affects later development speed
Commit granularity changed between eras

The goal was honest analysis, not advocacy. I tried to show the data fairly and let you draw your own conclusions.