Is Claude Code Actually Better? 1,820 Hours of Data
What is Reazy?
Reazy is a text-to-speech app that lets you listen to anything you read. Chrome extension, web appāyou upload a document or visit a webpage, and it reads it to you with custom AI voices.
Itās the test case for this analysis. Iāve been building it solo for over 2.5 years, and tracking my focused hours data for the last 22 months.
The tech stack:
- React and TypeScript on the frontend
- Ionic Capacitor for mobile
- Firebase for auth and storage
- Google Cloud Run for inference
Iām not drawing a salary while I build Reazy. And I scoped a little too large for my first web development project. Shipping faster means revenue. A reduction in debug time isnāt just a stat ā itās runway.
The Setup
Development Timeline
22 months ⢠2,100 hours tracked ⢠4 AI tools
22 months of development. 1,820 hours tracked in a spreadsheet. 1,387 commits across five repositories.
The question I want to answer: Did Claude Code actually make me more productive?
Over those 22 months, I used four different tools:
- GPT prompting / Copilot in early 2024
- Cursor from November 2024 through January 2025
- Windsurf from February through June 2025
- Claude Code from July 2025 to now
That gives us a natural experiment. Same developer, same codebase, using different agentic AI tools at different times.
Why This Is Hard to Measure
The Measurement Problem
How do you compare 1 Claude commit (detailed) to 3 pre-Claude commits (terse)?
They might represent the same amount of workāor completely different amounts.
Fixed nested scroll operation bug preventing search navigation to distant chunks.
Complicating Factors
Raw numbers are misleading. Here are four different ways to look at this data, each controlling for different complications.
Lens 1: Debug Session Duration
Debug sessions are complexity-independent. When youāre hunting a bug, the feature is already built. Youāre just trying to find the root cause. It takes however long it takes.
This isolates āproblem-solving speedā from āfeature complexity.ā
The longest debug sessions in the pre-Claude era:
- Lazy Loading Text Files (Cursor, December 2024): 59 hours over 10 days. I had to restart completely after 30 hours when the first approach failed.
- Firebase Bundle Sizes (Cursor/Windsurf): 52 hours totalāand this one happened twice, in June 2024 and June 2025. Same root cause, one year apart.
- Web App Authentication (Cursor, November 2024): 50+ hours over 11 days. I tried three different approaches before one worked.
- TanStack Virtual Migration (Windsurf, February 2025): 40+ hours over 12 days.
Now look at the Claude Code era:
- Search with Virtualization (November 2025): 35 hours over 7 daysāand it produced 1,600 lines of documentation.
- Scroll Coordinator (November 2025): About 15 hours over 3 days, fully documented.
Pre-Claude debug sessions averaged around 50 hours. Claude Code sessions averaged around 25 hours. Thatās roughly 60% shorter.
Debugging with Claude Code feels much easier. It feels smarter (even though I used Claude in the cursor/windsurf). Often pasting in the logs can solve it with a couple code-and-test iterations.
One caveat is that the Tanstack Virtualization was infrastructure work that was more complex. That said, I believe I would have implemented it faster using Claude Code.
Lens 2: Commit Behavior
Before Claude Code, I averaged about 33 commits per month. After? About 83. Thatās a 150% increase.
But hereās the thingāthis isnāt āmore output.ā Itās a behavioral change.
Why Commits Increased
I can just ask ācommit this with a good messageā and get a clear, descriptive commit. Removes friction from the process.
More commits meant I felt safe breaking things and pushing forward. Easy to roll back if something goes wrong.
More commits isnāt āmore outputāāitās better version control habits. Claude made committing so easy that I actually started doing it more.
The tool changed my behavior, not just my productivity.
Lens 3: Documentation Behavior
Documentation by File
Why Documentation Actually Happens Now
During debugging, Claude explains what went wrong and why the fix works. That explanation becomes documentation with minimal effort.
Claude uses these docs as context. The documentation isnāt just for meāitās how Claude remembers architectural decisions and past debugging sessions.
Pre-Claude
Other tools did not encourage this habit. It is almost essential with Claude Code.
Claude Code
3,600+ lines of architecture docs, debug sessions, patterns.
It lets you build projects bigger than what fits in your head. Architecture decisions, debugging context, integration detailsāask Claude. Those details belong in text so you can focus on higher level issues like user experience.
Claudeās memory is only as good as what you document.
Lens 4: Qualitative Pattern Changes
Numbers donāt capture everything. Some changes are about how work happens.
Conclusion
Claude was first and is still the best (December 2025) despite lots of competition.
Claude Code is a magical experience that makes coding easy
The experience is qualitatively better than any agentic integrated development environment Iāve tried. The traction and progress theyāve madeāeven over the last few monthsāis very real.
The Wrapper Experience vs Claude Code
- UI got clunky on long threads
- Complex multi-file work was slow
- Context compression issues
- Had to re-explain things
- Zippy CLI experience
- Simple, clean design
- Full context preserved
- Easy to recover from mistakes
Evidence itās catching on
Cursor hired Claude Codeās creatorsāBoris and Cat (developer and product manager). They stayed a couple weeks before Anthropic was able to hire them back. Now Cursor has their own CLI agentic coding experience.
My feeling: Sonnet 4 hit product-market fit with Claude Code. Lots of people migrated to the better experience.
Gemini CLI, OpenAI Codex, and others have followed Claudeās lead. Proof that the approach is effective.
Not all agentic coding tools are equal.
What The Data Shows
The Reality Check
As of the end of 2025, coding with agentic tools is still collaborative and iterative. A human in the loop is required. 100% vibe coding probably takes longer than steering smartly toward your goals. Testing is still required, though Claude Code is getting better at one shot implementations. You still have to test and fix almost everything moderately complex. You still have to steer it, especially when things are complexādocumentation to follow, integrating parts. You also have to mind your architecture. Sometimes Claude gets stuck in the weeds, and you have to step back and talk about the architecture at a higher level.
Documenting everything has been a game changer. AI_docs is Claudeās memory. When I hit the same issues again, Claude already has the context. When I have documentation, I can use him as an oracle. I ask him to give me a command to do X and he can lookup the context and provide the terminal syntax I asked for.
Appendix: Methodology
- hours.csv: 610 entries from April 2024 through December 2025
- Git commit history: 1,387 commits across reazy-app, reazy-inference, tts-training-pipeline, and supporting repos
- Personal notes and documentation
- GPT Prompting / Copilot: January 2024 - October 2024
- Cursor: November 2024 - January 2025
- Windsurf: February 2025 - June 2025
- Claude Code: July 2025 - present (first commit: July 14, 2025)
- 282 commits (20% of total) explicitly marked with āGenerated with Claude Codeā
- Many more commits in Claude era not explicitly marked but created with Claude assistance
- Single developer, single projectāmay not generalize
- Skill growth over 22 months is a complicating factor
- Infrastructure maturity affects later development speed
- Commit granularity changed between eras
The goal was honest analysis, not advocacy. I tried to show the data fairly and let you draw your own conclusions.