Harjot Gill of CodeRabbit: GPT-5 nearly doubles code review performance, conversion to paid customers expected to jump
Aug 7, 2025 with Harjot Gill
Key Points
- CodeRabbit's code review platform scores nearly twice as high on GPT-5 versus prior models on its hardest internal test cases, solving problems no earlier model could tackle.
- The company will not raise prices despite the performance uplift, keeping customers at the same monthly rate as all competitors can access GPT-5 on identical terms.
- CodeRabbit's free-to-paid conversion rate doubled to roughly 30% when o1 preview launched, and Gill expects GPT-5 to drive another significant conversion jump, though production-scale results may diverge from lab benchmarks.
Summary
Read full transcript →Harjot Gill, representing AI code review platform CodeRabbit, reports that GPT-5 scores nearly twice as high as GPT-4o, Claude Sonnet, and Claude Opus on the company's internal golden dataset of the most difficult pull request reviews — problems no prior model had been able to solve. CodeRabbit's product sits in a narrow category of genuinely reasoning-heavy AI applications, tasked with identifying race conditions, security vulnerabilities, and other complex code issues across developer pull requests.
“We would say it's almost 2x better than the next o3 or Sonnet or Opus at this time. It's a generational leap. Our conversion doubled after o1 preview came out — we went to close to 30% success in getting paid users. With GPT-5, we can see another big jump in the number of people who start becoming paid customers.”
The performance uplift will not translate into a price increase. Gill states explicitly there is no upsell plan, with customers receiving materially better output at the same monthly price point. That dynamic reflects a broader competitive reality: every rival can access GPT-5 on the same terms.
The more consequential business signal is conversion. When o1 preview launched, CodeRabbit's free-to-paid conversion rate doubled, reaching approximately 30%. Gill expects GPT-5 to drive another significant jump in paid customer conversion and a reduction in churn, though he cautions that lab benchmarks do not always survive contact with production-scale usage. False positive rates and hallucination frequency at scale remain under active observation.
The GPT-4 release cycle offers a cautionary reference point. Gill describes GPT-4 as a "Windows Vista moment" for CodeRabbit, a release where internal evals suggested parity but real-world performance regressed, causing a dip in conversion metrics. The o1 preview reversal that followed restored momentum and reset expectations for what reasoning model upgrades can do to a business built on inference quality.
Every deal, every interview. 5 minutes.
TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.