Greg Brockman traces OpenAI's journey from GPT-1 to GPT-5 and explains why software engineering is being revolutionized
Aug 7, 2025 with Greg Brockman
Key Points
- GPT-5, launched today, represents a reliability step-change in software engineering, with Cursor adopting it as the default model, signaling a breakthrough in code-writing and codebase comprehension capabilities.
- OpenAI's scaling strategy treats pretraining, reinforcement learning, and test-time compute as multiplicative rather than additive vectors, with infrastructure partners Oracle and SoftBank supporting the compute buildout.
- Brockman identifies scalable oversight and chain-of-thought interpretability as foundational research priorities to handle outputs like 10,000-line programs that humans cannot practically review.
Summary
Greg Brockman traces OpenAI's foundational logic back to a 2017 LSTM experiment called the unsupervised sentiment neuron, which demonstrated that next-token prediction on Amazon reviews could yield state-of-the-art sentiment classification. That result, not a preconceived theory of scale, set the trajectory. The lesson embedded in OpenAI's DNA was methodological: push existing techniques to their limits before declaring them insufficient. Scale was the thing that worked, not the thing they set out to prove.
From GPT-1 to GPT-5
Brockman characterizes each generation with notable precision. GPT-1 was proof that transformer-based pretraining on public data produced transferable representations. GPT-2 surfaced the generation side, producing coherent if imperfect prose. GPT-3, released in 2019, was the first model barely above the threshold of practical utility, useful for demos and quick outputs but unreliable. GPT-4 crossed into genuine real-world utility, proving capable in health and coding contexts. GPT-5, launched on the day of this broadcast, is positioned as a reliability and utility step-change, with software engineering the clearest immediate domain.
GPT-5 is now the default model in Cursor, which Brockman describes as a significant industry signal of model quality. He argues the model handles the full software engineering stack, from code writing to codebase comprehension to agentic tool use, effectively lowering the barrier to programming for non-engineers.
The API Era and ChatGPT's Accidental Origins
The GPT-3 API launch in early 2020 required Brockman and colleagues to physically drive around San Francisco offices soliciting trial users. Inference costs at launch were roughly 150 to 250 milliseconds per token, optimized down to approximately 50 milliseconds before the API went live. The team set two internal milestones: find one paying customer, and find one use case OpenAI itself used daily. The first was hit within months. The second did not arrive until ChatGPT launched in November 2022.
The moment that convinced Brockman chat was the killer application came on August 8, 2022, during the initial post-training run of GPT-4. Despite bugs, the model exhibited unexpected multi-turn coherence from a single-turn instruction-following dataset, a sign of genuine generalization. That insight accelerated the decision to ship ChatGPT as infrastructure ahead of the GPT-4 release, which had been targeted for early 2023. A precursor product called WebGPT, built on GPT-3.5, required OpenAI to pay contractors to use it throughout 2022.
Scaling Architecture
Brockman identifies three compounding compute vectors: pretraining, reinforcement learning, and test-time compute. OpenAI has published scaling laws on all three. He describes these as multiplicative, not additive, and rejects the idea that hitting a data wall or a pretraining ceiling signals fundamental limits. Infrastructure partners cited include Oracle and SoftBank. His internal team, called Scaling, focuses on maximizing FLOPs delivered, coordinating large GPU clusters across synchronous training runs, and deploying models at production scale. Innovation at every layer of the stack, from CUDA kernel optimization to data curation, compounds into overall capability gains.
Agents and the Year of Deep Research
Brockman pushes back gently on the framing that agents have underdelivered in 2025, noting the year is not over. His underlying model for progress is that capabilities which partially work in one generation become reliably robust in the next. Deep research, he suggests, was the partially-working capability of 2024 that became a breakout product in 2025. Computer-use agents are the current partially-working category.
He also reframes what valuable agents actually look like, arguing that flight booking is a misleading benchmark because the existing UI already encodes complex personal preferences efficiently. Higher-leverage agent applications are in areas like healthcare coordination, where no existing system helps patients synthesize advice across multiple specialists, and where text-only AI can add significant value without requiring computer-use capabilities.
Supervision, Interpretability, and the Path to AGI
Brockman frames scalable oversight as one of OpenAI's foundational research priorities, noting the problem was identified as early as 2017 alongside the first language modeling results. The concern is direct: if a model produces a 10,000-line program, human review is impractical. The proposed solution involves reviewer agent ensembles and preserving the integrity of chain-of-thought reasoning. OpenAI's position is to avoid optimizing chain-of-thought outputs to look good, so that internal reasoning remains interpretable and auditable.
On new knowledge generation, Brockman draws an explicit analogy to human learning: grounding in accumulated wisdom, experimentation in contained environments, and real-world feedback loops. He signals that moving AI systems from hermetically sealed RL environments to real-world interaction, including robotics, is a major near-term milestone for the company.
Policy
On Washington, Brockman praises the current administration's engagement with AI technology but frames the core ask as calibration rather than specific regulation. He invokes the OODA loop concept, arguing the government needs rapid iteration cycles to keep pace with model capability advances. His stated priority is ensuring American AI leadership promotes democratic values globally, not just domestic economic benefit.