OpenAI's Nikunj Handa on the Responses API's new MCP support and why remote MCP servers are a breakthrough for agentic apps

May 21, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

tools and features in the responses API, their MCP server. We're gonna explain uh we're going to have him explain uh the the the adoption of MCP that standard and welcome to the stream. How are you doing? Boom. Well, yeah. Great to be here. This is exciting. Thanks. Welcome to the temple of technology.

Welcome to the temple of technology. Yeah, it happens to be this podcast specifically, not open AI.

Also big big probably uh I mean big day in OpenAI Slack today probably a lot merger the acquisition that's very exciting acquisition um I'm sure we'll get into design and decision making around design and devices at some point but uh let's start with the announcement today can you introduce yourself exactly what you do at OpenAI maybe even how long you've been there and then what you released today yeah sure my name is Nick I'm on the product team here working on our API been here for almost two years now.

I've been working on the API since day one. Uh and uh mostly focused on getting out new models in the API, releasing new tools, just making the API much easier for developers to use to build applications. So yeah, that's me. And then today, what what's the big announcement around tools and features in the responses API?

Yeah, sure. So u about two months ago, we released the responses API.

uh we we used to have uh an API called chat completions that was uh you know the the thing that almost all of uh AI was built on top of uh we we created a new iteration of that uh that endpoint called the responses API which is much more powerful it supports built-in tools uh it supports multiple model turns it's really uh you know natively built for models like 03 and 04 mini uh which which have reasoning uh built uh into them And today what we did is we expanded the suite of tools that uh developers have access to.

Uh so the most exciting thing is support for remote MCP servers. And so now developers can plug plug in their responses API integration into any remote MCP server across the internet. Um you you have you have servers like Stripe, Shopify, Asana, Linear, um HubSpot.

Uh, and I think this ecosystem is just going to, you know, grow so much more over time. Really excited to see what people build on top of it. Is the benefit of of direct MCP support something about latency?

The idea that because I imagine that I could be interacting with the chat GPT API or the GPT API and then and then make a separate call to Stripe's API and I could even have cursor windsurf write that integration for me. And that doesn't seem that difficult.

And I've so I've always wrestled with with MCP being it's great, it's cool, it's a standard, but what's the real value besides saving a couple developer hours writing an API binding library or wrapper? Yeah, I think there are a couple of angles to this.

So firstly, because these organizations and companies are building MCP servers, they're taking an LLM first approach to to to what the API needs to expose uh to the agent. So, Stripe has, you know, a bunch of APIs that can be used to create a subscription or a customer um or a product or a price.

But for an LLM, they could just sort of combine that into a single function. And instead of returning this like massive JSON object, they can return something that's very specific to the task that was being solved so that the LLM can like more easily understand what's happening.

So it's an opportunity to like rewrite your APIs to be very LLM first. Uh so that's the first thing. The second thing is like friction, right? Like what you said is like maybe 2 hours of work, but why do the two hours of work when you can do it in like four lines of code in like less than a minute.

Uh and so this is just going to make it so much easier for developers to to put together multiple tools and whatnot. Um, the other sort of thing that I'm really excited about is like responses API takes multiple turns.

So in chat completions, you would have to like put it in a for loop and like make it keep going until it tells you it's done. The responses API, you give it three or four tools, you give it a task, it'll keep going automatically. Single API call, finish the whole thing and give you the result.

So it's truly magical when you when you get that. That's amazing. uh run famously wrote text is the universal interface. Is it just a standing o ovation for him over there today? Because this feels like a uh like a victory for the text as the universal interface paradigm.

But is there more nuance that you could add to that philosophy of of machines communicating with other machines via text? Because that feels like, you know, again, the LLM understands JSON.

I understand what you're saying and probably faster and and and smoother, but at the same time, we could just train an LLM to speak JSON. Correct. Yeah, totally. I think LLMs can understand JSON.

I guess the meta point there was uh if you could optimize your tools and combine them for tasks that you want AI agents and developers to be able to do, that's much better, right? Like at the end of the day, uh a model is not deterministic.

So even if you ask it to call like four tools one by one, it it's going to have some instances where it takes a wrong step and then it needs to backtrack and like fix it.

But if you have these like curated functions and tools exposed via an MCP server that can directly go to, um it just makes it so much more reliable for agents to be able to like do the thing that they're designed to do. Uh and so uh there's definitely some benefit to that.

Um in terms of text, yeah, that that that's uh that makes a lot of sense to me. Uh uh yeah um it's uh that makes sense to me. How how has your general high level view on M MCP evolved over the past you know few months? Uh were you skeptical at first or were you always excited about the potential of it?

It seems like a total MCP victory uh at this point. Yeah. Yeah. Initially, I mean, the the main goal was how do we get an ecosystem of tools started? Uh, and OpenAI did this thing with plugins back in the day where we built it all on top of Open API specs.

Uh, and our plugins was uh integration was like a little too early for its time because the models weren't that good at like using these like third party tools uh really well. Um, and we didn't take a second stab at it, but uh the MCP folks did. Uh, and MC MCP is great. it it supports so much more than just tools.

It um and it took off. Uh it was initially like local only. So you know there was some hesitation about like how will people easily plug into this thing but now with the streamable HGTV protocol so that these things can be hosted remotely across the internet uh is just perfect.

Like I'm so excited about uh what's going to happen with MCP over the coming months. Um the hundreds of remote MCP servers that are going to exist. developers are just going to like plug into them. It's it's just going to make things so much better. So really really excited about the opportunities here.

I mean currently most of these uh most of these like when you hit an MCP server you're kind of expecting a few seconds of of latency just because of how long it takes to inference these models. Uh I've been kind of beating the drum on hey some of these models are just good enough.

Let's bake them onto silicon and make them respond in like two milliseconds. Uh and I don't know if that's happening or not. You don't need to comment on that. I've I like midjourney V5 I was like it's good enough like just make it instant. Uh V3 I'm like I'm ready for it just to be like instantaneous.

Certainly with deep research and and chatb generally. Um I'm super satisfied. I'm like I don't really have any uh any requests for improvements to the models. I just want faster and cheaper and easier.

Um but do you think that if inferencing does go through some sort of massive speed up uh whether it's from custom silicon or or architectural redesign does geocaching or edge computing become important to the way MCP servers communicate with each other?

for example, like I don't necessarily if I'm going to the Portland AWS hosting of some MCP server, I don't actually want to be routed to Virginia uh by accident. And so do you think that that will require like a new rethink of the spec or is that something that can just be naturally handled by load balancing?

You know, the spec is just like how do you communicate between uh the MCP client and the MCP server? So it tells you like you know literally the JSON RPC like object that that needs to go between the two parties. Uh optimizing the geocaching of it really depends on the service provider, right?

So if I'm going to call mcp. stripe. com or allirds. com/apimcp which is by the way allirds has an MCP server. How crazy is that? Okay. Uh Shopify has to route it to like the right place. We need an MCP server. Help us build one. I don't know what we do but uh we want to We want to get in on the action. Yeah.

We feel a little bit left out. Create a create a create a Shopify store and you'll have one instantly. It's Oh, that's good. Yeah. Okay. Yeah. For the merch, you can integrate. Yeah. Yeah. So, I mean, OpenAI already does a lot of geo routing based on on on where the inference request is coming from.

So, we try to like route it to something near and then when when the request leaves OpenAI's infrastructure and goes to say Stripes or Shopifies, uh they could do that as well.

uh based on uh you know either like an explicit URL or something more magical on their end where they can detect where it's coming from and and route it to the right place. Makes sense. Um any other questions, trends in pricing? I know that there's pricing and availability updates. Anything to share there?

No, pricing is like uh you know it's it's an infrastructure play over here. Uh MCP is free to use. You just pay for the tokens. Uh we have code interpreter which is like the Python tool. By the way, the Python tool is the thing in 03 that can do geoger like things when you when you upload your images.

So you can just bring that into the API. Yeah. So why is Python relevant there? Yeah. So what's actually happening is when you when you put a an image in the 03 model, it writes Python code to like zoom in. Sure. Like try to find tries to find clues. crops them. It inverts the images. Got it.

It then like does web searches. So, it's like all the all the image manipulation is based on Python and that is enabled by the code interpreter tool which is also available in the API today. Um, that's interesting. I I noticed that because I was I took a picture of uh of a of a podcast.

I wanted to see how tall the table was.

I uploaded this images from the Pat McAfee show and and and I noticed that when I unpacked the 03 internal reasoning, it was like writing tons and tons of lines of image processing code in Python all to to draw lines and count pixels to see how tall the the the table was relative to the person.

It wrote like 500 lines, 5,000 lines of Python. We were joking about it. 403 yesterday. I said, "How many white boxes in this picture? " And it and it and it was coding for like almost 14 minutes. It was like unfortunately failed. It was like putting a putting a software engineer on the task for a month.

But you know now you get it in two minutes. Uh but it is incredible. I can see where that goes. It just adds a completely different angle of attack on reasoning and it needs to be an important tool in the tool chest. So yeah. How are you how are you uh leveraging codeex today?

I'm sure you guys got got early access, but I'm I'm curious if and where it's fitting into your your toolbox toolkit today. I see the engineers using it nonstop like uh we had to make a couple of changes for the launch. We had to like fix error messages.

These are like such small things and and you can just fire up a codex task and do it. I've used it for docs changes uh a bunch of times where I'm just like there's a typo here, change this heading and and these are tiny things. I just like write it, forget about it, create a PR.

I mean that I don't have to bug people anymore. It's it's really it's really helpful. Uh and I think engineers are actually doing like even more complex tasks on top of it. Um but I just see everyone in the company using it which is uh really cool. Very cool.

What are you uh what are you daily driving as a personal smartphone these days? Is are you iPhone, Android, maybe something else? What do you got? I know. You got early access. You got something in your pocket. Okay. Okay. No leaks. No leaks today, but we've been asking. We still part of Johnny's legacy.

We we we tried to trick you into leaking the hardware here, but we'll get you back next time. Yeah. Well, I'm sure we'll find out soon. But it's an exciting It's an exciting acquisition. Uh, but thank you so much for stopping by. It's very exciting. We'll talk to you soon. Yeah.

Congrats to you and the team on the on the launch. Fantastic. We'll talk to you soon. See you. Bye. Next up, we got David Sandra coming in the temple of technology, the godfather of podcasting himself. That's brother behavior. Brother behavior. We need a lot of uh we need a lot of sound effects for this one.

Uh he's been on absolute tear dropping podcast after podcast. But uh today we want to talk about John Johnny IV. Yeah. And Apple and Steve Jobs. What would Steve Jobs be doing in this situation? I want to I want to play out that scenario for sure. Um, and then we got Scott Woo coming in from Cognition.

Um, talking about Devon coding agents. Every company has a coding agent now. Crowded market. I want to know how it plays out. So, we're excited to talk to him about that. Then we got Joe Westhal from Bloomberg OddLotss coming on the show. We'll get an update on the market. It's been flat for a couple days.

Rough week for Joe. It's been terrible. Yeah. Yeah. Yeah. He's not getting his dopamine fixed. How is he How is he getting through it? I want to know. I'm sure he's finding obscure exchanges where he might be going 100x short volatility. Yeah, maybe he's just printing. Might be shorting Costco at 50 times. Maybe.

We'll have to ask him. Uh anyway, uh let's do some more ads in the meantime. Let's tell you about Bezel. Your bezel concierge is available now to source you any watch on the planet. Seriously, any watch. Any watch. And I think we got it through our day. Download the app. Yeah.

Um, we can also go through some I'm actually shocked at how frequently I use bezel. Like I'm a

← Back to story