Raindrop raises $15M seed from Lightspeed to monitor AI agents in production — and the problem gets worse as agents get better

Dec 1, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

measure out of home with precision. Our last guest of the show is Ben Hilac. Did you do the Jaguar rebrand?

That's him. Ben, welcome.

And we'll follow him forever.

How are you? How are you?

Grab a seat. Hang out. Good to see you. Oh, you brought hats. Fantastic. Thank you. Please uh grab a seat, introduce yourself, introduce the company. What's the name?

Yes. So, my name is Ben Hilock. Um

yes,

let's take a second for the flow.

Fantastic.

This is kind of like a vintage Silicon Valley flow that you don't somewhat of a lost art.

I appreciate it. You guys have great hair as well. You know, I th a lot of pressure. Um you'll notice I'm not wearing a hat today and it's because I did notice actually. Yeah, kind of I discovered a blow dryer I think around uh 9 months ago, 10 months ago. So that was a big never been the same since. Um but yeah, my name is Ben Hil as you guys know. Uh I'm the CTO of a company called Raindrop. So really simply put um we monitor agents in production. So uh we were building a product ourselves uh probably around 2 years ago now which was like a coding agent and um we realized that there was just this huge gap of like if you're using Sentry if you're using traditional analytics um you know uh they're covering like the things the users are clicking and almost everything that's happening in your product if you're making an agent is just not covered. So you just have no idea what's going on.

These agents are going absolutely wild. They're going they're going haywire.

You know what's been insane? I think one of the things that's been like really kind of critical to our growth in the last couple months has been realizing that um as agents get better this problem gets worse. So that that was not necessarily intuitive to us in the beginning. You know you think like oh well agents are going to get better maybe this problem becomes less important but it's like actually as they become more capable they can use more tools more valuable.

Exactly. So, for example, if you take a company like Replet, it's like, you know, maybe a year ago or two years ago, um, uh, or when they first launched, um, you know, you couldn't quite get as far, right? Maybe you could just get like a personal website or something. And, and so if it messes up at that point, it kind of gets stuck. It's like, okay, maybe it's not the end of the world.

But now with Replet, you're able to build just like real applications, like people are building real production applications. So now, if you get to a point where it gets stuck, something goes wrong, suddenly it's like it's a real issue. Um, so that was not intuitive. uh before but

so uh agent's a pretty overloaded term at this point. Uh I I think of

you know when I fire off a deep research report in chat GPT that's an agentic workflow uh to some customer service agent that's happening completely behind the scenes and the customer might not even know that they're dealing with an agent. Uh and then there's coding agents. There's a few that you mentioned. Are are you uh dividing the market and trying to focus on an early landing zone first or do you want to do all of those?

Yes. So, we focus on essentially uh and I will say I agree the word agents overloaded. We're very hesitant to use it for a really long time and then we realize it actually matters of course.

So, we focus on products that have some sort of user input and some sort of uh assistant output eventually. So, that that's sort of our focus. So what we we're not focused on is for example like uh we're not going to focus on like specific like ML pipelines or things like you know maybe like translating text or like summarizing text even it's like we want to see like the user the user is sort of like has some sort of request the assistant is responding to that request. Um and we do uh map essentially everything that happens in between that initial uh you know user input and to what the actually what the assistant actually responds.

Uh and then what's the go to market for you? I mean, it's been a little crazy, actually. We've had a lot of inbound. So, some of our biggest customers have been inbound. Um, a lot of it has been like when we first launched, I think, uh, uh, like I guess this was like 6 months ago or seven months ago now. Um, agents weren't as big of a deal. And so I think in the first month or two, we had a lot of customers that were like, "Okay, like I have evalu."

Sure. Sure. Sure. uh how are you thinking about the you know target like the best type of customer? Are you segmenting it by size? Do you want to go enterprise upfront because they're implementing agents at scale or are you more likely to see immediate results at the startup that just kind of gets it and they can hop on really quickly? Like how are you thinking about prioritizing if you are at all?

Yeah, it's a really good question. I think that we really look at the entire range and I think that we see and have always seen startups as being a really core part of keeping our company healthy.

Sure.

Um you know I I heard a while ago that like Post Hog has this metric where they look at like what percentage of YC companies in every batch are using them.

Sure. Sure.

Um and so that's why we started with startups like uh they're always they're able to move faster. So for example like when a new model comes out just give actually a very specific example. So GPD5 um introduced intermediate reasoning right there. It was kind of one of the first models to do this where like it's going to make tool calls. It's going to look at the results of those tool calls, think about it, and then make more tool calls. Take that, think about it, write, you know, more tool calls.

It sounds small or subtle, but actually it kind of means that you know if you architected these uh your system, your pipelines in the wrong way, you just couldn't use that like and and it and it really helped. Um, so where startups will just like d just throw everything out the next day, right? And they'll they'll ship a whole new thing in a week. You don't see like, you know, like uh if you look at like the biggest enterprises, they're not going to do that. Um, so you can learn really fast by with startups. That being said, on the flip side, I think that the problem we're solving is actually most painful for enterprises, right? It's like the the the the most critical highstakes environments are where like failures cost the most in every single sense.

Yeah.

How much categories of agents that you're excited about that are maybe underhyped today? Coding agents. Coding agents are like sufficiently hyped. I think

coding agents are and for good reason for good reason. But like uh and may maybe they're deserving of more hype. uh who knows but uh but uh what what other category you know I think I think people have been sold on the

AI BDR

uh haven't exactly may maybe companies are getting a ton of value from it and they're getting so much value they don't want to come on TVPN and talk about it because they don't want their competitors to know

um but uh uh and then obviously like CX feels sufficiently

sure uh hyped but uh what else are you seeing

man it's it there's so many different things like I think um you know, speak for example, language learning. I think the better like

as models get better, that experience just actually starts to become really really really viable. So like that's an example of something where it's like yeah it existed a year ago, it existed two years ago, but like as voice models get better, as like the models themselves get better, it can it's actually not just like

you know if you try to use chatbt for example to learn a language, you sort of can. But if you ask it to like critique you for example, um it just never will. Like if you say something wrong, it just isn't going to stop and be like, "Hey, look, actually

it's still glazing." Absolutely right.

Yeah, exactly.

Esta bibloteka is the most complicated Spanish sentence.

It will. It will. Right. Um

you're fluent. Exactly. You're fluent. You're It's like Yeah. You're pretty much good to go. And um even if you can get it to the point where like if you can really really like prompt it into critiquing you, it'll just like start critiquing everything, you know, which is also not what you want as like you're learning a language. So like it turns out I think we see this with a lot of products that like getting something right is actually a lot of details and really really understanding that domain. So I think we're seeing that in literally every domain like whether it's like marketing, whether it's like even just like the idea of having a personal assistant like notably we don't have that yet which is crazy, right? models, but then none of us have chat and be like, "Hey, send this email." Right. I don't think we're actually nailer mostly. But

how are you thinking about um just I I don't know if I don't know if like if you're Century for AI agents, does Sentry actually handle this? But just types of AI failures that happen for more infrastructural reasons. So just the GPUs are on fire or like there's just not enough GPUs in this particular cloud and you just see a spike in demand and so you just can't provision more like those types of more more tactical errors. Do you help with that?

Sort of would be the answer. So um I think it's actually really interesting is that what one thing we realized about eval is that they don't catch those sort of issues like you know you're kind of testing just like the model what is the model responding but then there's all of these things that happen in between like I remember really really early on when we launched one of the issues that a customer caught was like their file upload was broken so a bunch of users all started complaining about like oh like the file uploads taking too long like okay well it's not like an AI problem but it is um and so we see that with like tool calls um we saw uh one of our customers had an issue sort of what you're saying which is that like they started having like they they have their own GPUs they started having like an infrastructure error and um it was mixing up responses between users and so users all started complaining like hey that's not what I like what are you talking about that's not my was like an increase in that in like

I don't know if you're talking about meta but I think that happened in meta

it wasn't meta they're not one of our customers yet but there was a situation right people could share it was it was not that bad but it was something like I could share my chat with you but if I shared it with you and I didn't know that I was sharing ing it. It would go out everywhere and so yeah, stuff like that happens.

Totally. There's all these sorts of things. So, you can actually catch those sort of problems. It's actually one of the one of the things is like

uh that ground truth is actually really really important because if you just see like a few errors like let's say you have tool call like your agent calls tools like yeah it's going to error once in a while, right? Like that's might not be the biggest deal but especially once you if you can see when it actually starts to affect users like that's really that's really powerful.

Yeah. Yeah, that makes sense. Um, what about uh degradation of models under the hood? I feel like people I don't know if it's just a meme. I've noticed it here and there. I'm not I'm not benchmarking everything every night like some big companies. But it does feel like that sometimes, right? It feels like it feels like sometimes I'm like, "Wait a minute. They it used to respond in this many tokens. Now it responds this many. It used to look HD. Now it looks standard definition." Like

I know I agree with you. I think I think it's real. I know that I I can't say too much. I know that at least on one occasion that I I think people were led to believe that there wasn't a thing. I I know that there was. So that's you know what I mean? I can't say who. It's a big company. Um and because I I noticed this and I thought it was

I can't say whose hands were

I can't say which one of the caught some red hands.

Yeah, exactly. And and like it it was like I thought it was a cursor problem. It was like some really absurd behavior. And then I went into chat and it was doing the same. Oh, I just said but um but anyway, yeah, like I I I think that the reality is that like every single one of these uh providers are like having these sort of problems and they're trying to optimize costs. They're trying to like make changes and like so I think it's natural

and some of them I understand where I'm like okay well yeah realistically I haven't used that in a long time. I came back I kind of I don't really mind that you put me on the lower tier. I just hope that for the people that actually like went and built businesses around this that are using at the API level that are hopefully paying for the service at a high gross margin to you. You're not degrading the service behind their backs like

100%.

Right. So anyway, uh who did the deal? Anybody we know?

You want to hit the gong?

Oh, let's do it. Yeah. Yeah.

Hit hit the gong. Uh tell us how much you raised.

How much did you raise? How much did you raise? Uh, we raised So, we raised $15 million in total.

Um, uh, Light Speed.

Who did the deal?

Bucky.

Yeah, let's go.

Let's hit it again for Bucky.

Bucky. We love

This one's for you, Bucky.

Uh, yeah, we're big fans of Bucky over here. So, it just wants to get him a shout out.

Us, too. Us, too. Us, too. I think when the moment we met him, we're like, "Okay." Like, he matched our energy. Like, great vibe.

Yeah. Yeah. He's doing

How's building the team going?

Uh, it's going. It's going. Uh, I think we're really, really picky. We've realized. Um, and so it's really hard. Um, and I think hiring in San Francisco is really hard. Um, we have a great team. Uh, it's honestly really, really small still. Um, we're

well, if you want to get out of San Francisco, you could book a wanderer with inspiring views, hotel, great many, dreamy beds, top cleaning, 20% customer service.

It's a vacation home, but better. You could do an off-site there.

We could do our offsite. That's beautiful. Do your offsite.

I once used a team offsite as a recruiting tactic. Like I said, we are going on an offsite in two weeks and posted a picture.

Oh, yeah. We got an amazing creative urgency.

We're doing it. So, if you're watching right now, uh we'll I'll post the picture soon of of the house.

Fantastic.

But we have an amazing team.

Yeah. Know. Yeah. I figure if you're if you're picky and you're in San Francisco, it's like the most ruthless like talent war constant.

You know, the other thing is that that I think when you hire amazing people, they have zero tolerance for working with people that are not amazing. fool yourself as a founder if you like whether you're just feels that and

have you had to bring anyone's soup. Are you familiar with this?

I'm not familiar with this.

Okay. So, apparently the AI this is from Ashley Vance. This is a scoop just dropped on uh core memory on the podcast. So he had Mark Chen, OpenAI's research chief on the show as part of a postGemini 3 sitdown to get the update from OpenAI. And he said, "I knew the AI talent wars were rough, but not this rough. Zuck is out there apparently delivering handmade soup."

Wow.

And Open AI has soup counters and so I guess uh

Wait, they count how much soup is soup counter.

I don't even know what this means, but

Oh, I see. They count how many?

No, no, I think it's like a count like just like you have soup quite

aggressive. What exactly does this tit for?

We can we we we can play this on the show later, but uh but yes, I mean

no no my my my partner has some meals for someone like you know full meals like that sort of thing works. Um you know we we do typewritten I'll write a note on a typewriter you know when we do our offer letter. So that that adds something a little Are you messing with us?

I love No, I'm serious. I love typewriters.

No, I I I like that. It's just a way actually value this message.

All the text is AI generated. I'm sure.

Of course. Yeah. I'm just copying from

I think it's like a little bit of a newer. You're a revelation.

This is a statement.

Yeah. Yeah.

Having fun. Uh well, that's great. Congratulations on all the progress. Uh very excited. I'm sure you'll be back on the show soon giving us plenty more updates. And it's been fun because uh I mean I believe that we started tracking your journey via your viral joke post about uh doing the war or something. But we've always had fun uh uh featuring great live in person live in person.

One one year ago today I remember just roughly one year ago I was sitting in a parking lot and I I was listening to it was the first time I ever heard of you guys. You were reading like one of my tweets and it was just so surreal that like people from the internet are reading my tweets. Like I think one of our customers sent it to us actually. You print it out. Yeah. So, I called my mom today. I was like telling her I was like, "Hey, I'm going to be on the I" I was like, "You're not you're not going to know what it is." But remember that those guys that were talking about that tweet.

This was the whole This was the whole shtick was like little love letters to Silicon Valley folks. Just like little messages of just, hey, we we found something that you did. Fun cuz anyone can like, anyone can repost. You know, it's it's easy to send a small thing. It's very hard to actually print it out, sit down, talk about it. Uh but we appreciate your post and we appreciate you coming on the show and hanging out today. So, thanks so much. We're gonna close out the show and we'll talk to you in just a second. Um,

while he's walking off, let me tell you about get bezel.com shop over 25 26,500 luxury watch authenticated in-house by bezel's team

← Back to story