Jazzberry: AI agent for bug finding that thrives in the vibe-coding era by cloning PRs into sandboxes
Jun 11, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Matteo
right. And we will bring in the next crew. Stoked for you guys. Oh yeah. Bring the phone out. Run out of those. How many more do we have? We We have a variety of goodies. Oh, I love the sweatshirts. You're owning a color. There we go. Ramp yellow. You guys are wearing pink.
Uh you can have you can have those if you want. Feel free. They're limited edition to this demo day. You can only get them this year. There we go. There we go. The colors are working together perfectly. Roughly the same saturation. Roughly the same on the orange background. On the orange paint. Yeah.
This is This is Jaspberry. It feels kind of vintage. It feels like I've known it. I've known it. What do you guys bring it down? Oh, we're building an AI agent for bug finding. Okay. So, right now we have a PR bot. Um, so you make a pull request.
We'll take your code, uh, clone it into a sandbox, and then we let an agent just go ham at it. Okay. Um, and then we tell you how we break it. Okay. Uh h how much of this is just about speeding up the pace of development versus like uh is there a pen testing angle here?
Is that just a completely separate cyber security play? No, I think we do some pin testing. So it's we want to basically find any kind of bug. Yeah. Um so a lot of people take like a a limited a limited approach at bug finding. So they ether do like coverage testing or they try and find integration bugs.
Y we really want to basically build an agent that can do any of that or kind of what's best for your tool. So you like when people are vibe coding cuz they're just creating vibes all the time. Keep vibe coding. Yeah. No, we love vibe coders. We we're here to help you make better code. No problems with it whatsoever.
As long as you buy our software. Yeah. Yeah. You vibe code, we'll test it. Then you take our output, you put it back into cursor. Just feed it back in. Talk to me about the the prompts that you're using to actually have the agent go and hammer it. Like I imagine that that's not just try and find a bug.
You've probably gotten very like designing flows. There's probably a lot of work that goes into that. What goes into actually getting an agent to effectively uh hunt for bugs. Yeah.
So like the most important part is just to have like a sandbox where it can like it can run code, can compile your code, it can it can run like unit tests. Yep. Um, and so then we just get the agent to, yeah, go ham. And, uh, each time that it like runs a small experiment on your code, it learns a little bit more.
And it's able to do run a better test the next time. And so, it's able to search a repository, able to run commands to, you know, see if you've, you know, changes you made actually were propagated through all the files. So, in we found a bug, which was that someone updated a path but didn't update it everywhere.
And so, that was those these sorts of things. So, the agent's able to run these like every any command that a person would. Yeah. Are you doing stuff like like trying to stuff multiple variables in a single function like that type of stuff where like there isn't as much fall tolerance built into the code?
Maybe they need like a you know if else try except clause in there or something like that.
Is that the type of like bugs that you're trying to find or is it more about like scalability of code like okay you're making a database call right here it looks fine now but if we scale this up and there's and there's a lot of demand you're going to get cooked. I think it really depends.
So we use the pull request as kind of the initial seed. So what change you make there kind of determines the path that we use for testing. So if you're trying to scale then yeah we'll we'll kind of test as if you're trying to scale. Um but if you're making path changes we'll test as if you're making path changes.
One of the things we find is like vibe coding often it there's a different flavor of bugs that are that are happening because of vibe coding. Okay. So they don't LMS don't make the same kind of errors that people do because people it with people code grows organically. LM is like oneshotting things.
So it'll often like forget to add functions. Um so it's not as easy as pointing to a line and going like this variable is wrong. It's like no no you you like fundamentally missed like this whole section of things you were supposed to implement. Got it. Um so yeah I think it's okay.
Talk to me about the go to market motion. Uh is this just a landing page? You're driving traffic to it. People sign up by themselves. Are you doing founderled sales? All the above? No. No. We're we're landing page like this is our go to market right now. Go to jaspberry. com jaspberry. ai. Go sign up. Go sign up. Yeah.
You can just go you install the bot. We have a 7-day free trial. Okay. Uh so it's consumption versus seatbased pricing. What are you thinking? Yeah, it's seatbased pricing. Okay. So for every developer, it's 20 bucks a month. Okay. Um just simple kind of flat rate. Are you running into cost problems?
because we've seen this like you know the latest and greatest LLM comes out it's really expensive GPT03 just dropped by 80%. So anyone who was having a problem with their with their cost is probably fine right now. Uh but but how are you thinking about that side? You go.
Yeah, we found that like actually for just like you know running lots of experiments on your code to find bugs that it's actually better just to have a really fast and small model. Okay. And so we've actually yeah we haven't had these sorts of problems yet. So what does that mean like llama fine-tuned?
Are you we we've talked to LLM training companies that have trained even smaller models like just for JSON to you know formulation or just translation models or just profanity finding. Um are are you thinking about uh going so small you could run it on a gaming GPU or are we still talking about like the big boys? Yeah.
So like right now it's it's we've actually gotten like a lot of mileage out of Gemini Flash.
started by um you know we're we fine-tuned with RL a model that was specifically good at using tools and um and so yeah we're like we're getting ready to do that once we like find all our pain points exactly in our current architecture we can train the exact right thing and these smaller faster models that are targeted for the specific use case are better so what were you guys doing before YC we were both researchers so I was doing research in software testing using large language models cool and Matteo was doing his PhD in reinforcement learning in formal methods Very nice.
Very nice. So, our our kind of research has come together to make this happen. How are the metrics? How's the raise coming together? How's the pitch for demo day? What are the goals? Excited about demo day. Goals are raise a lot of money here. Let's go. Good luck. We're going big. Bring some big napkins around.
Recommend Sharpie napkin. Just do it here. Perfect. Um and then yeah, it's we're slowly kind of growing. Well, I wouldn't say slowly. We've doubled our growth kind of every every week for the past kind of couple weeks. So, that's a better way to frame it. There we go.
But, yeah, we've gotten I think we're up to like 18 um different kind of companies using our tool. That's great. So, yeah, it's been awesome. Cool. Well, good luck to you. 1,800 soon. Yeah, 18 after this when all of you go subscribe. Yeah. Then we'll be at 1800. Fantastic.