Miles Brundage launches AVERI, an independent AI auditing institute focused on frontier model safety
Jan 28, 2026 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Miles Brundage
I think Mark German took his Diet Coke, but you're welcome to one of mine if you get thirsty. But um please introduce yourself for everyone who might be watching.
Yeah, so thanks so much for having me. Uh my name is Miles Brundage. Uh I lead a new organization called Avery. It's called stands for AI verification and evaluation research institute and the basic idea is that AI is becoming critical infrastructure and you know everyone is depending on it but we don't really like make sure that it's safe and secure the way that we you know audit critical infrastructure. Uh, and you know, we need something analogous to the cyber security industry which propped up to kind of make the internet secure. We need that for AI systems. And so,
how many how many emails have you sent to Peter from uh Multibot in the last 48 hours?
I haven't actually
cuz you on on the show yesterday, he was like his number one thing was like he's like, I need somebody to just handle the inbound from security researchers [laughter] and I just can't I not even multibot can handle uh processing all the inbounds.
Yeah. Yeah. Security researchers are quite interested in that. But yeah, so the basic idea is, you know, we're kind of like a think tank that is trying to figure out how to build that new industry. We think that, you know, it's good for improving safety and security outcomes. If you have this rigorous auditing, specifically frontier AI systems, we're not so much focused on kind of the downstream. So like upstream, how do we make sure that this is infrastructure the whole society can can rely on and not have to rely on companies doing their own kind of testing or kind of like, oh, do I trust the CEO's vibe? Like that's not a good basis for uh for trust in a technology. And so, uh, you know, we we just kind of launched recently and we're, you know, kind of excited to talk about our work.
So, it's a nonprofit.
Yep.
For now.
Are we thinking for been through, you know, enough, you know, what one uh kind of controversial nonprofit to for-profit transition was enough for my life.
Okay. Yeah, that makes sense. Um, talk to us about uh how you're breaking down. I mean the the issue of just AI security safety, you can go so many different directions from, you know, the the GPT40 psychosis to uh fake news to uh paper clips and gray goo and and you know, a thousand years in the future. Um what is what's most interesting to talk about? What's most important to talk about? Uh there's also just like security issues as we see old bot
thing. Yeah. So we kind of break it up into four categories. So there's uh kind of unintended system behaviors and so that includes things like hallucinations um you know on the uh you know on the kind of more extreme end like big misalignment deception type type things where the where the AI system is kind of taking uh actions that are unaligned with the user's intent. Um and then there's misuse of the AI systems themselves. So kind of you know someone trying to carry out a cyber attack with claude which you know has been confirmed to occur. You know Anthropic was like hey like you know people connected to the Chinese government are doing this. So yeah,
very real issue. Uh the third category is what we call emergent social phenomena. So that's kind of like kind of emergent interactions between you know the human and the AI that lead to these like psychosis, addiction, you know kind of like you know degraded kind of like learning those kinds of things and you know these are all kind of different categories but ultimately you know you should look at all of them and then the fourth uh category is kind of you know normal security issues and that includes tampering with AI systems theft of AIP uh AI IP which you know there have been a couple confirmed cases of that like you know Kim let's talk about let's talk about Kimmy what's going on there. Are you up to date on the story?
Um, so people are saying that that you know there was some kind of like theft of like the model. That's it seems to me like the more likely explanation would be kind of one of two things. One is distillation like they're kind of you know they sampled from the uh you know the cloud API and the other is just that there's a lot of samples already on the internet and so they did kind of general scraping and then you know the way that these systems often work is they they've read a bunch of stuff from the internet and they're kind of selecting a persona of like okay what kind of thing am I and like they see a bunch of stuff on the internet where like AI type things are saying I'm Claude I'm chat GPT and that it'll just like default into that persona. Yeah, the the feedback loop of the the pre-training now that we have an internet that is aware of AI and LLM is fascinating. I I keep thinking about that New York Times interaction between uh who was it? It was a New York Times reporter who wrote about interacting with uh GPT4 in Microsoft and he was like mean to it and then that that article obviously was very well shared and so it got baked in the pre-training and he says that now if I go to a new LLM and it finds out who I am it's like sort of adversarial. So, this is weird. And I've noticed I've noticed, you know, it's really uh what is it? The the Rockus Bassilisk a little bit. Like, I've noticed that I I'll go to LLMs and because I write a lot on the internet and I'm obviously live stream and there's transcripts all over the place, uh that they pick up on who I am and what I do much quicker than I think most people. No, it's hard to be anonymous these days.
Yeah. You say thank you.
I do not. Um but I'm also not mean either. Sorry. I'm kind of like a I'm like a neutral actor when it comes to like I don't say thank you, but I'm also not like braiding them. So, when I did that thing the other day when everyone's like, "Oh, like ask Chad GBT like you know what, you know, like oh, make an image of what your experience is like being my like assistant or whatever." It was like it was like a it was like a it was like cozy like sitting in a chair like reading a book or whatever. So, it was not like oh I'm abused or whatever.
Yeah. Yeah. Talk about where you want to take the output of the work. obviously think tank I think Washington policy makers but also there's a a feedback loop of if you put out a really uh insightful statement or analysis the labs might absorb that directly uh who's the main uh audience
yeah so we're trying to be kind of a hub for various stakeholders so not just policy makers uh you know some other key ones are as you mentioned AI companies themselves both upstream the frontier AI developers also downstream like enterprise customers ers might want to know that, you know, they're they might be like, "Oh, well, I had meetings with, you know, Sam and Dario and so forth, but like I want, you know, I'm about to make a$10 billion kind of contract, like I want something more substantive, investors, insurers." Um, and so that's one of the things that, you know, we we just put out a big analysis with with various folks on, you know, how do you drive demand for this auditing? You know, make sure that it's high quality. And, you know, I think some of the mo the like most honest signals, so to speak, come from the private sector more so than regulation. And like I'm pro some kinds of regulation, but like in some sense insurers are a great case where their incentives are very aligned to not kind of mispric the risks and they might, you know, be supporters of high quality audits. And in fact, like one of our uh donors is the AI underwriting company, which is like one of the other players. I know you had Tudo on recently. They're like one of the other players in the
Sure. Sure. Sure. Yeah. Uh so uh the the the even though uh a big tech company might come to you to read an analysis uh they're not the ones that are actually funding the nonprofit. It's from a more
disperate. Yeah. So we we're trying to avoid like depending on you know depending on industry too much although like you know very pro there being kind of like companies that are making money off of this and kind of selling their services to industry as long as they're good disclosures of conflicts of interest and so forth. We we you know since we're a think tank, since we're doing a lot of policy analysis, we want to be kind of as as pure as possible. So we we don't have any majority donors. So far we haven't taken any cash from frontier companies. We do take API credits so that we can kind of like you know audit their audit you know their systems also kind of like you know use you know open eyes models to assess anthropics and vice versa that kind of thing. So we have like you know six frontier developers who've provided credits.
Cool. Um, what about uh I I I've been grappling with this fact that uh when the AI safety question came up, it was driven, and Dario touched on this in on his essay, how it was it was driven a lot by these like sci-fi doom scenarios. And I think a lot of people in tech sort of looked at AI as a tool, sort of incremental, okay, yeah, it's autocomplete, it's knowledge retrieval, it's Google search, whatever. Uh and then and then we wound up getting AI safety issues, but they manifested in very different ways than what was actually predicted. No one was predicting the GPT psychosis necessarily or some of the other things. People were predicting, oh, this will like swing the election and and and drive like everyone will be falling for fake news. Videos that are fake, they do go out, but they get debunked pretty quickly. Like I feel like we've responded pretty well to that, but then there's been a whole other host. So how do you think about the timeline of risk and how you want to how far you want to look into the future?
Yeah. So you know the the way that we think about it is that you know on the one hand like you know people at at my organization Avery have various perspectives on these things like I personally am on the like AI is going very quickly and we could see some crazy stuff very soon end of the spectrum but I don't think you need to believe that in order to be pro you know AI auditing. I think you know if you just kind of compare AI to you know other normal technologies and people sometimes say like well AI is not this this supernatural thing it's a normal technology well a lot of those normal technologies are audited for safety like if you buy a power bank it probably has been you know audited against like you know underwriters laboratory standards for electrical safety so you know it doesn't catch on fire and stuff like that and so I think like even just getting to the normal technology you know level uh would be an improvement and then you know the case for auditing is like even more crazy if you're like, "Okay, someone could take over the world with this."
Yeah, that makes sense. What were the biggest points uh that you agreed with in the Daario essay? Was there anything you wanted to push back on?
Um, I mean, I would love to hear more about auditing as one of, you know, anthropics, [laughter] you know, policy platforms, but I, you know, but but, you know, more seriously, I directionally broadly agree. I mean, I'm I'm, you know, I'm uh I'm maybe not the target audience as someone who, you know, has written and read a zillion kind of like these are the three five big issues in AI risks and here's what to do about them. So, like broadly I agree with this is very serious stuff and you know we should take this seriously. Jordy,
uh, how are you, how do you, uh, yesterday we were talking about, uh, the risk of I think people talk about sort of agentic AI systems and sort of getting on some runaway path where it's taking actions out in the world. And
I don't know if there's enough discussion around maybe how an AI system like that could actually recruit everyday humans, potentially millions of people, to kind of like join their cause. How are you looking at that
risk in the in the context of
overall AI psychosis, right? These these uh
point about like the just turn it off doesn't work if there's like 10,000 people that are like
camped out on top of the data center. I love it.
Yeah. I mean, I haven't thought so much about the kind of like recruiting angle, but what I will say is that it is important to audit not just like the models themselves, but also like how are they used? Do these platforms have good practices for like detecting if there's some crazy shenanigans like that going on? And I mean just you know an example you know this this clawed code thing where it was being used by these Chinese hackers that's like not like the individual interactions were okay you know because they were basically like decomposing the prompt in like various subcomponents that are benign. Um, and this was kind of like a known issue in the research was like, well, you know, if the if the model is just refusing stuff that looks like it's obviously malware or whatever, then you know, that's one thing, but if if the user kind of breaks it down into small chunks, um, then that's a big problem. And so,
and the risk is if you say like it's going to it's going to be easy to make it refuse to say like build me a bioweapon, but if I ask just like how do I learn about pipetting and then how do I how do I learn about this particular step and it doesn't know.
Yeah. and potentially like across multiple accounts. And so like how do you kind of stop that in a way that is kind of like you know preserves privacy obviously
not even just multiple accounts but multiple
people
models models. Oh yeah.
Yeah. No, it's like it's a super hard problem and and yeah, I mean I I would say like I'm interest and again like you know we I think it you know might be that we need to kind of just accept that like models you know more than like a year old or so like given how how fast kind of things get cheaper and get open source like we should just assume they will be like maximally misused in the worst possible way and just focus on like the very newest ones. Um but you know for the very newest ones I think you know we need to get better at detecting those sorts of things because right now some of the companies will say we found this stuff other companies it's not so clear that they are actually like trying to stop
so politicians have uh focused in on uh the the sort of risk or reality of rising energy costs.
Are you frustrated that so much of their attention is is on that versus some of these other issues? Um, I mean, I think the I think the energy cost thing is more of a real thing than the water uh kind of issue. And so like I so it's I don't think it's crazy to to worry about that, but um but yeah,
but it's certainly taking up mental bandwidth and and just like dominating the narrative when maybe there's bigger bigger risk.
Yeah. So I mean the you know my kind you know the things that I personally focus on are like you know there should be some kind of you know articulation what what what counts as safe enough or secure enough like and you know there's starting to be things like this and you know California New York um that say okay you know you uh you know they may don't say exactly how safe but at least you should share you know whether you've measured catastrophic risks and share that you have a safety you know a security policy like eventually we should kind of ratchet up the standard there but you know some set of standards then you need some evidence that people are actually following those standards and that's where auditing comes in that's where kind of um you know kind of transparency and and publishing system cards those kind of things come in and you need some set of incentives so uh to like make sure that people actually you know get face some penalty and so doing all that in a way that doesn't crush you know small businesses and focus focuses on you know the very frontier of of the AI systems that's kind of what my focus is more so than the energy stuff
what does an actual good audit or benchmark look like you can't just do like bad stuff per million tokens I assume. Uh how do you actually uh get to some sort of quantitative metric that's tractable understandable but still valuable and not gameable?
Yeah. So we're kind of going through this transition right now from the earlier period period where like the strongest safety case you can make was you know what people call an inability argument. It's like the model's too dumb to be worth worrying about in cyber security or bio. And so like kind of just like show that it's too dumb. That was kind of like the early wave of safety analysis and like we're starting to move into the world where actually like have you taught it to behave well and like let's just assume that it's capable because they're getting very capable. Are the mitigations good? And then it's mitigations at the model level like refusing stuff. Um there's uh mitigations at the system level like you have a classifier outside the model that kind of like blocks certain you know outputs and so forth. there's kind of at the platform level like detecting you know multiple fraudulent accounts that are you know coordinated and kind of like trying to steal the uh you know kind of like trying to distill the model and so forth. So those are the kinds of things that uh you know there's starting to be work like for example there are organizations like meter and Apollo research secure bio transluce etc that are kind of looking at different aspects of that um but right now it's like largely voluntary and you know they're they often only look at a very small subset of those risks. They're not really looking at the full the full gamut.
Very cool. Well congratulations on the launch. I want to ring the gong for
right here. Right here. Right here.
We love an AI nonprofit. Boom. Thank you so much for coming on down to the TV pit culture.
Let me tell you about Plaid. Plaid powers the apps you use to spend, save, borrow, and invest. Securely connecting bank accounts to move money, fight fraud, and improve lending now with AI.
That's right.
Uh we have some slight changes to the linear lineup. Uh we're shifting some folks around. There's some breaking news. We're going to bring on uh different guests. We're starting our lightning round at 1:30, but in the meantime, we can uh continue through the timeline. Uh, does that sound good, Jordan?
Sounds great.
Bubble Boy is talking about Claudebot in the race for Mac minis a bit over the last few days. I think it's become a very scary phenomenon. Uh, very scary realization that explains this crazy phenomenon. Put simply, building a gaming PC will be nearly impossible in the next 5 years. In fact, it already is for the vast majority of consumers. But I will go one step further. In the next 10 years, having any type of personal computing device will be unattainable. Fab capacity will be allocated to its most productive and profitable use, which is cloud and AI data centers. Even today, most of the software you run already won't work without an internet connection. But now, with the opportunity cost being so high,