Elorian raises $55M seed to build multimodal visual reasoning models — where current AI performs at preschool level

Apr 9, 2026 · Full transcript · This transcript is auto-generated and may contain errors.

thank you so much for taking the time to come with us both.

Take care.

Thank you. We'll talk to you soon.

Cheers.

Our next guest is Andrew Dye from Allian. He's the co-founder and CEO with Big Rays coming in to the CBP and Ultra. Let's bring in Andrew from the waiting room. Andrew, how are you doing?

Hi. Good. Thanks.

Thanks so much for taking time to join us. I would love to hear a little bit about your background. Uh maybe we should start uh even before Google Brain and Deep Mind. Uh what was your academic track like going into uh TAC?

Um yeah, so I grew up in the UK. Um I lived in Manchester and Sunderland and London all across following my uh dad's work. Um and uh ended up in London, did my high school there. I went to the University of Cambridge uh for computer science and then I went to the University of Edinburgh for my uh PhD in AI.

And what did you work on when you were at Google?

Um at Google I worked on a whole bunch of things. I worked on Google Now. I worked on smart reply and smart compose. Uh but really I had a paper about um 12 years ago where we proposed language model pre-training and supervised fine tuning

and that's the paper that all the GPT papers site and

wow

uh here we are today

turned out to be a big deal.

Yeah.

Yeah. uh did uh did the

Wait, yeah, sorry.

I don't know. I don't know if you're going to dig in deeper there, but but at the time when you were writing that, how how much conviction did you have around the paradigm? Did you expect

and scaling laws in particular?

Yeah, I knew it was going to be something big. Um, and I could tell because when I was giving my poster at Europe's that year, um, one of the inventors of like LSTMs, which was the biggest

model at that time, they said it just works. Like the method just works.

Uh, and there I could tell, oh

yeah, there's something something happening here. But I never imagined it would get to this scale and I never imagined I've been doing it for more than a decade.

Yeah. So, uh,

did you go to Nurup's uh, this last year?

Uh, yeah. Yeah, I did. How has it changed? How has it changed?

It's all now.

A lot of language

and VC. A lot of VCs sneaking around too.

Lot of VCs. That's right.

Well, then uh yeah, take us through uh the this company that decide to launch the decision to launch this company. Uh what you're thinking uh needs to be different about the current strategies employed by AI labs.

Yeah. So, our company is built around visual reasoning. uh as a first class citizen, multimodal reasoning.

Uh so looking at all the labs uh you see there's a lot of focus on text on language

um that's been very effective right we have new paradigms in cyber security right now uh but um the visual capabilities are kind of getting left behind and there you have problems where the models are on visual problems are at the level of like a preschooler uh of like a three-year-old.

Can you give me an example of that? Yeah. What does that look like in practice? Yeah. So in practice, uh you can tell these models to generate a pool table and they will make a perfectly good-looking pool table, right? But if you ask them to count the number of bowls on the table or count the number of bottles in a bar, then they will just hallucinate. They'll be off sometimes by a large amount.

Yeah. So I I've seen sometimes uh the reasoning models will ingest an image and then wind up writing a bunch of like Python code to basically like count pixels and do things like very much not the way I would imagine the way a normal human would process counting something. I at one point I was asking uh to estimate the height of a desk and uh it was you know writing all this math to to to uh to sort of manually you know try and understand the size of things which which we re meant re it was a reasonable approach. It wound up just being a normal sized table which was sort of underwhelming. Um but uh how are you thinking about the the actual uh development and what you want to do differently? Are you are you focused uh more on a new architecture, new data sources, uh more scale? What's the shape of your of your uh strategy here?

Yeah. So, we are basically a full stack team. We have experience with pre-training, data, multimodality. So, we are essentially building specialized models that includes new architectures for visual reasoning. um a very specialized uh sets of data with specialized data processing and new algorithms. So we expect to be running the full gamut of uh changes and this is really needed to really make a breakthrough in visual reasoning.

What does good training data look like today? Is it is it synthetic? Is it uh images, videos, 3D, you know, virtual worlds? Like what what's the shape of what's valuable?

Um as you can expect is a bit of everything. all but yeah definitely definitely natural data so data that's not generated from a model is more valuable and more useful um data from a model that's synthetic data has the risk of putting the model into like a weird place where it just outputs like m dashes or just like tries to repeat the same thing all the time but yeah data around the natural world around the 3D world that's invaluable

what do you think the cyber security analogy is for uh visual reasoning Like cyber security is so equipped for textbased models, coding agents. Um the entire cyber security threat can sort of be understood as a big string of text more or less. Um where do you think visual reasoning goes in terms of uh applications?

Um yeah. So for applications, one of the most promising ones that um we're seeing is engineering. So right now all these um engineers, mechanical engineers, hardware engineers and also architects, they're drawing all these uh diagrams in CAD software which has been developed around you know the last few decades but there hasn't really been AI breakthroughs there. People are still doing basically the same thing they've been doing for the past few decades. Um and so what we will do is we will produce models that really understand these drawings. um like uh say you have a real estate floor plan and you want to say make this bedroom bigger or add a extension to my house right now that would take weeks and lots of manual time and then you have to follow building codes uh make sure like everything is correct um and that's because there's there's not really any reasoning uh in our visual reasoning our current models and so we think there's a huge potential

one of the hardest one of the hardest times we've ever laughed on this show was was reacting to a a floor plan that was generated by AI and it looked so high fidelity. Every line was perfect. It didn't have any of the fuzziness that you've expected from earlier models, but when you dug in, it made no sense. There were like 12 toilets and two baths and it had one big it didn't really understand the problem. So, very exciting. Uh, well, you raised some money. Tell us about it. How much did you raise? We want to hit the gong for you.

Yes, we raised $55 million. Oh,

congratulations

from who?

And that's from Spiker Ventures, Menlo Ventures, Automator, and Nvidia and 49 participating. And we have some great angels there, including Jeff Dean.

Jeff Dean. Awesome.

He's been on a he's been on angel investing ter. It's really exciting to see. I mean, obviously, he's a legend, but uh he's clearly very optimistic about all these different approaches uh going forward. So, very excited. Uh well, thank you so much for taking the time to come. Great to meet you, Andrew. We'll talk to you soon. Have a good rest

← Back to story