Higgsfield AI CEO on reasoning engines in video generation and why the Instagram boyfriend market is under threat
Jul 17, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Alex Mashrabov
Uh the horse had a little bit of a long neck, but otherwise photore like the it nails the face really, really well. Not exactly sure how they do that. I want to talk to it. Welcome to the team. Welcome to the stream. Yeah. Pigsfield. Uh we have a lot of amazing people on today.
Uh I have been incredibly excited to speak with you. Um we were we were just talking. I don't know if you caught it, but I posted a an image generated uh by Higsfield a couple days ago. I think most people still think it was real. It was a completely ridiculous image.
It was like this like cinematic upshot of me on a horse that had uh I I don't know why people didn't catch that. neck was was quite a bit bigger than a regular horse's neck, but it was great. Clearly a breakthrough. Yeah.
Uh it it it was the first time that I've seen an image generation product and realized that I think you guys are are have probably already started to take over Instagram style content. Um but but really disrupting the uh potentially disrupting the Instagram boyfriend market.
If if uh if people can just generate infinite images of themselves, Instagram boyfriends will have nothing to do. But great to have you on the show. Uh would be great to get a quick background on yourself and the company and then we'll get into a bunch of other stuff. Great. Uh uh thank you very much for the kind words.
Uh definitely it's uh great to be here. Thank you for the invitation. So quickly about myself. I'm a veteran in the video generative AI space. I built maybe some of the most iconic products in that space. Uh maybe you remember snap uh filter face filters. Oh really? Which way? Yeah, totally. That's amazing.
Yeah, there were like billion people throughout the world who played with that just and this was regen AI, right?
Um and when the models were like thousand times smaller than they are today and although this product was specifically targeting uh like augmented reality use case so basically overlay on top of the existing camera. Yeah.
with with all these learnings which I got from my exciting times at Snapchat, we started Hicksfield with a way bigger ambition is to create camera of the next generation. Yep. Even today we can see that uh there is an emergence of UGC content. Um the quality unfortunately goes down. Mhm.
And with the and with Hicksfield we create a new ways to tell stories helping creators and brands to get attention on social media. What's the key insight do you think how important is it to um create a great image generation model just scale you just need to throw a ton of compute versus changes in the algorithm?
We've heard rumors that images and chap GPT isn't pure diffusion. There's some transformer architecture in there, some different layers.
I've started to suspect that there are different layers in some of these where there might be a different neural network or different a different system to put text on top so the text is really clean. Are we kind of recreating Photoshop at a certain level?
Uh talk to me about the actual technical infrastructure to the degree that you can. Totally. Um first of all I think it is important to admit that today we are early in our journey with video AI technology.
I think Hicksfield is probably the first example of the technology platform which helps to create compelling content for social media. The next step is going to be is to is going to be to build a reasoning engine on top of that. Mhm.
So the so let's say you post a video the system is going to suggest you and analyze your accounts and actually provide you with more suggestions what to post and eventually I think we we are going to find ourselves in two years in in a new world version of the world where most of the content out there is AI generated.
There is no way to stop that. Mhm. And um on on our side, we do our best to provide a simple interface so that nonAI users like let's think about market of social media professionals of tens of millions of people so that this broader user base can actually tap into the power of generous fiat technology. Mhm.
Um and I think we will see that the models are going to get way better than they are today. It is true that various research labs they do experiment with various architectures.
We found that um our post training techniques and allow us to substantially differentiate from the competition and we do believe that the general quality of the technology is already there to surpass like average human produced content.
what uh uh I have this one eval for uh AI image generators where I ask it to create a a a where's Waldo and and no systems been able to crack it. Um and I think it has something to do with the density of information in a proper where's Waldo.
You'll typically see hundreds of little characters doing very intricate things. And so it's very clear that the artists that create the actual werewaldos work at a very small scale and they actually piece together the full image like it's a puzzle.
That's something that I feel like could be solved with a reasoning layer on top. You understand that you're trying to create something that's really really layered and so you need to kind of create tiles and then blend them together.
uh is that but but this gets into the question of like how are we going to generalize and scale reinforcement learning in LLMs and agentic workflows. Is there a similar path that we're going down in terms of image generation? Yeah, this is a great question by the way.
I think this is a a trillion dollar question which which you just asked which is we all saw the power of reasoning engines uh with maybe models like 03 and then we see that at Grock 4 they actually spent on reinforcement learning more than they spent on the pre-training stage right where is the market right now in terms of uh pre-training post-training in images in your estimation yeah I do believe that in the video AI space we are relatively early It's probably we still see that the post- training stage in the video core video models can be um maybe 20 50 times lower compared to the pre-training stage.
Wow. Yeah. But we are really just scratching the surface there. Sure. I think then the future is building the video reasoning engine and this is a trillion dollar question cuz this will um because think about the brands out there. Yeah.
Um today it is um today what we are seeing is that brands start to and agencies they start to actually experiment with various models. Um and primarily we all rely on our stereotypical understanding of the customers and some maybe qualitative data which is available out there. Mhm.
I strongly believe that with this video reasoning engine, the the way how how the stories are told is going to be completely different. Instead of just running one video, we can run hundreds of the videos out there and AB test them and see which one performs the best.
And today there is there are only a few top creators who actually do that. If you look at comp at Mr. beast and similar size of the creators. They do AB test thumbnails very aggressively. We all know that they actually AB test the hooks.
So far this privilege is only available for larger teams who can who can actually do that who have the manpower to do that and the next generation video reasoning engine will empower everyone to do that which is going to boom to the boom of generation.
I mean I mean what you're talking about is basically like rlinging on humanity with human graders which is the algorithm and likes and and that's effectively what the like the Mr. Beast algorithm is doing.
Uh my question is like is there a way that we can bring that into the data center because if it stays on Instagram, if it stays on YouTube, uh it's probably pretty rough for you because Google and Meta are going to have an advantage there.
But if you can figure out how to do RL with verifiable rewards or something that looks like a rubric for grading, you know, and finding errors.
Are we going back to the generative adversarial network era where you'll have two competing models to determine like whenever I generate something with VO, it's always like car's driving, looks amazing, then all of a sudden I'm looking at the front of the car and the car is driving backwards.
And clearly the model is getting confused, but we need maybe like a detector for that. How do how are we actually going to do RL at scale in imagery? This is a great question. So I think the first step is exactly I mean the first step is going to be reinforcement learning with AI feedback.
Part of that we already do at Hicksfield. Obviously at the post training stage not yet at the inference stage just because of the cost associated with that. That makes sense.
Um so we have to train video generation model in a in a way that it's sort of competing with video understanding model and like at Hicksfields we could we we cannot go and label millions of the videos. That's why we have a set of powerful agents for video understanding. Mhm.
Which help to tune the video generation process. Yep. But this is the process in the vacuum itself. What's going to be powerful is when we condition the outputs of the model and train the model based on the engagement data from the social media. First is going to be number of likes and number of comments.
Although if we look at meta ads, we see that they provides a very detailed breakdown and drop off second by seconds. So training the models based on on these outputs is going to lead to completely next level of uh reasoning and success rates for the end customers. Cool.
We have another one in I know I know we're we're totally over. Uh what what happens to the legacy Instagram influencer that has built a business basically on envy? They're constantly traveling around the world at the best hotels on boats and private jets.
What happens when anybody can be on a private jet uh on on a platform like Instagram or on a yacht somewhere? Have you thought through kind of the implications for the tech across different categories of of content creators? Hopefully.
So, first of all, I I I need to admit that we're a technology platform first and foremost.
Although we think about ourselves as a scientist for and we are constantly monitoring and listening to to the creators and how they use the technology and I can I I cannot bring up the names out here although like some of the top 50 YouTubers in the world they actually want to get reads of the team and build own agency of AI influencers to be honest that's they they actually they have a bunch bunch of ideas which they want to sell and they don't want to condition their existence on the social media just to their likeness today because people just are getting older and sometimes they get irrelevant.
We have seen many examples of that on social media and uh creators actually want to create those digital agencies where they can where they can express all their ideas through various synthetic AI influencers and they can use Hixel platform to do that. Yeah, it's very very wild time.
Uh let's have you back on again soon. Uh this is fascinating. Yeah, we'll talk to you soon. Thanks so much for hopping on Alex. We'll talk to you soon. Let me quickly tell you about public. com investing for those who take it seriously.
They have multiasset investing industryleading yields and they're trusted by millions. We have Billy from Regent also calling in from Reindustrialized. Sorry to keep you waiting, Billy. Great to see you.