Fastino raises $17.5M from Khosla to train sub-billion-parameter AI models on consumer GPUs

May 9, 2025 · Full transcript · This transcript is auto-generated and may contain errors.

sounds like an Italian name. Fino. Fino. Ask him about it. John, you you love to you love Your Italian accent is it's fantastic. Yeah, it's my favorite. Uh, big news. Fino trains AI models on cheap gaming GPUs and just raised 17. 5 million from Kla. Let's bring him into the studio.

We got uh Greg here from Fino or George. George, welcome. How you doing, guys? How you doing? You guys hear me? All right. Yeah. Uh, first off, uh, how do you pronounce the company's name? Is it Fino or Fino? So, funny enough, Fastini in Italian means a feast. Oh, okay.

But Fastino, it's kind of a play on fast and tiny, like very quick and small model. It's our lame attempt at naming. You've been feasting on uh Nvidia graphics cards. You've been feasting on Kla Venture dollars. Uh, putting them together. Can you break down uh what's the news this week? All the above. Yeah. Yeah.

Well, great great being on, guys. Am I your last one of the week? Uh last one of the week. Tapping it off. Tapping it off. Hoping to save the best for last. You guys are looking pretty pretty fresh, by the way. I feel pretty underdressed. Yeah. Hit the soundboard. Let them know that we still got energy.

I want to hear the Ashton Hall effect. Not that one. The other one. Let's go. It's It's Friday, but we still got energy. We're not We're not slacking off here. Next time I'm going to have a a suit. I'll go buy one. I'll go buy Please bring it on. Bring it on. Suits are are You should have one for every day of the week.

Yes. Yes. Yes. I got to go to New York in a couple weeks, I think. Uh I got to go first by Taylor. We got a suit guy for you. Yeah. Yeah. We'll introduce you. Anyway, uh break down the news and then we'll we'll start talking about uh the business. Yeah. So, great being on. Uh this week we launched TLMs.

So, uh, it's a family of language models called task specific language models. Y, they're small, lightweight models that are really fast and they're built for AI developers. So, being task specific really means that we're more accurate on enterprise tasks than large models like OpenAI or Gemini.

And and they cost a fraction to train. So, we spent less than 100K on GPUs to train our models. Uh, but we're beating industry benchmarks for for enterprise tasks. Okay. So, are you fine-tuning open-source models? Are you ripping apart a mixture of experts model to just have a smaller set of weights? Give me the scope.

I've seen the GPT 3. 5 circles like this and the GPT4 circles huge. How big is your circle? I guess. So, we're not fine-tuning or distilling any open-source transformer-based models. Uh, what we're doing is very different. We took a different approach from the large labs.

Uh we've built a new architecture that maintains high accuracy even with very low parameter count. So we're not fully revealing exactly how many parameters, but all of our models are far below a billion parameters. Oh wow. Our accur our architecture actually gets more accurate as the task becomes well defined.

So they're not generalist models. You cannot ask them to do anything, but they're extremely performant for the tasks that we built them for. Okay. Talk about the data sources. I imagine that if you're doing summarization, you need a whole bunch of examples of that to train on. Text to JSON.

Probably need some text and some JSON. A lot of stuff's out there on the web. Are you scraping? Are you crawling? Are you buying data? Are you using uh open source data sets? Where's the data coming from? Yeah, really all of the above.

I think there's a big debate out there in academia as to whether synthetic data or real world data leads to a more accurate model. We've definitely been using a blend of all the above. But the first models that we're rolling out and you you nailed them perfectly.

We have models for developers doing text to JSON, text to SQL. We have an agentic function calling model that we're putting out. I know you guys talk quite a lot about agents and yeah, an agent that can book a flight for me or can book a hotel for me.

So, we have a very lightweight model that inferences in milliseconds that can basically take what the user wants and call an API. So very much developer focused. We've got models that can parse documents, redact private uh identifiable information from documents, which is huge for banks, insurance companies.

We have a really fun model. It's our favorite model in house. It's a profanity censoring model. I love that. Amazing. We don't swear on this show. So yeah. Yeah. keep it light in the office. But you can imagine we've had some fun late nights building synthetic data for profanity model which gaming can be used anywhere.

It's pretty much the funnest red teaming we can do. I try to plug my ears or turn the That's great. Uh I mean you're if you're inferencing this in milliseconds, is there then a desire to uh deploy this at the edge? You know, run this in, you know, the cloud of the business that's actually deploying this.

Well, so I just have to ask for the profanity model. Could we run it in real time while we're doing the show? So if a guest, you know, drops a an F-word, an F-word, yeah, it'll it'll it'll work in real time. It'll be much faster than an existing LM.

That's Yeah, I I I think the I think the real TV shows, they have a system that puts it on delay and does something like this, but the delay has got to be so much faster if you're imagining like using whisper and then this uh there's a lot of things. But in terms of that latency, latency is really important to us.

I imagine it's important to uh your clients and customers. Are you seeing demand for let us run your model and we'll still pay you but we just don't want to go back and forth with your API. Yeah. So there are a lot of ways you can deploy smaller lightweight models. Obviously we're going to be heavily reliant on our API.

Yeah. But when models have a small enough footprint which actually comes from a low lower parameter parameter count you can run them on prem you can run them on CPUs low-end GPUs. I think Ash and I, my co-founder Ash, he had a dev agent similar to Curser in 2023 after I sold my last company.

I was actually an investor in his startup. He had a problem where his LLM costs ended up being higher than his headcount cost and it's a problem that a lot of agent companies face.

So, I was actually I spent a little bit of time as a GP after I sold my last company and all of our portfolio companies were facing the same thing. Uh rising cost of LLM.

So, it's not only latency, as you mentioned, accuracy is a big problem with large LLMs, but frankly, the key issue that we've seen is that LLMs just aren't built for the enterprise. They're built for consumers, right? So, GBT, Gemini, they're trained on trillions of data points.

They're used by our friends, our family every day. They help you code. They help you get food recipes. They help you prep for podcast interviews. They're not built for high-scale enterprise tasks, right? But enterprises are spending millions of dollars a month on these large monolithic APIs.

I want to tell you guys a story. Sure. How GBT is being used. So my my wife's dog got really sick a few weeks ago. Um cute little guy, 13 years old. Uh he has cancer. We went to the pet hospital and the doctor basically recommended that we put him down.

We had her best friend on the phone and we're trying to make a decision based on the doctor's recommendation, symptoms, dog's age, what to do. And who was the tiebreaker? We asked GBT, "Hey, GBT, here's what's going on with my dog. Here's a situation. " Wow. Should we put him down? Right.

So, my wife was using GBT to to play Yeah. play maker. Um, and and for the record, GPT told us to put him down and we didn't listen. He's doing well today. But wow, that that's a crazy story. Why would AI is the Antichrist trying to take out the dot the poor pup? Yeah.

Why would a large bank, Bank of America, City, JPMC, if all they're looking to do is analyze your bank statement or look at some log for fraud, why are they using the same model model that my wife used to consult on her dog's mortality, right?

So, using these massive models, it's like trying to come up with a with a cool metaphor for the show. It's like it's like the Door Dash guy riding Saturn 5 to come for a pizza, right? It's so that's it's so unnecessary and frankly that's why large enterprises, large banks haven't put chatbt or an LLM into a chatbot.

Yeah. Super fascalizing your bank statement and it's just like $500 on dinner. You cannot afford to have a dog. You got to put that dog down. It's like Chase, what are you doing? What are you doing? JP Morgan, cool it with the recommendations. Reasoning is a very sexy word in this space right now.

But from speaking with almost a hundred Fortune 500 enterprises since we announced our preceding round in Q4, they don't want models that can reason. A bank does not want a model that can reason its way through your last 100 chatbot users logs and figure out their personal information.

So when you're building lightweight task specific models like this, our models are in domain. They're only trained to do the task that the enterprise using them for. So, I think we have that edge very much just in how the models are built. Okay.

Talk about the gaming GPUs uh worth less than $100,000 in total is what Techrunch is reporting. Uh did you build that yourself? Do you have your own data center or are there clusters out there of low-end GPUs that are all rigged together from like legacy Bitcoin mining applications or something like that?

Where where are you getting these? Yeah, so we're we have uh GPUs in house. We have uh you know we use GPUs in the cloud but we all told our models take about a couple hours to train. They cost the training cost for one model is less than you know the cost of a Chipotle burrito.

I so much said about how but a but a 2012 Chipotle burrito or 2023 one that didn't give me food poisoning hopefully. But but there there's been so much said about when Deep Seek came out and they only spent $10 million on H100s, however questionable that number was.

I think what we're trying to do as a super small team is show that you don't even need H100s to build generative models for enterprises, right? So, we didn't use a single H100. We used T4s, gaming GPUs, low-end V100s, and and you can do that.

Uh I think we've proven that you know banks don't need a model that takes six months to run or it's going to drain you know Lake Tahoe to for a training run right so what's more important with that training run uh flops or memory because I imagine lower token counts you haven't released it but I imagine you can fit it in memory and so that unlocks it but um talk to me about the dynamics of like building a cluster and thinking about the the different parame parameters that go into the cards that you select.

Yeah. So, we have a family of of it's going to be less than 10 models most likely for the next six months. I'll say that they're far below a billion parameters each. So, we don't need a cluster even. We can just run these on one or two GPUs each for for inference. For inference. Yeah.

And for training, it's the same thing. It's a very low-end GPU for an hour or two. So, wow. We we definitely believe that there's going to be a giant shift in how language models are used, right? So you guys have seen waves of the last couple decades.

So these massive IBM mainframes y shifted into client server architectures and open source software. You used to have these massive monolithic uh builds that would take, you know, that would ship a month at a time. Software applications were were were shipped so much slower.

and then out came microervices and you have a different release for your payment gateway and your APIs. Uh this kind of workload partitioning as we call it, it's completely going to change the landscape in in language modeling.

So there was a report from Gartner that came out about a month ago saying that small task specific models are going to outpace LLMs and enterprise usage by 3 to one in three years. And and we want to we want to lead that. We definitely think that every developer is going to become an AI developer.

So every dev today will need to be able to integrate a language model into their code just like they're they're integrating a open-source mpm package or Python library. It needs to be much simpler. Uh and that's very much how we're looking to change the game. It's going to be really hard to compete with the big labs.

Yeah. If we're just focus on the models, which obviously we're a foundational model company. Yeah, but we need to make much smoother developer workflow integrations. We need to make life easier for devs. And right now it's Yeah. Where does this Yeah. Where does this go?

I mean, I I I know um some of the stories about companies like like ramp for example, our sponsor, they uh need to uh digitize receipts. So they get a lot of images. They do OCR. Uh and I think Google provides an API for that. There's a bunch of companies that do OCR.

comes through this kind of messy cluster of text, then pipe that through GPT4 and boom, you have structured data. Uh, llama comes out. Okay, maybe it's getting cheaper. But this seems like something where you'd want to go to you guys and get an even cheaper model that's distilled just for that one task.

But that feels like almost like you're a consulting shop. Or is there is there a place where a company says, "Hey, we've been using GPT4 or Llama and we've done, you know, 10 million inferences and so we have a lot of data about what works, what doesn't.

Uh, can you train a custom model for us to drop our inference cost by a couple orders of magnitude? uh or or are you trying to focus more on more versatile foundation models that can be uh just tools in the tool chest and aren't kind of oneoff specific systems for a specific task within a specific company?

Yeah, I think there are a few a few ways to answer that. The first one is probably talking about agents and how agentic systems are evolving. I'm definitely in the camp of thought that says that agentic systems will take over legacy SAS systems within four to five years. Right?

So when you see how these agents are being put together, it's typically daisy chaining eight or 10 LLM calls. So in a chatbot, you want to parse a query.

You want to then figure out the right document to give back to the user, summarize the right chunk of that document, give it back all in real time, uh, with a very smooth chat interface. So we definitely see a world where you have different models for different tasks.

And and we're not saying that we're going to replace large generalist LLMs. The models that are larger, they're good at reasoning. They're good at research. They do things like orchestration. So they'll help orchestrate this entire pipeline. That's still going to be the large model. That's still going to be your GPTs.

You're going to be your Geminis. But the actual agents, the workers that are calling APIs that are doing these sort of deterministic, highcale, high throughput tasks, those are all going to be very small, intelligent, task specific models. That's how we seeing it see it play out. Cool. Uh Nvidia's down 6% today.

Is that because of you? Is that uh is it is it is the market are they pricing? I mean billions of dollars billions of dollars have been evaporated from the markets on the news. No, but how do you you know how do you think uh you know assuming Fesino just gains you know massive adoption over the coming years? Yeah.

How does how do you think that impacts the GPU market GPU demand broadly? Yeah, for you know for the record the low-end GPUs that we use are still Nvidia GPUs. We're still, you know, we're still a big Nvidia 10 to Jensen.

Jensen might appreciate if you weren't so efficient and you, you know, raised, you know, $500 million and gave it to him. Have you thought about doing that? If he can help us scale and go more viral, we we'll need more of them for inference.

For the record, but I think there's going to be a huge demand for for consumers, for uh LLMs as they are. I think we we were lucky enough to have two of the first OpenAI investors on our cap table. That's right. And they discussed that in the very beginning when there was a pitch deck, there wasn't a business model.

They didn't think that they were going to be a consumer company, but the consumer appetite for LLMs has gone crazy. Even during the Deep Seek moment, didn't Deepseek get to number one on the app store? My my wife, my friends are all downloading Deepseek.

So the need and the the hunger for consumers to, you know, automate their lives will constantly be driving the need for these larger generalist LLMs. We just don't believe they're needed for the enterprise. So we're taking a very different approach. Very cool. Makes total sense. Anything else? I love this conversation.

Me, too. This is super fun. He's an absolute dog. You're an absolute dog. You're an absolute dog. Call me anytime you guys want to talk about LLMs and I might need to get some get some help on buying a suit for New York. Oh, fantastic. We've got you. And and seriously, we'll we uh we got you. You got us.

I want to figure out I want to figure out this real time censoring thing. I think it'd be hilarious. We have some we have some guests that come on and try to drop fbombs. It's unacceptable. Our children listen to this. It's unacceptable. And I think it'd be hilarious if it was like, you know, made like a duck sound.

Yeah. Quack. It' be great. So, well, our people will talk to your people. Yeah. Appreciate it, guys. Thanks so much for having us on. Thank you so much. Uh, hey, big news. Uh, yeah. Yeah, Rippling has raised 480 or sorry, 450 million at a $16. 8 billion valuation.

And bigger news, I was actually supposed to be texting with them. Let me see. YC is a customer really. And that is big considering YC has created all the big payroll companies. Gusto, Deal, Ripling. Yeah, Gary must have been uh finding it hard to pick favorites, but well, I'm I'm texting with the team.

Hopefully get Parker on soon to talk about the business. Um uh one last thing. Uh we have a couple posts we want to go through. Uh but Michael, uh can you check the printer because I think we got a special print out for today. We haven't been printing posts very frequently, but we got one post that I wanted to print.

Hopefully, it printed. Let me see. I don't even know if it printed. It did not print. Did not print. All right. Is it working? I'm trying to print it again. And we might be out of paper, but Okay. But we can pull up the digital version. Anyway, uh Luke Metro said, "This show used to print out my tweets and read them.

Now they have the head of the army. " So, thank you to Luke Metro. I tried to print it, but we haven't printed in so long that the printer's not working. Anyway, there's some other posts. Uh did you see the drama in Ander World uh about Matt Grim uh taking uh notice.

co co to task for uh selling some fake equity in the company. Yeah. So I it's hard to know. I mean the way that notice was displaying this information was uh not not consistently candid is how I would describe it. Um and then the funniest thing you called this out to Matt.

Uh apparently the CEO of notice messaged Matt and said, "Hey, I'm FINRA registered so not supposed to post publicly on social media. Happy to continue the combo privately or do a call if you want. Let me know. Buddy and Buddy, it's a war crime. Hitting Matt Grim with buddy is a is a war is a war crime. Straight to jail.

It's so bad. It's so brutal. It's so bad. Um I actually think the notice platform is pretty cool. Oh yeah. Like they they they have a bunch of you know I mean I imagine that there's probably some steelman here. There's a million ways that you know it's a big company. There's a lot of investors.

Someone could have come and uh and figured out a way to put some money. No, I think what was happening is an employee. Oh, really? Okay. Like the reason that the the the way that you would have zero fees Sure.

is an employee was common, but it was Yeah, it's common that wasn't being and there should be transfer restrictions. I mean, these are very standard. So, there's something odd. I mean, there's been a big big history of these odd like secondary sales like uh for a while people were doing forward contracts.

Did you ever follow that story? So, so basically I mean I think still work. I I think they're definitely banned in most companies.

You're not supposed to do forward contract and it's in your but the whole nature of a forward contract is that the company doesn't really there's very good chance the company would never find out. True. But they can still ban it in your employment agreement.

Like it can still be something that you agree to when you join the company. So basically a forward contract if you're not familiar uh it's the right to purchase the shares at a future date. uh much like a stock option.

So you're basically writing an options contract against your shares uh in in the third party and and it and the the shares themselves don't actually transfer and so the company in theory doesn't need to approve it but it's very they wouldn't necessarily know and they wouldn't necessarily know. Yeah.

And so this was a big thing very very controversial. A lot of people don't like this. And so even though it doesn't it doesn't relate, it doesn't um result in a whole bunch of uh like legal complications that obviously you're not transferring the information rights because you're not actually transferring the shares.

They couldn't sue the company. That's the big reason why you want a clean cap table is because you have to deal with every investor. You have to give them information. They can sue you. Uh if they just own a forward contract, they probably can't do that. It's still a problem and creates all these distortion market.

So, uh, at Grim taking them to task, he says, "We're one of the good guys," says CEO of company who solicits retail investors to buy shares at a significantly inflated price for what it's worth in a privately held company they do not own shares of, have direct access to, or have any information or information rights from, which to discern financial performance or market positioning or anything whatsoever to inform the proposed investment.

All while not clarifying publicly or for that matter to the clients they are soliciting how exactly this exposure is structured or what precisely these clients are buying and while privately hiding behind a claimed veil fake by the way their fin registration means they can't comment when keeping it real goes too far uh rough so clean it up you don't want to have that grim on your bad side he could basically say this about most secondary brokers and or platform forms.

Um, anyway, we got some massive breaking news. We got some personnel. Jacob Efron's been promoted to managing director of Red Point. We love to see it. Very interesting. The rumors, this is a maxed out contract. Rumors have been swirling for a long time.

He was in free agency, considered free agency, stuck with Redpoint, got promoted. Congratulations to him. Couldn't be more excited for everything ahead. And weird uh weird terminology over at Redpoint. not GP, not partner, managing director, very investment bank like, you know, official business.

Uh Andrew Reed had a funny post. He was a uh coming from banking. No matter how senior I get in venture, I'll always think that a managing director is uh more senior to me. Moged. Feels feels very senior. It does. It does. Um anyway, uh Arvin Strvas over at Perplexity is taking more shots at Bloomberg.

Bloomberg is such a joke. Well, Perplexi does realtime call transcriptions for free. Bloomberg has a 15minute lag and it costs $30,000 a year. To be clear, that was Marcelo who said that somebody at Techrunch will listen to this. Do not ser is a joke.

Um, anyways, what Perplexity is doing uh around real time call transcription tech crunch reports is my quote. Me quoting someone else as our That's what I was saying. So bad. That's a risk. The bar is low these days, folks. The bar is low. Uh anyway, any other post we want to go through? The Pope is American.

Vatican City has a BIE now in a Waffle House and a Costco. It's great. They've taken over. America is in control. It's fantastic. And we hope you have a great weekend. We hope you have a great Mother's Day. Pick up something simple. And thank you for joining us this week, folks. Joining for the break.

We had a great time doing this show. Yeah, it was fantastic. Good job for tuning in and we'll see you next week. And remember, it's Mother's Day. We talked about this 3 hours ago, the Super Bowl of pronatalism, folks. Take care everybody. Have a great Mother's Day. Cheers. Bye.

← Back to story