Speechify's Cliff Weitzman: 50M users, bootstrapped on conviction, and why hyperscalers help more than hurt
May 20, 2025 · Full transcript · This transcript is auto-generated and may contain errors.
Featuring Cliff Weitzman
no, it's not a joke. It's not a joke. Hit the Ashton Hall. Hit the Ashton Hall. Bring Br in. Welcome to the stream, Cliff. Welcome to the stream. Oh, we were expecting shirtless. John said you were going to be shirtless. Just any TV. You can do shirtless as well. Yeah. No, no, we're making we're making TVPN stringers.
Uh yeah, I I I I worked out with him in in this building where we record this. Uh he put up some fantastic numbers on the bench press, showed me a spreadsheet where he's uh what's it called? Thor 2. 0. That's right. He's trying to become Thor 2. 0. now. And so I've been telling him I'm trying to do Thor for 3.
0 and it's been getting under your skin. But uh has a list in his gym. How's it been? All the PRs of anyone who's ever worked out in that gym. Yeah. Yeah. Yeah. We're we're getting there. Uh are you in LA now? Well, where you at these days? I'm in LA. Nice. I'm in LA in Studio City.
Uh but I'm about to we we move every four months or so. So, I'm about to do SF, New York, Prague, back to New York, London, Rome, back to London, back to New York. Those are all four month Those are all four months. So, you're planning like years. That's next That's the next 40 days. Okay.
So, bouncing around back in Florida for for a set period of time. Okay. Yeah. So, so I mean, take us through the structure of the business.
Obviously, you can travel a lot, but uh I it's a fantastic story, and I just want to hear uh what you built, how you wound up there, and kind of the the the state of the union with Speechify. Yeah.
Give I I guess to give people a sense for the scale, which I think is important because typically people just think companies are big if enough VCs post about it, but you haven't raised very much money. Speechify has over 500,000 five-star reviews, over 50 million users, Chrome extension of the year, app of the day.
Uh I'm just going to keep I'm going to hit this a couple times. Um but uh but anyways, uh continue. Yeah. Yeah. Take us through it. Yeah. Happy to share more. So I'm super dyslexic and I have ADHD. So first, second, third, fourth grade, I had a really tough time learning how to read.
And when I was in right about to start college, I built this text to speech tool that would read out everything to me. And um in high school, my mom used to read my summer reading books to me. And just we didn't have time to finish the summer reading book for college.
So I cobbled together this thing that would read the stuff into my iPhone and then I'd listen to that on the planet. It worked. And when I was in school, I studied renewable energy engineering at Brown. Um I ended up building about 36 different products.
Everything from 3D printed skateboard brakes to iPhone apps and websites and payment systems. It's skateboard brakes. Skateboard brakes. I'll show you. Tell Yeah, tell tell me tell me about that. You want to go slower.
So, um, if you ride a longboard or a skateboard and you go concrete hill, you don't want to stop by putting your face against the pavement. So, what I did 3D printed the series of brakes that would attach to the back axle. Oh, wow. That actually works really well. Ah, that's great. That's a good product.
That That's perfect. Be huge on Shark Tank. That's like the perfect product for that. Could have been on the tank. Oh, you did. No way. That's the best.
I was considering going on a show, but it's like you end up sitting for like 48 hours in a trailer waiting to get called and another big opportunity that I had at the same time. I was like, Shark Tank isn't worth it enough. Yeah. So, at a certain point, I realized that software is much better.
So, I had built this kind of meme maker um on a flight from Logan Airport to SFO, published it, checked in on it 30, 40 days later, and it had 90,000 users. And I was like, "Oh, software is way easier than injecting molding stuff and way easier than like boron doping silicon wafers.
" Um, my thesis was on a more efficient solar cell that I was building. I was like, "All right. " And around this time, I read this paper about narrow applications of deep learning. Uh, a bunch of academic papers. One of them was called WaveNet about autogressive speech. It came out of Deep Mind.
And I was like, "I can make a 100x better texttospech experience and a 100x better audiobooks experience. " and I wasn't sure what to work on. I knew I didn't want to go get a job at Google or Palanteer or Meta. Um, and I was like, "All right, well, I don't know what I want to do, so let me just write.
" So, I wrote a 30-page paper about my worldviews. And the conclusion was I wanted to be the person that I needed the most when I was a kid. And the thing I really needed was someone to do my readings for me. Mhm. Like, okay, I'm going to fully send it on this.
So, I convinced two of my professors to sponsor me to stay in school as a visiting scholar. Basically, I guest taught classes. I got to be a meal plan, live on campus, use the gym, but not pay tuition or do homework. Gym's important. Amazing.
I was going to do that uh work teaching computer science over the summer unlimited indefinitely until something took off. And so 6 months in it took off. Um and so now there's like 50 million people who use it.
Um but the goal is to make sure that reading is never a barrier for learning for anyone no matter what your background is. So if you download the speechify iPhone app or Chrome extension or Mac app or Android app, it lets you take a picture of a physical book.
It reads um it gives you play buttons throughout the internet. It's like the voice of the internet. You click play, it reads. Uh, and it coaches you to listen fast. So, I listen to two audio books a week. I've done that since I was 14. Um, so I've read more than 1,800 books by listen. You listen on like 3x or something?
Yeah, I listen on 3x. So intense. And so you can listen to the whole show in just one hour because it's a three hour. Yeah, it's one hour. This is why streaming doesn't work for me. It's got to be something that I can react. We're in RSS. We're in We're in We're on YouTube. So you can pull it in there. Listen.
That's exactly what I do. I I show off the live streams, you know, 80%. Yeah. Yeah. Yeah. Catch up at the end. Um, but look, I I learned English when I was 13. So, in the beginning, I would listen at 75x speed, build up to 1. 5, 2x, 2. 5, 3x. Um, and I'm obsessed with biographies, theology, philosophy, fantasy.
Um, and so in the beginning, nobody wanted to back a dyslexia education startup. AI was not yet hot.
Um the toughest point was um around 2017 we didn't have the level of engagement and retention that I wanted and there were a bunch of other sexy projects I could have done but from first principles my conclusion was I think that the trend that I most back is narrow applications of deep learning to we call it generative AI uh from a supply side and from a demand side it's audio as a user interface and I was like the intersection of these two if I wanted to build something in that intersection it's speechify so I might as well stick with it.
Um, I ended up putting a gigantic help button on the app. It made like 20% of the screen real estate. Bright red help last message us. And if you clicked it, it just put you into an iMessage conversation with me. So like I didn't need you to enable notifications. There was no intercom. There was no uh email.
And if I messaged you back and you didn't respond, I would FaceTime audio call you and a spreadsheet of all of our users. And if for whatever reason you didn't use the product, I would call you every day until you used it to didn't use it. And then the product became really good. That's amazing.
Um, we when we were 25, 18 of the folks who worked at Speechifi were previously either CEO CTO of VP of Engineering at their last company. A lot of them were XYC founders or folks who exited their company after their series A. Nice tidbit on fitness. 12 of the original teammates got six packs within 10 weeks of joining.
Uh, and we had and we have had at least 17 people who gained 10 pounds of muscle um in the first 10 weeks. It's amazing. Everyone back when we were here. Um, and we got a big Airbnb in a different city every four months or so.
And so at the time we were in LA and I convinced one person to move from India, one person to move from Bulgaria, one person to move from Mexico, one person to move to San Francisco. And I went and bought a bench press uh and I rented a truck, installed it in the apartment in my room.
Uh, and so we had me and Valentine were sleeping in and that's like the only kind of weights you need, right? It's just a bench everything. Um, and actually it gets even more deep because I went up to Marin to make sure that I was buying food for my parents because I didn't want them to go out during COVID.
And so two of the guys were in the apartment working out just body weight and they did a DEXA scan before and after and they found that they had like not gained any muscle even though they were working out like animals. And I was like, "No, you guys need more weight.
" And so I brought the bench press um and we had two guys sleeping in a bedroom in the living room that we partitioned with a window. Two guys sleeping in the bed in the uh in the room and then one guy on an air mattress. And that was like the best time ever. Like incredible time.
Um and then we found just very strong product market fit. Ended up scaling really quickly. We closed partnerships with all the top publishers to resell their audiobooks and ebooks. My little brother Tyler started coding when he was seven building Dragon Ball Z websites.
When he was eight, he taught himself assembly to hack video games. He went to exit for high school, skipped four and a half years of math, skipped five years of computer science, did Stanford as an undergrad, dropped out to run a cyber security company, went back to Stanford to do his masters in AI.
And I was like, hey, if you can help me build a model that uh will compile in 3x real time, has better quality than any API that's currently out there um and meets the following requirements, I will like do anything. You know, I gave him a very nice compensation offer and he did it. It took him about 10 months.
He joined Speechify 5 years in as head of AI. So now we have a 40 person AI engineering team and we make the highest quality digital voices in the world. We are the largest supplier of speech AI in the world for consumers. Um and then so that's like level one, level two, level three.
Um the vibe now is we make everything multimodal. So if you imagine that 5% of the population would read books for fun on their own. Um and let's imagine 15% of the population would listen if you provided it as audio.
Now we're using text to video models to turn everything into like a full um audio and visual experience. That's like super high level for speechify but happy to riff on anything. How big is the company now? We're 176 people now. Wow, that's huge. 32 countries.
So five software engineers, 40 people on the engineering team and the rest is growth. Yeah.
Talk about talk about the things that venture has tried to make you that the venture industrial complex has maybe tried to make you do that you've rejected because in in a in an alternate reality you would have raised $500 million by now and you know and and you know decided not to do that and that's influenced a lot of decisions I'm sure.
So part of it is the fact that between the ages of 14 and 18 I read 461 books. Um, a lot of them were about economics, philosophy, biographies. Um, and I also listen to a lot of fantasy. And when you listen to fantasy, you think, would I make the decision this character would have made in this point in time?
So, my favorite book is The Way of Kings by Brandon Sanderson. I'm obsessed with this character named Kaledan. I base my entire leadership style after Kaledan.
Um, and so when I was writing that 30-page paper to figure out what I wanted to do in my life, um, I became one of those lucky kids who when they were 21 figured out what they were put on earth to do. And for me it was to solve dyslexia. Um, and so I love what I do.
Like I wake up every morning early and I go to sleep very late. I sleep very little because I just I can't get enough of working on this. Um, and so I don't want to sell it. I want to own as much of it as possible and I consider equity holy.
Um mind you I was a solo founder for five years working on speechify and again we had an amazing leadership team that uh blood sweat and tears we figured out how to hire. Last year 50,000 people applied to work in the technical positions at speechify. 10,000 people took the asynchronous uh engineering challenges.
Um but we had like incredible offers for our series A from like literally the top firms. We said no. They doubled the offers. We still said no because I didn't want to sell 20% of the company. Um, and even with seed, we just picked folks who we thought were really good.
So, founders of Instagram, of Twitter, of Robin Hood, of you know, all these companies that I could learn from who did consumer subscription really well. Um, and that was was key. Um, so for example, Dylan Field from Figma taught me a ton very early on.
Um, and my nuance is if you take money from a fund, by and large, they have a fiduciary duty to their LPs. If you take money from an individual, they have just a duty to themselves. And so, you get a rolodex of someone who's really great, uh, but you're not beholden.
That being said, there are some businesses where it makes a lot of sense to raise money.
In our case, we have a very simple business and we understand the user extremely well and it's a very niche audience initially for people who have dyslexia, ADHD, low vision, autism, anxiety, concussion, second language learners, and then the entire productivity suite.
And at this point, we've worked on speech for eight and a half years. So, it's really hard to catch up to the product innovations as well as the AI side. Um, and so we are always talking to investors and don't get me wrong, we have raised money from, you know, funds as well.
Um, but we've just done it where we typically look for three things. Number one, it's folks who either have an evergreen model, so they can ride with us because I'm going to be CEO of Speechify in 80 years. Um, they have a history of philanthropy and education or an experience with dyslexia and their family.
And we think really highly of them. They can see around corners that we can't. Either they've done an amazing job taking multiple companies public um or they've backed founders in ways. So, you would go public. We would go public but into the future not in the short term. Yeah.
What what are you thinking about competition from the hyperscalers, the big tech companies? Google IO is today. Microsoft had their uh build keynote yesterday.
Uh it seems like an obvious place that uh competition could come from and yet uh on the product side a lot of the big tech companies just don't seem to be it iterating on the application layer side as fast as most people expected. So what's that been like?
Has anything slowed you down over the past two years with competition from the big guys? Oh, they've helped us a ton. So I'll give you two huristics that are relevant to any founders then I'll give you the specifics on speechify.
The first one is in my personal opinion value occurs at the application layer more than it occurs at the model layer and models become commoditized. Do you ever did you ever doubt yourself?
Did you ever think, oh, maybe value is going to acrue to the the model layer because there were there was like a there was like a 18month period where everyone was like models will do everything.
Well, I'm I'm giving you what my opinion is today and and the only reason I have a strong opinion is I've gone deeper into it than almost anyone has, right? We bought millions of dollars of H100 GPUs. We set them up in our own data center. We have a 40 person team who does research and development on models.
So the second part I was going to say is but if you really want to be indestructible you got to own both the model and the data and the application layer.
Um and that's what made Chad GPT and OpenAI so successful is they had uh Brockman work on the userfacing side and they had Ilia work on the research side and if they just had the research they'd just be another lab.
It's application layer that made it incredible and uh so that's the first thing because you want to be able to control the quality and the cost as well as the user experience. And then why did WhatsApp get acquired by Meta for $19 billion is because they had all the users. And so you own the end relationship.
You want to have the phone number, the credit card, the email of that user, which we talk in a minute about kind of the changes that came with Apple. Um, so that's the first thing. The second thing is, and this is very unique, sorry, not unique, ubiquitous.
If you are Apple, Meta, Google, Netflix, there is no point in doing any project unless you think that project is going to make you one to hundred billion dollars per year in the long term. Mhm. We had a point where we introduced translation into speechify. This is in 2020 2021.
And Mike Kger, who was the founder of Instagram, um CTO of Instagram, uh was like, "You should consider removing this feature. " And I'm like, "Are you kidding me? Do you know how long I spent building this feature? " But I have respect for Mike. So I took it out.
Install to trial increased, activation increased, users use the product more. And I was like, I got to remove this. Why? Why? Because it was distracting. Speech. One thing to give you the best experience listening to something. The second you add an extra button, users go down that path.
So, you know, from interesting has problem. But they had a problem. You guys have such insane scale that you could immediately see something that again just removing some complexity. Back back then we didn't have mad scale, but you could see it in 3 days of usage. like there's enough. No.
Um, so Clippy is horrible because Word is horrible because the design is terrible. But you know who has amazing design? Google. Yeah. One uh uh uh one text field. Who else has amazing design? JPD. One text field, one button. And that's why they're successful.
If they had multiple buttons, they would be far less successful. Now, here's the key. Text to speech is buried five levels deep in the menu. When I was 19, I made a video on YouTube under an anonymous account because at the time I was embarrassed for people to learn I was dyslexic.
And it was called how to text to speech macree. And if you search that phrase on Google even today, my video from when I was 19 is number one. Why? Because it's so difficult to activate text to speech that you need 19-year-old Cliff to show you how to do it. And you can't pause. You can't play.
And by the way, if you want to listen like me, what I ended up doing in college is I would use the terminal to change the speed because the guey didn't go fast enough and it back down. I had to restart my computer. So I was restarting my MacBook multiple times a day when I was in college.
And then I was like, screw this. I'm going to build a Mac app with keyboard shortcuts. I'm going to build a thing that OCRs the screen. And so my dream is for Google in mobile Chrome to add a gigantic play button that appears in every website that lets you listen to it. Why?
Because it will make everyone addicted to listening.
And then when they want to listen to a long PDF and they want to do it offline or they want to do it in my voice or Snoop Dogg voice or their own voice or they want to translate it to another language or they want to scan a physical document, Google doesn't cover that.
Apple doesn't cover that and people use multiple platforms. So you need to use speechify and so speechify is the premium experience for text to speech. What I need is for people to become educated about the idea that you can listen to stuff.
And so a lot of our work over the last eight years has been to educate the market. And so there's really two goals. Number one, build the most exceptional product like extreme. We have four core principles. Extreme product quality, leading with love, frugality, and speed. So, we talk to users a lot.
We take really good care of each other. We don't waste money on anything. We build our own SAS tools and we move really fast. We ship a lot to production. Um, and the second goal is to educate people that, hey, you can practice being a fast listener. And if you practice, anyone can do it.
My dad is 65 years old and he English is not his first language. And the joke we have in my family is he wishes he had a mute button for me and I wish I had a speed up button for him. That's very sweet. He now listens at two and a halfx speed to everything.
And so my life is better because my dad listens and talks faster now. Uh but anyone can learn how to do this. It's just a matter of practice. When you come into first grade, no one expects you to be a good reader first, second, third, fourth, fifth, 12th grade. We expect 12 years for you to become a good reader.
But you can become a good listener after listening to 15 audiobooks. You're not going to be good on the first one. You're not going to be good on the 10th one. And being a good listener means three things. Number one, you can do something else and listen at the same time. Drive, cook, walk, whatever.
Number two, you can listen to more than 2x speed. Number three, five weeks later, I ask you about the book, you have great retention of everything that you read in the book. And it will not happen when you start. You have to practice. But here's the thing, reading is a hack we invented, right?
24 characters on some dead trees. Listening, we evolved to be good at listening, right? Telling stories over the fire. So if you were a bad listener, you were removed from the genetic pool. If you're a bad reader, you were not removed from the genetic pool because otherwise I wouldn't be around.
Um, and so, uh, when you read, 30% of your brain is dedicated towards decoding. 70% is comprehending. When you listen, like 3% is dedicated toward decoding, the rest is comprehending. And so, you can understand a lot better. And once you practice listening fast, you have that skill for life.
And especially if you have ADHD and you get distracted easily, the speed of listening matches the speed at which their mind is working. And that helps people focus, especially in school. but also in work. Um, and so that's the goal of the company is to just make that accessible to everybody.
And right now there's 450,000 audio books, but there's 100 million books. We make all those accessible plus your emails, plus PDFs and everything else. It's amazing. Incredible. You are correct. Uh, wish we had more time. Yeah, we got to get a workout in soon. We will. Yeah. Come by before you before you take off.
We got to make sure you break that PR. Yeah. Yeah, I'm working on I I think I'm pretty close. Uh, we'll talk to you soon, Cliff. Thanks. Thanks for coming on. You're the man. You got it. Bye. Bye. Uh really bunch of VCs are going to reach out to him now. He really has a fantastic business.
Uh let me tell you about Bezel. Go to getbzel. com. Your bezel concierge is available now to source you any watch on the planet. Seriously, any watch. They got Rolexes. They got the GMT Master 2, the Batman, the precursor to the Batman payroll suite or the Batman. Uh you're really on to something there.
I think it's good. I think that's a feature. Disney payroll. I mean, Logan Paul Logan Paul has a energy drink. Why not? Jake Paul has deodorant. Jake Paul has deodorant. Why doesn't Batman have a ERP system? Why not? Or Shack Payroll. Shack payroll. Shack payroll would be pretty good.
Um, before we bring in the Google folks, we should talk about Joe Weisenthal. He says, "Your neighbor is online. He thinks the bricks are creating a new currency that will soon replace the dollar. He thinks Blackstone owns 60% of single family homes. He's long Cardano.
And this is a response to Buco Capital Bloke who says, "Your neighbor isn't online. He doesn't think about tariffs. His Honda CRV only has 80k miles on it. Good for another 100k. " He doesn't have an opinion on the rating of our sovereign debt. Only the only flation he cares about is shrinkflation.
Yet his 401k goes up just the same. I think the uh the Blackstone owning 60% of family homes is the funniest meme. this whole idea of like, you know, oh, like Blackstone's the most powerful financial institution in the world just because they index everything. Turning single family homes into pods. They are.
Uh, all right. Let's bring in you know