Shirish Sathaye, General Partner, Cervin:
Hi Anshu, thanks for taking the time to talk with me today. We are excited to be invested in ThirdAI. We were among the first investors in the company, which is an exciting opportunity to change the economics of artificial intelligence and machine learning. First things first, tell us a bit about yourself and ThirdAI.
Anshumali Shrivastava, Founder and CEO, ThirdAI:
I’m Anshumali Shrivastava. I’m the founder and CEO of ThirdAI. I’m also a tenured faculty member in Computer Science at Rice, I’ve been here since 2015. My specialization is training really large neural networks, and I have been working on that since 2010.
SS: So you’re a professor at Rice. You are established there, you have graduate students. What inspired you to start a company?
AS: For a bit of background, I was a math major as an undergrad. Then I worked for a few years at FICO, building credit score models, and that is how I got interested in machine learning. With my math background and this renewed interest in data-driven sciences, I started my Ph.D. at Cornell, where I looked at fundamental information retrieval problems. Around this time, we solved an open problem and got the NeurIPS best paper award, and we showed that certain theoretical information retrieval problems could be solved efficiently. This was the direction I was headed in when I joined Rice. Then I focused on how to use similar ideas to make neural networks faster. I’ll not bore you with all of the technical details, but we figured out a non-trivial way to train neural networks with the same accuracy but requires 10,000 or 1,000 fewer operations. In some of the early papers we put out in the academic community, we talked about a neural network that, on some CPUs, beat GPUs by 5 or 10X speed. Now that excited everybody. At this point, my co-founder Tharun and I were convinced that we’d created a very valuable thing because AI demand is shortly going to surpass any computing capabilities. What we have is a technology that can dramatically increase the performance and efficiency of AI. That's when we decided we would do the hard work, and take this journey.
SS: So obviously, there is a lot of excitement around AI and large language models. Now along with that excitement, there are some fears about it turning sentient and turning on humanity, but let’s put that aside for now. Besides that, there are also many challenges that have to do with the economics of training models, including environmental problems, cost issues, and availability issues. Can you speak to that? I’d love to hear from an expert.
AS: I recently gave a talk about what it really takes to build a large language model and the cost of it. Unfortunately, there is a lack of information about this topic.
So imagine you are an enterprise and want to build a large language model; you have two options. One is to go with the usual suspects, take your data, send it to a special private cloud, and train the model. Now we are talking about terabytes of your information. So if you are an enterprise that deals with payroll or something like that, let's say you need all the payroll information to be sitting there to train, but if you trust the cloud, you should also know that it's fine to have one more copy in a AI ready cloud, but having 2 and 3 copies in the cloud doubles and triples the privacy risk. Mind you, to train this large language model, you need a dedicated infrastructure, so it cannot be where your data resides. So you are transferring your data.
The alternative is to use an open source model. Now that's a viable route, but it requires that you first build the infrastructure yourself. This infrastructure requires a large hardware cluster. If you are relying on cloud infrastructure, you still need to send your data to the cloud. You need specialized engineers to build this pipeline. For getting the model to work, you are relying on the open source community to progress in your desired direction, which can be a challenge.
The whole friction arises because of hardware tension. Data and AI cannot sit together. AI sits on its own hardware, and data sits on its own hardware. And you have to move the data from its hardware to the AI hardware to train it. Also, AI building is a constant process, so you have to constantly deal with the hardware barrier and friction. Sooner or later, enterprises will realize that, and that is where ThirdAI comes in. We are bringing AI to the data, which is much easier.
SS: Now let’s talk about your transition from professor to entrepreneur. How does the world of academia compare to the startup world?
AS: There are quite a few differences, but one big one is that academia is more focused on the rigor of an idea and startups and companies are more focused on the rigor of execution.
Another difference comes from how you interact with people. When you are a professor, every Ph.D. candidate is like a small company with their own objectives and goals they are working towards. Whereas in a startup, you have a team all working towards the same goal. And I think adjusting to this is a hard part of the transition from academia to the startup world. It’s also the fun part of transition.