Exclusive Talk with Devvret Rishi, CEO and Cofounder at Predibase

Devvret Rishi is the CEO and Cofounder of Predibase. Prior he was an ML product leader at Google working across products like Firebase, Google Research and the Google Assistant as well as Vertex AI. While there, Dev was also the first product lead for Kaggle â€“ a data science and machine learning community with over 8 million users worldwide. Devâ€™s academic background is in computer science and statistics, and he holds a masters in computer science from Harvard University focused on ML.

Asif: What inspired you to found Predibase, and what gap in the market did you aim to address?

Devvret: We started Predibase in 2021 with the mission to democratize deep learning. At that time, we saw that leading tech companies like Google, Apple, and Uberâ€”where my co-founders and I previously workedâ€”were leveraging neural network models, especially large pre-trained ones, to build better systems for tasks like recommendation engines and working with unstructured data such as text and images. However, most companies were still relying on outdated methods like linear regression or tree-based models. Our goal was to democratize access to these advanced neural networks.

We built Predibase on top of an open-source project my co-founder Piero had started while at Uber. Initially, we believed the way to democratize deep learning would be through platforms like ours, but we were surprised by how quickly the field evolved. What really changed the game was the emergence of models with massive parameter counts, like transformers. When scaled up by 100x or 1000x, these models gained emergent generative properties. Suddenly, engineers could interact with them simply by prompting, without any initial training.

Our platform initially focused on fine-tuning models like BERT in 2021-2022, which were considered large at the time. But as generative AI evolved, we saw that engineers needed more than just pre-trained modelsâ€”they needed a way to customize them efficiently. This reinforced our original vision. While we initially focused on democratizing deep learning through fine-tuning, we realized that the need for customization platforms like Predibase had only grown stronger.

Asif: Your results seem almost magical; how do you do it?Â

Devvret: The core of our success comes from recognizing that machine learning has fundamentally changed. Five years ago, the way you trained models was by throwing a lot of data at them, training from scratch, and waiting hours or days for the process to converge. While training and fine-tuning arenâ€™t going away, there has been a fundamental shift in how models are trained. The biggest trend driving this shift is the technical innovation behind Low-Rank Adaptation (LoRA). LoRA introduced the idea that you can modify only a small fraction of a modelâ€™s parametersâ€”typically less than 1%â€”and still achieve the same level of performance as if you had fine-tuned all 7 billion parameters. This approach allows the model to behave and perform at a high level while being much more efficient.

Many customers assume that training or fine-tuning models will take days and cost tens of thousands of dollars. In contrast, with Predibase, we can fine-tune most models in 30 minutes to an hour for as little as $5-$50. This efficiency empowers teams to experiment more freely and reduces the barriers to building custom models.

So I think the magic in our results is really threefold:

The first key insight we had was recognizing that the way models are trained would change significantly. We fully committed to parameter-efficient fine-tuning, enabling users to achieve high-quality results much faster and with a much smaller computational footprint.

The second step was integrating parameter-efficient training with parameter-efficient serving. We used LoRA-based training and LoRA-optimized serving through our open-source framework, LoRAX. LoRAX allows a single deployment to support multiple fine-tuned models, which means you can achieve excellent results by having many specialized fine-tunesâ€”perhaps one per customerâ€”without significantly increasing serving costs.

The final ingredient behind our success is a lot of hard work and benchmarking. Weâ€™ve fine-tuned hundreds of billions of tokens on our platform and tens of thousands of models ourselves. This hands-on experience has given us deep insights into which parameter combinations work best for different use cases. When a customer uploads a dataset and selects a model, we have prior knowledge of how to train that model most effectivelyâ€”what LoRA rank to use, how large the model should be, and how long to train it. It all comes down to being empirical, and our extensive research, including the Predibase Fine-Tuning Leaderboard, has been baked into the platform to make this process seamless for users.

Asif: Where/when does your solution deliver the best results?

Devvret: Our platform delivers the best results for specialized tasks. As one of our customers put it, â€œGeneralized intelligence might be great, but we donâ€™t need our point-of-sale assistant to recite French poetry.â€

Weâ€™ve seen this in our Fine-Tuning Leaderboard as well, which shows that fine-tuned models excel at handling specific, focused tasks. LoRA-based fine-tuning and serving are especially effective in these scenarios, enabling organizations to achieve high-quality results tailored to their needs. This approach ensures they get the precision they require without the unnecessary overhead of larger, general-purpose models.

Asif: How does your solution help address the huge cost of running LLMs?

Devvret: Weâ€™ve built over 50 optimizations into our fine-tuning stack, incorporating the latest findings from the research community. These optimizations allow you to fine-tune models with minimal resources while still achieving high-quality results. As a result, fine-tuning can typically be completed in minutes or hoursâ€“not daysâ€“for just $5 to $50, a fraction of what traditional methods would cost.

On the inference sideâ€“where a typical organization allocates most of their spedâ€“we tackle costs with GPU autoscaling, so you only pay for the compute you use. Turbo LoRA ensures models are optimized for fast inference with low latency, and our LoRAX framework allows multiple fine-tuned models to run from a single GPU. This means you can efficiently serve fine-tuned models from fewer GPUs, helping keep your infrastructure costs low while supporting high-volume real-time workloads.

Asif: Large enterprises are very concerned about data security and IP, how do you address this?

Devvret: We get itâ€”data security and IP protection are top priorities, especially for enterprises handling sensitive information. Thatâ€™s why we offer the ability to deploy Predibase in your Virtual Private Cloud or in our cloud. This ensures that data stays under your control, with all the security policies you need, including SOC II Type II compliance. Whether youâ€™re in finance, healthcare, or any other regulated industry, you can fine-tune and deploy models with the confidence that your data and IP are safe.

Asif: How easy/complicated is it to use Predibase?

Devvret: You can get started with Predibase in as few as ~10 lines of code. Whether youâ€™re an engineer or a data scientist, our platform abstracts away the complexities of fine-tuning and deploying models. You can get started through our web interface or SDK, upload your dataset, select a model, and kick off training in no time. Weâ€™ve built Predibase to make fine-tuning as simple as possible, so teams can focus on outcomes instead of wrestling with infrastructure.

Asif: Inference speed is key in many use cases, how does Predibase help with that aspect?

Devvret: Predibase boosts inference speed with Turbo LoRA, which increases throughput by up to 4x, and FP8 quantization, which cuts the memory footprint in half for faster processing. On top of that, the LoRAX framework lets multiple fine-tuned models run on a single GPU, reducing costs and improving efficiency. With GPU autoscaling, the platform adjusts resources in real-time based on demand, ensuring fast responses during traffic spikes without overpaying for idle infrastructure. This combination guarantees fast, cost-effective model serving, whether for production workloads or high-volume AI applications.

Asif: How fast is the payback on the fine-tuning initial cost?

Devvret: The payback on fine-tuning with Predibase is incredibly fast because LoRA fine-tuning is remarkably cheap compared to full fine-tuning. Many people still assume that fine-tuning is expensive, imagining the high costs of full model retrainingâ€”but with LoRA, fine-tuning typically costs only $5 to $50 for a job, making it a low-risk, high-return investment. With Predibase, enterprises can fine-tune efficiently without running dozens of expensive, time-consuming experiments. This enables rapid deployment of specialized, high-performing models.

Asif: How are you different from other fine tuning providers?

Devvret: Predibase stands out with a comprehensive fine-tuning platform that just worksâ€”no out-of-memory errors while training or unexpected drops in throughput while serving. Weâ€™ve built 50+ optimizations directly into our stack to ensure smooth, high-performance fine-tuning. Combined with LoRAXâ€“which lets you efficiently serve hundreds of fine-tuned adapters on a single GPUâ€“our Turbo LoRA, FP8 quantization, and GPU autoscaling make our model serving infrastructure industry-leading, delivering faster responses at lower costs.

Weâ€™ve seen too many teams get bogged down managing infrastructure, building data pipelines, and debugging fragmented open-source toolsâ€”leaving less time to actually build and productionize AI. Thatâ€™s why we provide an end-to-end platform backed by a dedicated team of ML engineers to help you every step of the way. Whether you prefer the flexibility of SaaS in our cloud or full control with VPC deployments in yours, Predibase frees you from the operational burden, so you can focus on delivering impactful AI solutions.

Asif: What are some of the companies that youâ€™re working with and what problem are they solving with SLMs?

Devvret: Checkr leverages Predibase to improve the accuracy and efficiency of background checks. They process millions of checks monthly, but 2% of the data in one part of the background check workflowâ€”often messy and unstructuredâ€”needed human review. With Predibase, Checkr fine-tuned a small language model, achieving 90%+ accuracy, outperforming GPT-4, and reducing inference costs by 5x. This enabled them to replace manual review with real-time automated decisions, meeting tight latency SLAs and improving customer experience.

Convirza, on the other hand, processes over a million phone calls per month to extract actionable insights that help coach call agents. Previously, managing infrastructure for their AI models was complex and often too much of a burden for their small AI team. With Predibaseâ€™s LoRAX multi-adapter serving, theyâ€™re able to consolidate 60 adapters into a single deployment, reducing overhead and allowing them to iterate on new models much faster. This efficiency lets them focus on building AI solutions, not infrastructure, unlocking new capabilities for their customers, like creating bespoke call performance indicators on the fly.

Both companies highlight how small language models fine-tuned on Predibase outperform larger models while cutting costs, improving response times, and streamlining operations.

Asif: How do you see the industry evolving?

Devvret: There are two big wars happening in generative AI infrastructure. The first is the competition between small, fine-tuned language models and large, general-purpose models. The second is the battle between open-source and commercial solutions.

The question that comes up a lot is: will the future be about small, task-specific, fine-tuned models, or large, general-purpose ones? Iâ€™m convinced itâ€™s going to be more and more about small, fine-tuned models and weâ€™ve already seen this shift starting. In 2023, the marketâ€™s focus was all about making models as big as possible, which worked well for quick prototyping. But as companies move into production, the focus shifts to cost, quality, and latency.

A lot of studies have pointed out that the economics of Gen AI havenâ€™t always added upâ€”too much spend, too little benefit. You canâ€™t justify spending billions on infrastructure to solve relatively simple automation tasks. Thatâ€™s where smaller, task-specific models come in. As teams graduate from prototyping into production, these models will grow in importance.

And if you look at organizations using Gen AI seriously at scale, almost all of them follow this path as they mature. Itâ€™s the same reason OpenAI felt the need to roll out something like GPT-4o-mini. I think this trend will continue, and itâ€™s a good thing for the industry because it forces costs to align with ROI.

Talking about the second trend, my view is that the entire pie for both open-source and commercial models will grow very quickly, but the relative share of open-source is going to grow much faster than the commercial side. Based on an A16Z Generative AI survey from 2023, people were looking to spend a lot on LLMs, especially in the enterprise segment. But in 2023â€“the year of prototyping, as many people sayâ€“80 to 90% of the usage was estimated as closed source. However, two-thirds of AI leaders have expressed plans to increase their open-source usage, targeting a 50/50 split.Â

Historically, most machine learning has been built on open-source architectures, so this shift aligns with the broader trajectory of the industry.

Asif: What problems are left unsolved and where do you see the greatest opportunity?

Devvret: I think the biggest unsolved problemâ€”and one I find really excitingâ€”is how to create a flywheel where models get better as theyâ€™re used. What I mean is introducing a real active learning process for LLMs. Right now, what I hear from organizations is that when they move to production, they can often get a model to 70% accuracy with prompt engineering alone. But as they try to push further, they only see marginal improvementsâ€”maybe going from 70% to 71%.

What they really want is a way to reach 80% or 90% accuracy, and they hope that by deploying the model, they can collect enough data to keep improving it. But that workflow isnâ€™t solved yet. The way many companies handle it now is by releasing a model at 70%, collecting production data, manually reviewing it, and then fine-tuning the model based on annotated datasets. But this approach just doesnâ€™t scaleâ€”thereâ€™s no way to manually review enough data, especially as LLMs handle millions of queries in production.

The real opportunity, in my opinion, lies in building a system where models can improve automatically over time. For example, if a model launches with 70% accuracy in a new domain, you need a way to leverage production data to fine-tune it iteratively. I think the key will be applying some of the breakthroughs weâ€™re already seeingâ€”like using LLMs as judges or generating synthetic dataâ€”to create that flywheel. With such a system, a model could launch at 50-70% accuracy, collect data from real use, and improve on its own.

This idea was partially realized in recommender systems, but it hasnâ€™t yet been achieved with generative AI at scale. Thatâ€™s where I think the industry is headed, and itâ€™s where I see the most exciting potential for growth.

This Interview was originally published in Marktechpost Small Language Model SLM Magazine 2024.

The post Exclusive Talk with Devvret Rishi, CEO and Cofounder at Predibase appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Exclusive Talk with Devvret Rishi, CEO and Cofounder at Predibase

Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-4028 – PHPGurukul COVID19 Testing Management System SQL Injection Vulnerability

LLM safeguards are easily bypassed, UK government study finds

Google DeepMindâ€™s SIMA Project Enhances Agent Performance in Dynamic 3D Environments Across Various Platforms

CVE-2025-4499 – Simple Hospital Management System Buffer Overflow

Compare screenshots of rendered web pages

How AI Changed My Web Development Workflow

Proxmox Mail Gateway – email security solution

Activision user research workers form union under Microsoft

Exclusive Talk with Devvret Rishi, CEO and Cofounder at Predibase

Related Posts