By providing AI functions for SQL analysts, Databricks continues to integrate data, ML, and AI across its platform. AI Functions in Databricks SQL are pre-built, easy-to-use functions that incorporate machine learning models into SQL queries. These functions enable data analysts and engineers to leverage AI capabilities without the need for extensive machine learning expertise. By simply calling these functions within standard SQL statements, users can perform complex AI tasks such as natural language processing, image recognition, and predictive analytics. AI Functions provide general-purpose and task-specific functions.
Task-Specific AI Functions
The fastest and easiest approach to get started with AI Functions is to begin with a task-specific function, since you will be calling a GenAI model managed and maintained by Databricks. I consider the following functions to be the simplest entry point because they’re similar to work you’ve probably done with a chat model.
Once you have experimented with these basic commands, you can move on to including core ML concepts directly into SQL using AI. Try to use these commands within the context of a simple, but actual, ML task to get an idea of how much power can be leveraged with these pre-built tasks.
I strongly encourage you to try these commands in your environment. First of all, they are in Public Preview at this time, so even executing one command will validate whether or not you have this functionality enabled. Also, by the time you have moved from ai_translate to ai_forecast, you will have seen the potential. Fixing grammar may not have been impressive on its own, but the classification and extraction functions really begin to show how AI can practically bridge the gaps that exist in any large enterprise Databricks installation. There are gaps in knowledge and experience between and even within teams around data science and data engineering at AI can help bridge.
Task-Specific Examples
Assume you have a corpus of customer reviews and you want to perform standard sentiment analysis. This is considered a very foundational ML problem, but AI on SQL makes this capability accessible to any user.
SELECT review_text, ai_sentiment_analysis(review_text) AS sentiment FROM customer_reviews;
Clustering and classification algorithms are also made very intuitive.
SELECT product_id, image_url, ai_image_classification(image_url) AS product_category FROM product_catalog;
You can perform custom data extraction that would be extremely brittle and complex otherwise.
SELECT review_id, review_text, ai_extract( review_text, 'Extract the following information from the review: - Product name - Rating (1-5) - Key feature mentioned - Any reported issues Format the output as JSON.' ) AS extracted_info FROM customer_reviews
General AI Functions
Databricks only considers ai_query to be a general function, but I also include ai_gen and vector_search. The vector_search
function lets you query a Mosaic AI Vector Search index using SQL. I found this one interesting as it seems more like a MosaicAI extension rather than an AI function since it is very product-specific and I can’t help but notice it bucks the naming convention. the ai_query and ai_gen functions are similar, but they have critical differences.
ai_gen
is not considered by Databricks to be a General AI function because its specifically designed for, and constrained by, the SQL domain. Its intended to enable data exploration and query assistance by taking natural language questions about data in the form of a prompt. Also, its limited to Databricks-hosted foundation models optimized for AI Functions. ai_query
is not limited in its model selection. You can use the same Databricks-maintained models or invoke fine-tuned foundation models deployed on Mosaic AI, foundation models hosted outside of Databricks or even traditional ML or DL model, such as scikit-learn, xgboost, or PyTorch. At its most basic, you provide a prompt and optionally any additional configuration parameters.
Conclusion
AI Functions in Databricks SQL represent a significant leap forward in democratizing AI capabilities within the #DataLakehouse environment. By making complex ML tasks accessible through familiar SQL syntax, Databricks empowers data professionals to enhance their analytics workflows with cutting-edge AI technology. These functions promise to become indispensable tools in the modern data analyst’s toolkit.
Contact us to learn more about how to empower your teams with the right tools, processes, and training to unlock Databricks’ full potential across your enterprise.
Source: Read MoreÂ