Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Ultimate Guide to Node.js Development Pricing for Enterprises

      July 29, 2025

      Stack Overflow: Developers’ trust in AI outputs is worsening year over year

      July 29, 2025

      Web Components: Working With Shadow DOM

      July 28, 2025

      Google’s new Opal tool allows users to create mini AI apps with no coding required

      July 28, 2025

      5 preinstalled apps you should delete from your Samsung phone immediately

      July 30, 2025

      Ubuntu Linux lagging? Try my 10 go-to tricks to speed it up

      July 30, 2025

      How I survived a week with this $130 smartwatch instead of my Garmin and Galaxy Ultra

      July 30, 2025

      YouTube is using AI to verify your age now – and if it’s wrong, that’s on you to fix

      July 30, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Time-Controlled Data Processing with Laravel LazyCollection Methods

      July 30, 2025
      Recent

      Time-Controlled Data Processing with Laravel LazyCollection Methods

      July 30, 2025

      Create Apple Wallet Passes in Laravel

      July 30, 2025

      The Laravel Idea Plugin is Now FREE for PhpStorm Users

      July 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      New data shows Xbox is utterly dominating PlayStation’s storefront — accounting for 60% of the Q2 top 10 game sales spots

      July 30, 2025
      Recent

      New data shows Xbox is utterly dominating PlayStation’s storefront — accounting for 60% of the Q2 top 10 game sales spots

      July 30, 2025

      Opera throws Microsoft to Brazil’s watchdogs for promoting Edge as your default browser — “Microsoft thwarts‬‭ browser‬‭ competition‬‭‬‭ at‬‭ every‬‭ turn”

      July 30, 2025

      Activision once again draws the ire of players for new Diablo Immortal marketing that appears to have been made with generative AI

      July 30, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Build a Gemini-Powered DataFrame Agent for Natural Language Data Analysis with Pandas and LangChain

    Build a Gemini-Powered DataFrame Agent for Natural Language Data Analysis with Pandas and LangChain

    June 10, 2025

    In this tutorial, we’ll learn how to harness the power of Google’s Gemini models alongside the flexibility of Pandas. We will perform both straightforward and sophisticated data analyses on the classic Titanic dataset. By combining the ChatGoogleGenerativeAI client with LangChain’s experimental Pandas DataFrame agent, we’ll set up an interactive “agent” that can interpret natural-language queries. It will inspect data, compute statistics, uncover correlations, and generate visual insights, without writing manual code for each task. We’ll walk through basic exploration steps (like counting rows or computing survival rates). We will delve into advanced analyses such as survival rates by demographic segments and fare–age correlations. Then we’ll compare modifications across multiple DataFrames. Finally, we will build custom scoring and pattern-mining routines to extract novel insights.

    Copy CodeCopiedUse a different Browser
    !pip install langchain_experimental langchain_google_genai pandas
    
    
    import os
    import pandas as pd
    import numpy as np
    from langchain.agents.agent_types import AgentType
    from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
    from langchain_google_genai import ChatGoogleGenerativeAI
    
    
    os.environ["GOOGLE_API_KEY"] = "Use Your Own API Key"

    First, we install the required libraries, langchain_experimental, langchain_google_genai, and pandas, using pip to enable the DataFrame agent and Google Gemini integration. Then import the core modules. Next, set your GOOGLE_API_KEY environment variable, and we’re ready to instantiate a Gemini-powered Pandas agent for conversational data analysis.

    Copy CodeCopiedUse a different Browser
    def setup_gemini_agent(df, temperature=0, model="gemini-1.5-flash"):
        llm = ChatGoogleGenerativeAI(
            model=model,
            temperature=temperature,
            convert_system_message_to_human=True
        )
       
        agent = create_pandas_dataframe_agent(
            llm=llm,
            df=df,
            verbose=True,
            agent_type=AgentType.OPENAI_FUNCTIONS,
            allow_dangerous_code=True
        )
        return agent

    This helper function initializes a Gemini-powered LLM client with our chosen model and temperature. Then it wraps it into a LangChain Pandas DataFrame agent that can execute natural-language queries (including “dangerous” code) against our DataFrame. Simply pass in our DataFrame to get back an interactive agent ready for conversational analysis.

    Copy CodeCopiedUse a different Browser
    def load_and_explore_data():
        print("Loading Titanic Dataset...")
        df = pd.read_csv(
            "https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv"
        )
        print(f"Dataset shape: {df.shape}")
        print(f"Columns: {list(df.columns)}")
        return df

    This function fetches the Titanic CSV directly from the Pandas GitHub repo. It also prints out its dimensions and column names for a quick sanity check. Then it returns the loaded DataFrame so we can immediately begin our exploratory analysis.

    Copy CodeCopiedUse a different Browser
    def basic_analysis_demo(agent):
        print("nBASIC ANALYSIS DEMO")
        print("=" * 50)
       
        queries = [
            "How many rows and columns are in the dataset?",
            "What's the survival rate (percentage of people who survived)?",
            "How many people have more than 3 siblings?",
            "What's the square root of the average age?",
            "Show me the distribution of passenger classes"
        ]
       
        for query in queries:
            print(f"nQuery: {query}")
            try:
                result = agent.invoke(query)
                print(f"Result: {result['output']}")
            except Exception as e:
                print(f"Error: {e}")

    This demo routine kicks off a “Basic Analysis” session by printing a header. Then it iterates through a set of common exploratory queries, like dataset dimensions, survival rates, family counts, and class distributions, against our Titanic DataFrame agent. For each natural-language prompt, it invokes the agent. Later, it captures its output and prints either the result or an error.

    Copy CodeCopiedUse a different Browser
    def advanced_analysis_demo(agent):
        print("nADVANCED ANALYSIS DEMO")
        print("=" * 50)
       
        advanced_queries = [
            "What's the correlation between age and fare?",
            "Create a survival analysis by gender and class",
            "What's the median age for each passenger class?",
            "Find passengers with the highest fares and their details",
            "Calculate the survival rate for different age groups (0-18, 18-65, 65+)"
        ]
       
        for query in advanced_queries:
            print(f"nQuery: {query}")
            try:
                result = agent.invoke(query)
                print(f"Result: {result['output']}")
            except Exception as e:
                print(f"Error: {e}")

    This “Advanced Analysis” function prints a header, then runs a series of more sophisticated queries. It computes correlations, performs stratified survival analyses, calculates median statistics, and conducts detailed filtering against our Gemini-powered DataFrame agent. It loop-invokes each natural-language prompt, captures the agent’s responses, and prints the results (or errors). Thus, it demonstrates how easily we can leverage conversational AI for deeper, segmented insights into our dataset.

    Copy CodeCopiedUse a different Browser
    def multi_dataframe_demo():
        print("nMULTI-DATAFRAME DEMO")
        print("=" * 50)
       
        df = pd.read_csv(
            "https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv"
        )
       
        df_filled = df.copy()
        df_filled["Age"] = df_filled["Age"].fillna(df_filled["Age"].mean())
       
        agent = setup_gemini_agent([df, df_filled])
       
        queries = [
            "How many rows in the age column are different between the two datasets?",
            "Compare the average age in both datasets",
            "What percentage of age values were missing in the original dataset?",
            "Show summary statistics for age in both datasets"
        ]
       
        for query in queries:
            print(f"nQuery: {query}")
            try:
                result = agent.invoke(query)
                print(f"Result: {result['output']}")
            except Exception as e:
                print(f"Error: {e}")

    This demo illustrates how to spin up a Gemini-powered agent over multiple DataFrames. In this case, it includes the original Titanic data and a version with missing ages imputed. So, we can ask cross-dataset comparison questions (like differences in row counts, average-age comparisons, missing-value percentages, and side-by-side summary statistics) using simple natural-language prompts.

    Copy CodeCopiedUse a different Browser
    def custom_analysis_demo(agent):
        print("nCUSTOM ANALYSIS DEMO")
        print("=" * 50)
       
        custom_queries = [
            "Create a risk score for each passenger based on: Age (higher age = higher risk), Gender (male = higher risk), Class (3rd class = higher risk), Family size (alone or large family = higher risk). Then show the top 10 highest risk passengers who survived",
           
            "Analyze the 'deck' information from the cabin data: Extract deck letter from cabin numbers, Show survival rates by deck, Which deck had the highest survival rate?",
           
            "Find interesting patterns: Did people with similar names (same surname) tend to survive together? What's the relationship between ticket price and survival? Were there any age groups that had 100% survival rate?"
        ]
       
        for i, query in enumerate(custom_queries, 1):
            print(f"nCustom Analysis {i}:")
            print(f"Query: {query[:100]}...")
            try:
                result = agent.invoke(query)
                print(f"Result: {result['output']}")
            except Exception as e:
                print(f"Error: {e}")

    This routine kicks off a “Custom Analysis” session by walking through three complex, multi-step prompts. It builds a passenger risk-scoring model, extracts and evaluates deck-based survival rates, and mines surname-based survival patterns and fare/age outliers. Thus, we can see how easily our Gemini-powered agent handles bespoke, domain-specific investigations with just natural-language queries.

    Copy CodeCopiedUse a different Browser
    def main():
        print("Advanced Pandas Agent with Gemini Tutorial")
        print("=" * 60)
       
        if not os.getenv("GOOGLE_API_KEY"):
            print("Warning: GOOGLE_API_KEY not set!")
            print("Please set your Gemini API key as an environment variable.")
            return
       
        try:
            df = load_and_explore_data()
            print("nSetting up Gemini Agent...")
            agent = setup_gemini_agent(df)
           
            basic_analysis_demo(agent)
            advanced_analysis_demo(agent)
            multi_dataframe_demo()
            custom_analysis_demo(agent)
           
            print("nTutorial completed successfully!")
           
        except Exception as e:
            print(f"Error: {e}")
            print("Make sure you have installed all required packages and set your API key.")
    
    
    if __name__ == "__main__":
        main()
    

    The main() function serves as the starting point for the tutorial. It verifies that our Gemini API key is set, loads and explores the Titanic dataset, and initializes the conversational Pandas agent. It then sequentially runs the basic, advanced, multi-DataFrame, and custom analysis demos. Lastly, it wraps the entire workflow in a try/except block to catch and report any errors before signaling successful completion.

    Copy CodeCopiedUse a different Browser
    df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
    agent = setup_gemini_agent(df)
    
    
    agent.invoke("What factors most strongly predicted survival?")
    agent.invoke("Create a detailed survival analysis by port of embarkation")
    agent.invoke("Find any interesting anomalies or outliers in the data")

    Finally, we directly load the Titanic data, instantiate our Gemini-powered Pandas agent, and fire off three one-off queries. We identify key survival predictors, break down survival by embarkation port, and uncover anomalies or outliers. We achieve all this without modifying any of our demo functions.

    In conclusion, combining Pandas with Gemini via a LangChain DataFrame agent transforms data exploration from writing boilerplate code into crafting clear, natural-language queries. Whether we’re computing summary statistics, building custom risk scores, comparing multiple DataFrames, or drilling into nuanced survival analyses, the transformation is evident. With just a few lines of setup, we gain an interactive analytics assistant that can adapt to new questions on the fly. It can surface hidden patterns and accelerate our workflow.


    Check out the Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 99k+ ML SubReddit and Subscribe to our Newsletter.

    The post Build a Gemini-Powered DataFrame Agent for Natural Language Data Analysis with Pandas and LangChain appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTop 15 Vibe Coding Tools Transforming AI-Driven Software Development in 2025
    Next Article From Text to Action: How Tool-Augmented AI Agents Are Redefining Language Models with Reasoning, Memory, and Autonomy

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 29, 2025
    Machine Learning

    Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons

    July 29, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Urgent Citrix NetScaler Alert: Critical Memory Overflow Flaw (CVE-2025-6543, CVSS 9.2) Actively Exploited

    Security

    CVE-2025-35965 – Mattermost Denial-of-Service DoS Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-46712 – Erlang/OTP SSH Man-in-the-Middle Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-46544 – Sherpa Orchestrator Privilege Escalation Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    I replaced my laptop with Microsoft’s 12-inch Surface Pro – here’s my verdict after two weeks

    May 22, 2025

    The latest Surface Pro model offers a thinner form factor, solid battery life, and a…

    FBI Warns about Silent Ransom Group Targeting Law Firms

    May 27, 2025

    Teaching AI models the broad strokes to sketch more like humans do

    June 2, 2025

    OpenChrom – software for chromatography, spectrometry and spectroscopy

    May 16, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.