Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 17, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 17, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 17, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 17, 2025

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025

      Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

      May 17, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025
      Recent

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025

      Apps in Generative AI – Transforming the Digital Experience

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025
      Recent

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet

    A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet

    March 25, 2025

    Unlock the power of structured data extraction with LangChain and Claude 3.7 Sonnet, transforming raw text into actionable insights. This tutorial focuses on tracing LLM tool calling using LangSmith, enabling real-time debugging and performance monitoring of your extraction system. We utilize Pydantic schemas for precise data formatting and LangChain’s flexible prompting to guide Claude. Experience example-driven refinement, eliminating the need for complex training. This is a glimpse into LangSmith’s capabilities, showcasing how to build robust extraction pipelines for diverse applications, from document processing to automated data entry.

    First, we need to install the necessary packages. We’ll use langchain-core and langchain_anthropic to interface with the Claude model.

    Copy CodeCopiedUse a different Browser
    !pip install --upgrade langchain-core
    !pip install langchain_anthropic

    If you’re using LangSmith for tracing and debugging, you can set up environment variables:

    Copy CodeCopiedUse a different Browser
    LANGSMITH_TRACING=True
    LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
    LANGSMITH_API_KEY="Your API KEY"
    LANGSMITH_PROJECT="extraction_api"

    Next, we must define the schema for the information we want to extract. We’ll use Pydantic models to create a structured representation of a person.

    Copy CodeCopiedUse a different Browser
    from typing import Optional
    from pydantic import BaseModel, Field
    
    
    class Person(BaseModel):
        """Information about a person."""
    
    
        name: Optional[str] = Field(default=None, description="The name of the person")
        hair_color: Optional[str] = Field(
            default=None, description="The color of the person's hair if known"
        )
        height_in_meters: Optional[str] = Field(
            default=None, description="Height measured in meters"
        )
    

    Now, we’ll define a prompt template that instructs Claude on how to perform the extraction task:

    Copy CodeCopiedUse a different Browser
    from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
    
    
    
    
    prompt_template = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "You are an expert extraction algorithm. "
                "Only extract relevant information from the text. "
                "If you do not know the value of an attribute asked to extract, "
                "return null for the attribute's value.",
            ),
    
    
            ("human", "{text}"),
        ]
    )
    

    This template provides clear instructions to the model about its task and how to handle missing information.

    Next, we’ll initialize the Claude model that will perform our information extraction:

    Copy CodeCopiedUse a different Browser
    import getpass
    import os
    
    
    if not os.environ.get("ANTHROPIC_API_KEY"):
        os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter API key for Anthropic: ")
    
    
    from langchain.chat_models import init_chat_model
    
    
    llm = init_chat_model("claude-3-7-sonnet-20250219", model_provider="anthropic")

    Now, we’ll configure our LLM to return structured output according to our schema:

    Copy CodeCopiedUse a different Browser
    structured_llm = llm.with_structured_output(schema=Person)

    This key step tells the model to format its responses according to our Person schema.

    Let’s test our extraction system with a simple example:

    Copy CodeCopiedUse a different Browser
    text = "Alan Smith is 6 feet tall and has blond hair."
    prompt = prompt_template.invoke({"text": text})
    result = structured_llm.invoke(prompt)
    print(result)

    Now, Let’s try a more complex example:

    Copy CodeCopiedUse a different Browser
    from typing import List
    
    
    class Data(BaseModel):
        """Container for extracted information about people."""
        people: List[Person] = Field(default_factory=list, description="List of people mentioned in the text")
    
    
    structured_llm = llm.with_structured_output(schema=Data)
    
    
    text = "My name is Jeff, my hair is black and I am 6 feet tall. Anna has the same color hair as me."
    prompt = prompt_template.invoke({"text": text})
    result = structured_llm.invoke(prompt)
    print(result)
    
    
    
    
    # Next example
    text = "The solar system is large, (it was discovered by Nicolaus Copernicus), but earth has only 1 moon."
    prompt = prompt_template.invoke({"text": text})
    result = structured_llm.invoke(prompt)
    print(result)

    In conclusion, this tutorial demonstrates building a structured information extraction system with LangChain and Claude that transforms unstructured text into organized data about people. The approach uses Pydantic schemas, custom prompts, and example-driven improvement without requiring specialized training pipelines. The system’s power comes from its flexibility, domain adaptability, and utilization of advanced LLM reasoning capabilities.


    Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

    The post A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleQwen Releases the Qwen2.5-VL-32B-Instruct: A 32B Parameter VLM that Surpasses Qwen2.5-VL-72B and Other Models like GPT-4o Mini
    Next Article This AI Paper from NVIDIA Introduces Cosmos-Reason1: A Multimodal Model for Physical Common Sense and Embodied Reasoning

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 17, 2025
    Machine Learning

    Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    CVE-2025-47785 – Emlog SQL Injection and Remote Code Execution

    Common Vulnerabilities and Exposures (CVEs)

    CodeSOD: One Month

    Tech & Work

    This AI Paper Explores New Ways to Utilize and Optimize Multimodal RAG System for Industrial Applications

    Development

    CVE-2025-26783 – Samsung Mobile Processor, Wearable Processor, and Modem Exynos RRC Denial of Service Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Development

    How LotteON built dynamic A/B testing for their personalized recommendation system

    May 9, 2024

    This post is co-written with HyeKyung Yang, Jieun Lim, and SeungBum Shim from LotteON. LotteON…

    May 2025 Patch Tuesday forecast: Panic, change, and hope

    May 9, 2025

    Black Basta Ransomware Affiliates Possibly Exploited Windows Bug as a Zero-Day

    June 12, 2024

    Audible: Build seamless purchase experiences

    January 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.