A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet

Unlock the power of structured data extraction with LangChain and Claude 3.7 Sonnet, transforming raw text into actionable insights. This tutorial focuses on tracing LLM tool calling using LangSmith, enabling real-time debugging and performance monitoring of your extraction system. We utilize Pydantic schemas for precise data formatting and LangChain’s flexible prompting to guide Claude. Experience example-driven refinement, eliminating the need for complex training. This is a glimpse into LangSmith’s capabilities, showcasing how to build robust extraction pipelines for diverse applications, from document processing to automated data entry.

First, we need to install the necessary packages. We’ll use langchain-core and langchain_anthropic to interface with the Claude model.

Copy CodeCopiedUse a different Browser

!pip install --upgrade langchain-core
!pip install langchain_anthropic

If you’re using LangSmith for tracing and debugging, you can set up environment variables:

Copy CodeCopiedUse a different Browser

LANGSMITH_TRACING=True
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY="Your API KEY"
LANGSMITH_PROJECT="extraction_api"

Next, we must define the schema for the information we want to extract. We’ll use Pydantic models to create a structured representation of a person.

Copy CodeCopiedUse a different Browser

from typing import Optional
from pydantic import BaseModel, Field


class Person(BaseModel):
    """Information about a person."""


    name: Optional[str] = Field(default=None, description="The name of the person")
    hair_color: Optional[str] = Field(
        default=None, description="The color of the person's hair if known"
    )
    height_in_meters: Optional[str] = Field(
        default=None, description="Height measured in meters"
    )

Now, we’ll define a prompt template that instructs Claude on how to perform the extraction task:

Copy CodeCopiedUse a different Browser

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder




prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert extraction algorithm. "
            "Only extract relevant information from the text. "
            "If you do not know the value of an attribute asked to extract, "
            "return null for the attribute's value.",
        ),


        ("human", "{text}"),
    ]
)

This template provides clear instructions to the model about its task and how to handle missing information.

Next, we’ll initialize the Claude model that will perform our information extraction:

Copy CodeCopiedUse a different Browser

import getpass
import os


if not os.environ.get("ANTHROPIC_API_KEY"):
    os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter API key for Anthropic: ")


from langchain.chat_models import init_chat_model


llm = init_chat_model("claude-3-7-sonnet-20250219", model_provider="anthropic")

Now, we’ll configure our LLM to return structured output according to our schema:

Copy CodeCopiedUse a different Browser

structured_llm = llm.with_structured_output(schema=Person)

This key step tells the model to format its responses according to our Person schema.

Let’s test our extraction system with a simple example:

Copy CodeCopiedUse a different Browser

text = "Alan Smith is 6 feet tall and has blond hair."
prompt = prompt_template.invoke({"text": text})
result = structured_llm.invoke(prompt)
print(result)

Now, Let’s try a more complex example:

Copy CodeCopiedUse a different Browser

from typing import List


class Data(BaseModel):
    """Container for extracted information about people."""
    people: List[Person] = Field(default_factory=list, description="List of people mentioned in the text")


structured_llm = llm.with_structured_output(schema=Data)


text = "My name is Jeff, my hair is black and I am 6 feet tall. Anna has the same color hair as me."
prompt = prompt_template.invoke({"text": text})
result = structured_llm.invoke(prompt)
print(result)




# Next example
text = "The solar system is large, (it was discovered by Nicolaus Copernicus), but earth has only 1 moon."
prompt = prompt_template.invoke({"text": text})
result = structured_llm.invoke(prompt)
print(result)

In conclusion, this tutorial demonstrates building a structured information extraction system with LangChain and Claude that transforms unstructured text into organized data about people. The approach uses Pydantic schemas, custom prompts, and example-driven improvement without requiring specialized training pipelines. The system’s power comes from its flexibility, domain adaptability, and utilization of advanced LLM reasoning capabilities.

Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

The post A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

Big Changes at Meteor Software: Our Next Chapter

Apps in Generative AI – Transforming the Digital Experience

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs

CVE-2025-47785 – Emlog SQL Injection and Remote Code Execution

CodeSOD: One Month

This AI Paper Explores New Ways to Utilize and Optimize Multimodal RAG System for Industrial Applications

CVE-2025-26783 – Samsung Mobile Processor, Wearable Processor, and Modem Exynos RRC Denial of Service Vulnerability

How LotteON built dynamic A/B testing for their personalized recommendation system

May 2025 Patch Tuesday forecast: Panic, change, and hope

Black Basta Ransomware Affiliates Possibly Exploited Windows Bug as a Zero-Day

Audible: Build seamless purchase experiences

A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet

Related Posts