Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»NuMind Releases NuExtract: A Lightweight Text-to-JSON LLM Specialized for the Task of Structured Extraction

    NuMind Releases NuExtract: A Lightweight Text-to-JSON LLM Specialized for the Task of Structured Extraction

    June 25, 2024

    NuMind introduces NuExtract, a cutting-edge text-to-JSON language model that represents a significant advancement in structured data extraction from text. This model aims to transform unstructured text into structured data highly efficiently. The innovative design and training methodologies used in NuExtract position it as a superior alternative to existing models, providing high performance and cost-efficiency.

    Image Source

    NuExtract is engineered to operate efficiently with models ranging from 0.5 billion to 7 billion parameters, achieving similar or superior extraction capabilities compared to larger, popular language models (LLMs). This efficiency is achieved by creating three distinct models within the NuExtract family: NuExtract-tiny, NuExtract, and NuExtract-large. These models have demonstrated remarkable performance in various extraction tasks, often outperforming significantly larger LLMs.

    NuExtract is available in three trained versions:

    NuExtract-tiny (0.5B): This lightweight model is ideal for applications requiring efficient performance with minimal computational resources. Despite its small size, NuExtract-tiny performs better than some larger models, making it suitable for tasks where resource constraints are a priority.

    NuExtract (3.8B): This model balances size and performance, making it well-suited for more demanding extraction tasks. It leverages a moderate number of parameters to deliver high accuracy and versatility, handling a wide range of structured extraction tasks efficiently.

    NuExtract-large (7B): The most powerful version, designed for the most complex and intensive extraction tasks. With 7 billion parameters, NuExtract-large achieves performance levels comparable to top-tier LLMs like GPT-4 while being significantly smaller and more cost-effective. This model is perfect for applications requiring the highest accuracy and detail in data extraction.

    The primary challenge NuExtract addresses is structured extraction, which involves extracting diverse information types such as entities, quantities, dates, and hierarchical relationships from documents. The extracted information is structured into a JSON format, making it easier to parse & integrate into databases or use for automated actions. For instance, extracting data from a document and organizing it into a hierarchical tree structure in JSON format is a task NuExtract handles with high precision and efficiency.

    Structured extraction tasks vary significantly in complexity. While traditional methods like regular expressions or non-generative machine learning models could handle simple entity extraction, they must improve when dealing with more complex tasks requiring deeper hierarchical extraction. Modern generative LLMs, including GPT-4, have advanced these capabilities by enabling the generation of deep extraction trees. However, NuExtract has shown that it can achieve similar results with much smaller models, making it a more practical solution for many applications.

    Image Source

    One of NuExtract’s key advantages is its ability to handle zero-shot and fine-tuned extraction scenarios. The model can extract information based solely on a predefined template or schema in a zero-shot setting without requiring task-specific training data. This capability is particularly valuable for applications where creating large annotated datasets is impractical. Additionally, NuExtract can be fine-tuned for specific applications, enhancing its performance further for specialized tasks.

    To train NuExtract, the developers employed a novel approach: They used a large and diverse corpus of text from the C4 dataset, which was annotated using a modern LLM with carefully crafted prompts. This synthetic data was then used to fine-tune a compact, generic foundation model, resulting in a highly specialized task-specific model. This training methodology ensures that NuExtract can generalize well across different domains, making it versatile for various structured extraction tasks.

    The model consistently produces valid JSON outputs, adheres to the schema, and accurately extracts relevant information. For example, in tests involving the parsing of chemical reactions, NuExtract successfully identified, classified, and extracted quantities of chemical substances and reaction conditions such as duration and temperature. This high accuracy demonstrates NuExtract’s potential to tackle complex chemistry, medicine, law, and finance extraction tasks.

    Image Source

    NuExtract’s compact size offers several practical benefits. Smaller models are less expensive to run, allowing for cost-effective inference. They also enable local deployment, essential for applications requiring data privacy. The ease of fine-tuning these models makes them adaptable to specific use cases, further enhancing their utility.

    In conclusion, NuExtract by NuMind represents a significant leap forward in structured data extraction from text. Its innovative design, efficient training methodology, and impressive performance across various tasks make it a valuable tool for transforming unstructured text into structured data. The model’s ability to perform well in both zero-shot and fine-tuned settings, coupled with its cost-efficiency and ease of deployment, positions it as a leading solution for modern data extraction challenges.

    The post NuMind Releases NuExtract: A Lightweight Text-to-JSON LLM Specialized for the Task of Structured Extraction appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleScale and simplify ML workload monitoring on Amazon EKS with AWS Neuron Monitor container
    Next Article Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 16, 2025
    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    A new MOVEit vulnerability is igniting hacking attempts. Companies should patch ASAP

    Development

    Terraform Labs Co-Founder Kwon Faces U.S. Court Over $40 Billion Fraud Scheme

    Development

    git-filter-repo – quickly rewrite git repository history

    Development

    There is legitimately a reason to still use Figma

    Web Development

    Highlights

    News & Updates

    Atomfall reviews and Metacritic scores are in: Here’s a roundup of what everyone’s saying about this new Game Pass survival game

    March 26, 2025

    Rebellion’s new open-world survival game Atomfall is here, but is the Game Pass title as…

    Microsoft Fixes ASCII Smuggling Flaw That Enabled Data Theft from Microsoft 365 Copilot

    August 29, 2024

    CVE-2022-27562 – HCL Domino Volt HTML Injection Vulnerability

    April 30, 2025

    A glimpse of the next generation of AlphaFold

    May 13, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.