Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Ultimate Guide to Node.js Development Pricing for Enterprises

      July 29, 2025

      Stack Overflow: Developers’ trust in AI outputs is worsening year over year

      July 29, 2025

      Web Components: Working With Shadow DOM

      July 28, 2025

      Google’s new Opal tool allows users to create mini AI apps with no coding required

      July 28, 2025

      5 preinstalled apps you should delete from your Samsung phone immediately

      July 30, 2025

      Ubuntu Linux lagging? Try my 10 go-to tricks to speed it up

      July 30, 2025

      How I survived a week with this $130 smartwatch instead of my Garmin and Galaxy Ultra

      July 30, 2025

      YouTube is using AI to verify your age now – and if it’s wrong, that’s on you to fix

      July 30, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Time-Controlled Data Processing with Laravel LazyCollection Methods

      July 30, 2025
      Recent

      Time-Controlled Data Processing with Laravel LazyCollection Methods

      July 30, 2025

      Create Apple Wallet Passes in Laravel

      July 30, 2025

      The Laravel Idea Plugin is Now FREE for PhpStorm Users

      July 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      New data shows Xbox is utterly dominating PlayStation’s storefront — accounting for 60% of the Q2 top 10 game sales spots

      July 30, 2025
      Recent

      New data shows Xbox is utterly dominating PlayStation’s storefront — accounting for 60% of the Q2 top 10 game sales spots

      July 30, 2025

      Opera throws Microsoft to Brazil’s watchdogs for promoting Edge as your default browser — “Microsoft thwarts‬‭ browser‬‭ competition‬‭‬‭ at‬‭ every‬‭ turn”

      July 30, 2025

      Activision once again draws the ire of players for new Diablo Immortal marketing that appears to have been made with generative AI

      July 30, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Transformers Can Now Predict Spreadsheet Cells without Fine-Tuning: Researchers Introduce TabPFN Trained on 100 Million Synthetic Datasets

    Transformers Can Now Predict Spreadsheet Cells without Fine-Tuning: Researchers Introduce TabPFN Trained on 100 Million Synthetic Datasets

    April 15, 2025

    Tabular data is widely utilized in various fields, including scientific research, finance, and healthcare. Traditionally, machine learning models such as gradient-boosted decision trees have been preferred for analyzing tabular data due to their effectiveness in handling heterogeneous and structured datasets. Despite their popularity, these methods have notable limitations, particularly in terms of performance on unseen data distributions, transferring learned knowledge between datasets, and integration challenges with neural network-based models because of their non-differentiable nature.

    Researchers from the University of Freiburg, Berlin Institute of Health, Prior Labs, and ELLIS Institute have introduced a novel approach named Tabular Prior-data Fitted Network (TabPFN). TabPFN leverages transformer architectures to address common limitations associated with traditional tabular data methods. The model significantly surpasses gradient-boosted decision trees in both classification and regression tasks, especially on datasets with fewer than 10,000 samples. Notably, TabPFN demonstrates remarkable efficiency, achieving better results in just a few seconds compared to several hours of extensive hyperparameter tuning required by ensemble-based tree models.

    TabPFN utilizes in-context learning (ICL), a technique initially introduced by large language models, where the model learns to solve tasks based on contextual examples provided during inference. The researchers adapted this concept specifically for tabular data by pre-training TabPFN on millions of synthetically generated datasets. This training method allows the model to implicitly learn a broad spectrum of predictive algorithms, reducing the need for extensive dataset-specific training. Unlike traditional deep learning models, TabPFN processes entire datasets simultaneously during a single forward pass through the network, which enhances computational efficiency substantially.

    The architecture of TabPFN is specifically designed for tabular data, employing a two-dimensional attention mechanism tailored to effectively utilize the inherent structure of tables. This mechanism allows each data cell to interact with others across rows and columns, effectively managing different data types and conditions such as categorical variables, missing data, and outliers. Furthermore, TabPFN optimizes computational efficiency by caching intermediate representations from the training set, significantly accelerating inference on subsequent test samples.

    Empirical evaluations highlight TabPFN’s substantial improvements over established models. Across various benchmark datasets, including the AutoML Benchmark and OpenML-CTR23, TabPFN consistently achieves higher performance than widely used models like XGBoost, CatBoost, and LightGBM. For classification problems, TabPFN showed notable gains in normalized ROC AUC scores relative to extensively tuned baseline methods. Similarly, in regression contexts, it outperformed these established approaches, showcasing improved normalized RMSE scores.

    TabPFN’s robustness was also extensively evaluated across datasets characterized by challenging conditions, such as numerous irrelevant features, outliers, and substantial missing data. In contrast to typical neural network models, TabPFN maintained consistent and stable performance under these challenging scenarios, demonstrating its suitability for practical, real-world applications.

    Beyond its predictive strengths, TabPFN also exhibits fundamental capabilities typical of foundation models. It effectively generates realistic synthetic tabular datasets and accurately estimates probability distributions of individual data points, making it suitable for tasks such as anomaly detection and data augmentation. Additionally, the embeddings produced by TabPFN are meaningful and reusable, providing practical value for downstream tasks including clustering and imputation.

    In summary, the development of TabPFN signifies an important advancement in modeling tabular data. By integrating the strengths of transformer-based models with the practical requirements of structured data analysis, TabPFN offers enhanced accuracy, computational efficiency, and robustness, potentially facilitating substantial improvements across various scientific and business domains.


    Here is the Paper. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Transformers Can Now Predict Spreadsheet Cells without Fine-Tuning: Researchers Introduce TabPFN Trained on 100 Million Synthetic Datasets appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous Articlememethesis-cli creates memes from the terminal
    Next Article SQL-R1: A Reinforcement Learning-based NL2SQL Model that Outperforms Larger Systems in Complex Queries with Transparent and Accurate SQL Generation

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 29, 2025
    Machine Learning

    Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons

    July 29, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-7397 – Brocade ASCG CLI Command History Information Disclosure

    Common Vulnerabilities and Exposures (CVEs)

    I took my ROG Ally X to Taiwan in this magnetic case, and going zipperless went better than I thought it would

    News & Updates

    AMD’s Ryzen 7 7800X3D hits the lowest price of the year — Why wait for Prime Day?

    News & Updates

    CVE-2025-49834 – GPT-SoVITS-WebUI Command Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-31328 – SAP Learning Solution CSRF Vulnerability

    April 22, 2025

    CVE ID : CVE-2025-31328

    Published : April 22, 2025, 7:15 p.m. | 3 hours, 34 minutes ago

    Description : SAP Learning Solution is vulnerable to Cross-Site Request Forgery (CSRF), allowing an attacker to trick authenticated user into sending unintended requests to the server. GET-based OData function is named in a way that it violates the expected behaviour. This issue could impact both the confidentiality and integrity of the application without affecting the availability.

    Severity: 4.6 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-5600 – TOTOLINK EX1200T Stack-Based Buffer Overflow Vulnerability

    June 4, 2025

    Microsoft adds RFT & SFT support in Azure AI Foundry for smarter model fine-tuning

    May 14, 2025

    CVE-2025-20271 – Cisco AnyConnect VPN Server SSL VPN Session Denial of Service Vulnerability

    June 18, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.