DrBenchmark: The First-Ever Publicly Available French Biomedical Large Language Understanding Benchmark

A group of researchers in France introduced Dr.Benchmark to address the need for the evaluation of masked language models in French, particularly in the biomedical domain. There have been significant advances in the field of NLP, particularly in pre-trained language models (PLMs), but evaluating these models remains difficult due to variations in evaluation protocols. The scarcity of evaluation benchmarks in the biomedical domain in languages other than English and Chinese has made this even more challenging. These issues created a gap in evaluating the accuracy of the latest French biomedical models.

The existing method for evaluating French language models failed to provide standardized protocols and comprehensive benchmark datasets, leading to inconsistent results and stalling advancement in NLP research. DrBenchmark is the first publicly available French biomedical language understanding benchmark. This benchmark comprises 20 diversified tasks, including named-entity recognition, part-of-speech tagging, question-answering, semantic textual similarity, and classification. The primary contribution of DrBenchmark is its aggregation of diverse downstream tasks into a single benchmark, allowing the assessment of pre-trained language modelsâ€™ intrinsic qualities from various perspectives. The paper also tests eight cutting-edge pre-trained masked language models (MLMs) on both general and biomedical data. The MLMs include French generalist models, cross-lingual generalist models, French biomedical models, and an English biomedical model.

DrBenchmark offers a modular, reproducible, and easily customizable automated protocol for fair comparison among language models. It leverages the HuggingFace Datasets and the Transformers library for data loading, pre-training, and evaluation. The experimental protocol ensures consistency by fine-tuning all models using the same hyperparameters for each downstream task. Results from the experiments reveal that no single model excels across all tasks, highlighting the importance of domain-specific models for achieving peak performance in the biomedical field. Interestingly, even though French biomedical models exhibit superior performance in most tasks, certain out-of-domain models or models trained in different languages maintain competitiveness in specific tasks.

In conclusion, the paper presents DrBenchmark to solve the lack of evaluation resources for French biomedical NLP models. By aggregating diverse downstream tasks into a comprehensive benchmark, DrBenchmark enables fair comparison among pre-trained language models. The evaluation results highlight the importance of employing domain-specific models for optimal performance in biomedical NLP tasks. The study also shows that certain models trained in different languages or outside of the domain can still compete in specific tasks, underscoring the need for more study in this field.

Check out theÂ Paper and Project page.Â All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 40k+ ML SubReddit

The post DrBenchmark: The First-Ever Publicly Available French Biomedical Large Language Understanding Benchmark appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

DrBenchmark: The First-Ever Publicly Available French Biomedical Large Language Understanding Benchmark

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

GitHub Patches Critical Security Flaw in Enterprise Server Granting Admin Privileges

Test Cases For Text Box & How to Write It?

8 Kingdom Come: Deliverance 2 beginner tips on what to do first in this grand medieval RPG

Understanding the hasOwnProperty() Method in JavaScript

T-Shaped vs. V-Shaped Designers

Influencing Product Strategy at MongoDB with Garaudy Etienne

Enhancing Anomaly Detection with Adaptive Noise: A Pseudo Anomaly Approach

Managing Laravel View Search Paths

DrBenchmark: The First-Ever Publicly Available French Biomedical Large Language Understanding Benchmark

Related Posts