Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Benchmarking Federated Learning for Large Language Models with FedLLM-Bench

    Benchmarking Federated Learning for Large Language Models with FedLLM-Bench

    June 12, 2024

    Large language models (LLMs) have achieved remarkable success across various domains, but training them centrally requires massive data collection and annotation efforts, making it costly for individual parties. Federated learning (FL) has emerged as a promising solution, enabling collaborative training of LLMs on decentralized data while preserving privacy (FedLLM). Although frameworks like OpenFedLLM, FederatedScope-LLM, and FedML-LLM have been developed along with methods tackling data quality, intellectual property, privacy, and resource constraints in FedLLM, a significant challenge remains the lack of realistic benchmarks. Current works construct artificial FL datasets by partitioning centralized datasets, failing to capture properties of real-world cross-user data.

    Numerous methods have been proposed to address data heterogeneity in federated learning, a major challenge where clients’ datasets come from different distributions. These include regularization, gradient correction, feature alignment, adjusting aggregation weights, introducing momentum, and leveraging pre-trained models. While FedLLM has gained traction recently, with frameworks like OpenFedLLM, FederatedScope-LLM, FedML-LLM, and methods like FedbiOT for model property protection and FFA-LoRA for differential privacy, a significant limitation persists. Previous works evaluate artificially crafted federated datasets by partitioning centralized datasets, failing to capture the complexities of real-world cross-user data.

    Researchers from Shanghai Jiao Tong University, Tsinghua University, and Shanghai AI Laboratory propose FedLLM-Bench, the first realistic benchmark for FedLLM. It offers a comprehensive testbed with four datasets: Fed-Aya (multilingual instruction tuning), Fed-WildChat (multi-turn chat instruction tuning), Fed-ChatbotIT (single-turn chat instruction tuning), and Fed-ChatbotPA (preference alignment). These datasets are naturally split by real-world user IDs across 38 to 747 clients, capturing realistic federated properties like cross-device data partitioning. The datasets exhibit diversity in languages, data quality, quantity, sequence lengths, and user preferences, mirroring real-world complexities. FedLLM-Bench integrates these datasets with 8 baseline methods and 6 evaluation metrics to facilitate method comparisons and exploration of new research directions.

    The FedLLM-Bench is introduced from four perspectives: training methods, datasets, dataset analysis, and evaluation metrics. For training methods, it covers federated instruction tuning and preference alignment tasks using parameter-efficient LoRA fine-tuning along with 8 baseline FL methods like FedAvg, FedProx, SCAFFOLD, FedAvgM, FedAdagrad, FedYogi, and FedAdam. The benchmark includes four diverse datasets: Fed-Aya (multilingual instruction tuning), Fed-ChatbotIT, Fed-WildChat, and Fed-ChatbotPA, capturing realistic properties like varied languages, quality, quantity, lengths, and user preferences. Extensive dataset analysis reveals inter/intra-dataset diversities in aspects like length, instructions, quality, embeddings, and quantity. The evaluation uses 6 metrics – 4 open-ended (MT-Bench, Vicuna bench, AdvBench, Ref-GPT4) and 2 close-ended (MMLU, HumanEval).

    The benchmark evaluates the implemented methods across diverse datasets. On the multilingual Fed-Aya, most federated methods outperform local training on average, though no single method dominates all languages, highlighting opportunities for language personalization. For Fed-ChatbotIT, all federated approaches enhance instruction-following ability over local training without compromising general capabilities, with FedAdagrad performing best overall. On Fed-WildChat for single and multi-turn conversations, federated methods consistently surpass local training, with FedAvg proving the most effective for multi-turn. For Fed-ChatbotPA preference alignment, federated training improves instruction-following and safety compared to local, with FedAvgM, FedProx, SCAFFOLD, and FedAvg being top performers. Across datasets, federated learning demonstrates clear benefits over individual training by utilizing collaborative data.

    In this study, researchers introduce FedLLM-Bench, the first realistic benchmark for FedLLM. The core contribution is a suite of four diverse datasets spanning instruction tuning and preference alignment tasks, exhibiting real-world properties like varied languages, data quality, quantity, instruction styles, sequence lengths, embeddings, and user preferences across 38 to 747 clients. Integrated with eight training methods, four training datasets, and six evaluation metrics, extensive experiments on FedLLM-Bench benchmark classical federated approaches and explore research directions like cross-lingual collaboration and differential privacy. By providing a comprehensive, practical testbed mirroring real-world complexities, FedLLM-Bench aims to reduce effort, enable fair comparisons, and propel progress in the emerging area of FedLLM. This timely benchmark can greatly benefit the research community working on collaborative, privacy-preserving training of large language models.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 44k+ ML SubReddit

    The post Benchmarking Federated Learning for Large Language Models with FedLLM-Bench appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDeepStack: Enhancing Multimodal Models with Layered Visual Token Integration for Superior High-Resolution Performance
    Next Article Identifying and Mitigating Risks from Drunk and Impaired Drivers on the Road

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Adding Real Time Chat to Laravel Using Reverb & Vue

    Development

    Eloquent Fill and Insert Method in Laravel 12.6

    Development

    CVE-2025-46737 – Cisco SEL Cross-Origin Resource Sharing (CORS) Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4215 – “uBlock Origin Regular Expression Complexity Remote Vulnerability”

    Common Vulnerabilities and Exposures (CVEs)
    Hostinger

    Highlights

    Development

    Laravel Prompts Adds a Multi-line Textarea Input, Laravel 11.3 Released

    April 10, 2024

    This week, the Laravel team released v11.3, which includes multi-line text in Laravel Prompts, a…

    CVE-2025-45015 – PHPGurukul Park Ticketing Management System Cross-Site Scripting (XSS)

    April 30, 2025

    From concept to reality: Navigating the Journey of RAG from proof of concept to production

    February 12, 2025

    Google Abandons Plan to Phase Out Third-Party Cookies in Chrome

    July 26, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.