Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Double-Edged Sustainability Sword Of AI In Web Design

      August 20, 2025

      Top 12 Reasons Enterprises Choose Node.js Development Services for Scalable Growth

      August 20, 2025

      GitHub’s coding agent can now be launched from anywhere on platform using new Agents panel

      August 20, 2025

      Stop writing tests: Automate fully with Generative AI

      August 19, 2025

      I’m a diehard Pixel fan, but I’m not upgrading to the Pixel 10. Here’s why

      August 21, 2025

      Google Pixel Watch 4 vs. Samsung Galaxy Watch 8: I compared the two best Androids, and here’s the winner

      August 21, 2025

      Get a free Amazon gift card up to $300 when you preorder a new Google Pixel 10 phone – here’s how

      August 21, 2025

      Everything announced at Made by Google 2025: Pixel 10 Pro, Fold, Watch 4, and more

      August 21, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Copy Errors as Markdown to Share With AI in Laravel 12.25

      August 21, 2025
      Recent

      Copy Errors as Markdown to Share With AI in Laravel 12.25

      August 21, 2025

      Deconstructing the Request Lifecycle in Sitecore Headless – Part 2: SSG and ISR Modes in Next.js

      August 20, 2025

      Susan Etlinger, AI Analyst and Industry Watcher on Building Trust

      August 20, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      TerraMaster D1 SSD Plus Review: Experience a Faster External SSD

      August 20, 2025
      Recent

      TerraMaster D1 SSD Plus Review: Experience a Faster External SSD

      August 20, 2025

      Microsoft is investigating Windows 11 KB5063878 SSD data corruption/failure issue

      August 20, 2025

      Microsoft Surface Won’t Turn On: 6 Tested Solutions to Fix

      August 20, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»This AI Paper Introduces PARSCALE (Parallel Scaling): A Parallel Computation Method for Efficient and Scalable Language Model Deployment

    This AI Paper Introduces PARSCALE (Parallel Scaling): A Parallel Computation Method for Efficient and Scalable Language Model Deployment

    May 21, 2025

    Over time, the pursuit of better performance of language models has pushed researchers to scale them up, which typically involves increasing the number of parameters or extending their computational capacity. As a result, the development and deployment of language models now heavily depend on the availability of substantial computational resources and memory.

    Despite the advances, increasing model size or generating more tokens to enhance reasoning capabilities leads to significant challenges. Parameter scaling methods like Dense Scaling and Mixture-of-Experts Scaling, which involve increasing the number of trainable weights, demand much larger memory resources. Meanwhile, Inference-time scaling, on the other hand, requires models to generate longer sequences or conduct multiple reasoning steps, which introduces latency and makes deployment slower. While effective, these approaches are not adaptable across all scenarios and fail to address deployment efficiency for low-resource settings such as mobile devices or embedded systems.

    Researchers from Zhejiang University and Alibaba Group proposed a new approach termed PARSCALE, which stands for Parallel Scaling. This method shifts focus from increasing model size or output length to increasing the model’s parallel computations during training and inference. By applying multiple learnable transformations to the input, the model executes several forward passes in parallel and aggregates their outputs dynamically. PARSCALE retains the model’s original parameter count and boosts computational diversity, making it an adaptable solution for various tasks and model architectures without requiring specialized datasets or changes in training protocols.

    At the technical level, the PARSCALE appends several distinct, learnable prefixes to the same input, producing multiple parallel versions. The model processes these simultaneously, and the outputs are aggregated using a dynamic weighted sum calculated by a multilayer perceptron. This structure introduces only about 0.2% extra parameters per stream, a minor addition compared to full parameter scaling. The model uses prefix tuning to distinguish each parallel stream via unique key-value caches, allowing for efficient memory reuse. The approach also benefits from GPU-friendly parallelization, which helps to keep latency low despite the additional computation. This design ensures scalability without modifying the core architecture and enables application even in frozen pretrained models by only training the new prefix and aggregation parameters.

    The researchers conducted extensive experiments on models ranging from 0.5B to 4.4B parameters with parallel streams P set from 1 to 8. When training with 42 billion tokens, models with P = 8 demonstrated performance equivalent to models with up to 4.4 billion parameters, but required significantly less memory and latency. Specifically, on a 1.6B model, PARSCALE used 22× less memory increase and 6× less latency increase compared to parameter scaling for the same performance. On downstream tasks, PARSCALE yielded up to a 34% improvement on GSM8K and 23% on MMLU. Coding performance improved significantly—models with 1.6B parameters and P = 8 achieved results comparable to those of a 4.4B parameter model. The method also proved effective during post-training and parameter-efficient fine-tuning, maintaining high performance even when core model parameters remained unchanged.

    This paper introduced a strategy that rethinks how language models can be scaled. Instead of inflating model size or inference steps, it focuses on efficiently reusing existing computation. The researchers’ approach addresses time and memory inefficiencies while maintaining or improving performance. This demonstrates a compelling shift in scaling methods and sets a direction for deploying advanced models in constrained environments using parallel computation effectively.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post This AI Paper Introduces PARSCALE (Parallel Scaling): A Parallel Computation Method for Efficient and Scalable Language Model Deployment appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMarktechpost Releases 2025 Agentic AI and AI Agents Report: A Technical Landscape of AI Agents and Agentic AI
    Next Article Meta Researchers Introduced J1: A Reinforcement Learning Framework That Trains Language Models to Judge With Reasoned Consistency and Minimal Data

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 21, 2025
    Machine Learning

    Enhance AI agents using predictive ML models with Amazon SageMaker AI and Model Context Protocol (MCP)

    August 21, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-43841 – WP Vegas Cross-site Scripting (XSS)

    Common Vulnerabilities and Exposures (CVEs)

    10 passkey survival tips: The best preparation for a password-less future is to start living there now

    News & Updates
    Fourier Neural Operators Just Got a Turbo Boost: Researchers from UC Riverside Introduce TurboFNO, a Fully Fused FFT-GEMM-iFFT Kernel Achieving Up to 150% Speedup over PyTorch

    Fourier Neural Operators Just Got a Turbo Boost: Researchers from UC Riverside Introduce TurboFNO, a Fully Fused FFT-GEMM-iFFT Kernel Achieving Up to 150% Speedup over PyTorch

    Machine Learning

    CVE-2025-6352 – “Code-projects Automated Voting System Remote Code Execution Vulnerability”

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Machine Learning

    How Handmade.com modernizes product image and description handling with Amazon Bedrock and Amazon OpenSearch Service

    August 4, 2025

    Handmade.com is a leading hand-crafts product marketplace, offering unique, seller-contributed items to customers around the…

    CVE-2025-49254 – ThemBay Nika PHP Remote File Inclusion Vulnerability

    June 17, 2025

    Il podcast di Marco’s Box – Puntata 208

    July 24, 2025

    CVE-2025-48959 – Acronis Cyber Protect Cloud Agent Local Privilege Escalation

    June 4, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.