Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»ChuXin: A Fully Open-Sourced Language Model with a Size of 1.6 Billion Parameters

    ChuXin: A Fully Open-Sourced Language Model with a Size of 1.6 Billion Parameters

    May 11, 2024

    The capacity of large language models (LLMs) to produce adequate text in various application domains has caused a revolution in natural language creation. These models are essentially two types: 1) Most model weights and data sources are open source. 2) All model-related information is publicly available, including training data, data sampling ratios, training logs, intermediate checkpoints, and assessment methods (Tiny-Llama, OLMo, and StableLM 1.6B). Full access to open language models for the research community is vital for thoroughly investigating these models’ capabilities and limitations and understanding their inherent biases and potential risks. This is necessary despite the continued breakthroughs in the performance of community-released models.

    Meet ChuXin 1.6B, a 1.6 billion parameter open-source language model. Various sources, including encyclopedias, online publications, public knowledge databases in English and Chinese, and 2.3 trillion tokens of open-source data, were utilized to train ChuXin. Other open-source projects inspired by this project include OLMo, Tiny-Llama, and StableLM 1.6B. To accomplish an input length of 1 million, the researchers have improved ChuXin’s context length capabilities by continuing pre-training on datasets derived from lengthier texts. The researchers strongly believe that cultivating a broad and diverse ecosystem of these models is the best way to improve their scientific understanding of open language models and drive technology advancements to make them more practical. 

    For their backbone, the team used LLaMA2, tweaked for about 1.6 billion parameters. The following provides further information regarding the design of ChuXin 1.6B as provided by the researchers.

    Positional embeddings that rotate (RoPE): They use the Rotary Positional Embedding (RoPE) technique to record the associations between sequence parts at different locations.

    Root-mean-squared norm: Pre-normalization, which involves normalizing the input before each sub-layer in the transformer, offers a more consistent training process. This work normalization strategy also uses RMSNorm, which improves training efficiency. 

    Focus Cover: Following stableLM’s lead, the team implemented a block-diagonal attention mask architecture that resets attention masks at EOS (End of Sequence) tokens for all packed sequences. This method enhances the model’s performance even further by avoiding the problem of cross-attention during the model’s cool-down phase. 

    Generator of tokens: The data was tokenized using the DeepSeek LLM tokenizer, which is based on the tokenizers library’s Byte-Level Byte-Pair Encoding (BBPE). The lexicon contains 102,400 words. The tokenizer’s training was done on a 24-gigabyte multilingual corpus. In addition, this tokenizer can improve the encoding of numerical data by dividing numbers into individual digits. 

    Expanded information. The team used SwiGLU as their activation function.

    The team’s training process involved utilizing all pre-training datasets obtained from HuggingFace, facilitating easier reproduction of their pre-trained model by others. They optimized their model’s training speed by starting from scratch, using a 4096-context length and several efficient implementations. The researchers began by enhancing the device’s throughput during training with FlashAttention-2. Training was executed using BFloat16 mixed precision, with all-reduce operations preserved in FP32. The research indicates that there is little difference in loss between training on unique data and training on repeated data over several epochs. As part of this effort, they trained for two epochs using 2 trillion (2T) tokens.

    To test the model’s performance on Chinese tasks, the team uses the CMMLU and the C-Eval, two tests for Chinese comprehension and reasoning. They also use the HumanEval to test how well the model can generate code. The pre-training performance of ChuXin was tracked using commonsense reasoning benchmarks. The results demonstrate that except OpenbookQA, ChuXin’s performance on most tasks improves as the quantity of training tokens increases.

    In the future, the team envisions providing larger and more advanced models, incorporating features like instruction tweaking and multi-modal integration. They also plan to share the challenges they faced and the solutions they devised while developing ChuXin, aiming to inspire the open-source community and stimulate further progress in language modeling. 

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 42k+ ML SubReddit

    The post ChuXin: A Fully Open-Sourced Language Model with a Size of 1.6 Billion Parameters appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleInnovating Game Design with GPT: A Comprehensive Scoping Review
    Next Article Top 50 AI Writing Tools To Try in 2024

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2024-47893 – VMware GPU Firmware Memory Disclosure

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Critical Docker Engine Flaw Allows Attackers to Bypass Authorization Plugins

    Development

    How to Implement Spring Expression Language (SpEL) Validator in Spring Boot: A Step-by-Step Guide

    Development

    How to Use Git and Git Bash Locally: A Comprehensive Guide

    Machine Learning

    Learn the Basics of API Security

    Development

    Highlights

    Machine Learning

    Building a virtual meteorologist using Amazon Bedrock Agents

    February 11, 2025

    The integration of generative AI capabilities is driving transformative changes across many industries. Although weather…

    How AI is Revolutionizing Mobile App Development with React Native🤖

    April 17, 2025

    CVE-2025-3874 – WordPress Simple Shopping Cart Insecure Direct Object Reference

    May 1, 2025

    GitLab Releases Security Update to Patch XSS and Account Takeover Flaws

    April 23, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.