Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»The Evolution of Chinese Large Language Models (LLMs)

    The Evolution of Chinese Large Language Models (LLMs)

    June 11, 2024

    Pre-trained language model development has advanced significantly in recent years, especially with the advent of large-scale models. For languages such as English, there is no shortage of open-source chat models. However, the Chinese language has not seen equivalent progress. To bridge this gap, several Chinese models have been introduced, showcasing innovative approaches and achieving remarkable results. Some of the most prominent Chinese Large Language Models (LLMs) have been discussed in this article. 

    Yi 

    The Yi model family is well known for its multidimensional capabilities, from basic language models to multimodal applications. The Yi models, which have 34B and 6B parameter versions, perform well on benchmarks such as MMLU. The vision-language models in this family combine semantic language spaces with visual representations using creative data engineering and scalable supercomputer infrastructure. Pre-training the models on a massive 3.1 trillion token corpus guarantees reliable results and strong performance on a range of tasks.

    HF Page: https://huggingface.co/01-ai

    GitHub Page: https://github.com/01-ai/Yi

    QWEN

    Together with base pre-trained models and refined conversation models, QWEN is a comprehensive collection of language models. The QWEN series performs exceptionally well in a variety of downstream tasks. The use of Reinforcement Learning from Human Feedback (RLHF) in the chat models makes them stand out in particular. These models are competitive even against larger models since they exhibit sophisticated tool use and planning skills. The series’ versatility has been demonstrated by special variations like CODE-QWEN and MATH-QWEN-CHAT, which excel at coding and mathematics-focused jobs.

    HF Page: https://huggingface.co/Qwen/Qwen-14B

    GitHub Page: https://github.com/QwenLM/Qwen

    DeepSeek-V2

    DeepSeek-V2 is a mixture-of-experts (MoE) model that balances potent performance and cost-effective operation. With a context length of 128K tokens, DeepSeek-V2 allows 236B parameters, of which only 21B are enabled per token. Through the use of DeepSeekMoE and Multi-head Latent Attention (MLA) architectures, the model achieves notable increases in efficiency, cutting training costs by 42.5% and increasing throughput.

    GitHub Page: https://github.com/deepseek-ai/DeepSeek-V2

    WizardLM

    WizardLM uses LLMs rather than manual human input to overcome the difficulty of creating high-complexity instruction data. The model iteratively rewrites instructions to increase complexity using a unique technique called Evol-Instruct. When LLaMA is fine-tuned using this AI-generated data, WizardLM is produced, which performs better than human-created instructions in assessments conducted by humans. Additionally, the model is favorably compared to OpenAI’s ChatGPT.

    GitHub Page: https://github.com/nlpxucan/WizardLM

    GLM-130B

    With 130 billion parameters, the multilingual (English and Chinese) GLM-130B model competes with the GPT-3 (Davinci) model in terms of performance. GLM-130B beats ERNIE TITAN 3.0 on Chinese benchmarks and excels several key models on English benchmarks, overcoming various technological obstacles during training. Due to its special scaling property, which enables INT4 quantization without causing performance loss after training, it is a highly effective option for large-scale model deployment.

    GitHub Page: https://github.com/THUDM/GLM-130B

    CogVLM

    CogVLM is a sophisticated visual language model whose architecture thoroughly incorporates vision-language elements. CogVLM uses a trainable visual expert module, in contrast to shallow alignment techniques, and achieves state-of-the-art performance across several cross-modal benchmarks. The model’s great performance and versatility are demonstrated by the variety of applications it supports, including visual grounding and image captioning.

    HF Page: https://huggingface.co/THUDM/CogVLM

    GitHub Page: https://github.com/THUDM/CogVLM

    Baichuan-7B

    With 4-bit weights and 16-bit activations, the Baichuan-7B models optimize for on-device deployment and reach state-of-the-art performance on Chinese and English benchmarks. Baichuan-7B’s quantization renders it appropriate for a multitude of uses, guaranteeing effective and efficient operation in practical situations.

    HF Page: https://huggingface.co/baichuan-inc/Baichuan-7B

    InternLM

    Chinese, English, and coding problems are areas in which InternLM, a 100B multilingual model trained on over a trillion tokens, excels. Improved with superior human-annotated dialogue data and RLHF technology, InternLM produces responses consistent with morality and human values, giving it a strong option for intricate exchanges.

    HF Page: https://huggingface.co/internlm

    GitHub Page: https://github.com/InternLM/InternLM

    Skywork-13B

    With 3.2 trillion tokens under its belt, Skywork-13B is among the most extensively trained bilingual models. It performs well on tasks that are both general-purpose and domain-specific, with the help of a two-stage training technique. In addition, the approach addresses data contamination concerns and presents a unique leakage detection technique with the goal of democratizing access to high-quality LLMs.

    GitHub Page: https://github.com/SkyworkAI/Skywork

    ChatTTS

    A generative text-to-speech model with support for both Chinese and English dialogue scenarios is ChatTTS. ChatTTS provides extremely accurate and natural-sounding speech output, having been trained on more than 100,000 hours of speech data. 

    GitHub Page: https://github.com/cronrpc/ChatTTS-webui

    Hunyuan-DiT

    Hunyuan-DiT is a text-to-image diffusion transformer that performs exceptionally well in terms of fine-grained comprehension of Chinese and English. The architecture of the model is meticulously crafted to maximize performance, encompassing its positional encoding, text encoder, and transformer structure. Hunyuan-DiT benefits from an extensive data pipeline that facilitates iterative model optimization by means of ongoing assessments and modifications. Picture captions are refined using a Multimodal Large Language Model to improve language comprehension, which allows Hunyuan-DiT to participate in multi-turn multimodal conversations. Several human evaluations have confirmed that this model represents a new state-of-the-art in Chinese-to-image generation. 

    ERNIE 3.0 

    ERNIE 3.0 addresses the limitations of conventional pre-trained models that only use plain text without incorporating further knowledge. The model performs well in tasks involving both natural language creation and processing because of its combined architecture of auto-regressive and auto-encoding networks. After being trained on a 4TB plaintext corpus and a large-scale knowledge graph, the 10-billion parameter model beats the most advanced models on 54 Chinese natural language processing tasks. On the SuperGLUE benchmark, its English translation has attained optimal performance, even outperforming human performance.

    HF Page: https://huggingface.co/nghuyong/ernie-3.0-base-zh

    AND MANY MORE……….

    The post The Evolution of Chinese Large Language Models (LLMs) appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleResearchers at the University of Illinois have developed AI Agents that can Autonomously Hack Websites and Find Zero-Day Vulnerabilities
    Next Article Creating Fullscreen Animations with CSS Grid and GSAP Flip

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Is your phone truly waterproof? Here’s what the IP rating tells you

    News & Updates

    Fog Ransomware Group Exposed: Inside the Tools, Tactics, and Victims of a Stealthy Threat

    Security

    From Deep Knowledge Tracing to DKT2: A Leap Forward in Educational AI

    Machine Learning

    A Milestone in Universal Design for Healthcare Blog Series

    Development

    Highlights

    News & Updates

    Anthropic CEO predicts a “bipolar” world with China holding the competitive advantage over the US with next-gen AI systems and a global lead in military apps

    January 31, 2025

    Dario Amodei recently spoke on DeepSeek, China, and its potential future competing against US-based AI…

    DOJ Orders Google to Sell Chrome to End Search Monopoly: A Possible Game-Changer for Competition

    November 22, 2024

    Rilasciata Zenwalk 2024 “Santa Claus”: La Nuova Versione della Distribuzione GNU/Linux Basata su Slackware

    December 29, 2024

    7 upgrades Apple Vision Pro needs to succeed in business

    June 25, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.