IEIT SYSTEMS Releases Yuan 2.0-M32: A Bilingual Mixture of Experts MoE Language Model based on Yuan 2.0. Attention Router

In recent research, a team of researchers from IEIT Systems has developed Yuan 2.0-M32, a sophisticated model built using the Mixture of Experts (MoE) architecture. Similar in base design to Yuan-2.0 2B, it is distinguished by its use of 32 experts. The model has an efficient computational structure because only two of these experts are active for processing at any given time.Â

In contrast to conventional router networks, this model presents a unique Attention Router network that improves expert selection and increases overall accuracy. In order to train the Yuan 2.0-M32, a sizable dataset of 2000 billion tokens was processed from the start. The computational consumption of the model for training, even with such a large amount of data, was only 9.25% of the requirements of a dense model with a similar parameter scale.Â

In terms of performance, Yuan 2.0-M32 showed remarkable ability in a number of areas, such as mathematics and coding. Using 7.4 GFlops of forward computation per token, the model used just 3.7 billion active parameters out of a total of 40 billion. Considering that these numbers only represent 1/19th of the Llama3-70B modelâ€™s requirements, they are quite efficient.Â

Yuan 2.0-M32 performed admirably in benchmarks, surpassing Llama3-70B with scores of 55.89 and 95.8, respectively, on the MATH and ARC-Challenge benchmarks while having a smaller active parameter set and a smaller computational footprint.Â

An important development is Yuan 2.0-M32â€™s adoption of the Attention Router. This routing mechanism improves the modelâ€™s precision and performance by optimizing the selection process by concentrating on the most pertinent experts for each task. In contrast to traditional techniques, this unique way of expert selection emphasizes the potential for enhanced accuracy and efficiency in MoE models.

The team has summarized their primary contributions as follows.

The team has presented the Attention Router, which considers the correlation between specialists. When compared to conventional routing techniques, this method yields a notable gain in accuracy.

The team has created and made available the Yuan 2.0-M32 model, which has 40 billion total parameters, 3.7 billion of which are active. Only two experts are active in every token in this paradigm, which uses a structure of thirty-two experts.

Yuan 2.0-M32â€™s training is extremely effective, using only 1/16 of the computing power required for a dense model with a comparable number of parameters. The computing cost at inference is comparable to that of a dense model with 3.7 billion parameters. This guarantees that the model maintains its efficiency and cost-effectiveness during training and in real-world scenarios.

Check out theÂ Paper, Model, and GitHub. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 43k+ ML SubReddit | Also, check out our AI Events Platform

The post IEIT SYSTEMS Releases Yuan 2.0-M32: A Bilingual Mixture of Experts MoE Language Model based on Yuan 2.0. Attention Router appeared first on MarkTechPost.

Source: Read MoreÂ

How to send nested JSON in cucumber as request

June 25, 2024

Then verify status code is 200

and I have the following step definition
@And(“^TimeSlots is edited with following fields$”)
public void timeslotsIsCreatedWithFollowingFields(List<Map<String, String>> expectedTimeSlots) {
TimeSlots timeSlots = new TimeSlots();

for(int i = 0; i < expectedTimeSlots.size(); i ++) {
timeSlots.setDayOfWeek(expectedTimeSlots.get(i).get(“dayOfWeek”));
timeSlots.setStartTime(expectedTimeSlots.get(i).get(“startTime”));
timeSlots.setEndTime((expectedTimeSlots.get(i).get(“endTime”)));
timeSlots.setDuration(expectedTimeSlots.get(i).get(“duration”));
timeSlots.setQuantity(Integer.parseInt(expectedTimeSlots.get(i).get(“quantity”)));
timeSlots.setUsedQuantity(Integer.parseInt(expectedTimeSlots.get(i).get(“usedQuantity”)));
timeSlots.setActive(Boolean.parseBoolean(expectedTimeSlots.get(i).get(“active”)));

}

Actual output is :
{
“productWorkingDate”: {
“id”: “bpvjPBpJ”,
“productId”: “WaNX2QOd”,
“fromDate”: “2022-07-01”,
“toDate”: “2022-12-01”,
“name”: “Test55”,
“strictHours”: false,
“timeSlots”: [
{
“id”: “Wlqb8XOb”,
“productWorkingDateId”: “bpvjPBpJ”,
“dayOfWeek”: “Monday”,
“startTime”: “14:00:00”,
“endTime”: “15:00:00”,
“duration”: “02:00:00”,
“quantity”: 0,
“usedQuantity”: 0,
“active”: true,
“deletedAt”: null
}
],
“deletedAt”: null,
“maxUsedTicketsQuantity”: 0,
“errorCode”: 0
},
“maxUsedTicketsQuantity”: 0,
“error”: null,
“errorCode”: 0
}

Expected output is :
{
“productWorkingDate”: {
“id”: “bpvjPBpJ”,
“productId”: “WaNX2QOd”,
“fromDate”: “2022-07-01”,
“toDate”: “2022-12-01”,
“name”: “Test55”,
“strictHours”: false,
“timeSlots”: [
{
“id”: “4lrn8old”,
“productWorkingDateId”: “bpvjPBpJ”,
“dayOfWeek”: “Sunday”,
“startTime”: “14:00:00”,
“endTime”: “15:00:00”,
“duration”: “02:00:00”,
“quantity”: 0,
“usedQuantity”: 0,
“active”: true,
“deletedAt”: null
},
{
“id”: “dOnz85OV”,
“productWorkingDateId”: “bpvjPBpJ”,
“dayOfWeek”: “Monday”,
“startTime”: “14:00:00”,
“endTime”: “15:00:00”,
“duration”: “02:00:00”,
“quantity”: 0,
“usedQuantity”: 0,
“active”: true,
“deletedAt”: null
}
],
“deletedAt”: null,
“maxUsedTicketsQuantity”: 0,
“errorCode”: 0
},
“maxUsedTicketsQuantity”: 0,
“error”: null,
“errorCode”: 0
}

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

“Touch Grass without touching grass” with these hilarious (and very real) skins for Xbox, Steam Deck, laptop, phone, and more

Microsoft Teams will fix meeting chats for presenters with this small change

ChatGPT’s stunning new image generator is now free for everyone

Everything coming to Call of Duty: Black Ops 6 multiplayer with Season 3

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

Image Dimension Validation with Laravel’s dimensions Rule

“Touch Grass without touching grass” with these hilarious (and very real) skins for Xbox, Steam Deck, laptop, phone, and more

“Touch Grass without touching grass” with these hilarious (and very real) skins for Xbox, Steam Deck, laptop, phone, and more

Microsoft Teams will fix meeting chats for presenters with this small change

Everything coming to Call of Duty: Black Ops 6 multiplayer with Season 3

IEIT SYSTEMS Releases Yuan 2.0-M32: A Bilingual Mixture of Experts MoE Language Model based on Yuan 2.0. Attention Router

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

AI-Powered No-Code Automation for Lead Generation

What is the Difference between Network Architecture and Application Architecture?

Cyble Warns of Escalating Cyber Risks in IoT and WordPress Plugins Amid Phishing Surge

From fitness to diagnosis: How your wearable’s next trick could transform healthcare

How to send nested JSON in cucumber as request

This 40,000mAh power bank is the most versatile accessory I’ve tested to date – see for yourself

Exception in thread “main” org.openqa.selenium.NoSuchElementException: Unable to find element with css selector ==

https://viralstyle.com/6T9/michael-taaffe-jahdae-for-thorpe-shirt

IEIT SYSTEMS Releases Yuan 2.0-M32: A Bilingual Mixture of Experts MoE Language Model based on Yuan 2.0. Attention Router

Related Posts