AMD Releases AMD-135M: AMDâ€™s First Small Language Model Series Trained from Scratch on AMD Instinctâ„¢ MI250 AcceleratorsÂ Utilizing 670B TokensÂ

AMD has recently introduced its new language model, AMD-135M or AMD-Llama-135M, which is a significant addition to the landscape of AI models. Based on the LLaMA2 model architecture, this language model boasts a robust structure with 135 million parameters and is optimized for performance on AMDâ€™s latest GPUs, specifically the MI250. This release marks a crucial milestone for AMD in its endeavor to establish a strong foothold in the competitive AI industry.

Background and Technical Specifications

The AMD-135M is built on the LLaMA2 model architecture and is integrated with advanced features to support various applications, particularly in text generation and language comprehension. The model is designed to work seamlessly with the Hugging Face Transformers library, making it accessible for developers and researchers. The model can handle complex tasks with a hidden size of 768, 12 layers (blocks), and 12 attention heads while maintaining high efficiency. The activation function used is the Swiglu function, and the layer normalization is based on RMSNorm. Its positional embedding is designed using the RoPE method, enhancing its ability to understand and generate contextual information accurately.

The release of this model is not just about the hardware specifications but also about the software and datasets that power it. AMD-135M has been pretrained on two key datasets: the SlimPajama and Project Gutenberg datasets. SlimPajama is a deduplicated version of RedPajama, which includes sources such as Commoncrawl, C4, GitHub, Books, ArXiv, Wikipedia, and StackExchange. The Project Gutenberg dataset provides access to a vast repository of classical texts, enabling the model to grasp various language structures and vocabularies.

Key Features of AMD-135M

AMD-135M has remarkable features that set it apart from other models in the market. Some of these key features include:

Parameter Size: 135 million parameters, allowing for efficient processing and generation of text.

Number of Layers: 12 layers with 12 attention heads for in-depth analysis and contextual understanding.

Hidden Size: 768, offering the capability to handle various language modeling tasks.

Attention Type: Multi-Head Attention, enabling the model to focus on different aspects of the input data simultaneously.

Context Window Size: 2048, ensuring the model can effectively manage larger input data sequences.

Pretraining and Finetuning Datasets: The SlimPajama and Project Gutenberg datasets are utilized for pretraining, and the StarCoder dataset is used for finetuning, ensuring comprehensive language understanding.

Training Configuration: The model employs a learning rate 6e-4 with a cosine learning rate schedule, and it has undergone multiple epochs for effective training and finetuning.

Deployment and Usage

The AMD-135M can be easily deployed and used through the Hugging Face Transformers library. For deployment, users can load the model using the `LlamaForCausalLM` and the `AutoTokenizer` modules. This ease of integration makes it a favorable option for developers looking to incorporate language modeling capabilities into their applications. Additionally, the model is compatible with speculative decoding for AMDâ€™s CodeLlama, further extending its usability for code generation tasks. This feature makes AMD-135M particularly useful for developers working on programming-related text generation or other NLP applications.

Performance Evaluation

The performance of AMD-135M has been evaluated using the lm-evaluation-harness on various NLP benchmarks, such as SciQ, WinoGrande, and PIQA. The results indicate the model is highly competitive, offering comparable performance to other models in its parameter range. For instance, it achieved a pass rate of approximately 32.31% on the Humaneval dataset using MI250 GPUs, a strong performance indicator for a model of this size. This shows that AMD-135M can be a reliable model for research and commercial applications in natural language processing.

In conclusion, the release of AMD-135M underscores AMDâ€™s commitment to advancing AI technologies and providing accessible, high-performance models for the research community. Its robust architecture and advanced training techniques position AMD-135M as a formidable competitor in the rapidly evolving landscape of AI models.

Check out the Model on Hugging Face and Details. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 50k+ ML SubReddit

The post AMD Releases AMD-135M: AMDâ€™s First Small Language Model Series Trained from Scratch on AMD Instinctâ„¢ MI250 AcceleratorsÂ Utilizing 670B TokensÂ appeared first on MarkTechPost.

Source: Read MoreÂ

How to share data between steps in Cucumber feature file?

May 26, 2024

I am designing automation scripts using the REST APIs, RestAssured with Cucumber BDD framework. I have multiple APIs where one API’s response is used as a parameter in another API.
Here is my feature file:
Feature: Create Order API

@Background:
Scenario Outline: Generate Access token With Valid Details
Given Query param for request
| grant_type |
| client_credentials |
Given Basic Auth keys for request “<userName>” and “<key>”
When Build request for baseurl “PAYPAL_BASE_URI” and endpoint “ENDPOINT_GET_AUTH_KEY”
And Perform “POST” request using
Then status code is 200
And response contains “scope”
Examples:
| userName | key |
| AWnCb | EMAekuSA2f |

Now the response of the above API is as follows:
{
“scope”: “https://uri.pppaypal.com/services/invoicing https://uri.pppaypal.com/services/applications/webhooks”,
“access_token”: “ALs1szFnv2TJ19Zf3vq”,
“token_type”: “Bearer”,
“app_id”: “APP-284543T”,
“expires_in”: 311286,
“nonce”: “2022-05-31T03:41:41ZWs9dpOQ”
}

Now I need this “access_token” as in the “Create Order API” Authorization parameter with Bearer. also i need to pass “app_id” and “nonce” in the Create Order API. The “Create Order API” feature file is below:
Scenario: Verify create order api using valid auth
Given Generate request
And Build request for baseurl “PAYPAL_BASE_URI” and endpoint “ENDPOINT_CREATE_ORDER_API”
And Set header values as
| Content-Type | Authorization | app_id | nonce |
| application/json | <token> | <app_id> | <nonce> |
When Perform “POST” request using “FILE_PATH_ORDER_JSON”
Then status code is 201

How can I get the values from the response of one API and the use that data in the next API as payload or query param in the same feature file so that I can use it anywhere in this feature file?

CodeSOD: Enterprise Code Coverage

CodeSOD: Ready Xor Not

CodeSOD: A Set of Mistakes

CodeSOD: While This Works

I tested the viral ‘tangle-free’ USB-C cable, and it’s my new travel essential

I tried an ultra-thin iPhone case, and here’s how my daunting experience went

I found one of the fastest-charging portable batteries for home backups – and it’s on sale

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PEAR Releases (12.09.2024)

Community News: Latest PECL Releases (12.17.2024)

Windows 11’s Microsoft 365 app is taking a new AI-first approach with Copilot

Windows 11’s Microsoft 365 app is taking a new AI-first approach with Copilot

5 Compelling Reasons to Choose Linux Over Windows

Rilasciato DXVK 2.5.2: Ottimizzazioni e Correzioni per i Giochi Windows su GNU/Linux

AMD Releases AMD-135M: AMDâ€™s First Small Language Model Series Trained from Scratch on AMD Instinctâ„¢ MI250 AcceleratorsÂ Utilizing 670B TokensÂ

Why developers needn’t fear CSS – with the King of CSS himself Kevin Powell [Podcast #154]

I tested the viral ‘tangle-free’ USB-C cable, and it’s my new travel essential

How to get the new Like Glue and Thorn named shotguns in The Division 2 and instantly make them proficient

It only took 10 years â€” we’re finally getting a sequel to the best underrated survival-horror game

DistroWatch Weekly, Issue 1090

Google’s Privacy Sandbox Accused of User Tracking by Austrian Non-Profit

How to share data between steps in Cucumber feature file?

Rocinante Trojan Poses as Banking Apps to Steal Sensitive Data from Brazilian Android Users

Pods â€“ manage your Podman containers

Chrome may introduce Toast notifications for Reading List

AMD Releases AMD-135M: AMDâ€™s First Small Language Model Series Trained from Scratch on AMD Instinctâ„¢ MI250 AcceleratorsÂ Utilizing 670B TokensÂ

Related Posts