Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 17, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 17, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 17, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 17, 2025

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025

      Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

      May 17, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025
      Recent

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025

      Apps in Generative AI – Transforming the Digital Experience

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025
      Recent

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 are now available on SageMaker JumpStart

    Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 are now available on SageMaker JumpStart

    December 7, 2024

    Today, we are excited to announce that Mistral-NeMo-Base-2407 and Mistral-NeMo-Instruct-2407—twelve billion parameter large language models from Mistral AI that excel at text generation—are available for customers through Amazon SageMaker JumpStart. You can try these models with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click for running inference. In this post, we walk through how to discover, deploy and use the Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 models for a variety of real-world use cases.

    Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 overview

    Mistral NeMo, a powerful 12B parameter model developed through collaboration between Mistral AI and NVIDIA and released under the Apache 2.0 license, is now available on SageMaker JumpStart. This model represents a significant advancement in multilingual AI capabilities and accessibility.

    Key features and capabilities

    Mistral NeMo features a 128k token context window, enabling processing of extensive long-form content. The model demonstrates strong performance in reasoning, world knowledge, and coding accuracy. Both pre-trained base and instruction-tuned checkpoints are available under the Apache 2.0 license, making it accessible for researchers and enterprises. The model’s quantization-aware training facilitates optimal FP8 inference performance without compromising quality.

    Multilingual support

    Mistral NeMo is designed for global applications, with strong performance across multiple languages including English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. This multilingual capability, combined with built-in function calling and an extensive context window, helps make advanced AI more accessible across diverse linguistic and cultural landscapes.

    Tekken: Advanced tokenization

    The model uses Tekken, an innovative tokenizer based on tiktoken. Trained on over 100 languages, Tekken offers improved compression efficiency for natural language text and source code.

    SageMaker JumpStart overview

    SageMaker JumpStart is a fully managed service that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy quickly, accelerating the development and deployment of ML applications. One of the key components of SageMaker JumpStart is the Model Hub, which offers a vast catalog of pre-trained models, such as DBRX, for a variety of tasks.

    You can now discover and deploy both Mistral NeMo models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and machine learning operations (MLOps) controls with Amazon SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your virtual private cloud (VPC) controls, helping to support data security.

    Prerequisites

    To try out both NeMo models in SageMaker JumpStart, you will need the following prerequisites:

    • An AWS account that will contain all your AWS resources.
    • An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, see Identity and Access Management for Amazon SageMaker.
    • Access to Amazon SageMaker Studio, a SageMaker notebook instance, or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.
    • Access to accelerated instances (GPUs) for hosting the model.
    • This model requires an ml.g6.12xlarge instance. SageMaker JumpStart provides a simplified way to access and deploy over 100 different open source and third-party foundation models. In order to launch an endpoint to host Mistral NeMo from SageMaker JumpStart, you may need to request a service quota increase to access an ml.g6.12xlarge instance for endpoint usage. You can request service quota increases through the console, AWS Command Line Interface (AWS CLI), or API to allow access to those additional resources.

    Discover Mistral NeMo models in SageMaker JumpStart

    You can access NeMo models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

    SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, see Amazon SageMaker Studio.

    In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane.

    Then choose HuggingFace.

    From the SageMaker JumpStart landing page, you can search for NeMo in the search box. The search results will list Mistral NeMo Instruct and Mistral NeMo Base.

    You can choose the model card to view details about the model such as license, data used to train, and how to use the model. You will also find the Deploy button to deploy the model and create an endpoint.

    Deploy the model in SageMaker JumpStart

    Deployment starts when you choose the Deploy button. After deployment finishes, you will see that an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK. When you select the option to use the SDK, you will see example code that you can use in the notebook editor of your choice in SageMaker Studio.

    Deploy the model with the SageMaker Python SDK

    To deploy using the SDK, we start by selecting the Mistral NeMo Base model, specified by the model_id with the value huggingface-llm-mistral-nemo-base-2407. You can deploy your choice of the selected models on SageMaker with the following code. Similarly, you can deploy NeMo Instruct using its own model ID.

    from sagemaker.jumpstart.model import JumpStartModel 
    
    accept_eula = True 
    
    model = JumpStartModel(model_id="huggingface-llm-mistral-nemo-base-2407") 
    predictor = model.deploy(accept_eula=accept_eula)
    

    This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. The EULA value must be explicitly defined as True to accept the end-user license agreement (EULA). Also make sure that you have the account-level service limit for using ml.g6.12xlarge for endpoint usage as one or more instances. You can follow the instructions in AWS service quotas to request a service quota increase. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

    payload = {
        "messages": [
            {
                "role": "user",
                "content": "Hello"
            }
        ],
        "max_tokens": 1024,
        "temperature": 0.3,
        "top_p": 0.9,
    }
    
    response = predictor.predict(payload)['choices'][0]['message']['content'].strip()
    print(response)
    

    An important thing to note here is that we’re using the djl-lmi v12 inference container, so we’re following the large model inference chat completions API schema when sending a payload to both Mistral-NeMo-Base-2407 and Mistral-NeMo-Instruct-2407.

    Mistral-NeMo-Base-2407

    You can interact with the Mistral-NeMo-Base-2407 model like other standard text generation models, where the model processes an input sequence and outputs predicted next words in the sequence. In this section, we provide some example prompts and sample output. Keep in mind that the base model is not instruction fine-tuned.

    Text completion

    Tasks involving predicting the next token or filling in missing tokens in a sequence:

    payload = {
        "messages": [
            {
                "role": "user",
                "content": "The capital of France is ___."
            }
        ],
        "max_tokens": 10,
        "temperature": 0.3,
        "top_p": 0.9,
    }
    
    response = predictor.predict(payload)['choices'][0]['message']['content'].strip()
    print(response)
    

    The following is the output:

    Paris
    The capital of France is Paris.
    

    Mistral NeMo Instruct

    The Mistral-NeMo-Instruct-2407 model is a quick demonstration that the base model can be fine-tuned to achieve compelling performance. You can follow the steps provided to deploy the model and use the model_id value of huggingface-llm-mistral-nemo-instruct-2407 instead.

    The instruction-tuned NeMo model can be tested with the following tasks:

    Code generation

    Mistral NeMo Instruct demonstrates benchmarked strengths for coding tasks. Mistral states that their Tekken tokenizer for NeMo is approximately 30% more efficient at compressing source code. For example, see the following code:

    payload = {
        "messages": [
            {
                "role": "user",
                "content": """Create a Binary Search Tree class with methods for insertion, searching, and in-order traversal."""
            }
        ],
        "max_tokens": 2048,
        "temperature": 0.3,
        "top_p": 0.9,
    }
    text_response = predictor.predict(payload)['choices'][0]['message']['content'].strip()
    tokens = predictor.predict(payload)['usage']
    print(text_response)
    print(tokens)
    

    The following is the output:

    class Node:
        def __init__(self, key):
            self.key = key
            self.left = None
            self.right = None
    
    class BinarySearchTree:
        def __init__(self):
            self.root = None
    
        def insert(self, key):
            if not self.root:
                self.root = Node(key)
            else:
                self._insert(self.root, key)
    
        def _insert(self, node, key):
            if key < node.key:
                if node.left:
                    self._insert(node.left, key)
                else:
                    node.left = Node(key)
            elif key > node.key:
                if node.right:
                    self._insert(node.right, key)
                else:
                    node.right = Node(key)
    
        def search(self, key):
            return self._search(self.root, key)
    
        def _search(self, node, key):
            if not node or node.key == key:
                return node
    
            if key < node.key:
                return self._search(node.left, key)
            else:
                return self._search(node.right, key)
    
        def inorder_traversal(self):
            self._inorder_traversal(self.root)
            print()
    
        def _inorder_traversal(self, node):
            if node:
                self._inorder_traversal(node.left)
                print(node.key, end=" ")
                self._inorder_traversal(node.right)
    
    # Example usage:
    bst = BinarySearchTree()
    bst.insert(50)
    bst.insert(30)
    bst.insert(20)
    bst.insert(40)
    bst.insert(70)
    bst.insert(60)
    bst.insert(80)
    
    print("In-order traversal:")
    bst.inorder_traversal()  # Output: 20 30 40 50 60 70 80
    
    print(f"Search 40: {bst.search(40).key if bst.search(40) else 'Not found'}")
    print(f"Search 90: {bst.search(90).key if bst.search(90) else 'Not found'}")
    {'prompt_tokens': 22, 'completion_tokens': 433, 'total_tokens': 455}
    

    The model demonstrates strong performance on code generation tasks, with the completion_tokens offering insight into how the tokenizer’s code compression effectively optimizes the representation of programming languages using fewer tokens.

    Advanced math and reasoning

    The model also reports strengths in mathematic and reasoning accuracy. For example, see the following code:

    payload = {
        "messages": [
            {   "role": "system", 
                "content": "You are an expert in mathematics and reasoning. Your role is to provide examples, explanations, and insights related to mathematical concepts, problem-solving techniques, and logical reasoning.",
                "role": "user",
                "content": """Calculating the orbital period of an exoplanet:
                 Given: An exoplanet orbits its star at a distance of 2.5 AU (Astronomical Units). The star has a mass of 1.2 solar masses.
                 Task: Calculate the orbital period of the exoplanet in Earth years."""
            }
        ],
        "max_tokens": 2048,
        "temperature": 0.3,
        "top_p": 0.9,
    }
    response = predictor.predict(payload)['choices'][0]['message']['content'].strip()
    print(response)
    

    The following is the output:

    To calculate the orbital period of an exoplanet, we can use Kepler's Third Law, which states that the square of the orbital period (P) is directly proportional to the cube of the semi-major axis (a) of the orbit and inversely proportional to the mass (M) of the central body. The formula is:
    
    P^2 = (4 * π^2 * a^3) / (G * M)
    
    where:
    - P is the orbital period in years,
    - a is the semi-major axis in AU (Astronomical Units),
    - G is the gravitational constant (6.67430 × 10^-11 m^3 kg^-1 s^-2),
    - M is the mass of the star in solar masses.
    
    First, we need to convert the mass of the star from solar masses to kilograms. The mass of the Sun is approximately 1.98847 × 10^30 kg. So, the mass of the star is:
    
    M = 1.2 * 1.98847 × 10^30 kg = 2.386164 × 10^30 kg
    
    Now, we can plug the values into Kepler's Third Law:
    
    P^2 = (4 * π^2 * (2.5 AU)^3) / (G * M)
    
    Since 1 AU is approximately 1.496 × 10^11 meters, the semi-major axis in meters is:
    
    a = 2.5 AU * 1.496 × 10^11 m/AU = 3.74 × 10^12 m
    
    Now, we can calculate P^2:
    
    P^2 = (4 * π^2 * (3.74 × 10^12 m)^3) / (6.67430 × 10^-11 m^3 kg^-1 s^-2 * 2.386164 × 10^30 kg)
    
    P^2 = (4 * π^2 * 5.62 × 10^36 m^3) / (1.589 × 10^20 m^3 kg^-1 s^-2)
    
    P^2 = (4 * π^2 * 5.62 × 10^36 m^3) / (1.589 × 10^20 m^3 kg^-1 s^-2)
    
    P^2 = 4.15 × 10^16 s^2
    
    Now, we take the square root to find the orbital period in seconds:
    
    P = √(4.15 × 10^16 s^2) ≈ 2.04 × 10^8 s
    
    Finally, we convert the orbital period from seconds to Earth years (1 Earth year = 31,557,600 seconds):
    
    P = (2.04 × 10^8 s) / (31,557,600 s/year) ≈ 6.47 years
    
    Therefore, the orbital period of the exoplanet is approximately 6.47 Earth years.
    

    Language translation task

    In this task, let’s test Mistral’s new Tekken tokenizer. Mistral states that the tokenizer is two times and three times more efficient at compressing Korean and Arabic, respectively.

    Here, we use some text for translation:

    text= """
    "How can our business leverage Mistral NeMo with our new RAG application?"
    "What is our change management strategy once we roll out this new application to the field?
    """
    

    We set our prompt to instruct the model on the translation to Korean and Arabic:

    prompt=f"""
    
    text={text}
    
    Translate the following text into these languages:
    
    1. Korean
    2. Arabic
    
    Label each language section accordingly""".format(text=text)
    

    We can then set the payload:

    payload = {
        "messages": [
            {   "role": "system", 
                "content": "You are an expert in language translation.",
                "role": "user",
                "content": prompt
            }
        ],
        "max_tokens": 2048,
        "temperature": 0.3,
        "top_p": 0.9,
    }
    #response = predictor.predict(payload)
    text_response = predictor.predict(payload)['choices'][0]['message']['content'].strip()
    tokens = predictor.predict(payload)['usage']
    print(text_response)
    print(tokens)
    

    The following is the output:

    **1. Korean**
    
    - "우리의 비즈니스가 Mistral NeMo를 어떻게 활용할 수 있을까요?"
    - "이 새 애플리케이션을 현장에 롤아웃할 때 우리의 변화 관리 전략은 무엇입니까?"
    
    **2. Arabic**
    
    - "كيف يمكن لعمليتنا الاست من Mistral NeMo مع تطبيق RAG الجديد؟"
    - "ما هو استراتيجيتنا في إدارة التغيير بعد تفعيل هذا التطبيق الجديد في الميدان؟"
    {'prompt_tokens': 61, 'completion_tokens': 243, 'total_tokens': 304}
    

    The translation results demonstrate how the number of completion_tokens used is significantly reduced, even for tasks that are typically token-intensive, such as translations involving languages like Korean and Arabic. This improvement is made possible by the optimizations provided by the Tekken tokenizer. Such a reduction is particularly valuable for token-heavy applications, including summarization, language generation, and multi-turn conversations. By enhancing token efficiency, the Tekken tokenizer allows for more tasks to be handled within the same resource constraints, making it an invaluable tool for optimizing workflows where token usage directly impacts performance and cost.

    Clean up

    After you’re done running the notebook, make sure to delete all resources that you created in the process to avoid additional billing. Use the following code:

    predictor.delete_model()
    predictor.delete_endpoint()

    Conclusion

    In this post, we showed you how to get started with Mistral NeMo Base and Instruct in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.

    For more Mistral resources on AWS, check out the Mistral-on-AWS GitHub repository.


    About the authors

    Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics.

    Preston Tuggle is a Sr. Specialist Solutions Architect working on generative AI.

    Shane Rai is a Principal Generative AI Specialist with the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using the breadth of cloud-based AI/ML services provided by AWS, including model offerings from top tier foundation model providers.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNVIDIA AI Introduces NVILA: A Family of Open Visual Language Models VLMs Designed to Optimize both Efficiency and Accuracy
    Next Article ScribeAgent: Fine-Tuning Open-Source LLMs for Enhanced Web Navigation

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 17, 2025
    Development

    Learn A1 Level Spanish

    May 17, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Next.js vs React – Differences and How to Choose the Right One for Your Project

    Development

    Seamless GitHub Integration with Azure Storage for Enhanced Cloud File Management

    Development

    You can score 10% off a new Apple product if you recycle an old device – for a limited time

    News & Updates

    North Korean Hackers Exploited Chromium Zero-Day to Deploy Rootkit

    Development

    Highlights

    News & Updates

    My favorite TKL gaming keyboard to play World of Warcraft with is now $60 off

    January 21, 2025

    Razer’s critically acclaimed Huntsman V2 tenkeyless keyboard has gone on sale at Best Buy for…

    Cultivating Engagement: Education Accessibility in the Universal Design Series – 3

    May 16, 2024

    01.AI Introduces Yi-1.5-34B Model: An Upgraded Version of Yi with a High-Quality Corpus of 500B Tokens and Fine-Tuned on 3M Diverse Fine-Tuning Samples

    May 18, 2024

    How to Build Autonomous Agents using Prompt Chaining with AI Primitives (No Frameworks)

    April 21, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.