Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      How To Prevent WordPress SQL Injection Attacks

      June 13, 2025

      Java never goes out of style: Celebrating 30 years of the language

      June 12, 2025

      OpenAI o3-pro available in the API, BrowserStack adds Playwright support for real iOS devices, and more – Daily News Digest

      June 12, 2025

      Creating The “Moving Highlight” Navigation Bar With JavaScript And CSS

      June 11, 2025

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025

      Sam Altman says “OpenAI was forced to do a lot of unnatural things” to meet the Ghibli memes demand surge

      June 13, 2025

      5 things we didn’t get from the Xbox Games Showcase, because Xbox obviously hates me personally

      June 13, 2025

      Minecraft Vibrant Visuals finally has a release date and it’s dropping with the Happy Ghasts

      June 13, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      QAQ-QQ-AI-QUEST

      June 13, 2025
      Recent

      QAQ-QQ-AI-QUEST

      June 13, 2025

      JS Dark Arts: Abusing prototypes and the Result type

      June 13, 2025

      Helpful Git Aliases To Maximize Developer Productivity

      June 13, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025
      Recent

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025

      Sam Altman says “OpenAI was forced to do a lot of unnatural things” to meet the Ghibli memes demand surge

      June 13, 2025

      5 things we didn’t get from the Xbox Games Showcase, because Xbox obviously hates me personally

      June 13, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Operating Systems»Linux»Experimenting with Llama 3.1 – 405B Model with 128k window size (8B and 7B)

    Experimenting with Llama 3.1 – 405B Model with 128k window size (8B and 7B)

    July 25, 2024

    Experimenting with Llama 3.1 - 405B Model with 128k window size (8B and 7B)

    Meta has released its latest Llama iteration, Llama 3.1, which is by far the world’s largest and most capable openly available foundation model. With over 300 million total downloads of all Llama versions to date, this new release is poised to supercharge innovation and unlock opportunities for growth and exploration.

    Llama 3.1 405B variation?

    Llama 3.1 is a large language model that has been designed to rival the top AI models in terms of state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation. Llama 3.1 brings in huge improvements including a new model of whopping 405B parameters size.

    Key Features of Llama 3.1

    The latest iteration of Llama comes with several key features that make it stand out from the competition. Some of the most notable features are –

    • Multilingual Support: Llama 3.1 supports multiple languages that makes it a valuable tool for businesses and organizations that operate globally.
    • Long-Form Text Summarization: This model is capable of summarizing long-form text with ease (thanks to expanded 128k context window), making it an ideal tool for researchers, journalists, and students.
    • Model Distillation: Llama 3.1 can distill complex models into smaller, more manageable versions, that is easier to train and deploy AI models in real-world scenarios.

    Llama 3.1 has made headlines after introducing the new 405B parameter size model. This huge model will most likely not fit in on your laptop or desktop. But, we can try our hands on the smaller models i.e. 70B and 8B.

    Llama 405B vs. 70B vs. 8B

    The difference between the three variations is clear. The larger the model, the higher quality results. The 405B model has more knowledge about the World, so it’ll most likely generate quality results compared to its smaller variations. But, 405B model cannot be easily accessed.

    The model is so large that majority of its users will have to pay for the cloud infrastructure to run this model. We cannot run it on our machines, even after it’s openly available for download.

    We can, however, try our hands on 70B and 8B variations.

    Llama 3.1 8B can be run on most modern laptops and desktops. Its smaller size and parameters make it only about 5GB in size, and requires a little over 5GB memory to load.

    đź’ˇ
    Please note that 8B models take over 5GB memory to load. For me, it took 5.17GB memory. If you’re running Windows operating system (that itself requires a lot of memory), you should have 16GB RAM. You maybe able to run it on lower RAM but may experience lag.

    If you’ve lower than 16GB memory, try a lightweight Linux distribution such as MX Linux and use Ollama.

    5 Best Lightweight Linux For Old Computers
    Do you have an old computer? Have you kept your old computer somewhere in a rack? So this is the time to take it out and start using it. In this article, I will walk you through the list of 5 Lightweight Linux distributions that you can install and use
    Experimenting with Llama 3.1 - 405B Model with 128k window size (8B and 7B)LinuxAndUbuntuSohail
    Experimenting with Llama 3.1 - 405B Model with 128k window size (8B and 7B)

    Best Lightweight Linux Distros in 2023
    Hey friends! Today I am going to discuss the top lightweight Linux distros you can try this year on your computer. Although you got yourself a pretty Linux distro already, there is always something new to try in Linux. Remember I recommend trying this distro in VirtualBox first or with
    Experimenting with Llama 3.1 - 405B Model with 128k window size (8B and 7B)LinuxAndUbuntuSohail
    Experimenting with Llama 3.1 - 405B Model with 128k window size (8B and 7B)

    The memory usage of a model like Llama 3.1 (70B) depends on various factors. Here are some general estimates for memory usage –

    • Small batch sizes (~1-16): Llama 3.1 (70B) typically requires around 6-12 GB of RAM per GPU.
    • Medium batch sizes (~32-64): Memory usage increases to around 18-24 GB per GPU.
    • Large batch sizes (~128 and above): Expect memory usage to reach up to 40-60 GB or more per GPU.

    Keep in mind that these estimates are approximate and can vary depending on the specific implementation, optimization strategies, and hardware configurations used. If you’re interested in precise memory usage for a particular use case, I recommend conducting experiments with your chosen hardware setup.

    My experience with Llama 70B model

    I tried running Llama 3.1 70B all three batch sizes on my desktop (Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz 3.70 GHz, 40GB RAM, Nvidia GTX 1050 4GB), and I can report that small and medium size models loaded for me. Even though medium size model loaded in memory but it need more GPU power.

    70B largest batch size could not load because it required over 40GB memory to load and 4.5GB GPU memory.

    My experiment confirms that to successfully run 3.1 70B model, you must have at least 6GB GPU memory, a little over 40GB memory (just to load the model), and at least 40GB storage.

    The small batch size model worked successfully for me, however, was so slow (because of large parameter size).

    The 3.1 small batch size weigh around 15 GB and consumed 15.03 GB of system memory. The medium batch size is over 16 GB in size and consumed 16.30 GB on my system. Small and medium batch sizes can be run only if you’ve at least 20GB RAM (running Windows or a little less if you’ve lightweight Linux distribution) and 6 GB GPU memory.

    Similarly, Llama 3.1 8B variation has three batch sizes and all three worked perfectly fine for me. You need a little over 5 GB free system memory, around 3 GB GPU memory, and 5 GB on disk.

    How Does Llama 3.1 Compare to Other Models?

    In terms of performance, Llama 3.1 is competitive with leading foundation models such as GPT-4, GPT-4o, and Claude 3.5 Sonnet. Its smaller models are also competitive with closed and open models that have a similar number of parameters.

    Meta is planning to continue developing and improving this model in the coming months. With its advanced capabilities and multilingual support, this model is expected to become a game-changer in the world of AI.

    Conclusion

    Overall, the experiment concludes that even though such large models are openly available, but cannot be used on devices majority of users have. Llama 3.1 70B small batch size can be loaded on home computers but it’s so slow that it’s barely usable. It took around 10 seconds just to output the following line – Thank you for asking! I’m doing well, thanks to you.

    That’s where smaller models come into play. The 8B variation (largest batch size) will load and work perfectly find on most home computers or gaming laptops or desktops. One can train the model on their own data to get more accurate results.

    Source: Read More

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleChatGPT Alternatives: Exploring Free, Open-Source and Affordable AI Models
    Next Article Obsidian: Best Open Source AI-powered Note-taking App

    Related Posts

    News & Updates

    Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

    June 13, 2025
    News & Updates

    Sam Altman says “OpenAI was forced to do a lot of unnatural things” to meet the Ghibli memes demand surge

    June 13, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-46348 – YesWiki Unauthenticated Archive Creation and Download Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    This ultraportable LG laptop gives my 15-inch MacBook Air some serious competition

    News & Updates

    AI algorithm predicts heart disease risk from bone scans

    Artificial Intelligence

    The big VPN choice: System-wide or just in the browser? How to decide

    News & Updates

    Highlights

    CVE-2025-46352 – “CS5000 Fire Panel Hard-Coded Password Remote Command Injection Vulnerability”

    May 29, 2025

    CVE ID : CVE-2025-46352

    Published : May 30, 2025, 12:15 a.m. | 31 minutes ago

    Description : The CS5000 Fire Panel is vulnerable due to a hard-coded password that
    runs on a VNC server and is visible as a string in the binary
    responsible for running VNC. This password cannot be altered, allowing
    anyone with knowledge of it to gain remote access to the panel. Such
    access could enable an attacker to operate the panel remotely,
    potentially putting the fire panel into a non-functional state and
    causing serious safety issues.

    Severity: 9.8 | CRITICAL

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-5364 – Campcodes Online Hospital Management System SQL Injection

    May 30, 2025

    FcrackZIP — ZIP File Cracker

    June 3, 2025

    Amazon Bedrock Model Distillation: Boost function calling accuracy while reducing cost and latency

    April 30, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.