Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      From Data To Decisions: UX Strategies For Real-Time Dashboards

      September 13, 2025

      Honeycomb launches AI observability suite for developers

      September 13, 2025

      Low-Code vs No-Code Platforms for Node.js: What CTOs Must Know Before Investing

      September 12, 2025

      ServiceNow unveils Zurich AI platform

      September 12, 2025

      DistroWatch Weekly, Issue 1139

      September 14, 2025

      Building personal apps with open source and AI

      September 12, 2025

      What Can We Actually Do With corner-shape?

      September 12, 2025

      Craft, Clarity, and Care: The Story and Work of Mengchu Yao

      September 12, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Optimizely Mission Control – Part III

      September 14, 2025
      Recent

      Optimizely Mission Control – Part III

      September 14, 2025

      Learning from PHP Log to File Example

      September 13, 2025

      Online EMI Calculator using PHP – Calculate Loan EMI, Interest, and Amortization Schedule

      September 13, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      I Ran Local LLMs on My Android Phone

      September 16, 2025
      Recent

      I Ran Local LLMs on My Android Phone

      September 16, 2025

      DistroWatch Weekly, Issue 1139

      September 14, 2025

      sudo vs sudo-rs: What You Need to Know About the Rust Takeover of Classic Sudo Command

      September 14, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Learning Resources»I Ran Local LLMs on My Android Phone

    I Ran Local LLMs on My Android Phone

    September 16, 2025

    I Ran Local LLMs on My Android Phone

    Like it or not, AI is here to stay. For those who are concerned about data privacy, there are several local AI options available. Tools like Ollama and LM Studio makes things easier.

    Now those options are for the desktop user and require significant computing power.

    What if you want to use the local AI on your smartphone? Sure, one way would be to deploy Ollama with a web GUI on your server and access it from your phone.

    But there is another way and that is to use an application that lets you install and use LLMs (or should I say SLMs, Small Language Models) on your phone directly instead of relying on your local AI server on another computer.

    Allow me to share my experience with experimenting with LLMs on a phone.

    📋
    Smartphones these days have powerful processors and some even have dedicated AI processors on board. Snapdragon 8 Gen 3, Apple’s A17 Pro, and Google Tensor G4 are some of them. Yet, the models that can be run on a phone are often vastly different than the ones you use on a proper desktop or server.

    Here’s what you’ll need:

    • An app that allows you to download the language models and interact with them.
    • Suitable LLMs that have been specifically created for running on mobile devices.

    Apps for running LLMs locally on a smartphone

    After researching, I decided to explore following applications for this purpose. Let me share their features and details.

    1. MLC Chat

    MLC Chat supports top models like Llama 3.2, Gemma 2, phi 3.5 and Qwen 2.5 offering offline chat, translation, and multimodal tasks through a sleek interface. Its plug-and-play setup with pre-configured models, NPU optimization (e.g., Snapdragon 8 Gen 2+), and beginner-friendly features make it a good choice for on-device AI. 

    You can download the MLC Chat APK from their GitHub release page.

    Android is looking to forbid sideloading of APK files. I don’t know what would happen then, but you can use APK files for now.

    Put the APK file on your Android device, go into Files and tap the APK file to begin installation. Enable “Install from Unknown Sources” in your device settings if prompted. Follow on-screen instructions to complete the installation.

    I Ran Local LLMs on My Android Phone
    Enable APK installation

    Once installed, open the MLC Chat app, select a model from the list, like Phi-2, Gemma 2B, Llama-3 8B, Mistral 7B. Tap the download icon to install the model. I recommend opting for smaller models like Phi-2. Models are downloaded on first use and cached locally for offline use.

    I Ran Local LLMs on My Android Phone
    Click on the download button to download a model

    Tap the Chat icon next to the downloaded model. Start typing prompts to interact with the LLM offline. Use the reset icon to start a new conversation if needed.

    I Ran Local LLMs on My Android Phone

    2. SmolChat (Android)

    SmolChat is an open-source Android app that runs any GGUF-format model (like Llama 3.2, Gemma 3n, or TinyLlama) directly on your device, offering a clean, ChatGPT-like interface for fully offline chatting, summarization, rewriting, and more.

    Install SmolChat from Google’s Play Store. Open the app, choose a GGUF model from the app’s model list or manually download one from Hugging Face. If manually downloading, place the model file in the app’s designated storage directory (check app settings for the path).

    I Ran Local LLMs on My Android Phone
    I Ran Local LLMs on My Android Phone
    I Ran Local LLMs on My Android Phone

    3. Google AI Edge Gallery

    Google AI Edge Gallery is an experimental open-source Android app (iOS soon) that brings Google’s on-device AI power to your phone, letting you run powerful models like Gemma 3n and other Hugging Face models fully offline after download. This application makes use of Google’s LiteRT framework.

    You can download it from Google Play Store. Open the app and browse the list of provided models or manually download a compatible model from Hugging Face.

    Select the downloaded model and start a chat session. Enter text prompts or upload images (if supported by the model) to interact locally. Explore features like prompt discovery or vision-based queries if available.

    I Ran Local LLMs on My Android Phone
    I Ran Local LLMs on My Android Phone
    I Ran Local LLMs on My Android Phone

    Top Mobile LLMs to try out

    Here are the best ones I’ve used:

    Model My Experience Best For
    Google’s Gemma 3n (2B) Blazing-fast for multimodal tasks including image captions, translations, even solving math problems from photos. Quick, visual-based AI assistance
    Meta’s Llama 3.2 (1B/3B) Strikes the perfect balance between size and smarts. It’s great for coding help and private chats.The 1B version runs smoothly even on mid-range phones. Developers & privacy-conscious users
    Microsoft’s Phi-3 Mini (3.8B) Shockingly good at summarizing long documents despite its small size. Students, researchers, or anyone drowning in PDFs
    Alibaba’s Qwen-2.5 (1.8B) Surprisingly strong at visual question answering—ask it about an image, and it actually understands! Multimodal experiments
    TinyLlama-1.1B The lightweight champ runs on almost any device without breaking a sweat. Older phones or users who just need a simple chatbot

    All these models use aggressive quantization (GGUF/safetensors formats), so they’re tiny but still powerful. You can grab them from Hugging Face—just download, load into an app, and you’re set.

    Challenges I faced while running LLMs Locally on Android smartphone

    Getting large language models (LLMs) to run smoothly on my phone has been equally exhilarating and frustrating.

    On my Snapdragon 8 Gen 2 phone, models like Llama 3-4B run at a decent 8-10 tokens per second, which is usable for quick queries. But when I tried the same on my backup Galaxy A54 (6 GB RAM), it choked. Loading even a 2B model pushed the device to its limits. I quickly learned that Phi-3-mini (3.8B) or Gemma 2B are far more practical for mid-range hardware.

    The first time I ran a local AI session, I was shocked to see 50% battery gone in under 90 minutes. MLC Chat offers power-saving mode for this purpose. Turning off background apps to free up RAM also helps.

    I also experimented with 4-bit quantized models (like Qwen-1.5-2B-Q4) to save storage but noticed they struggle with complex reasoning. For medical or legal queries, I had to switch back to 8-bit versions. It was slower but far more reliable.

    Conclusion

    I love the idea of having an AI assistant that works exclusively for me, no monthly fees, no data leaks. Need a translator in a remote village? A virtual assistant on a long flight? A private brainstorming partner for sensitive ideas? Your phone becomes all of these staying offline and untraceable.

    I won’t lie, it’s not perfect. Your phone isn’t a data center, so you’ll face challenges like battery drain and occasional overheating. But it also provides tradeoffs like total privacy, zero costs, and offline access.

    The future of AI isn’t just in the cloud, it’s also on your device.

    Author Info

    I Ran Local LLMs on My Android Phone

    Bhuwan Mishra is a Fullstack developer, with Python and Go as his tools of choice. He takes pride in building and securing web applications, APIs, and CI/CD pipelines, as well as tuning servers for optimal performance. He also has passion for working with Kubernetes.

    Source: Read More

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleWhat I learned from Inspired

    Related Posts

    Learning Resources

    What I learned from Inspired

    September 16, 2025
    Learning Resources

    Talk to more users sooner

    September 16, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Microsoft is blocking Google Chrome in the name of family safety — but this time it actually makes sense

    News & Updates

    Microsoft Teams Gets a 3D Makeover for Events

    Operating Systems
    Skylanders studio Toys for Bob says they’d love to work on a ‘Banjo-Kazooie’ since going independent of Xbox, all in this new interview

    Skylanders studio Toys for Bob says they’d love to work on a ‘Banjo-Kazooie’ since going independent of Xbox, all in this new interview

    News & Updates

    How to Write a PHP Script to Calculate the Area of a Triangle

    Development

    Highlights

    Creating an Auto-Closing Notification With an HTML Popover

    June 9, 2025

    The HTML popover attribute transforms elements into top-layer elements that can be opened and closed…

    CSS Processing Guide

    July 31, 2025

    CVE-2025-46329 – Snowflake libsnowflakeclient Sensitive Information Logging

    April 29, 2025

    CVE-2025-4104 – WordPress Frontend Dashboard Privilege Escalation

    May 7, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.