Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV

    A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV

    March 27, 2025

    Monocular depth estimation involves predicting scene depth from a single RGB image—a fundamental task in computer vision with wide-ranging applications, including augmented reality, robotics, and 3D scene understanding. In this tutorial, we implement Intel’s MiDaS (Monocular Depth Estimation via a Multi-Scale Vision Transformer), a state-of-the-art model designed for high-quality depth prediction from a single image. Leveraging Google Colab as the compute platform, along with PyTorch, OpenCV, and Matplotlib, this tutorial enables you to upload your image and visualize the corresponding depth maps easily.

    Copy CodeCopiedUse a different Browser
    !pip install -q timm opencv-python matplotlib

    First, we install the necessary Python libraries—timm for model support, opencv-python for image processing, and matplotlib for visualizing the depth maps.

    Copy CodeCopiedUse a different Browser
    !git clone https://github.com/isl-org/MiDaS.git
    %cd MiDaS

    Then, we clone the official Intel MiDaS repository from GitHub and navigate into its directory to access the model code and transformation utilities.

    Copy CodeCopiedUse a different Browser
    import torch
    import cv2
    import matplotlib.pyplot as plt
    import numpy as np
    from PIL import Image
    from torchvision.transforms import Compose
    from google.colab import files
    
    
    from midas.dpt_depth import DPTDepthModel
    from midas.transforms import Resize, NormalizeImage, PrepareForNet
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    We import all the necessary libraries and MiDaS components required for loading the model, preprocessing images, handling uploads, and visualizing depth predictions. Then we set the computation device to GPU (CUDA) if available; otherwise, it defaults to CPU, ensuring system compatibility.

    Copy CodeCopiedUse a different Browser
    model_path = torch.hub.load("intel-isl/MiDaS", "DPT_Large", pretrained=True, force_reload=True)
    model = model_path.to(device)
    model.eval()

    Here, we download the pretrained MiDaS DPT_Large model from Intel’s torch.hub, moves it to the selected device (CPU or GPU), and sets it to evaluation mode for inference.

    Copy CodeCopiedUse a different Browser
    transform = Compose([
        Resize(384, 384, resize_target=None, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method="upper_bound"),
        NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        PrepareForNet()
    ])
    

    We define MiDaS’s image preprocessing pipeline, which resizes the input image, normalizes its pixel values, and formats it appropriately for model inference.

    Copy CodeCopiedUse a different Browser
    uploaded = files.upload()
    for filename in uploaded:
        img = cv2.imread(filename)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        break

    We allow the user to upload an image in Colab, read it using OpenCV, and convert it from BGR to RGB format for accurate color representation.

    Copy CodeCopiedUse a different Browser
    img_input = transform({"image": img})["image"]
    input_tensor = torch.from_numpy(img_input).unsqueeze(0).to(device)
    
    
    with torch.no_grad():
        prediction = model(input_tensor)
        prediction = torch.nn.functional.interpolate(
            prediction.unsqueeze(1),
            size=img.shape[:2],
            mode="bicubic",
            align_corners=False,
        ).squeeze()
    
    
    depth_map = prediction.cpu().numpy()

    Now, we apply the preprocessing transform to the uploaded image, convert it to a tensor, perform depth prediction using the MiDaS model, resize the output to match the original image dimensions, and extract the final depth map as a NumPy array.

    Copy CodeCopiedUse a different Browser
    plt.figure(figsize=(10, 5))
    
    
    plt.subplot(1, 2, 1)
    plt.imshow(img)
    plt.title("Original Image")
    plt.axis("off")
    
    
    plt.subplot(1, 2, 2)
    plt.imshow(depth_map, cmap='inferno')
    plt.title("Depth Map")
    plt.axis("off")
    
    
    plt.tight_layout()
    plt.show()

    Finally, we create a side-by-side visualization of the original image and its corresponding depth map using Matplotlib. The depth map is displayed using the ‘inferno’ colormap for better contrast.

    In conclusion, by completing this tutorial, we’ve successfully deployed Intel’s MiDaS model on Google Colab to perform monocular depth estimation using just an RGB image. Using PyTorch for model inference, OpenCV for image processing, and Matplotlib for visualization, we’ve built a robust pipeline to generate high-quality depth maps with minimal setup. This implementation is a strong foundation for further exploration, including video depth estimation, real-time applications, and integration of AR/VR systems.


    Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

    The post A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning Agents
    Next Article TokenBridge: Bridging The Gap Between Continuous and Discrete Token Representations In Visual Generation

    Related Posts

    Machine Learning

    LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

    May 17, 2025
    Machine Learning

    This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Can one combine FlaUI and Selenium?

    Development

    What is AGI (Artificial General Intelligence)?

    News & Updates

    CVE-2025-3818 – Webpy Web.py PostgresDB SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Free Animated 3D Objects

    Development

    Highlights

    Machine Learning

    This AI Paper Introduces a Short KL+MSE Fine-Tuning Strategy: A Low-Cost Alternative to End-to-End Sparse Autoencoder Training for Interpretability

    April 5, 2025

    Sparse autoencoders are central tools in analyzing how large language models function internally. Translating complex…

    A Complete Guide to NetSuite Account Reconciliation

    April 25, 2024

    Top Vue Dev Tools to Look Forward in 2025

    March 17, 2025

    Android 16’s first beta is here with better support for adaptive apps, Live Updates notifications, and more

    January 23, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.