A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV

Monocular depth estimation involves predicting scene depth from a single RGB image—a fundamental task in computer vision with wide-ranging applications, including augmented reality, robotics, and 3D scene understanding. In this tutorial, we implement Intel’s MiDaS (Monocular Depth Estimation via a Multi-Scale Vision Transformer), a state-of-the-art model designed for high-quality depth prediction from a single image. Leveraging Google Colab as the compute platform, along with PyTorch, OpenCV, and Matplotlib, this tutorial enables you to upload your image and visualize the corresponding depth maps easily.

Copy CodeCopiedUse a different Browser

!pip install -q timm opencv-python matplotlib

First, we install the necessary Python libraries—timm for model support, opencv-python for image processing, and matplotlib for visualizing the depth maps.

Copy CodeCopiedUse a different Browser

!git clone https://github.com/isl-org/MiDaS.git
%cd MiDaS

Then, we clone the official Intel MiDaS repository from GitHub and navigate into its directory to access the model code and transformation utilities.

Copy CodeCopiedUse a different Browser

import torch
import cv2
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from torchvision.transforms import Compose
from google.colab import files


from midas.dpt_depth import DPTDepthModel
from midas.transforms import Resize, NormalizeImage, PrepareForNet
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

We import all the necessary libraries and MiDaS components required for loading the model, preprocessing images, handling uploads, and visualizing depth predictions. Then we set the computation device to GPU (CUDA) if available; otherwise, it defaults to CPU, ensuring system compatibility.

Copy CodeCopiedUse a different Browser

model_path = torch.hub.load("intel-isl/MiDaS", "DPT_Large", pretrained=True, force_reload=True)
model = model_path.to(device)
model.eval()

Here, we download the pretrained MiDaS DPT_Large model from Intel’s torch.hub, moves it to the selected device (CPU or GPU), and sets it to evaluation mode for inference.

Copy CodeCopiedUse a different Browser

transform = Compose([
    Resize(384, 384, resize_target=None, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method="upper_bound"),
    NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    PrepareForNet()
])

We define MiDaS’s image preprocessing pipeline, which resizes the input image, normalizes its pixel values, and formats it appropriately for model inference.

Copy CodeCopiedUse a different Browser

uploaded = files.upload()
for filename in uploaded:
    img = cv2.imread(filename)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    break

We allow the user to upload an image in Colab, read it using OpenCV, and convert it from BGR to RGB format for accurate color representation.

Copy CodeCopiedUse a different Browser

img_input = transform({"image": img})["image"]
input_tensor = torch.from_numpy(img_input).unsqueeze(0).to(device)


with torch.no_grad():
    prediction = model(input_tensor)
    prediction = torch.nn.functional.interpolate(
        prediction.unsqueeze(1),
        size=img.shape[:2],
        mode="bicubic",
        align_corners=False,
    ).squeeze()


depth_map = prediction.cpu().numpy()

Now, we apply the preprocessing transform to the uploaded image, convert it to a tensor, perform depth prediction using the MiDaS model, resize the output to match the original image dimensions, and extract the final depth map as a NumPy array.

Copy CodeCopiedUse a different Browser

plt.figure(figsize=(10, 5))


plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Original Image")
plt.axis("off")


plt.subplot(1, 2, 2)
plt.imshow(depth_map, cmap='inferno')
plt.title("Depth Map")
plt.axis("off")


plt.tight_layout()
plt.show()

Finally, we create a side-by-side visualization of the original image and its corresponding depth map using Matplotlib. The depth map is displayed using the ‘inferno’ colormap for better contrast.

In conclusion, by completing this tutorial, we’ve successfully deployed Intel’s MiDaS model on Google Colab to perform monocular depth estimation using just an RGB image. Using PyTorch for model inference, OpenCV for image processing, and Matplotlib for visualization, we’ve built a robust pipeline to generate high-quality depth maps with minimal setup. This implementation is a strong foundation for further exploration, including video depth estimation, real-time applications, and integration of AR/VR systems.

Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

The post A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV

LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

Can one combine FlaUI and Selenium?

What is AGI (Artificial General Intelligence)?

CVE-2025-3818 – Webpy Web.py PostgresDB SQL Injection Vulnerability

Free Animated 3D Objects

This AI Paper Introduces a Short KL+MSE Fine-Tuning Strategy: A Low-Cost Alternative to End-to-End Sparse Autoencoder Training for Interpretability

A Complete Guide to NetSuite Account Reconciliation

Top Vue Dev Tools to Look Forward in 2025

Android 16’s first beta is here with better support for adaptive apps, Live Updates notifications, and more

A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV

Related Posts