XR-Objects: A New Open-Source Augmented Reality Prototype that Transforms Physical Objects into Interactive Digital Portals Using Real-Time Object Segmentation and Multimodal Large Language Models

Advancements in Extended Reality (XR) have allowed for the fusion of real-world entities within the virtual world. However, despite the innumerable sensors, plethora of cameras, and expensive computer vision techniques, this integration poses a few critical questions. 1 ) Does this blend truly capture the essence of real-world objects or merely treat them as a backdrop? 2) If we continue along the path at this velocity, would it be â€œfeasiblyâ€ accessible to the masses soon? When seen stand-alone without machine learning interventions, the future of XR seems hazy â€“ A) Current endeavors transport surrounding objects into XR, but this integration is superficial and lacks meaningful interaction. B ) Masses are not most exuberant when they go before the technological constraints to experience the XR mentioned in part (A). When AI and its multiple fascinating applications, such as real-time unsupervised segmentation and generative AI content generation, come into perspective, solid ground is set for XR to achieve this XR future encompassing a seamless integration.

A team of researchers at Google recently unveiled XR-Objects, and in their literal words, they claim to make XR as immersive as â€“ â€œright-clicking a digital file to open its context menu, but applied to physical objects.â€ The paper introduces â€˜Augmented Object Intelligenceâ€™ that employs AI to extract digital information from analog objects, a task established earlier as strenuous. AOI represents a paradigm shift towards seamless integration of real and virtual content and gives users the freedom to context-appropriate digital interactions. Google Researchers combined AR developments in spatial understanding via SLAM with object detection and segmentation integrated with Multimodal Large Language Model (MLLM)Â

XR Object offers an object-centric interaction in contradistinction to the application-centric approach of Google Lens. Here, interactions are directly anchored to objects within the userâ€™s environment, further improved by a World-Space UI, which saves one the hassle of navigating through applications and manually selecting objects. To ensure aesthetic appeal and avoid clutter, digital information is presented in semi-transparent bubbles that serve as subtle minimalist prompts.

The framework to achieve this state-of-the-art in XR is straightforward. The quarter-fold strategy is â€“ A) Object Detection and B) Localisation and Anchoring of objects. C) Coupling each object with MLLM D) Action Execution. Google MediaPipe library, which essentially uses a mobile-optimized CNN, comes in handy for the first task and generates 2D bounding boxes that initiate AR anchoring and localization. Currently, this CNN is trained on a COCO dataset that categorizes around 80 objects. Initially, Depth Maps are used for AR localization, and an object proxy template containing the objectâ€™s context menu is initiated. At last, an MLLM(PaLI) is coupled with each object, and the cropped bounding box from step A becomes the prompt. This makes the algorithm stand out and identify â€œSuperior Dark Soy Sauceâ€ from the ordinary bottle kept in your kitchen.

Google performed a user study to compare XR Object against Gemini, and the results were no surprise given the above context. XR achieved sweet victories in time consumption and form factor for HMD. The form factor for the phone was split between chatbot and XR objects. The HALIE survey results for both Chatbot and XR were similar. The subject users also gave appreciative feedback for XR on how helpful and efficient it was. Users also provided feedback to improve its ergonomic feasibility.

This new AOI paradigm is promising and would grow with acceleration in LLM functionalities. It would be interesting to see if its counter Meta, which has made massive strides in segmentation and LLM, would develop new solutions to supersede XR Objects and take XR to a new zenith.

Check out the Paper and Details. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter.. Donâ€™t Forget to join ourÂ 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Letâ€™s collaborate!

The post XR-Objects: A New Open-Source Augmented Reality Prototype that Transforms Physical Objects into Interactive Digital Portals Using Real-Time Object Segmentation and Multimodal Large Language Models appeared first on MarkTechPost.

Source: Read MoreÂ

CodeSOD: Enterprise Code Coverage

Mastering SVG Arcs

CodeSOD: A Set of Mistakes

CodeSOD: While This Works

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

Finally, a luxury soundbar that’s compact and delivers immersive audio (and it’s $500 off)

This affordable Lenovo gaming PC is the one I recommend to most people. Here’s why

The last day of ’12 days of OpenAI’ is expected to bring biggest drop yet

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PEAR Releases (12.09.2024)

Community News: Latest PECL Releases (12.17.2024)

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

Windows 11 hidden toggle reveals how to turn on or off Administrator protection

10 Must-Have Apps for 3 Monitors You Should Know About

XR-Objects: A New Open-Source Augmented Reality Prototype that Transforms Physical Objects into Interactive Digital Portals Using Real-Time Object Segmentation and Multimodal Large Language Models

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

What do the State of CSS and HTML surveys tell us?

Humanityâ€™s Final Challenge: AI Experts Prepare “Humanity’s Last Exam” for a New Era of Tech

Best YouTube Channels for UX Designers

LlamaIndex Workflows: An Event-Driven Approach to Orchestrating Complex AI Applications

DAI#45 â€“ New top model, lawsuit blues, and puzzled AI

Google DeepMind Introduces Video-to-Audio V2A Technology: Synchronizing Audiovisual Generation

The Radio-Head

Transition from AWS DMS to zero-ETL to simplify real-time data integration with Amazon Redshift

Composable Martech: Experience Builders

XR-Objects: A New Open-Source Augmented Reality Prototype that Transforms Physical Objects into Interactive Digital Portals Using Real-Time Object Segmentation and Multimodal Large Language Models

Related Posts