This AI Paper Introduces a Unified Perspective on the Relationship between Latent Space and Generative Models

In recent years, there have been drastic changes in the field of image generation, mainly due to the development of latent-based generative models, such as Latent Diffusion Models (LDMs) and Mask Image Models (MIMs). Reconstructive autoencoders, like VQGAN and VAE, can reduce images into smaller and easier forms called low-dimensional latent space. This allows these models to create very realistic images. Considering the major influence of autoregressive (AR) generative models, such as Large Language Models in natural language processing (NLP), itâ€™s interesting to explore whether similar approaches can work for images. Even though autoregressive models use the same latent space as models like LDMs and MIMs, they still somewhere fails in image generation. This stands in sharp contrast to natural language processing (NLP), where the autoregressive model GPT has achieved major dominance.

Current methods like LDMs and MIMs use reconstructive autoencoders, such as VQGAN and VAE, to transform images into a latent space. However, these approaches face challenges with stability and performance too. It is seen that, in the VQGAN model, as the image reconstruction quality improves (indicated by a lower FID score), the overall generation quality can actually decline. To address these issues, researchers have proposed a new method called Discriminative Generative Image Transformer (DiGIT). Unlike traditional autoencoder approaches, DiGIT separates the training of encoders and decoders, starting with the encoder-only training through a discriminative self-supervised model.

A team of researchers from the School of Data Science and the School of Computer Science and Technology at the University of Science and Technology of China, as well as the State Key Laboratory of Cognitive Intelligence and Zhejiang University propose Discriminative Generative Image Transformer (DiGIT). This method separates the training of encoders and decoders, beginning with encoder, training through a discriminative self-supervised model. This strategy enhances the stability of the latent space, making it more robust for autoregressive modeling. They utilize a method inspired by VQGAN to convert the encoderâ€™s latent feature space into discrete tokens using K-means clustering. The research suggests that image autoregressive models can operate similarly to GPT models in natural language processing. The main contributions of this work include a unified perspective on the relationship between latent space and generative models, emphasizing the importance of stable latent spaces; a novel method that separates the training of encoders and decoders to stabilize the latent space; and an effective discrete image tokenizer that enhances the performance of image autoregressive models.Â

Â Â Â Â Â The architecture of DiGIT

During testing, researchers matched each image patch with the nearest token from the codebook. After training a causal Transformer to predict the next token using these tokens, the researchers got good results on ImageNet. The DiGIT model surpasses previous techniques in image understanding and generation, demonstrating that using a smaller token grid can lead to higher accuracy. Experiments conducted by researchers highlighted the effectiveness of the proposed discriminative tokenizer, which significantly boosts model performance, as the number of parameters increases. The study also found that increasing the number of K-Means clusters enhances accuracy, reinforcing the advantages of a larger vocabulary in autoregressive modeling.

In conclusion, this paper presents a unified view of how latent space and generative models are related, highlighting the importance of a stable latent space in image generation and introducing a simple yet effective image tokenizer and an autoregressive generative model called DiGIT. The results also challenge the common belief that being good at reconstruction means also having an effective latent space for autoregressive generation. Through this work, the researchers aim to rekindle interest in the generative pre-training of image auto-regressive models, encourage a reevaluation of the fundamental components that define latent space for generative models, and make this a step towards new technologies and methods!

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter.. Donâ€™t Forget to join ourÂ 55k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post This AI Paper Introduces a Unified Perspective on the Relationship between Latent Space and Generative Models appeared first on MarkTechPost.

Source: Read MoreÂ

CodeSOD: Enterprise Code Coverage

CodeSOD: Ready Xor Not

CodeSOD: A Set of Mistakes

CodeSOD: While This Works

I tested the viral ‘tangle-free’ USB-C cable, and it’s my new travel essential

I tried an ultra-thin iPhone case, and here’s how my daunting experience went

I found one of the fastest-charging portable batteries for home backups – and it’s on sale

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PEAR Releases (12.09.2024)

Community News: Latest PECL Releases (12.17.2024)

Windows 11’s Microsoft 365 app is taking a new AI-first approach with Copilot

Windows 11’s Microsoft 365 app is taking a new AI-first approach with Copilot

5 Compelling Reasons to Choose Linux Over Windows

Rilasciato DXVK 2.5.2: Ottimizzazioni e Correzioni per i Giochi Windows su GNU/Linux

This AI Paper Introduces a Unified Perspective on the Relationship between Latent Space and Generative Models

Why developers needn’t fear CSS – with the King of CSS himself Kevin Powell [Podcast #154]

I tested the viral ‘tangle-free’ USB-C cable, and it’s my new travel essential

I’ve used Google’s $499 Pixel 8a for a week. Here’s my buying advice for 2024

One of the best ultraportable laptops I’ve tested isn’t a MacBook Air or LG Gram

Data annotation decoded: A complete guide for 2024

HIP-35 â€“ HP-35 RPN calculator emulator

How to Host a Website on AWS EC2 Using a CSS Template

EvolutionaryScaleâ€™s ESM3: a generative model for biology

Microsoft testing functionality where you can browse your Androidâ€™s files directly from your Windows 11

Ransomware Group LockBit Claims Responsibility for Cannes Hospital Cyberattack

This AI Paper Introduces a Unified Perspective on the Relationship between Latent Space and Generative Models

Related Posts