Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»TiTok: An Innovative AI Method for Tokenizing Images into 1D Latent Sequences

    TiTok: An Innovative AI Method for Tokenizing Images into 1D Latent Sequences

    June 14, 2024

    In recent years, image generation has made significant progress due to advancements in both transformers and diffusion models. Similar to trends in generative language models, many modern image generation models now use standard image tokenizers and de-tokenizers. Despite showing great success in image generation, image tokenizers encounter fundamental limitations due to the way they are designed. These tokenizers are based on the assumption that the latent space should retain a 2D structure to maintain a direct mapping for locations between the latent tokens and image patches. 

    This paper discusses three existing methods in the realm of image processing and understanding. Firstly, Image Tokenization has been a fundamental approach since the early days of deep learning, utilizing autoencoders to compress high-dimensional images into low-dimensional latent representations and then decode them back. The second approach is Tokenization for Image Understanding, which is used for image understanding tasks such as image classification, object detection, segmentation, and multimodal large language models (MLLMs). Last is the Image Generation, in which methods have evolved from sampling variational autoencoders (VAEs) to utilizing generative adversarial networks (GANs), diffusion models, and autoregressive models. 

    Researchers from Technical University Munich and ByteDance have proposed an innovative approach that tokenizes images into 1D latent sequences, named Transformer-based 1-Dimensional Tokenizer (TiTok). TiTok consists of a Vision Transformer (ViT) encoder, a ViT decoder, and a vector quantizer, similar to typical Vector-Quantized (VQ) model designs. During the tokenization phase, the image is divided into patches, which are then flattened and combined into a 1D sequence of latent tokens. After the ViT encoder processes the image features, the resulting latent tokens form the image’s latent representation.

    Along with the Image Generation task using a tokenizer, TiTok also shows its efficiency in image generation by using a typical pipeline. For the generation framework, MaskGIT is used because of its simplicity and effectiveness, which allows for training a MaskGIT model by simply replacing its VQGAN tokenizer with TiTok model. The process begins by pre-tokenizing the image into 1D discrete tokens, and a random ratio of the latent tokens is replaced with mask tokens at each training step. After that, a bidirectional transformer takes this masked token sequence as input and predicts the corresponding discrete token IDs for the masked tokens.

    TiTok provides a more compact way for latent representation, making it much more efficient than traditional methods. For example, a 256 × 256 × 3 image can be reduced to just 32 discrete tokens, compared to the 256 or 1024 tokens used by earlier techniques. Using the same generator framework, TiTok achieves a gFID score of 1.97, outperforming the MaskGIT baseline by 4.21 on the ImageNet 256 × 256 benchmark. TiTok’s advantages are even more significant at higher resolutions. On the ImageNet 512 × 512 benchmark, TiTok not only outperforms the leading diffusion model DiT-XL/2 but also reduces the number of image tokens by 64 times, resulting in a generation process that is 410 times faster.

    In this paper, researchers have introduced an innovative method that tokenizes images into 1D latent sequences called TiTok. It can be used for reconstructing and generating natural images. A compact formulation is provided to tokenize an image into a 1D latent sequence. The proposed method can represent an image with 8 to 64 times fewer tokens than the commonly used 2D tokenizers. Moreover, the compact 1D tokens enhance the training and inference speed of the generation model, as well as obtain a competitive FID on the ImageNet benchmarks. The future direction will focus on more efficient image representation and generation models with 1D image tokenization.

    Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 44k+ ML SubReddit

    The post TiTok: An Innovative AI Method for Tokenizing Images into 1D Latent Sequences appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleError’d: All Michael
    Next Article Unlocking the Language of Proteins: How Large Language Models Are Revolutionizing Protein Sequence Understanding

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    SolverLearner: A Novel AI Framework for Isolating and Evaluating the Inductive Reasoning Capabilities of LLMs

    Development

    Visa preps AI-ready credit cards for automated shopping transactions

    News & Updates
    Laravel Herd Raycast Extension

    Laravel Herd Raycast Extension

    Development

    Phi-4-mini, Microsoft’s new next-gen small model, has finally arrived

    Operating Systems

    Highlights

    News & Updates

    Over Extended Methods

    March 20, 2025

    Jenny had been perfectly happy working on a series of projects for her company, before…

    CVE-2025-2821 – WordPress Search Exclude Plugin Unauthenticated Data Modification

    May 6, 2025

    From Phantoms to Facts: DPO Fine-Tuning Minimizes Hallucinations in Radiology Reports, Boosting Clinical Trust

    June 18, 2024

    You won’t believe how B2B marketing is shifting – here are 5 ways to land more deals

    February 15, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.