Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025

      New Xbox games launching this week, from June 2 through June 8 — Zenless Zone Zero finally comes to Xbox

      June 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025
      Recent

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»CoordTok: A Scalable Video Tokenizer that Learns a Mapping from Co-ordinate-based Representations to the Corresponding Patches of Input Videos

    CoordTok: A Scalable Video Tokenizer that Learns a Mapping from Co-ordinate-based Representations to the Corresponding Patches of Input Videos

    December 26, 2024

    Breaking down videos into smaller, meaningful parts for vision models remains challenging, particularly for long videos. Vision models rely on these smaller parts, called tokens, to process and understand video data, but creating these tokens efficiently is difficult. While recent tools achieve better video compression than older methods, they struggle to handle large video datasets effectively. A key issue is their inability to fully utilize temporal coherence, the natural pattern where video frames are often similar over short periods, which video codecs use for efficient compression. These tools are also computationally expensive to train and are limited to short clips, making them not very effective in capturing patterns and processing longer videos.

    Current video tokenization methods have high computational costs and struggle to handle long video sequences efficiently. Early approaches used image tokenizers to compress videos frame by frame but ignored the natural continuity between frames, reducing their effectiveness. Later methods introduced spatiotemporal layers, reduced redundancy, and used adaptive encoding, but they still required rebuilding entire video frames during training, which limited them to short clips. Video generation models like autoregressive methods, masked generative transformers, and diffusion models are also limited to short sequences. 

    To solve this, researchers from KAIST and UC Berkeley proposed CoordTok, which learns a mapping from coordinate-based representations to the corresponding patches of input videos. Motivated by recent advances in 3D generative models, CoordTok encodes a video into factorized triplane representations and reconstructs patches corresponding to randomly sampled (x, y, t) coordinates. This approach allows large tokenizer models to be trained directly on long videos without requiring excessive resources. The video is divided into space-time patches and processed using transformer layers, with the decoder mapping sampled (x, y, t) coordinates to corresponding pixels. This reduces both memory and computational costs while preserving video quality.

    Based on this, researchers updated CoordTok to efficiently process a video by introducing a hierarchical architecture that grasped local and global features from the video. This architecture represented a factorized triplane to process patches of space and time, making long-duration video processing easier without excessively using computational resources. This approach greatly reduced the memory and computation requirements and maintained high video quality.

    Researchers improved the performance by adding a hierarchical structure that captured the local and global features of videos. This structure allowed the model to process space-time patches more efficiently using transformer layers, which helped generate factorized triplane representations. As a result, CoordTok handled longer videos without demanding excessive computational resources. For example, CoordTok encoded a 128-frame video with 128×128 resolution into 1280 tokens, while baselines required 6144 or 8192 tokens to achieve similar reconstruction quality. The model’s reconstruction quality was further improved by fine-tuning with both ℓ2 loss and LPIPS loss, enhancing the accuracy of the reconstructed frames. This combination of strategies reduced memory usage by up to 50% and computational costs while maintaining high-quality video reconstruction, with models like CoordTok-L achieving a PSNR of 26.9.

    In conclusion, the proposed framework by researchers, CoordTok, proves to be an efficient video tokenizer that uses coordinate-based representations to reduce computational costs and memory requirements while encoding long videos.

    Hostinger

    It allows memory-efficient training for video generation models, making handling long videos with fewer tokens possible. However, it is not strong enough for dynamic videos and suggests further potential improvements, such as using multiple content planes or adaptive methods. This work can serve as a starting point for future research on scalable video tokenizers and generation, which can be beneficial for comprehending and generating long videos.


    Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

    🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

    The post CoordTok: A Scalable Video Tokenizer that Learns a Mapping from Co-ordinate-based Representations to the Corresponding Patches of Input Videos appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeet CoMERA: An Advanced Tensor Compression Framework Redefining AI Model Training with Speed and Precision
    Next Article Deep Learning and Vocal Fold Analysis: The Role of the GIRAFE Dataset

    Related Posts

    Security

    New Linux Flaws Allow Password Hash Theft via Core Dumps in Ubuntu, RHEL, Fedora

    June 2, 2025
    Security

    Google AI Edge Gallery: Unleash On-Device AI Power on Your Android (and Soon iOS!)

    June 2, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Azure DevOps Pipeline: Guide to Automated Testing

    Development

    Kodeco Podcast: All the Conferences – Podcast V2, S3 E3 [FREE]

    Learning Resources

    Experts Warn of Mekotio Banking Trojan Targeting Latin American Countries

    Development

    How to do a clean install of Windows 11: See which option is best for you and why

    News & Updates

    Highlights

    How to resolve “Error: TypeScript compilation failed.” issue in Gitlab CI?

    June 17, 2024

    I’m trying to learn to create a pipeline in Gitlab. All my tests fail because of the compilation issue. My code is included below along with the error message.
    import loginPage from “../PageObjects/loginPage”;

    fixture(‘Login Tests’)
    .page(‘https://www.saucedemo.com/’);

    test(‘User Login to the website’, async (t) => {
    await loginPage.loginWebsite(‘performance_glitch_user’,’secret_sauce’);
    });

    import { Selector, t } from “testcafe”;

    class loginPage{
    userName: Selector;
    password: Selector;
    loginBtn: Selector;
    pageName: Selector;

    constructor(){
    this.userName = Selector(‘#user-name’);
    this.password = Selector(‘#password’);
    this.loginBtn = Selector(‘#login-button’);
    this.pageName = Selector(‘.title’);
    }

    async loginWebsite(username, password){
    await t
    .typeText(this.userName, username)
    .typeText(this.password, password)
    .click(this.loginBtn)
    .expect(Selector(this.pageName).innerText).eql(‘Products’);
    }
    }

    export default new loginPage;

    package.json
    {
    “name”: “testcafeproj”,
    “version”: “1.0.0”,
    “description”: “Assignment”,
    “main”: “index.js”,
    “scripts”: {
    “test”: “testcafe chrome tests/**/*”,
    “test2”: “testcafe edge tests/**/*”,
    “test:chrome:headless”: “testcafe chrome:headless tests/**/*”,
    “test:chrome:reports”: “testcafe chrome tests/* –reporter html:reports/report.html”,
    “test:chrome:reports:ss”: “testcafe chrome tests/*.ts -s takeOnFails=true –reporter html:reports/report.html”
    },
    “author”: “test”,
    “license”: “ISC”,
    “devDependencies”: {
    “faker”: “^5.5.3”,
    “faker-js”: “^1.0.0”,
    “testcafe”: “^2.4.0”,
    “typescript”: “^4.9.5”
    },
    “dependencies”: {
    “@faker-js/faker”: “^7.6.0”,
    “testcafe-reporter-html”: “^1.4.6”
    }
    }

    .gitlab-ci.yml
    stages:
    – test

    test_job:
    image: cypress/browsers:node18.12.0-chrome107
    stage: test
    script:
    – npm ci
    – npm run test:chrome:headless

    This is the error message in CI

    Xbox is getting the sequel to one of Nintendo Switch’s biggest cult mechaanimehits

    April 4, 2025

    My two favorite AI apps on Linux – and how I use them to get more done

    April 21, 2025

    “In Array Keys” Validation Rule Added in Laravel 12.16

    May 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.