Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 22, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 22, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 22, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 22, 2025

      Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

      May 22, 2025

      How to get started with Microsoft Copilot on Windows 11

      May 22, 2025

      Microsoft blocks employees from sending emails that mention “Palestine” or “Gaza”

      May 22, 2025

      I missed out on the Clair Obscur: Expedition 33 Collector’s Edition but thankfully, the developers are launching something special

      May 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Perficient is Shaping the Future of Salesforce Innovation

      May 22, 2025
      Recent

      Perficient is Shaping the Future of Salesforce Innovation

      May 22, 2025

      Opal – Optimizely’s AI-Powered Marketing Assistant

      May 22, 2025

      Content Compliance Without the Chaos: How Optimizely CMP Empowers Financial Services Marketers

      May 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

      May 22, 2025
      Recent

      Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

      May 22, 2025

      How to get started with Microsoft Copilot on Windows 11

      May 22, 2025

      Microsoft blocks employees from sending emails that mention “Palestine” or “Gaza”

      May 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Future Token Prediction Model FTP: A New AI Training Method for Transformers that Predicts Multiple Future Tokens

    Future Token Prediction Model FTP: A New AI Training Method for Transformers that Predicts Multiple Future Tokens

    November 3, 2024

    The current design of causal language models, such as GPTs, is intrinsically burdened with the challenge of semantic coherence over longer stretches because of their one-token-ahead prediction design. This has enabled significant generative AI development but often leads to “topic drift” when longer sequences are produced since each token predicted depends only on the presence of mere preceding tokens, not from a broader perspective. This narrows the practical usefulness of these models in complex real-world applications with strict topic adherence, such as narrative generation, content creation, and coding tasks. Overcoming this challenge by enabling multi-token prediction would greatly improve semantic continuity, accuracy, and coherence of the generated sequences of the current generative language models.

    There have been various ways through which multi-token prediction has been addressed, each with different limitations. Models that aim to make predictions for multiple tokens by splitting embeddings or having multiple language heads are computationally intensive and often don’t perform well. For Seq2Seq models in encoder-decoder sets, while this allows for multi-token prediction, they fail to capture past contexts into one single embedding; hence, a lot of inefficiencies result. While BERT and other masked language models can predict multiple tokens of a sequence that are masked, they fail in left-to-right generation, hence restricting their use in sequential text prediction. ProphetNet, on the other hand, uses an n-gram prediction strategy; nonetheless, this is not flexible across a wide range of data types. The basic drawbacks of the aforementioned methods are scalability issues, computational waste, and generally unimpressive results while generating high-quality predictions over long-context problems.

    The researchers from EPFL introduce the Future Token Prediction model, representing a new architecture to create broader context-aware token embeddings. This will enable seamless multi-token predictions where, in contrast with standard models, the embedding from the top layers is used by a transformer encoder to provide “pseudo-sequences” cross-attended by a small transformer decoder for next-token predictions. In this way, the model leverages such encoder-decoder capability of the FTP for retaining context information from tokens of the previous history to make smoother transitions and maintain topic coherence across multi-token predictions. With more widespread sequence context encoded within its embeddings, FTP provides stronger continuity for generated sequences and has become one of the best approaches to content generation and other applications that require long-form semantic coherence.

    The FTP model employs a modified GPT-2 architecture that is made up of a 12-layer encoder with a 3-layer decoder. Its encoder generates token embeddings that are linearly projected to higher dimensionality into a 12-dimensional pseudo-sequence that the decoder cross-attends over to make sense of sequence context. It shares embedding weights between the encoder and decoder; it is trained on OpenWebText data and uses the GPT-2 tokenizer. Meanwhile, optimization is done by AdamW, with a batch size of 500 and a learning rate of 4e-4. There is the gamma parameter set to 0.8 in this model to progressively discount the attention given to tokens far into the future so that immediate predictions can remain highly accurate. This way, the FTP model manages to keep semantic coherence without substantial computational overhead and thus finds an optimum trade-off between efficiency and performance.

    These results and evaluation indeed show that the model brings significant improvements compared to traditional GPTs on many key performance metrics: significant reductions in perplexity, better predictive accuracy, and enhanced stability for long-sequence tasks. It also yields higher recall, precision, and F1 scores in BERT-based assessments of textual quality, which would further imply improved semantic alignment against actual text sequences. It also outperforms GPT models on text classification tasks like the IMDB and Amazon reviews and always provides better validation loss with higher accuracy. More importantly, FTP follows the topic of the generated text more coherently, supported by higher cosine similarity scores in long-sequence evaluations, further establishing its prowess for coherent, contextually relevant content generation across more varied applications.

    Hostinger

    The FTP model represents a paradigm shift in causal language modeling, one that develops the most critical inefficiencies of the classic single-token methods into an embedding that supports wider and context-sensitive views for making multi-token predictions. By enhancing both the accuracy of prediction and semantic coherence, this difference is underlined by improved scores across both perplexity and BERT-based metrics for a wide range of tasks. The pseudo-sequence cross-attention mechanism within this model enhances generative AI by pulling consistent narrative flow—an important requirement for high value in topic-coherent language modeling across applications that require semantic integrity.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

    [Trending] LLMWare Introduces Model Depot: An Extensive Collection of Small Language Models (SLMs) for Intel PCs

    The post Future Token Prediction Model FTP: A New AI Training Method for Transformers that Predicts Multiple Future Tokens appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleEfficient Function Calling in Small-Scale LLMs: A Game-Changer for AI Reasoning Tasks
    Next Article Tokenformer: The Next Generation of Transformer Architecture Leveraging Tokenized Parameters for Seamless, Cost-Effective Scaling Across AI Applications

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 23, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-2394 – Ecovacs Home Android and iOS Mobile Apps Stored XSS Vulnerability

    May 23, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    (non) recensione CachyOS

    Linux

    Authorities Seize Domains of Popular Hacking Forums in Major Cybercrime Crackdown

    Development

    SwiftUI Navigation [SUBSCRIBER]

    Development

    LightSpy Expands to 100+ Commands, Increasing Control Over Windows, macOS, Linux, and Mobile

    Development
    Hostinger

    Highlights

    Figma Sites Isn’t the Future

    May 14, 2025

    Figma Sites just dropped, promising to kill WordPress and dethrone Framer — but under the…

    Augmentoolkit: An AI-Powered Tool that Lets You Create Domain-Specific Using Open-Source AI

    July 13, 2024

    Obarun – Arch based distribution without systemd

    June 23, 2024

    How To Design For High-Traffic Events And Prevent Your Website From Crashing

    January 7, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.