GameFactory: Leveraging Pre-trained Video Models for Creating New Game

Video diffusion models have emerged as powerful tools for video generation and physics simulation, showing promise in developing game engines. These generative game engines function as video generation models with action controllability, allowing them to respond to user inputs like keyboard and mouse interactions. A critical challenge in this field is scene generalization – the ability to create new game scenes beyond existing ones. While collecting large-scale action-annotated video datasets would be the most straightforward approach to achieve this, such annotation is prohibitively expensive and impractical for open-domain scenarios. This limitation creates a barrier to developing versatile game engines that can generate diverse and novel game environments.

Recent approaches in video generation and game physics have explored various methodologies, with video diffusion models emerging as a significant advancement. These models have evolved from U-Net to Transformer-based architectures, enabling the generation of more realistic and longer-duration videos. Further, methods like Direct-a-Video, offer basic camera control, while MotionCtrl and CameraCtrl provide more complex camera pose manipulation. In the gaming domain, various projects like DIAMOND, GameNGen, and PlayGen have attempted game-specific implementations but suffer from overfitting to specific games and datasets, showing limited scene generalization capabilities.

Researchers from The University of Hong Kong and Kuaishou Technology have proposed GameFactory, a groundbreaking framework designed to address scene generalization in-game video generation. The framework utilizes pre-trained video diffusion models trained on open-domain video data to enable the creation of entirely new and diverse games. Researchers also developed a multi-phase training strategy that separates game-style learning from action control to overcome the domain gap between open-domain priors and limited game datasets. They have also released GF-Minecraft, a high-quality action-annotated video dataset, and expanded their framework to support autoregressive action-controllable game video generation, enabling the production of unlimited-length interactive game videos.

GameFactory employs a complex multi-phase training strategy, to achieve effective scene generalization and action control. The process begins with a pre-trained video diffusion model and proceeds through three phases. In Phase #1, the model uses LoRA adaptation to specialize in the target game domain while preserving most original parameters. Phase #2 focuses exclusively on training the action control module, with pre-trained parameters and LoRA frozen. This separation prevents style-control entanglement and enables the model to focus purely on learning action controls. During Phase #3, the LoRA weights are removed while retaining the action control module parameters allowing the system to generate controlled game videos across diverse open-domain scenarios without being tied to specific game styles.

Evaluation of GameFactory’s performance reveals significant insights into different control mechanisms and their effectiveness. Cross-attention shows superior performance over concatenation for discrete control signals like keyboard inputs, as measured by Flow-MSE metrics. However, concatenation proves more effective for continuous mouse movement signals, likely because cross-attention similarity computation tends to diminish the impact of the control signal’s magnitude. Different methods show comparable performance due to the decoupled style of learning in Phase #1 in terms of style consistency, measured by CLIPSim and FID metrics. The system masters basic atomic actions and complex combined movements across diverse game scenarios.

In this paper, researchers introduced GameFactory which represents a significant advancement in generative game engines, addressing the crucial challenge of scene generalization in-game video generation. The framework shows the feasibility of creating new games through generative interactive videos by effectively utilizing open-domain video data and implementing a novel multi-phase training strategy. While this achievement marks an important milestone, several challenges remain in developing generative game engines, that are fully capable. This includes diverse levels creation, implementation of gameplay mechanics, development of player feedback systems, in-game object manipulation, and real-time game generation. GameFactory establishes a promising foundation for future research in this evolving field.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

Recommend Open-Source Platform: Parlant is a framework that transforms how AI agents make decisions in customer-facing scenarios. ^(Promoted)

The post GameFactory: Leveraging Pre-trained Video Models for Creating New Game appeared first on MarkTechPost.

Source: Read MoreÂ

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Smashing Animations Part 4: Optimising SVGs

I test AI tools for a living. Here are 3 image generators I actually use and how

The world’s smallest 65W USB-C charger is my latest travel essential

This Spotlight alternative for Mac is my secret weapon for AI-powered search

Tech prophet Mary Meeker just dropped a massive report on AI trends – here’s your TL;DR

Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

Cast Model Properties to a Uri Instance in 12.17

My Favorite Obsidian Plugins and Their Hidden Settings

My Favorite Obsidian Plugins and Their Hidden Settings

Rilasciata /e/OS 3.0: Nuova Vita per Android Senza Google, Più Privacy e Controllo per l’Utente

Rilasciata Oracle Linux 9.6: Scopri le Novità e i Miglioramenti nella Sicurezza e nelle Prestazioni

GameFactory: Leveraging Pre-trained Video Models for Creating New Game

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

A Coding Implementation to Build an Advanced Web Intelligence Agent with Tavily and Gemini AI

I found out Assassin’s Creed Shadows doesn’t let you upgrade or customize gear right away — but here’s how to unlock it

Google reveals trio of security vulnerabilities in OS X

CVE-2025-5390 – JeeWMS File Handler Improper Access Control Remote Vulnerability

Kodeco Podcast: UIKit to SwiftUI (V2, S2, E9) [FREE]

DistroWatch Weekly, Issue 1111

CVE-2025-4358 – PHPGurukul Company Visitor Management System SQL Injection Vulnerability

6 reasons for website downtime (+ how to resolve it)

CVE Program rescued at the last minute after concerns over losing its government funding

GameFactory: Leveraging Pre-trained Video Models for Creating New Game

Related Posts