Humans often argue how players should act while watching videos of sports or games. They can also reason what the impact of action would be. Inspired by these abilities, a group of researchers has recently proposed a new task, named Playable Video Generation.
It seeks to learn actions from real-world video clips without supervision. The user can control the video by selecting actions and immediately seeing their impacts. The approach relies on encoder-decoder architecture, which uses a discrete bottleneck layer to obtain a representation of the transitions between frames.
The method uses a reconstruction loss on the generated video as the main driving loss. That enables to learn without supervision or know knowing the exact number of actions. The method shows state-of-the-art performance on both real-world and synthetic datasets. It can learn a rich set of actions, thus enabling the user to enjoy a gaming-like experience.
This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a novel framework for PVG that is trained in a self-supervised manner on a large dataset of unlabelled videos. We employ an encoder-decoder architecture where the predicted action labels act as bottleneck. The network is constrained to learn a rich action space using, as main driving loss, a reconstruction loss on the generated video. We demonstrate the effectiveness of the proposed approach on several datasets with wide environment variety. Further details, code and examples are available on our project page this http URL.
Research paper: Menapace, W., Lathuilière, S., Tulyakov, S., Siarohin, A., and Ricci, E., “Playable Video Generation”, 2021, arXiv:2101.12195.Link: https://arxiv.org/abs/2101.12195