Visual Storytelling with Generative Models of Video

Speaker: Dr. Ruben Villegas

From: Google Brain

Abstract

Generative models of images have been making incredible progress in the past few years. This brings an opportunity for users to unleash their creativity and easily generate images with just a few words describing what they want to see. However, taking the creative process a step further to create stories that are coherent in time is still challenging. This happens because image generation techniques are not built to understand how time evolves, and thus, cannot generate temporally coherent outputs out of the box.

In this talk, I will present Phenaki, the very first model capable of generating coherent video given a sequence of open domain prompts that evolve over time. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, we demonstrate how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to the previous video generation methods, Phenaki can generate arbitrary long videos conditioned on a sequence of prompts (i.e. time variable text or a story) in open domain. In addition, compared to the per-frame baselines, the proposed video encoder-decoder computes fewer tokens per video but results in better spatio-temporal consistency.

For more info, please follow this link.

Locations:

HEC: 101B

Virtual [ Open Virtual Location Link ]

Contact:

Cherry Place cherry@crcv.ucf.edu

Calendar:: CS/CRCV Seminars
Category:: Speaker/Lecture/Seminar
Tags:: UCFCRCV

Admin Options

Locations:

Contact:

Calendar:

Category:

Tags: