CS/CRCV Seminars

3728353 962290 3728353 3519 CS/CRCV Seminars cscrcv-seminars https://events.ucf.edu/calendar/3519/cscrcv-seminars/ Advancing Temporal Action Localization: Efficient Large Model Adaptation and Open-Vocabulary Recognition in Videos Fri, 07 Feb 2025 14:00:00 -0500 Fri, 07 Feb 2025 15:00:00 -0500 Research 1 101

US United States

en-US Speaker: Ms. Akshita Gupta

From: TU Darmstadt

Abstract

In this talk, I will be discussing advancements in Temporal Action Localization (TAL) with a focus on two key innovations: Efficient Large Model Adaptation and Open-Vocabulary Recognition in Videos.

The first part of the talk introduces the Long-Short-range Adapter (LoSA), a memory-efficient backbone adapter designed for untrimmed videos. LoSA modifies intermediate layers across various temporal ranges to enhance video features, enabling end-to-end adaptation of billion-parameter models like VideoMAEv2. This approach ensures efficient utilization of state-of-the-art video models, even with the complexities of untrimmed video data.

The second part of the talk explores the OVFormer framework, which addresses Open-Vocabulary TAL. OVFormer leverages a language model to generate rich class descriptions and aligns these descriptions with video features using cross-attention. The framework employs a two-stage training strategy to enable generalization to novel categories, extending the range of recognizable actions beyond predefined categories.

Additionally, I will briefly discuss my internship work at Apple, where I worked on generating speech from videos of people and their transcripts.

For more info, please follow this link.

]]> Event Instance url https://events.ucf.edu/event/3728353/advancing-temporal-action-localization-efficient-large-model-adaptation-and-open-vocabulary-recognition-in-videos/ Cherry Place cherry@crcv.ucf.edu Speaker/Lecture/Seminar UCFCRCV Happening As Scheduled Public