BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//UNIFY
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE


BEGIN:VEVENT
UID:https://events.ucf.edu/event/3728353/advancing-temporal-action-localization-efficient-large-model-adaptation-and-open-vocabulary-recognition-in-videos/
DTSTAMP:20250207T140000
DTSTART:20250207T140000
DTEND:20250207T150000
LOCATION:Virtual and Research 1: 101

SUMMARY:Advancing Temporal Action Localization: Efficient Large Model Adaptation and Open-Vocabulary Recognition in Videos
URL:https://events.ucf.edu/event/3728353/advancing-temporal-action-localization-efficient-large-model-adaptation-and-open-vocabulary-recognition-in-videos/
DESCRIPTION:Speaker: Ms. Akshita Gupta\n\nFrom: TU Darmstadt\n\nAbstract\n\nIn this talk, I will be discussing advancements in Temporal Action Localization (TAL) with a focus on two key innovations: Efficient Large Model Adaptation and Open-Vocabulary Recognition in Videos. \n\nThe first part of the talk introduces the Long-Short-range Adapter (LoSA), a memory-efficient backbone adapter designed for untrimmed videos. LoSA modifies intermediate layers across various temporal ranges to enhance video features, enabling end-to-end adaptation of billion-parameter models like VideoMAEv2. This approach ensures efficient utilization of state-of-the-art video models, even with the complexities of untrimmed video data. \n\nThe second part of the talk explores the OVFormer framework, which addresses Open-Vocabulary TAL. OVFormer leverages a language model to generate rich class descriptions and aligns these descriptions with video features using cross-attention. The framework employs a two-stage training strategy to enable generalization to novel categories, extending the range of recognizable actions beyond predefined categories.\n\nAdditionally, I will briefly discuss my internship work at Apple, where I worked on generating speech from videos of people and their transcripts.\n\nFor more info, please follow this [link](https://www.crcv.ucf.edu/wp-content/uploads/2018/11/Gupta-Flyer.pdf).\n\nVirtual Location URL: https://ucf.zoom.us/j/93193076532?pwd=lZcxqayQ8jhY6Fg2sPfJJfIXTrENC5.1&amp;from=addon
END:VEVENT


END:VCALENDAR