Towards Deploying AI in the Real World

Thursday, March 9, 2023 11 a.m. to noon

Speaker: Dr. Ser Nam Lim

From: Facebook AI

Abstract

In this talk, I will focus on three of our most recent works with one having been published in CVPR2023 and two at ECCV2022. In all these works, our focus is on AI technologies that will actually work in the real world. Breaking that down, the criteria are (1) models that will work in the open world setting, (2) significantly reducing the amount of annotations needed to train a model, and (3) updating models fast to adapt to the real world’s rapidly changing data trend. For (1), I will present a work where we found that we can leverage Vision Language Pretrained (VLP) models, which are widely known to capture open world language-grounded image semantics, for zero-shot image semantic segmentation. The idea is simple: conducting InfoNCE based contrastive learning with the image’s patch tokens sets the SOTA in zero-shot open vocabulary image segmentation. Then, for (2), I will present a work we did in the field of image captioning, an important task that has many applications in the real world, one of which is social platforms where image captioning opens the door to integration with Large Language Models (LLM) in recommendation and search systems. The caveat with image captioning is that a huge amount of annotation is usually required, and on an ongoing basis, to deal with the changing trends especially on social platforms. To alleviate this, I describe how we leverage large scale pre-trained object detectors and match objects with nouns in the text without any supervision. This can also be extended in two ways: (1) to more than objects, where we can utilize technologies such as scene graphs or relationship/attribute detectors to match text, and (2) to non-English text. Finally, for (3), I will present the last paper on “visual prompt tuning”, an approach where we borrow the concept of prompt tuning from NLP, and apply it to CV. The basic idea is to train lightweight visual prompts for new tasks without going through a much more laborious full-blown fine-tuning. This opens the door to building a zoo of prompts that can be adapted to new tasks that come along. Interestingly, this work was actually productionized at Meta and produced significant performance gains, demonstrating the efficacy of our approach.

For more info, please follow this link.

Read More

Locations:

R1:101: Research 1, Room 101 [ View Website ]

Contact:


Calendar:

CRCV

Category:

Speaker/Lecture/Seminar

Tags:

Aii UCFCRCV