Peeling Back the Layers of Deep Neural Networks—A Venture from Implementation Perspective

Speaker: Dr. Aritra Dutta

From: University of Southern Denmark

Abstract

When there is a lot of training data or the deep neural network is too large, distributed parallel training becomes essential, which refers to either data or model parallelism. In both cases, parallelism introduces various overheads. Network communication is one such significant overhead in large-scale distributed deep learning. To mitigate the problem, many compressed communication schemes, in the form of sparsification or quantization of stochastic gradients, have been proposed.

However, there exist many significant discrepancies between theory and practice. Theoretical analysis of most existing compression methods assumes many artifacts that generally do not hold in practice. For example, theoreticians, motivated by general practice, design compressors with attractive theoretical properties and show communication gains. Nevertheless, researchers and practitioners face a daunting task when choosing an appropriate compression technique despite the potential theoretical gains. The reason is that training speed and model accuracy depend on
multiple factors, such as the basic framework used for the implementation, the communication library, the network bandwidth, and the characteristics of the model, to name a few.

This talk will provide an overview of gradient compression methods for distributed deep learning from the implementation perspective. We show that if the practical implementation aspects are better realized, they can provide strong theoretical foundations of compressed communication that are deployable in the real-world training of state-of-the-art deep neural network models. Understanding communication compression from the practical point of view would be advantageous for creating foundational aspects of scalable, parallel, and distributed algorithms for challenging paradigms such as federated learning.

Addendum to the main talk. In this scope, I will highlight my ongoing collaborative research on two other topics—(i) I will introduce the first-ever ground and drone-view video dataset, Dual-View Drone or DVD; (ii) I will discuss a few key aspects of my work on designing and analyzing randomized iterative algorithms for solving large-scale linear systems, Ax = b, that frequently arise in practice and demand effective iterative solvers.

For more info, please follow this link.

Locations:

MSB: 318 [ View Website ]

Virtual [ Open Virtual Location Link ]

Contact:

Cherry Place cherry@crcv.ucf.edu

Calendar:: CS/CRCV Seminars
Category:: Speaker/Lecture/Seminar
Tags:: UCFCRCV

Admin Options

Locations:

Contact:

Calendar:

Category:

Tags: