Computer Architecture Series: Preyesh Dalmia

Thursday, January 23, 2025 1 p.m. to 2 p.m.

Join us via Zoom as we welcome Preyesh Dalmia, a deep learning architect at NVIDIA. He'll be presenting "Reducing Synchronization and Communication Overheads in GPUs."

ABSTRACT: General-purpose programmable accelerators such as GPUs are now behaving as peers in heterogeneous systems rather than accelerators attached to CPUs, due to the broadening spectrum of applications that can run on them. They have evolved from primarily being used for data-parallel streaming applications to now encompassing an ever-growing list of applications such as big data processing, deep learning, graph analytics and data mining. A common characteristic of many of these modern applications is their reliance on fine-grained synchronization and data sharing. However, the current performance levels of these applications are inefficient due to a lack of efficient synchronization support in GPUs. Simple software-driven coherence protocols and the absence of appropriate software support make synchronization operations expensive when required. Synchronization can be either local or global depending on which thread blocks are synchronizing. Scopes were introduced in GPU memory models to allow synchronization to be relatively inexpensive when it’s only required locally. However, global synchronization remains an issue that plagues the performance of applications as they scale. 

Thus, a holistic approach is required to tackle the inefficiencies in how synchronization is used in both single- and multi-GPU systems. Proposed hardware-software frameworks have been proposed that use knowledge of the GPU memory hierarchy and algorithmic properties of applications to improve the efficiency of GPU global synchronization. The proposed techniques have advanced the state of the art for global synchronization in GPUs, resulting in performance and energy improvements across a plethora of modern GPU applications.

PREYESH DALMIA is a deep learning architect at NVIDIA, where he works on enhancing deep learning model training performance by working on optimizations throughout the software and hardware stack. He recently graduated with a doctorate in computer engineering from the University of Wisconsin, Madison where his research focused on data movement in heterogeneous systems. During his doctorate, he proposed optimizations to mitigate synchronization and communication overheads in GPUs at different scale from both a software and hardware perspective.

Read More

Location:


Contact:


Calendar:

ECE Calendar

Category:

Speaker/Lecture/Seminar

Tags:

UCF Department of Electrical and Computer Engineering computer architecture Preyesh Dalmia