D-Matrix Corsair: Transformative Generative Inference from Unsustainable to Attainable

Thursday, August 22, 2024 noon to 1 p.m.

In this rapidly changing world of large language models (LLMs), a common theme that prevails is the affordability of running inference on them. It’s been less than two years since OpenAI first released ChatGPT and since then, enterprises and academia have gone full throttle in the research of improving the overall efficiency and affordability of the deployment of LLMs for data-center inference. While these techniques have provided significant improvements on existing hardware solutions, there is a need for a new and radical approach to handling the different challenges presented by LLMs including their low-reuse, high memory-bandwidth requirements.

In that regard, d-Matrix is approaching these challenges by designing a first-of-its-kind datacenter-scale chiplet-based in-memory computing platform and a corresponding software stack that makes serving LLMs affordable and efficient. In this talk, d-Matrix software engineer Gaurav Jain will go over the dataflow computing paradigm that enables them to address the memory bandwidth boundedness of LLM-inference, how their SRAM-based in-memory compute is different from any of the previous solutions, and how they are leveraging the PyTorch and MLIR stack to achieve 3x to 20x improvement in inference latency for state-of-the-art LLMs.

Read More

Location:


Contact:


Calendar:

College of Engineering and Computer Science

Category:

Speaker/Lecture/Seminar

Tags:

UCF Department of Electrical and Computer Engineering computer architecture Gaurav Jain dMatrix