FAI Summit

FAI Summit Agenda

13:00 - 13:10 WAT
07:00 - 07:10 EST

10 Minutes

Opening Remarks

Dave Ojika PhD

Founder, CEO
Flapmax

13:10 - 13:30 WAT
07:10 - 07:30 EST

20 Minutes

Keynote Address I

Scaling Digital Transformation in Emerging Markets: Role of Artificial Intelligence (Theme: Digital Transformation)‍

Professor Musa Aibinu (PhD in Mechatronics and AI)

Vice-ChancellorSummit University, Offa, Kwara State, Nigeria.

13:30 - 13:45 WAT
07:30 - 07:45 EST

15 Minutes

Presentation

Presentation: Automation

Amos Omokpo

Manager, Software Support AutomationSymbotic

13:45 - 13:50 WAT
07:45 - 07:50 EST

5 Minutes

Break

Break and Music Interlude

13:50 - 14:05 WAT
07:50 - 08:05 EST

15 Minutes

Presentation

Presentation: Sustainability

Benjamin Udokwu

Managing PartnerClimatr

14:05 - 14:45 WAT
08:05 - 08:45 EST

40 Minutes

Panel Session

Panel Session: Sustainability

MODERATOR

Divya Vellanki

Product MnagementFAI Institute

PANELISTS

Innocent Orikiiriza

CEOKaCyber

Hank Selke

MD, CEOSnarkHealth

Benjamin Udokwu

Managing PartnerClimatr

14:45 - 15:05 WAT
08:45 - 09:05 EST

20 Minutes

Emerging Tech

Blockchain

Nancy Min

FounderEcolong

15:05 - 15:10 WAT
09:05 - 09:10 EST

5 Minutes

Break

Break and Music Interlude

15:10 - 15:50 WAT
09:10 - 09:50 EST

40 Minutes

VC Roundtable

VC Roundtable

MODERATOR

Eric Edokpa

Business Development ManagerFlapmax

PANELISTS

Matthew Mardsen

FounderDealbase Africa

Luigi Meschini

Head of Accelerator Funding NetworkThe Accelerator Network

Maksymilian Kusmierek

Managing DirectorGalactiv EooD

Avik Ashar

PartnerSaison Capital

15:50 - 16:20 WAT
09:50 - 10:20 EST

30 Minutes

Panel Session

Panel Session: “Digital Transformation & Women in AI”

MODERATOR

Betty Wairegi

PANELISTS

Dapiriye Briggs

Onikepo Amodu

Itunu Gbadamosi

Sophia Ekeh

Kamali Mathiazhagan

Modupeola Savage

16;20 - 16:25 WAT
10:20 - 10:25 EST

5 Minutes

Break

Break and Music Interlude

16:25 - 16:45 WAT
10:25 - 10:45 EST

20 Minutes

Keynote Address II

Scaling Digital Transformation in Emerging Markets: Role of Artificial Intelligence (Theme: Cybersecurity + Edge AI)

Deepak Rana

Founder, CEOThinkScan

16:45 - 17:05 WAT
10:45 - 11:05 EST

20 Minutes

Emerging Tech

Virtual Reality

Benjamin Lok, PhD

Computer Science Professor and EntrepreneurUniversity of Florida

17:05 - 17:30 WAT
11:05 - 11:45 EST

25 Minutes

Panel Session

Panel Session: AI & Financial Services “Fintech Services in Africa: Use of AI to enable safe transactions”

MODERATOR

Eric Edokpa

Business Development ManagerFlapmax

PANELISTS

Isa AliyuShata

VP/CEOKongapay

Amitesh S.

CTOCapsa Technology

Mustapha Suberu

CEOCapsa Technology

17;20 - 17:35 WAT
11:30 - 11:35 EST

5 Minutes

Break

Break and Music Interlude

17:35 - 17:55 WAT
11:35 - 11:55 EST

20 Minutes

Emerging Tech

AR/VR

Nilanjan Goswami, PhD

Graphics ArchitectMeta

17:55 - 18:10 WAT
11:55 - 12:10 EST

15 Minutes

Presentation

Presentation: Healthcare

Faika Bashoglu, PhD

Pharmacist and Assistant ProfessorEuropean University of Lefke

18:10 - 18:40 WAT
12:10 - 12:40 EST

30 Minutes

Panel Session

Panel Session: Healthcare

MODERATOR

G Anthony Reina, PhD, MD

Senior Data ScientistResilience

PANELISTS

Faika Bashioglu, PhD

Pharmacist and Assistant ProfessorEuropean University of Lefke

Hank Selke

MD, Co-Founder and CEOSnarkHealth

Dr. Trish Scanlan

CEOWe are TLM

18:40 - 18:55 WAT
12:40 - 12:55 EST

15 Minutes

Presentation

Presentation: “AI in Healthcare”

Rashwan Dany

SnarkHealth Project's InternshipUniversity of Florida

18:55 - 19:00 WAT
12:55 - 13:00 EST

5 Minutes

Closing Remarks

Acknowledgement/ Closing Remarks

Abiola Jimoh

Senior Programs ManagerFlapmax

13:00 - 13:20 WAT
07:00 - 07:20 EST

20 Minutes

Keynote Address III

Scaling Digital Transformation in Emerging Markets: Role of Artificial Intelligence (Theme: Women in AI)

Tiffany Rose Live

Microsoft, Wentors' AmbassadorWentors

13:05 - 13:50 WAT
07:20 - 07:50 EST

30 Minutes

Panel Session

Panel Session: “Digital Transformation & Women in AI”

MODERATOR

Adepeju Shittu

PANELISTS

Divya Vellanki

Anuoluwapo Tayo-Alabi

Maureen Mbugua

Margaret Medina

Ufua Ameh

Wentors

13:50 - 13:55 WAT
07:5 0- 07:55 EST

5 Minutes

Break

Break and Music Interlude

13:55 - 14:15 WAT
07:55 - 08:15 EST

20 Minutes

Emerging Tech

Emerging Tech : Robotics

Professor Abiodun Musa Aibinu PhD (Mechatronics and AI)

Vice-ChancellorSummit University, Offa, Kwara State, Nigeria

14:15 - 14:55 WAT
08:15 - 08:55 EST

40 Minutes

Startup AI

Startup AI: Lightening Talks

MODERATOR

Dave Ojika

CEOFlapmax

SPEAKERS

Edwin Lubanga

CTOSnarkHealth

Amitesh S.

CTOCapsa Technology

Mustapha Suberu

Co-Founder and CEOCapsa Technology

Paulus Indongo

CTOK-12Plus

14:55 - 15:00 WAT
08:55 - 09:00 EST

5 Minutes

Break

Break and Music Interlude

15:00 - 15:20 WAT
08:15 - 08:55 EST

20 Minutes

Startup AI

Startup AI: Lightening Talks (Continuation)

MODERATOR

Dave Ojika

CEOFlapmax

SPEAKERS

Samuel Ogbujimma

CTOLegitCar

Amsatou Mbengue, PhD

CTOKaCyber

15:20 - 16:40 WAT
09:20 - 09:40 EST

20 Minutes

Keynote Address IV

Scaling Digital Transformation in Emerging Markets: Role of Artificial Intelligence (Theme: AI/IoT in Agriculture)

Janise McNair

Associate ProfessorUniversity of Florida

15:40 - 15:55 WAT
09:40 - 09:55 EST

15 Minutes

Presentation

Presentation “AI & Agriculture”

Sadiq Falalu, MFR

CEOFalgates

15:55 - 16:25 WAT
09:55 - 10:25 EST

30 Minutes

Panel Session

Panel Session: Cyberinfrastructure, Apps, and Security

MODERATOR

Janise McNair

Associate ProfessorUniversity of Florida

PANELISTS

Jumoke Oloyede

Senior Specialist Threat Intelligence and HuntingMTN Group

Cat S.

Principal Adversary Emulation EngineerMITRE

Sadiq Falalu

CEOFalgates

Rich Wurden

CEOAigen

16:25 - 16:30 WAT
10:25 - 10:30 EST

5 Minutes

Break

Break and Music Interlude

16:30 - 14:50 WAT
10:30 - 10:50 EST

20 Minutes

Emerging Tech

Emerging Tech: AI

Haitang Wang, PhD

Data ScientistAmazon

16:50 - 17:05 WAT
10:50 - 11:05 EST

15 Minutes

Presentation

Presentation: AI & Edutech

Emeka Nzeih

Head Professional Services and CertificationDigital Bridge Institute

17:05 - 17:35 WAT
11:05 - 11:35 EST

30 Minutes

Panel Session

Panel Session: AI & Education

MODERATOR

Eric Edokpa

Business Development ManagerFlapmax

PANELISTS

Ozaal Zesha

CEOClassNotes

Emeka Nzeih

Head Professional Services and CertificationDigital Bridge Institute

Paulus Indongo

Co-Founder and CTOK-12Plus

Lisa Woodson

Project Coordinator, Continuing and Professional DevelopmentSan Jacinto

17:35 - 17:40 WAT
11:35 - 11:40 EST

5 Minutes

Break

Break and Music Interlude

17:40 - 18:00 WAT
11:40 - 12:00 EST

20 Minutes

AI Builders Garage

AI Builders Garage: Winners Announcements

Benard Irungu

Senior Programs ManagerFlapmax

18:00 - 18:05 WAT
12:00 - 12:05 EST

5 Minutes

Closing Remarks

Acknowledgement and Closing Remarks

Sheila Perocho

HR and Admin ManagerFlapmax

10:30 - 10:35

5 Minutes

Welcome Address

Dave Ojika

Flapmax

Keynote: SHREC is an NSF Center for Space, High-Performance, and Resilient Computing. In this talk, we will give an overview of the SHREC Center, its composition and its mission; and present an overview of four active projects at the University of Florida site of SHREC, under the umbrella of Heterogeneous Computing for Data Science: Compute Cache Hierarchy: focusing on Compute-near-Memory technologies such as FPGAs and Compute-in-Memory technologies such as PIM (Process-in-Memory) and IPU (Intelligent Processing Unit) devices. Heterogeneous Pont Cloud Net (HgPCN): a heterogeneous architecture for embedded 3-D point-cloud inference which aims to satisfy the stringent real-time requirements of applications on the computing edge. Productive Computational Science (PCS) Platform: which provides a programming abstraction that is accelerator-system agnostic, focusing on scalability and productivity to meet the demands of rapidly changing AI workloads and heterogeneous architectures. End-to-end ML Pipeline: leveraging Intel’s AI Analytics Toolkit to develop an end-to-end ML pipeline using oneAPI.

Herman Lam

Associate Professor, University of Florida

Bio: Herman Lam is an Associate Professor of Electrical and Computer Engineering at the University of Florida. Currently, his main research interest is in heterogeneous computing (HGC) and reconfigurable computing (RC), focusing on methods and tools for the acceleration and deployment of scientifically impactful applications on scalable RC and HGC systems. He was a Co-PI of the 2012 Alexander Schwarzkopf Prize for Technology Innovation from the National Science Foundation for “Novo-G: An innovative and synergistic research project and the world’s most powerful reconfigurable supercomputer”. Dr. Lam has authored or co-authored over 175 refereed conference and journal articles and one textbook. He served as the Associate Director of CHREC, the NSF Center for High-Performance Reconfigurable Computing. Currently, Dr. Lam is the University of Florida Site Director of the NSF Center for Space, High-Performance, and Resilient Computing. Academically, Dr. Lam was the Director of the Computer Engineering undergraduate program in the College of Engineering at the University of Florida from 2012-2021.

‍Contact information: http://www.hlam.ece.ufl.edu/ hlam@ufl.edu

10:35 - 11:30

55 Minutes

Keynote Session

Heterogeneous Computing for AI @ SHREC

Herman Lam

University of Florida and NSF SHREC

Abstract: By providing highly efficient one-sided communication with globally shared memory space, Partitioned Global Address Space (PGAS) has become one of the most promising parallel computing models in high-performance computing (HPC). Meanwhile, FPGA is getting attention as an alternative compute platform for HPC systems with the benefit of custom computing and design flexibility. However, the exploration of PGAS has not been conducted on FPGAs, unlike the traditional message passing interface. This paper proposes FSHMEM, a software/hardware framework that enables the PGAS programming model on FPGAs. We implement the core functions of GASNet specification on FPGA for native PGAS integration in hardware, while its programming interface is designed to be highly compatible with legacy software. Our experiments show that FSHMEM achieves the peak bandwidth of 3813 MB/s, which is more than 95% of the theoretical maximum, outperforming the prior works by 9.5×. It records 0.35us and 0.59us latency for remote write and read operations, respectively. Finally, we conduct a case study on the two Intel D5005 FPGA nodes integrating Intel's deep learning accelerator. The two-node system programmed by FSHMEM achieves 1.94× and 1.98× speedup for matrix multiplication and convolution operation, respectively, showing its scalability potential for HPC infrastructure.

Yashael Faith Arthanto

Hardware Engineer, Rebellions AI

Bio: Yashael Faith Arthanto is from Indonesia. He received a BS degree from Bandung Institute of Technology Indonesia in 2019. He recently earned a MS degree in EEE at KAIST, South Korea in 2022. Now, he works for an AI chip startup called rebellions in South Korea.

‍Contact Information: yashael.faith@alumni.kaist.ac.kr

11:30 - 12:00

30 Minutes

Session 1

FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure

Yashel Faith Arthanto

Rebellions AI/KAIST

Abstract: Graph neural networks (GNNs) are becoming increasingly important in many applications such as social science, natural science, and autonomous driving. Driven by real-time inference requirement, GNN acceleration has became a key research topic. Given the largely diverse GNN model types, such as graph convolution network, graph attention network, graph isomorphic network, with arbitrary aggravation methods and edge attributes, designing a generic GNN accelerator is challenging. In this talk, we discuss our proposed generic and efficient GNN accelerator, called FlowGNN, which can easily accommodate a wide range of GNN types. Without losing generality, FlowGNN can outperform CPU and GPU by up to 400 times. In addition, we discuss an open-source automation flow, GNNBuilder, which allows users to design their own GNNs in PyTorch and then automatically generates the accelerator code targeting FPGA.

Callie Hao

Assistant Professor, Georgia Institute of Technology

Bio: Dr. Cong (Callie) Hao is an assistant professor in ECE at Georgia Tech. She was a postdoctoral fellow at Georgia Tech from 2020-2021 and at UIUC from 2018-2020. She received the Ph.D. degree in Electrical Engineering from Waseda University in 2017, and the M.S. and B.S. degrees in Computer Science and Engineering from Shanghai Jiao Tong University. Her primary research interests lie in the joint area of efficient hardware design and machine learning algorithms, including software/hardware co-design for reconfigurable and high-efficiency computing and agile electronic design automation tools.

‍Contact information: https://sharclab.ece.gatech.edu/ callie.hao@ece.gatech.edu

12:00 - 12:30

30 Minutes

Session 1

Generic and automated graph neural network acceleration

Callie Hao

Georgia Tech

12:30-13:30

60 Minutes

Lunch

Lunch Break

Abstract: Hardware accelerator can help data scientists and ML engineers run much faster their applications but deploying these hardware accelerators was quite challenging until now. In this talk we will show how ML developers can utilize the power of the hardware accelerators like FPGA with zero code changes. FPGAs are adaptable hardware platforms that can offer great performance, low-latency and reduced OpEx for applications like machine learning. We will show how users can enjoy the performance of hardware accelerators and at the same time enjoy the easy of deployment like an other computing platform.

Chris Kachris

CEO, InAccel

Bio: Christoforos Kachris is the co-founder and CEO of InAccel that helps companies speedup their AI/ML applications using hardware accelerators (FPGAs) in the cloud or on-prem. Christoforos holds a Ph.D. on Computer Engineering from Delft University of Technology and he has more than 20 years of experience on hardware acceleration. He is the editor of the “Hardware Accelerators in Data Centers” and co-author of more than 80 scientific peer-reviewed publications on FPGA-based hardware acceleration (with more than 2400 citations). He was the supervisor of 3 winners on the international Open Hardware contest for his contribution on ML acceleration in 2018 and 2020.

‍Contact Information: https://inaccel.com/ chris@inaccel.com

13:30 - 14:00

30 Minutes

Session 2

How to speedup your ML applications, instantly

Chris Kachris

InAccel

Abstract: Many large-scale physics experiments, such as ATLAS at the Large Hadron Collider, Deep Underground Neutrino Experiment and sPHENIX at the Realistic Heavy Ion Collider, rely on accurate simulations to inform data analysis and derive scientific results. However, inevitable discrepancy between simulation and experiments requires corrections using heuristics in a conventional analysis workflow. It also prevents data-driven models, learned on simulation data, from inferring experiment data directly. Our goal is to develop machine learning methods that can bridge the gap between simulations and experiments. Our initial effort demonstrated the feasibility of such approach using a Vision Transformer augmented U-Net under the CycleGAN framework. In this talk, I will present our model (UVCGAN) and its applications on two tiers of data from Liquid Argon Time Projection Chamber simulations. UVCGAN is also competitive against other advanced image translation models on open benchmark data sets.

Yihui (Ray) Ren

Associate Research Scientist, BNL

Bio: Yihui, a.k.a. "Ray", works in the general area of Artificial Intelligence (AI), its applications in science and its interaction with novel hardware. Ray's current research topics include unpaired image translation to bridge the gap between simulation and experiments, neural network optimization and deployment for real-time systems, novel hardware exploration and benchmarking, privacy-preserving AI, and bringing advanced AI methods to scientific domains.

Contact Information: https://www.bnl.gov/staff/yren yren@bnl.gov

14:00 - 14:30

30 Minutes

Session 2

Bridging Gaps between Simulation and Experiment

Yihui Ren

BNL

Abstract: Brief Information regarding the topic presented by the speaker.

.

Abelardo Jara Berrocal

AMD

Bio: Brief biography of the speaker.

.
‍
Contact Information:

14:30 - 15:00

30 Minutes

Session 2

Topic Title

Abelardo Jara-Berrocal

AMD

15:00 - 15:10

10 Minutes

Break

Coffee Break

Abstract: Large Language Models are shifting “what’s possible” in AI, but distributed training across thousands of traditional accelerators is massively complex and always suffers diminishing returns as more compute is added. Always? No longer. In this talk, I would go over the overview of Cerebras Wafer-Scale Cluster which involved the fundamental redesigning of chips, systems, compilers, workflow scaling, and beyond. I will present a cluster of 16 Cerebras CS-2 nodes that achieves near-perfect linear scaling across more cores than the world’s most powerful supercomputer with nearly 13 million AI cores.

Prashanth Thinakaran

MTS, Cerebras Systems

Bio: Prashanth holds a Ph.D in Computer Science and Engineering from Penn State, and his research focused on systems aspects of high performance and cloud computing. He has authored several conference papers and a book chapter in the area. He is currently working for Cerebras systems as AI Cluster Infrastructure Engineer. He develops the systems that enable large-scale AI model training on Cerebras's Wafer-scale Cluster. This system recently made news to have trained the largest AI model on a single device and won the ACM Gordon Bell special prize for HPC COVID Research. News: https://www.cerebras.net/company/news/

‍Contact information: prashanth.thina@gmail.com

15:10 - 15:40

30 Minutes

Session 3

Near perfect AI scaling on 13 Million cores: Cerebras Architectural Overview

Prashanth Thinakaran

Cerebras

Abstract: From edge to AI and HPC, computer architectures are becoming more heterogeneous and complex. The systems typically have fat nodes, with multicore CPUs and multiple hardware accelerators such as GPUs, FPGAs, and DSPs. This complexity is causing a crisis in programming systems and performance portability. Several programming systems are working to address these challenges, but the increasing architectural diversity is forcing software stacks and applications to be specialized for each architecture, resulting in poor portability and productivity. This talk argues that a more agile, proactive, and intelligent runtime system is essential to increase performance portability and improve user productivity. To this end, this talk introduces a new runtime system called IRIS. IRIS enables programmers to write portable and flexible programs across diverse heterogeneous architectures for different application domains from embedded/mobile computing to AI and HPC computing, by orchestrating multiple programming platforms in a single execution and programming environment.

Seyong Lee

Senior R&D Staff, Programming Systems Group @ ORNL

Bio: Seyong Lee is a Senior R&D Staff in Computer Science and Mathematics Division at Oak Ridge National Laboratory. His research interests include parallel programming and performance optimization in heterogeneous computing environments, program analysis, and optimizing compilers. He received his PhD in Electrical and Computer Engineering from Purdue University, USA. He is a member of the OpenACC Technical Committee and a former member of the Exascale Computing Project PathForward Working Group. He served as a program committee/guest editor/external reviewer for various conferences, journals, and proposals. His SC10 paper won the best student paper award, and his PPoPP09 paper was selected as the most cited paper among all papers published in PPoPP between 2009 and 2014. He received the IEEE Computer Society TCHPC Award for Excellence for Early Career Researchers in High Performance Computing at SC16 and served as an award committee member for 2017 IEEE CS TCHPC Award.

‍Contact information: lees2@ornl.gov http://ft.ornl.gov/~lees2/

15:40 - 16:10

30 Minutes

Session 3

IRIS: A Portable Programming Framework for Extremely Heterogeneous Computing

Seyong Lee

ORNL

16:10 - 16:45

35 Minutes

Socials

Virtual social event (rotating 5 min breakout rooms)

Abstract: Deep learning technology has made significant progress on various cognitive tasks, once believed impossible for computers to do well as humans, including image classification, object detection, speech recognition, and natural language processing. However, the vast adaptation of deep learning also highlights its shortcomings, such as limited generalizability and lack of interpretability. In addition, application-specific deep learning models require lots of manually annotated training samples with sophisticated learning schemes. Witnessing the performance saturation of early models such as MLP, CNN, and RNN, one notable recent innovation in deep learning architecture is the transformer model introduced in 2017. It has two good properties towards artificial general intelligence over conventional models. First, the performance of transformer models continues to grow with their model sizes and training data. Second, transformers can be pre-trained with tons of unlabeled data either through unsupervised or self-supervised learning and can be fine-tuned quickly for each application. In this talk, I will present a multi-FPGA acceleration appliance named DFX for accelerating hyperscale transformer-based AI models. Optimized for OpenAI’s GPT (Generative Pre-trained Transformer) models, it manages to execute an end-to-end inference with low latency and high throughput. DFX uses model parallelism and optimized dataflow that is model-and-hardware-aware for fast simultaneous workload execution among multiple devices. Its compute cores operate on custom instructions and support entire GPT operations including multi-head attentions, layer normalization, token embedding, and LM head. We implement the proposed hardware architecture on four Xilinx Alveo U280 FPGAs and utilize all of the channels of the high bandwidth memory (HBM) and the maximum number of compute resources for high hardware efficiency. Finally, DFX achieves 5.58× speedup and 3.99× energy efficiency over four NVIDIA V100 GPUs on the modern GPT-2 model. DFX is also 8.21× more cost-effective than the GPU appliance, suggesting that it can be a promising alternative in cloud datacenters.

Joo-Young Kim

Assistant Professor, School of EE, KAIST

Bio: Joo-Young Kim received the B.S., M.S., and Ph. D degree in Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST), in 2005, 2007, and 2010, respectively. He is currently an Assistant Professor in the School of Electrical Engineering at KAIST. He is also the Director of AI Semiconductor Systems (AISS) research center. His research interests span various aspects of hardware design including VLSI design, computer architecture, FPGA, domain specific accelerators, hardware/software co-design, and agile hardware development. Before joining KAIST, Joo-Young was a Senior Hardware Engineering Lead at Microsoft Azure working on hardware acceleration for its hyper-scale big data analytics platform named Azure Data Lake. Before that, he was one of the initial members of Catapult project at Microsoft Research, where he deployed a fabric of FPGAs in datacenters to accelerate critical cloud services such as machine learning, data storage, and networking. Joo-Young is a recipient of the 2016 IEEE Micro Top Picks Award, the 2014 IEEE Micro Top Picks Award, the 2010 DAC/ISSCC Student Design Contest Award, the 2008 DAC/ISSCC Student Design Contest Award, and the 2006 A-SSCC Student Design Contest Award. He serves as Associate Editor for the IEEE Transactions on Circuits and Systems I: Regular Papers (2020-2021).
Contact Information: https://castlab.kaist.ac.kr/our-team/joo-young-kim/ jooyoung1203@kaist.ac.kr

10:00 - 10:30

30 Minutes

Session 4

A Multi-FPGA Appliance for Accelerating Inference of Hyperscale Transformer Models

Joo-Young

KAIST

Abstract: This talk will go over different AutoML methods that share the common themes of efficiency and hardware-awareness. We will present (1) a fast predictor-based search algorithm, (2) zero-cost NAS proxies that significantly speed up the evaluation phase of NAS, (3) accurate hardware latency prediction in NAS, (4) automated hardware-DNN codesign, and (5) real application case studies by which NAS delivered significant improvements. Our projects are all parts of an effort to enable on-device deployment of DNN within constrained hardware devices--an area in which AutoML can play a big role.

Mohamed Abdelfattah

Assistant Professor, Cornell Tech - Cornell University

Bio: Mohamed Abdelfattah is an Assistant Professor at Cornell Tech and in the Electrical and Computer Engineering Department at Cornell University. His research group is designing the next generation of machine-learning-centric computer systems for both datacenters and mobile devices. He received his BSc from the German University in Cairo, his MSc from the University of Stuttgart, and his PhD from the University of Toronto. After his PhD, Mohamed spent five years at Intel and Samsung Research.

‍Contact Information: https://www.mohsaied.com/ mohamed@cornell.edu

10:30 - 11:00

30 Minutes

Session 4

Neural Architecture Search

Mohamed Abdelfattah

Cornell

Abstract: Hardware architectures are undergoing major shifts due to slowing of Moore's law. Innovations in packaging technology has given rise to chiplet based architectures which we are seeing in CPUs and GPUs. In addition, we will in near future, see integration of heterogeneous chiplet modules on the same compute package thereby allowing a mix of accelerated workloads on the same architecture without offloading to PCIe attached accelerators. In addition, thanks to coherent fabrics like CXL, we will see more tighter integration between coherent accelerators and processors. This might also allow dis-aggregation of memory and storage at rack level. CXL will be an enabling technology for CPU-memory and CPU-accelerator allowing for disaggregation into fine grained resource pools. It will allow cloud vendors to provide precisely sized compute block for user workloads. Silicon photonics and in package optics will be able to ensure that latency induced by this dis-aggregated architecture is within tolerable limits and does not come at price of performance of latency sensitive application for most cases. All these hardware stack innovations will have massive impact on future accelerators and their development. It will be important to look at these underlying trends in hardware design and what they offer so that software can leverage performance gains accordingly. Architects will need to leverage the hardware and get performance with low overheads. The aim of the talk is to look at CXL enabled switch to pool memory, accelerator and storage with compute elements to provide customers economical and performance-oriented cloud infrastructure on demand. In addition, we aim to build a co designed serverless based API to provision these units and provide a cost-efficient experience to users.

Gaurav Kaul

HPe

Bio: Gaurav Kaul is a senior systems architect in Hewlett Packard Enterprise (HPE) where he leads HPC and AI systems design for large customers in EMEA region including pre-exascale machines like Archer2 (University of Edinburgh), LUMI (Finland) and Shaheen (Saudi Arabia). His work involves working with customers and understanding the workloads, hardware-software co-design on upcoming generations of accelerators and CPUs from AMD, Intel and Nvidia and onboarding users by sharing best practices and knowledge transfer. In addition to his role in HPE, Gaurav is involved in various standards like OCP, CXL, UCIe and MLIR for HPC and AI systems design. Prior to working in HPE, he has worked in AWS, Intel and IBM in various systems related domains and processor design. He holds a Masters in Computer Science from University of Manchester and lives in London, UK with his family.

Contact information: gaurav.kaul@hpe.com

11:00 - 11:30

30 Minutes

Session 4

Impact of Chip and System Level Disaggregation – A Hardware-Software Codesign Approach

Gaurav Kaul

HPE

Abstract: For programming FPGA-based accelerators, high level synthesis (HLS) is the mainstream approach. Unfortunately, HLS leaves a significant programmability gap in terms of reconfigurability, customization and versatility: 1. FPGA physical design can take hours, 2. FPGA reconfiguration time limits HLS from targeting complex workloads, and 3. HLS tools do not reason about cross-workload flexibility. Overlay approaches mitigate the above by mapping programmable designs (e.g. CPU, GPU, etc.) on top of FPGAs. However, the abstraction gap between overlay and FPGA leads to low efficiency/utilization.Our work develops a new FPGA programming paradigm, where an overlay architecture is automatically specialized to a set of representative applications. The key innovation is a highly-customizable overlay design space based on spatial architectures, which encompass a range of designs from application-specific to general purpose. We leverage and extend prior work on accelerator compilers, SoC generation, and fast design space exploration (DSE) to create an end-to-end FPGA acceleration system called OverGen. OverGen can compete in performance with state-of-the-art HLS techniques, while requiring 10,000x less compile time and reconfiguration time.

Tony Nowatzki

Associate Professor, UCLA

Bio: Tony Nowatzki is an associate professor in the Computer Science Department at the University of California, Los Angeles, where he leads the PolyArch Research Group. He joined UCLA in 2017 after completing his PhD at the University of Wisconsin - Madison. He was also a consultant for Simple Machines Inc., an AI hardware startup that used several of his patents in fabricated chips. Academic recognition includes four IEEE Micro Top Picks awards, a CACM Research Highlights, best paper nominations at MICRO and HPCA, and a PLDI Distinguished Paper Award.

‍
Contact information: https://web.cs.ucla.edu/~tjn/ tjn@cs.ucla.edu

11:30 - 12:00

30 Minutes

Session 4

11:30-12:00 Overlay Generation: A New Paradigm for Productive FPGA Acceleration Tony Nowatzki, UCLA

Tony Nowatzki

UCLA

12:00-13:00

60 Minutes

Lunch

Lunch break

Abstract: In this project, we set out to find innovative ways to improve inference performance on CPU for better resource utilization. By utilizing sparse multiplication libraries and vendor provided optimized libraries, we can notably improve performance in ML inference

Jared Baumann

Flapmax

Bio: Jared Baumann is a dual major graduate from TSU who has been working with Flapmax for a little over a year. He specializes in the development of low-level solutions for improving performance.

Contact information: jared@flapmax.com

13:00 - 13:30

30 Minutes

Session 5

Optimizing Machine Learning Inference performance on CPU

Jared Baumann

Flapmax

Abstract: Abstract: Entrepreneurs, researchers and AI practitioners that are climate-conscious (and receive monthly electricity bills) and in the search for a modern IT infrastructure to build their solutions need look no further. IBM Power server hardware paired with a RedHat OpenShift container software stack that stays “on” (with 99.999% availability) and a hardware-based AI accelerator (on-chip Matrix Math Accelerator) maintains a lower energy-footprint and Total Cost of Ownership (TCO) compared to x86 servers. Learn about the IBM Power servers, how you can leverage them to drive your AI roadmaps and benefit from the expanding ecosystem of vendors that support IBM Power at this session.

Azer Khan

GTM Leader, Banking & Industry Modernization, IBM

13:30 - 14:00

30 Minutes

Session 5

How entrepreneurs can benefit from sustainable IBM Power servers (with built in AI-accelerators) to build OpenShift-based IT infrastructures

Azer Khan

IBM

Abstract: In the space of hardware acceleration alternatives, FPGAs lie in the middle of the programmability-efficiency spectrum, with GPUs being more programmable and ASICs being more efficient. FPGAs provide massive parallelism and are reconfigurable, which makes them very well suited for the fast-changing needs of DL applications. But how can we minimize the gap between ASICs and FPGAs in terms of performance and efficiency, while retaining their strength - the reconfigurability? This talk will dive into our research that attempts to answer this question by exploring better reconfigurable fabrics for Deep Learning. We will discuss how FPGAs are evolving into domain-specific reconfigurable fabrics. Specifically, we will look at new blocks called Tensor Slices and CoMeFa RAMs. These blocks are a significant step towards closing the performance gap between FPGAs and ASICs. In this talk, we will take a peek into the architecture of these blocks and talk about the performance improvement and energy reduction that can be obtained for DL application by using modern FPGAs containing these blocks.

Aman Arora

Ph.D. Fellow, UT Austin

Bio: Aman Arora is a PhD candidate and Graduate Fellow at The University of Texas at Austin. His research focuses on optimizing FPGA architecture to make them better Deep Learning accelerators. He has over 12 years of experience in the semiconductor industry in design, verification, testing and architecture roles. He is in the job market for a faculty job starting next Fall.

‍Contact information: https://amanarora.site aman.kbm@utexas.edu

14:00 - 14:30

30 Minutes

Session 5

In search for the right reconfigurable fabric for DL

Aman Arora

UT Austin

Abstract: HPC developers/users in scientific domains such as climate modeling, CAD for Manufacturing, CFD, molecular dynamics, histopathology, seismology, protein folding, high energy physics, astrophysics etc. are exploring ways to use AI models to augment and accelerate or develop AI solutions for HPC simulations. Their problems typically have extremely large, multi-dimensional input data that are unlike those used in popular DL domains of image recognition, recommendation engine, language and text translations etc. This leads to significantly large memory usage and I/O ingestion challenges in both the input data pipeline and end-to-end HPC-AI workload pipelines. The talk will feature Intel’s best practices for optimizing HPC/AI workloads including, code optimizations using Intel Optimized TensorFlow and Pytorch using Intel extensions for training and Inference. These optimizations results in up to 3-4X improvement over Intel’s 3rd Gen Scalable Processor using AVX-512 using Intel’s 4th Gen Processor code named Sapphire Rapids with HBM using AMX/TMUL instructions supporting mixed-precision FP32 and BFloat16 for Training and quantized Inference.

Nalini Kumar

Intel

Bio: Nalini Kumar works on HPC/AI workload optimization, analysis, and modeling at Intel in Santa Clara. Her primary research interests are in applying parallel, high-performance, and reconfigurable computing to traditional HPC as well as AI workloads. She is also interested in performance modeling and prediction of full applications and workflows on large-scale systems. She received her PhD and MS from in Electrical and Computer Engineering from University of Florida.

‍Contact information: nalini.kumar@intel.com

14:30 - 15:00

30 Minutes

Session 5

HPC AI Workload Best Practices on Next Gen Intel® Xeon® Max processors (codenamed Sapphire Rapids)

Nalini Kumar

Intel

15:00 - 15:10

10 Minutes

Break

Coffee Break

Seyong Lee

Senior R&D Staff, Programming Systems Group @ ORNL

15:40 - 16:10

30 Minutes

Session 3

IRIS: A Portable Programming Framework for Extremely Heterogeneous Computing

Seyong Lee

ORNL

15:10 - 15:55

45 Minutes

Panel Session

Panel Session: Pathway to Scale-Out and Scale-Up AI

MODERATOR

Yi-Chung Chen

MediaTek

PANELISTS

Chris Kachris

InAccel

Azer Khan

IBM

Seyong Lee

ORNL

Dave Ojika

Flapmax

FAI Summit 2022

Why Attend FAI Summit

Watch FAI Summit Trailer

Watch FAI Summit Montage

Meet Some of our Speakers

Deepak Rana

Janise McNair

Abiodun Musa Aibinu

Tiffany Rios Live

Dave Ojika

Seyong Lee

Amos Omokpo

Prashanth Thinakaran

Faika Bashoglu

Benjamin Udokwu

Clara M. Mosquera Lopez

Joo-Young Kim

Shinjae Yoo

Aman Arora

Luigi Meschini

Startup AI

Keynote Speaker

Sustainability

VC Roundtable

Presentations

Emerging Tech

Panel Session

What is FAI Summit?

Why Attend FAI Summit?

Segments

AI Builders Garage

Emerging Tech

Women In Tech

Startup AI

VC Roundtable

Sustainability

Panel Session

Research

FAI Certification

Community

Youth Entrepreneurship

Developers Hackathon

FAI Summit Agenda

Welcome Address

Lunch Break

Coffee Break

Virtual social event (rotating 5 min breakout rooms)

Lunch break

Coffee Break

Startups

KaCyber

Legit Car

SnarkHealth

Capsa

Sectors

Partners