A conventional machine learning (ML) framework works on a centralized model where large amounts of data from various sources are gathered and collectively stored in the high capacity cloud for analysis. Here, training a machine learning model generally involves computationally expensive iterative algorithms running on high-performance computing platforms supported on the cloud. The proliferation of smart connected devices as edges of the Internet of Things, however, present unique opportunities for real-time edge analytics while ensuring low-latency and data privacy. As on-device intelligence is paramount, it has become necessary to provide a computing framework for training and updating ML models on streaming data and provide real-time analytics with limited hardware resources and a tight power budget on these edge devices. To build such a framework, we are designing fast, distributed, memory- efficient and communication negligible training algorithms for ML models by integrating ideas from fields of machine learning, parallel programming, distributed computing, and hardware design. The overarching goal is to accelerate training on the edge for which we propose a decentralized training framework of ML models with high scalability on a distributed network of computing devices and build corresponding ML hardware accelerators for these devices.
Research Area: Software Authentication for IoT
Internet-enabled embedded devices are deployed daily in the form of the Internet of Things (IoT). Many of these devices have little to no regard for security. This has the potential for disastrous consequences. Our research aims to provide lightweight security mechanisms through hardware provided introspection. Specific goals are to minimize both hardware cost and overhead impact on the software. The end goal is to help, while being as transparent as possible, provide a higher level of security for embedded system software.
Research Area: Autonomous Algorithms
This research focuses on heuristic-based pathfinding algorithm optimization. As the pathfinding problem gets significantly complex, designing heuristic functions for the pathfinding algorithm becomes more difficult. Our approach is to design and optimize a highly accurate heuristic function via evolution for pathfinding algorithm to search for an optimal path with the minimal amount of resources.
Syed Ali Hasnain
Research Area: Photonics Hardware Accelerator for Reservoir Computing
The research focuses on designing Photonic Hardware Accelerators for Reservoir Computing (RC), which is a subset of Recurrent Neural Networks (RNN), which doesn’t require training of hidden layers. Training an RNN is a time consuming and computationally intensive task, whereas its implementation in digital hardware is even more challenging due to all the non-linear nodes involved. Our approach focuses on the implementation of RC using delayed feedback models in an optoelectronic architecture.
Research Area: ML Architecture
Training deep learning networks involves continuous weight updates across the various layers of the deep network while using a backpropagation algorithm (BP). This results in expensive memory/computation overheads during training. Consequently, most deep learning accelerators today employ pre-trained weights and focus only on improving the design of the inference phase. The recent trend is to build a complete deep learning accelerator by incorporating the training module. Such efforts require an ultra-fast chip architecture for executing the BP algorithm. Resistive memory (ReRAM) has created a buzz to this end, due to its low power nature and analog in-memory computing capability. There have been several recent attempts to design ReRAM based deep learning accelerator. However, ReRAM cells are susceptible to thermal and process variations which leads to computing faults. In this project, we aim to design a reliability aware ReRAM crossbar for efficient deep learning accelerator architecture.