Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ningyuan Cao

Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

May 07, 2024
Ruiyang Qin, Zheyu Yan, Dewen Zeng, Zhenge Jia, Dancheng Liu, Jianbo Liu, Zhi Zheng, Ningyuan Cao, Kai Ni, Jinjun Xiong, Yiyu Shi

Large Language Models (LLMs) deployed on edge devices learn through fine-tuning and updating a certain portion of their parameters. Although such learning methods can be optimized to reduce resource utilization, the overall required resources remain a heavy burden on edge devices. Instead, Retrieval-Augmented Generation (RAG), a resource-efficient LLM learning method, can improve the quality of the LLM-generated content without updating model parameters. However, the RAG-based LLM may involve repetitive searches on the profile data in every user-LLM interaction. This search can lead to significant latency along with the accumulation of user data. Conventional efforts to decrease latency result in restricting the size of saved user data, thus reducing the scalability of RAG as user data continuously grows. It remains an open question: how to free RAG from the constraints of latency and scalability on edge devices? In this paper, we propose a novel framework to accelerate RAG via Computing-in-Memory (CiM) architectures. It accelerates matrix multiplications by performing in-situ computation inside the memory while avoiding the expensive data transfer between the computing unit and memory. Our framework, Robust CiM-backed RAG (RoCR), utilizing a novel contrastive learning-based training method and noise-aware training, can enable RAG to efficiently search profile data with CiM. To the best of our knowledge, this is the first work utilizing CiM to accelerate RAG.

Via

Access Paper or Ask Questions

Hardware-aware Pruning of DNNs using LFSR-Generated Pseudo-Random Indices

Nov 09, 2019
Foroozan Karimzadeh, Ningyuan Cao, Brian Crafton, Justin Romberg, Arijit Raychowdhury

Figure 1 for Hardware-aware Pruning of DNNs using LFSR-Generated Pseudo-Random Indices

Figure 2 for Hardware-aware Pruning of DNNs using LFSR-Generated Pseudo-Random Indices

Figure 3 for Hardware-aware Pruning of DNNs using LFSR-Generated Pseudo-Random Indices

Figure 4 for Hardware-aware Pruning of DNNs using LFSR-Generated Pseudo-Random Indices

Deep neural networks (DNNs) have been emerged as the state-of-the-art algorithms in broad range of applications. To reduce the memory foot-print of DNNs, in particular for embedded applications, sparsification techniques have been proposed. Unfortunately, these techniques come with a large hardware overhead. In this paper, we present a hardware-aware pruning method where the locations of non-zero weights are derived in real-time from a Linear Feedback Shift Registers (LFSRs). Using the proposed method, we demonstrate a total saving of energy and area up to 63.96% and 64.23% for VGG-16 network on down-sampled ImageNet, respectively for iso-compression-rate and iso-accuracy.

Via

Access Paper or Ask Questions

A Light-powered, Always-On, Smart Camera with Compressed Domain Gesture Detection

Aug 16, 2016
Anvesha A, Shaojie Xu, Ningyuan Cao, Justin Romberg, Arijit Raychowdhury

Figure 1 for A Light-powered, Always-On, Smart Camera with Compressed Domain Gesture Detection

Figure 2 for A Light-powered, Always-On, Smart Camera with Compressed Domain Gesture Detection

Figure 3 for A Light-powered, Always-On, Smart Camera with Compressed Domain Gesture Detection

Figure 4 for A Light-powered, Always-On, Smart Camera with Compressed Domain Gesture Detection

In this paper we propose an energy-efficient camera-based gesture recognition system powered by light energy for "always on" applications. Low energy consumption is achieved by directly extracting gesture features from the compressed measurements, which are the block averages and the linear combinations of the image sensor's pixel values. The gestures are recognized using a nearest-neighbour (NN) classifier followed by Dynamic Time Warping (DTW). The system has been implemented on an Analog Devices Black Fin ULP vision processor and powered by PV cells whose output is regulated by TI's DC-DC buck converter with Maximum Power Point Tracking (MPPT). Measured data reveals that with only 400 compressed measurements (768x compression ratio) per frame, the system is able to recognize key wake-up gestures with greater than 80% accuracy and only 95mJ of energy per frame. Owing to its fully self-powered operation, the proposed system can find wide applications in "always-on" vision systems such as in surveillance, robotics and consumer electronics with touch-less operation.

Via

Access Paper or Ask Questions