X-TIME: An in-memory engine for accelerating machine learning on tabular
data with CAMs
- URL: http://arxiv.org/abs/2304.01285v3
- Date: Fri, 2 Feb 2024 21:14:18 GMT
- Title: X-TIME: An in-memory engine for accelerating machine learning on tabular
data with CAMs
- Authors: Giacomo Pedretti, John Moon, Pedro Bruel, Sergey Serebryakov, Ron M.
Roth, Luca Buonanno, Archit Gajjar, Tobias Ziegler, Cong Xu, Martin Foltin,
Paolo Faraboschi, Jim Ignowski, Catherine E. Graves
- Abstract summary: Modern tree-based Machine Learning models shine in extracting relevant information from structured data.
In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM.
Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU.
- Score: 19.086291506702413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured, or tabular, data is the most common format in data science. While
deep learning models have proven formidable in learning from unstructured data
such as images or speech, they are less accurate than simpler approaches when
learning from tabular data. In contrast, modern tree-based Machine Learning
(ML) models shine in extracting relevant information from structured data. An
essential requirement in data science is to reduce model inference latency in
cases where, for example, models are used in a closed loop with simulation to
accelerate scientific discovery. However, the hardware acceleration community
has mostly focused on deep neural networks and largely ignored other forms of
machine learning. Previous work has described the use of an analog content
addressable memory (CAM) component for efficiently mapping random forests. In
this work, we focus on an overall analog-digital architecture implementing a
novel increased precision analog CAM and a programmable network on chip
allowing the inference of state-of-the-art tree-based ML models, such as
XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology
show 119x lower latency at 9740x higher throughput compared with a
state-of-the-art GPU, with a 19W peak power consumption.
Related papers
- Scaling Up Diffusion and Flow-based XGBoost Models [5.944645679491607]
We investigate a recent proposal to use XGBoost as the function approximator in diffusion and flow-matching models.
With better implementation it can be scaled to datasets 370x larger than previously used.
We present results on large-scale scientific datasets as part of the Fast Calorimeter Simulation Challenge.
arXiv Detail & Related papers (2024-08-28T18:00:00Z) - Bridging the Sim-to-Real Gap with Bayesian Inference [53.61496586090384]
We present SIM-FSVGD for learning robot dynamics from data.
We use low-fidelity physical priors to regularize the training of neural network models.
We demonstrate the effectiveness of SIM-FSVGD in bridging the sim-to-real gap on a high-performance RC racecar system.
arXiv Detail & Related papers (2024-03-25T11:29:32Z) - Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
Transfer Learning [59.26623999209235]
We present DiST, which disentangles the learning of spatial and temporal aspects of videos.
The disentangled learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters.
Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps.
arXiv Detail & Related papers (2023-09-14T17:58:33Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Knowledge Transfer For On-Device Speech Emotion Recognition with Neural
Structured Learning [19.220263739291685]
Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI)
We propose a neural structured learning (NSL) framework through building synthesized graphs.
Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance.
arXiv Detail & Related papers (2022-10-26T18:38:42Z) - Benchmarking Learning Efficiency in Deep Reservoir Computing [23.753943709362794]
We introduce a benchmark of increasingly difficult tasks together with a data efficiency metric to measure how quickly machine learning models learn from training data.
We compare the learning speed of some established sequential supervised models, such as RNNs, LSTMs, or Transformers, with relatively less known alternative models based on reservoir computing.
arXiv Detail & Related papers (2022-09-29T08:16:52Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Transformer-Based Behavioral Representation Learning Enables Transfer
Learning for Mobile Sensing in Small Datasets [4.276883061502341]
We provide a neural architecture framework for mobile sensing data that can learn generalizable feature representations from time series.
This architecture combines benefits from CNN and Trans-former architectures to enable better prediction performance.
arXiv Detail & Related papers (2021-07-09T22:26:50Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Computation on Sparse Neural Networks: an Inspiration for Future
Hardware [20.131626638342706]
We describe the current status of the research on the computation of sparse neural networks.
We discuss the model accuracy influenced by the number of weight parameters and the structure of the model.
We show that for practically complicated problems, it is more beneficial to search large and sparse models in the weight dominated region.
arXiv Detail & Related papers (2020-04-24T19:13:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.