X-TIME: An in-memory engine for accelerating machine learning on tabular
data with CAMs
- URL: http://arxiv.org/abs/2304.01285v3
- Date: Fri, 2 Feb 2024 21:14:18 GMT
- Title: X-TIME: An in-memory engine for accelerating machine learning on tabular
data with CAMs
- Authors: Giacomo Pedretti, John Moon, Pedro Bruel, Sergey Serebryakov, Ron M.
Roth, Luca Buonanno, Archit Gajjar, Tobias Ziegler, Cong Xu, Martin Foltin,
Paolo Faraboschi, Jim Ignowski, Catherine E. Graves
- Abstract summary: Modern tree-based Machine Learning models shine in extracting relevant information from structured data.
In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM.
Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU.
- Score: 19.086291506702413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured, or tabular, data is the most common format in data science. While
deep learning models have proven formidable in learning from unstructured data
such as images or speech, they are less accurate than simpler approaches when
learning from tabular data. In contrast, modern tree-based Machine Learning
(ML) models shine in extracting relevant information from structured data. An
essential requirement in data science is to reduce model inference latency in
cases where, for example, models are used in a closed loop with simulation to
accelerate scientific discovery. However, the hardware acceleration community
has mostly focused on deep neural networks and largely ignored other forms of
machine learning. Previous work has described the use of an analog content
addressable memory (CAM) component for efficiently mapping random forests. In
this work, we focus on an overall analog-digital architecture implementing a
novel increased precision analog CAM and a programmable network on chip
allowing the inference of state-of-the-art tree-based ML models, such as
XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology
show 119x lower latency at 9740x higher throughput compared with a
state-of-the-art GPU, with a 19W peak power consumption.
Related papers
- Scaling Up Diffusion and Flow-based XGBoost Models [5.944645679491607]
We investigate a recent proposal to use XGBoost as the function approximator in diffusion and flow-matching models.
With better implementation it can be scaled to datasets 370x larger than previously used.
We present results on large-scale scientific datasets as part of the Fast Calorimeter Simulation Challenge.
arXiv Detail & Related papers (2024-08-28T18:00:00Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Knowledge Transfer For On-Device Speech Emotion Recognition with Neural
Structured Learning [19.220263739291685]
Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI)
We propose a neural structured learning (NSL) framework through building synthesized graphs.
Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance.
arXiv Detail & Related papers (2022-10-26T18:38:42Z) - Benchmarking GPU and TPU Performance with Graph Neural Networks [0.0]
This work analyzes and compares the GPU and TPU performance training a Graph Neural Network (GNN) developed to solve a real-life pattern recognition problem.
Characterizing the new class of models acting on sparse data may prove helpful in optimizing the design of deep learning libraries and future AI accelerators.
arXiv Detail & Related papers (2022-10-21T21:03:40Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Transfer Learning with Deep Tabular Models [66.67017691983182]
We show that upstream data gives tabular neural networks a decisive advantage over GBDT models.
We propose a realistic medical diagnosis benchmark for tabular transfer learning.
We propose a pseudo-feature method for cases where the upstream and downstream feature sets differ.
arXiv Detail & Related papers (2022-06-30T14:24:32Z) - Transformer-Based Behavioral Representation Learning Enables Transfer
Learning for Mobile Sensing in Small Datasets [4.276883061502341]
We provide a neural architecture framework for mobile sensing data that can learn generalizable feature representations from time series.
This architecture combines benefits from CNN and Trans-former architectures to enable better prediction performance.
arXiv Detail & Related papers (2021-07-09T22:26:50Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.