Massively Parallel and Asynchronous Tsetlin Machine Architecture
Supporting Almost Constant-Time Scaling
- URL: http://arxiv.org/abs/2009.04861v4
- Date: Wed, 9 Jun 2021 15:17:34 GMT
- Title: Massively Parallel and Asynchronous Tsetlin Machine Architecture
Supporting Almost Constant-Time Scaling
- Authors: K. Darshana Abeyrathna, Bimal Bhattarai, Morten Goodwin, Saeed Gorji,
Ole-Christoffer Granmo, Lei Jiao, Rupsa Saha, Rohan K. Yadav
- Abstract summary: Tsetlin Machines (TMs) have recently obtained competitive performance in terms of accuracy, memory footprint, energy, and learning speed.
Each TM clause votes for or against a particular class, with classification resolved using a majority vote.
We propose a novel scheme for desynchronizing the evaluation of clauses, eliminating the voting bottleneck.
- Score: 11.57427340680871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using logical clauses to represent patterns, Tsetlin Machines (TMs) have
recently obtained competitive performance in terms of accuracy, memory
footprint, energy, and learning speed on several benchmarks. Each TM clause
votes for or against a particular class, with classification resolved using a
majority vote. While the evaluation of clauses is fast, being based on binary
operators, the voting makes it necessary to synchronize the clause evaluation,
impeding parallelization. In this paper, we propose a novel scheme for
desynchronizing the evaluation of clauses, eliminating the voting bottleneck.
In brief, every clause runs in its own thread for massive native parallelism.
For each training example, we keep track of the class votes obtained from the
clauses in local voting tallies. The local voting tallies allow us to detach
the processing of each clause from the rest of the clauses, supporting
decentralized learning. This means that the TM most of the time will operate on
outdated voting tallies. We evaluated the proposed parallelization across
diverse learning tasks and it turns out that our decentralized TM learning
algorithm copes well with working on outdated data, resulting in no significant
loss in learning accuracy. Furthermore, we show that the proposed approach
provides up to 50 times faster learning. Finally, learning time is almost
constant for reasonable clause amounts (employing from 20 to 7,000 clauses on a
Tesla V100 GPU). For sufficiently large clause numbers, computation time
increases approximately proportionally. Our parallel and asynchronous
architecture thus allows processing of massive datasets and operating with more
clauses for higher accuracy.
Related papers
- Data Classification With Multiprocessing [6.513930657238705]
Python multiprocessing is used to test this hypothesis with different classification algorithms.
We conclude that ensembling improves accuracy and multiprocessing reduces execution time for selected algorithms.
arXiv Detail & Related papers (2023-12-23T03:42:13Z) - Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine
Translation of Lecture Transcripts [50.00305136008848]
We propose a framework for parallel corpus mining, which provides a quick and effective way to mine a parallel corpus from publicly available lectures on Coursera.
For both English--Japanese and English--Chinese lecture translations, we extracted parallel corpora of approximately 50,000 lines and created development and test sets.
This study also suggests guidelines for gathering and cleaning corpora, mining parallel sentences, cleaning noise in the mined data, and creating high-quality evaluation splits.
arXiv Detail & Related papers (2023-11-07T03:50:25Z) - Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation [89.88214896713846]
STA score considers two critical factors: temporal redundancy and semantic importance.
We apply the STA module to off-the-shelf video Transformers and Videowins.
Results: Kinetics-400 and Something-Something V2 achieve 30% overshelf reduction with a negligible 0.2% accuracy drop.
arXiv Detail & Related papers (2023-08-08T19:38:15Z) - Parallel Algorithms Align with Neural Execution [7.535219325248997]
Parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed.
This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework.
arXiv Detail & Related papers (2023-07-08T21:28:20Z) - Program of Thoughts Prompting: Disentangling Computation from Reasoning
for Numerical Reasoning Tasks [108.4568236569645]
Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks.
We propose Program of Thoughts' (PoT), which uses language models to express the reasoning process as a program.
PoT can show an average performance gain over CoT by around 12% across all the evaluated datasets.
arXiv Detail & Related papers (2022-11-22T21:06:00Z) - Lifelong Bandit Optimization: No Prior and No Regret [70.94238868711952]
We develop LIBO, an algorithm which adapts to the environment by learning from past experience.
We assume a kernelized structure where the kernel is unknown but shared across all tasks.
Our algorithm can be paired with any kernelized or linear bandit algorithm and guarantees optimal performance.
arXiv Detail & Related papers (2022-10-27T14:48:49Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - Coalesced Multi-Output Tsetlin Machines with Clause Sharing [7.754230120409288]
Using finite-state machines to learn patterns, Tsetlin machines (TMs) have obtained competitive accuracy and learning speed across several benchmarks.
We introduce clause sharing, merging multiple TMs into a single one.
Our empirical results on MNIST, Fashion-MNIST, and Kuzushiji-MNIST show that CoTM obtains significantly higher accuracy than TM on $50$- to $1$K-clause configurations.
arXiv Detail & Related papers (2021-08-17T12:52:01Z) - GPU-Accelerated Optimizer-Aware Evaluation of Submodular Exemplar
Clustering [5.897728689802829]
optimization of submodular functions constitutes a viable way to perform clustering.
Strong approximation guarantees and feasible optimization w.r.t. streaming data make this clustering approach favorable.
Exemplar-based clustering is one of the possible submodular functions, but suffers from high computational complexity.
Half-precision GPU computation led to large speedups of up to 452x compared to single-precision, single-thread CPU computations.
arXiv Detail & Related papers (2021-01-21T18:23:44Z) - Increasing the Inference and Learning Speed of Tsetlin Machines with
Clause Indexing [9.440900386313215]
The Tsetlin Machine (TM) is a machine learning algorithm founded on the classical Tsetlin Automaton (TA) and game theory.
We report up to 15 times faster classification and three times faster learning on MNIST and Fashion-MNIST image classification, and IMDb sentiment analysis.
arXiv Detail & Related papers (2020-04-07T08:16:07Z) - Non-Autoregressive Machine Translation with Disentangled Context
Transformer [70.95181466892795]
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens.
We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts.
Our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.
arXiv Detail & Related papers (2020-01-15T05:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.