Related papers: Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling

Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling

URL: http://arxiv.org/abs/2009.04861v4
Date: Wed, 9 Jun 2021 15:17:34 GMT
Title: Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling
Authors: K. Darshana Abeyrathna, Bimal Bhattarai, Morten Goodwin, Saeed Gorji, Ole-Christoffer Granmo, Lei Jiao, Rupsa Saha, Rohan K. Yadav
Abstract summary: Tsetlin Machines (TMs) have recently obtained competitive performance in terms of accuracy, memory footprint, energy, and learning speed. Each TM clause votes for or against a particular class, with classification resolved using a majority vote. We propose a novel scheme for desynchronizing the evaluation of clauses, eliminating the voting bottleneck.
Score: 11.57427340680871
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Using logical clauses to represent patterns, Tsetlin Machines (TMs) have recently obtained competitive performance in terms of accuracy, memory footprint, energy, and learning speed on several benchmarks. Each TM clause votes for or against a particular class, with classification resolved using a majority vote. While the evaluation of clauses is fast, being based on binary operators, the voting makes it necessary to synchronize the clause evaluation, impeding parallelization. In this paper, we propose a novel scheme for desynchronizing the evaluation of clauses, eliminating the voting bottleneck. In brief, every clause runs in its own thread for massive native parallelism. For each training example, we keep track of the class votes obtained from the clauses in local voting tallies. The local voting tallies allow us to detach the processing of each clause from the rest of the clauses, supporting decentralized learning. This means that the TM most of the time will operate on outdated voting tallies. We evaluated the proposed parallelization across diverse learning tasks and it turns out that our decentralized TM learning algorithm copes well with working on outdated data, resulting in no significant loss in learning accuracy. Furthermore, we show that the proposed approach provides up to 50 times faster learning. Finally, learning time is almost constant for reasonable clause amounts (employing from 20 to 7,000 clauses on a Tesla V100 GPU). For sufficiently large clause numbers, computation time increases approximately proportionally. Our parallel and asynchronous architecture thus allows processing of massive datasets and operating with more clauses for higher accuracy.

Related papers

Learning Adaptive Parallel Reasoning with Language Models [70.1745752819628]
We propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations. A key innovation is our end-to-end reinforcement learning strategy, optimizing both parent and child inference threads to enhance task success rate without requiring predefined reasoning structures.
arXiv Detail & Related papers (2025-04-21T22:29:02Z)
Data Classification With Multiprocessing [6.513930657238705]
Python multiprocessing is used to test this hypothesis with different classification algorithms. We conclude that ensembling improves accuracy and multiprocessing reduces execution time for selected algorithms.
arXiv Detail & Related papers (2023-12-23T03:42:13Z)
Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts [50.00305136008848]
We propose a framework for parallel corpus mining, which provides a quick and effective way to mine a parallel corpus from publicly available lectures on Coursera. For both English--Japanese and English--Chinese lecture translations, we extracted parallel corpora of approximately 50,000 lines and created development and test sets. This study also suggests guidelines for gathering and cleaning corpora, mining parallel sentences, cleaning noise in the mined data, and creating high-quality evaluation splits.
arXiv Detail & Related papers (2023-11-07T03:50:25Z)
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation [89.88214896713846]
STA score considers two critical factors: temporal redundancy and semantic importance. We apply the STA module to off-the-shelf video Transformers and Videowins. Results: Kinetics-400 and Something-Something V2 achieve 30% overshelf reduction with a negligible 0.2% accuracy drop.
arXiv Detail & Related papers (2023-08-08T19:38:15Z)
Parallel Algorithms Align with Neural Execution [7.535219325248997]
Parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed. This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework.
arXiv Detail & Related papers (2023-07-08T21:28:20Z)
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks [108.4568236569645]
Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks. We propose Program of Thoughts' (PoT), which uses language models to express the reasoning process as a program. PoT can show an average performance gain over CoT by around 12% across all the evaluated datasets.
arXiv Detail & Related papers (2022-11-22T21:06:00Z)
Lifelong Bandit Optimization: No Prior and No Regret [70.94238868711952]
We develop LIBO, an algorithm which adapts to the environment by learning from past experience. We assume a kernelized structure where the kernel is unknown but shared across all tasks. Our algorithm can be paired with any kernelized or linear bandit algorithm and guarantees optimal performance.
arXiv Detail & Related papers (2022-10-27T14:48:49Z)
Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras. Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation. We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z)
Coalesced Multi-Output Tsetlin Machines with Clause Sharing [7.754230120409288]
Using finite-state machines to learn patterns, Tsetlin machines (TMs) have obtained competitive accuracy and learning speed across several benchmarks. We introduce clause sharing, merging multiple TMs into a single one. Our empirical results on MNIST, Fashion-MNIST, and Kuzushiji-MNIST show that CoTM obtains significantly higher accuracy than TM on $50$- to $1$K-clause configurations.
arXiv Detail & Related papers (2021-08-17T12:52:01Z)
GPU-Accelerated Optimizer-Aware Evaluation of Submodular Exemplar Clustering [5.897728689802829]
optimization of submodular functions constitutes a viable way to perform clustering. Strong approximation guarantees and feasible optimization w.r.t. streaming data make this clustering approach favorable. Exemplar-based clustering is one of the possible submodular functions, but suffers from high computational complexity. Half-precision GPU computation led to large speedups of up to 452x compared to single-precision, single-thread CPU computations.
arXiv Detail & Related papers (2021-01-21T18:23:44Z)
Increasing the Inference and Learning Speed of Tsetlin Machines with Clause Indexing [9.440900386313215]
The Tsetlin Machine (TM) is a machine learning algorithm founded on the classical Tsetlin Automaton (TA) and game theory. We report up to 15 times faster classification and three times faster learning on MNIST and Fashion-MNIST image classification, and IMDb sentiment analysis.
arXiv Detail & Related papers (2020-04-07T08:16:07Z)
Non-Autoregressive Machine Translation with Disentangled Context Transformer [70.95181466892795]
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. Our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.
arXiv Detail & Related papers (2020-01-15T05:32:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.