Coalesced Multi-Output Tsetlin Machines with Clause Sharing
- URL: http://arxiv.org/abs/2108.07594v1
- Date: Tue, 17 Aug 2021 12:52:01 GMT
- Title: Coalesced Multi-Output Tsetlin Machines with Clause Sharing
- Authors: Sondre Glimsdal and Ole-Christoffer Granmo
- Abstract summary: Using finite-state machines to learn patterns, Tsetlin machines (TMs) have obtained competitive accuracy and learning speed across several benchmarks.
We introduce clause sharing, merging multiple TMs into a single one.
Our empirical results on MNIST, Fashion-MNIST, and Kuzushiji-MNIST show that CoTM obtains significantly higher accuracy than TM on $50$- to $1$K-clause configurations.
- Score: 7.754230120409288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using finite-state machines to learn patterns, Tsetlin machines (TMs) have
obtained competitive accuracy and learning speed across several benchmarks,
with frugal memory- and energy footprint. A TM represents patterns as
conjunctive clauses in propositional logic (AND-rules), each clause voting for
or against a particular output. While efficient for single-output problems, one
needs a separate TM per output for multi-output problems. Employing multiple
TMs hinders pattern reuse because each TM then operates in a silo. In this
paper, we introduce clause sharing, merging multiple TMs into a single one.
Each clause is related to each output by using a weight. A positive weight
makes the clause vote for output $1$, while a negative weight makes the clause
vote for output $0$. The clauses thus coalesce to produce multiple outputs. The
resulting coalesced Tsetlin Machine (CoTM) simultaneously learns both the
weights and the composition of each clause by employing interacting Stochastic
Searching on the Line (SSL) and Tsetlin Automata (TA) teams. Our empirical
results on MNIST, Fashion-MNIST, and Kuzushiji-MNIST show that CoTM obtains
significantly higher accuracy than TM on $50$- to $1$K-clause configurations,
indicating an ability to repurpose clauses. E.g., accuracy goes from $71.99$%
to $89.66$% on Fashion-MNIST when employing $50$ clauses per class (22 Kb
memory). While TM and CoTM accuracy is similar when using more than $1$K
clauses per class, CoTM reaches peak accuracy $3\times$ faster on MNIST with
$8$K clauses. We further investigate robustness towards imbalanced training
data. Our evaluations on imbalanced versions of IMDb- and CIFAR10 data show
that CoTM is robust towards high degrees of class imbalance. Being able to
share clauses, we believe CoTM will enable new TM application domains that
involve multiple outputs, such as learning language models and auto-encoding.
Related papers
- ETHEREAL: Energy-efficient and High-throughput Inference using Compressed Tsetlin Machine [0.3121107735397556]
The Tsetlin Machine (TM) is a novel alternative to deep neural networks (DNNs)
We introduce a training approach that incorporates excluded automata states to sparsify TM logic patterns in both positive and negative clauses.
Compared to standard TMs, ETHEREAL TM models can reduce model size by up to 87.54%, with only a minor accuracy compromise.
arXiv Detail & Related papers (2025-02-08T16:58:43Z) - Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)
We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.
PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z) - To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs)
We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z) - TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
Sequence Processing [52.64837396100988]
MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length.
We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computational complexity to $O(L)$.
We evaluate TCNCA on EnWik8 language modeling, long-range-arena (LRA) sequence classification, as well as a synthetic reasoning benchmark associative recall.
arXiv Detail & Related papers (2023-12-09T16:12:25Z) - Building Concise Logical Patterns by Constraining Tsetlin Machine Clause
Size [11.43224924974832]
This paper introduces a novel variant of TM learning - Clause Size Constrained TMs (CSC-TMs)
As soon as a clause includes more literals than the constraint allows, it starts expelling literals.
Our results show that CSC-TM maintains accuracy with up to 80 times fewer literals.
arXiv Detail & Related papers (2023-01-19T17:37:48Z) - Neural Machine Translation with Contrastive Translation Memories [71.86990102704311]
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios.
We propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence.
In training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence.
arXiv Detail & Related papers (2022-12-06T17:10:17Z) - Learning Best Combination for Efficient N:M Sparsity [75.34103761423803]
N:M learning can be naturally characterized as a problem which searches for the best combination within a finite collection.
We show that our learning best combination (LBC) performs consistently better than off-the-shelf N:M sparsity methods across various networks.
arXiv Detail & Related papers (2022-06-14T07:51:31Z) - Fast, Effective and Self-Supervised: Transforming Masked LanguageModels
into Universal Lexical and Sentence Encoders [66.76141128555099]
We show that it is possible to turn tasks into universal lexical and sentence encoders even without any additional data and without supervision.
We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT.
Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples.
We report huge gains over off-the-shelfs with Mirror-BERT in both lexical-level and sentence-level tasks, across different domains and different languages.
arXiv Detail & Related papers (2021-04-16T10:49:56Z) - Massively Parallel and Asynchronous Tsetlin Machine Architecture
Supporting Almost Constant-Time Scaling [11.57427340680871]
Tsetlin Machines (TMs) have recently obtained competitive performance in terms of accuracy, memory footprint, energy, and learning speed.
Each TM clause votes for or against a particular class, with classification resolved using a majority vote.
We propose a novel scheme for desynchronizing the evaluation of clauses, eliminating the voting bottleneck.
arXiv Detail & Related papers (2020-09-10T13:48:33Z) - Extending the Tsetlin Machine With Integer-Weighted Clauses for
Increased Interpretability [9.432068833600884]
Building machine models that are both interpretable and accurate is an unresolved challenge for many pattern recognition problems.
Using a linear combination of conjunctive clauses in propositional logic, Tsetlin Machines (TMs) have shown competitive performance on diverse benchmarks.
Here, we address the accuracy-interpretability challenge by equipping the TM clauses with integer weights.
arXiv Detail & Related papers (2020-05-11T14:18:09Z) - A Regression Tsetlin Machine with Integer Weighted Clauses for Compact
Pattern Representation [9.432068833600884]
The Regression Tsetlin Machine (RTM) addresses the lack of interpretability impeding state-of-the-art nonlinear regression models.
We introduce integer weighted clauses to reduce computation cost N times and increase interpretability.
We evaluate the potential of the integer weighted RTM using six artificial datasets.
arXiv Detail & Related papers (2020-02-04T12:06:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.