Related papers: Coalesced Multi-Output Tsetlin Machines with Clause Sharing

Coalesced Multi-Output Tsetlin Machines with Clause Sharing

URL: http://arxiv.org/abs/2108.07594v1
Date: Tue, 17 Aug 2021 12:52:01 GMT
Title: Coalesced Multi-Output Tsetlin Machines with Clause Sharing
Authors: Sondre Glimsdal and Ole-Christoffer Granmo
Abstract summary: Using finite-state machines to learn patterns, Tsetlin machines (TMs) have obtained competitive accuracy and learning speed across several benchmarks. We introduce clause sharing, merging multiple TMs into a single one. Our empirical results on MNIST, Fashion-MNIST, and Kuzushiji-MNIST show that CoTM obtains significantly higher accuracy than TM on $50$- to $1$K-clause configurations.
Score: 7.754230120409288
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Using finite-state machines to learn patterns, Tsetlin machines (TMs) have obtained competitive accuracy and learning speed across several benchmarks, with frugal memory- and energy footprint. A TM represents patterns as conjunctive clauses in propositional logic (AND-rules), each clause voting for or against a particular output. While efficient for single-output problems, one needs a separate TM per output for multi-output problems. Employing multiple TMs hinders pattern reuse because each TM then operates in a silo. In this paper, we introduce clause sharing, merging multiple TMs into a single one. Each clause is related to each output by using a weight. A positive weight makes the clause vote for output $1$, while a negative weight makes the clause vote for output $0$. The clauses thus coalesce to produce multiple outputs. The resulting coalesced Tsetlin Machine (CoTM) simultaneously learns both the weights and the composition of each clause by employing interacting Stochastic Searching on the Line (SSL) and Tsetlin Automata (TA) teams. Our empirical results on MNIST, Fashion-MNIST, and Kuzushiji-MNIST show that CoTM obtains significantly higher accuracy than TM on $50$- to $1$K-clause configurations, indicating an ability to repurpose clauses. E.g., accuracy goes from $71.99$% to $89.66$% on Fashion-MNIST when employing $50$ clauses per class (22 Kb memory). While TM and CoTM accuracy is similar when using more than $1$K clauses per class, CoTM reaches peak accuracy $3\times$ faster on MNIST with $8$K clauses. We further investigate robustness towards imbalanced training data. Our evaluations on imbalanced versions of IMDb- and CIFAR10 data show that CoTM is robust towards high degrees of class imbalance. Being able to share clauses, we believe CoTM will enable new TM application domains that involve multiple outputs, such as learning language models and auto-encoding.

Related papers

$\texttt{SEM-CTRL}$: Semantically Controlled Decoding [53.86639808659575]
$texttSEM-CTRL$ is a unified approach that enforces rich context-sensitive constraints and task- and instance-specific semantics directly on an LLM decoder. texttSEM-CTRL$ allows small pre-trained LLMs to efficiently outperform larger variants and state-of-the-art reasoning models.
arXiv Detail & Related papers (2025-03-03T18:33:46Z)
ETHEREAL: Energy-efficient and High-throughput Inference using Compressed Tsetlin Machine [0.3121107735397556]
The Tsetlin Machine (TM) is a novel alternative to deep neural networks (DNNs) We introduce a training approach that incorporates excluded automata states to sparsify TM logic patterns in both positive and negative clauses. Compared to standard TMs, ETHEREAL TM models can reduce model size by up to 87.54%, with only a minor accuracy compromise.
arXiv Detail & Related papers (2025-02-08T16:58:43Z)
Reasoning Robustness of LLMs to Adversarial Typographical Errors [49.99118660264703]
Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning using Chain-of-Thought (CoT) prompting. We study the reasoning robustness of LLMs to typographical errors, which can naturally occur in users' queries. We design an Adversarial Typo Attack ($texttATA$) algorithm that iteratively samples typos for words that are important to the query and selects the edit that is most likely to succeed in attacking.
arXiv Detail & Related papers (2024-11-08T05:54:05Z)
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs) We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z)
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing [52.64837396100988]
MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length. We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computational complexity to $O(L)$. We evaluate TCNCA on EnWik8 language modeling, long-range-arena (LRA) sequence classification, as well as a synthetic reasoning benchmark associative recall.
arXiv Detail & Related papers (2023-12-09T16:12:25Z)
TMComposites: Plug-and-Play Collaboration Between Specialized Tsetlin Machines [12.838678214659422]
This paper introduces plug-and-play collaboration between specialized TMs, referred to as TM Composites. The collaboration relies on a TM's ability to specialize during learning and to assess its competence during inference. We implement three TM specializations in our empirical evaluation.
arXiv Detail & Related papers (2023-09-09T14:00:39Z)
Building Concise Logical Patterns by Constraining Tsetlin Machine Clause Size [11.43224924974832]
This paper introduces a novel variant of TM learning - Clause Size Constrained TMs (CSC-TMs) As soon as a clause includes more literals than the constraint allows, it starts expelling literals. Our results show that CSC-TM maintains accuracy with up to 80 times fewer literals.
arXiv Detail & Related papers (2023-01-19T17:37:48Z)
Neural Machine Translation with Contrastive Translation Memories [71.86990102704311]
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios. We propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence. In training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence.
arXiv Detail & Related papers (2022-12-06T17:10:17Z)
Learning Best Combination for Efficient N:M Sparsity [75.34103761423803]
N:M learning can be naturally characterized as a problem which searches for the best combination within a finite collection. We show that our learning best combination (LBC) performs consistently better than off-the-shelf N:M sparsity methods across various networks.
arXiv Detail & Related papers (2022-06-14T07:51:31Z)
Fast, Effective and Self-Supervised: Transforming Masked LanguageModels into Universal Lexical and Sentence Encoders [66.76141128555099]
We show that it is possible to turn tasks into universal lexical and sentence encoders even without any additional data and without supervision. We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT. Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples. We report huge gains over off-the-shelfs with Mirror-BERT in both lexical-level and sentence-level tasks, across different domains and different languages.
arXiv Detail & Related papers (2021-04-16T10:49:56Z)
Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling [11.57427340680871]
Tsetlin Machines (TMs) have recently obtained competitive performance in terms of accuracy, memory footprint, energy, and learning speed. Each TM clause votes for or against a particular class, with classification resolved using a majority vote. We propose a novel scheme for desynchronizing the evaluation of clauses, eliminating the voting bottleneck.
arXiv Detail & Related papers (2020-09-10T13:48:33Z)
Extending the Tsetlin Machine With Integer-Weighted Clauses for Increased Interpretability [9.432068833600884]
Building machine models that are both interpretable and accurate is an unresolved challenge for many pattern recognition problems. Using a linear combination of conjunctive clauses in propositional logic, Tsetlin Machines (TMs) have shown competitive performance on diverse benchmarks. Here, we address the accuracy-interpretability challenge by equipping the TM clauses with integer weights.
arXiv Detail & Related papers (2020-05-11T14:18:09Z)
A Regression Tsetlin Machine with Integer Weighted Clauses for Compact Pattern Representation [9.432068833600884]
The Regression Tsetlin Machine (RTM) addresses the lack of interpretability impeding state-of-the-art nonlinear regression models. We introduce integer weighted clauses to reduce computation cost N times and increase interpretability. We evaluate the potential of the integer weighted RTM using six artificial datasets.
arXiv Detail & Related papers (2020-02-04T12:06:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.