DFAMiner: Mining minimal separating DFAs from labelled samples
- URL: http://arxiv.org/abs/2405.18871v1
- Date: Wed, 29 May 2024 08:31:34 GMT
- Title: DFAMiner: Mining minimal separating DFAs from labelled samples
- Authors: Daniele Dell'Erba, Yong Li, Sven Schewe,
- Abstract summary: DFAMiner is a passive learning tool for learning minimal separating deterministic finite automata (DFA) from a set of labelled samples.
We first propose a simple and linear-time algorithm that incrementally constructs a three-valued DFA from a set of labelled samples.
We then apply our tool to mining a minimal separating DFA for the labelled samples by minimising the constructed automata via a reduction to solving SAT problems.
- Score: 8.196522686887713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose DFAMiner, a passive learning tool for learning minimal separating deterministic finite automata (DFA) from a set of labelled samples. Separating automata are an interesting class of automata that occurs generally in regular model checking and has raised interest in foundational questions of parity game solving. We first propose a simple and linear-time algorithm that incrementally constructs a three-valued DFA (3DFA) from a set of labelled samples given in the usual lexicographical order. This 3DFA has accepting and rejecting states as well as don't-care states, so that it can exactly recognise the labelled examples. We then apply our tool to mining a minimal separating DFA for the labelled samples by minimising the constructed automata via a reduction to solving SAT problems. Empirical evaluation shows that our tool outperforms current state-of-the-art tools significantly on standard benchmarks for learning minimal separating DFAs from samples. Progress in the efficient construction of separating DFAs can also lead to finding the lower bound of parity game solving, where we show that DFAMiner can create optimal separating automata for simple languages with up to 7 colours. Future improvements might offer inroads to better data structures.
Related papers
- Temper-Then-Tilt: Principled Unlearning for Generative Models through Tempering and Classifier Guidance [51.532841645285835]
We study machine unlearning in large generative models by framing the task as density ratio estimation to a target distribution.<n>We show it can fail to faithfully unlearn with finite samples when the forget set represents a sharp, concentrated data distribution.<n>We introduce Temper-Then-Tilt Unlearning (T3-Unlearning), which freezes the base model and applies a two-step inference procedure.
arXiv Detail & Related papers (2026-02-10T19:08:40Z) - Efficient Decomposition Identification of Deterministic Finite Automata from Examples [12.847589897236373]
We study DFA decomposition identification problems (DFA-DIPs)<n>DFA-DIP approaches depend on SAT encodings derived from Augmented Prefix Tree Acceptors (APTAs)<n>One of our key innovations replaces APTA with 3-valued DFA derived directly from labeled examples.
arXiv Detail & Related papers (2025-09-29T06:50:03Z) - Nearly Optimal Sample Complexity for Learning with Label Proportions [54.67830198790247]
We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags.<n>Despite the partial observability, the goal is still to achieve small regret at the level of individual examples.<n>We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal.
arXiv Detail & Related papers (2025-05-08T15:45:23Z) - Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness [53.96714099151378]
We propose a three-step approach for parameter-efficient fine-tuning of image-text foundation models.
Our method improves its two key components: minority samples identification and the robust training algorithm.
Our theoretical analysis shows that our PPA enhances minority group identification and is Bayes optimal for minimizing the balanced group error.
arXiv Detail & Related papers (2025-03-12T15:46:12Z) - Towards Interactive Deepfake Analysis [40.0271474912034]
This paper aims to explore interactive deepfake analysis by performing instruction tuning on multi-modal large language models (MLLMs)
To address these issues, we introduce (1) a GPT-assisted data construction process resulting in an instruction-following dataset called DFA-Instruct, (2) a benchmark named DFA-Bench, designed to comprehensively evaluate the capabilities of MLLMs in deepfake detection, deepfake classification, and artifact description, and (3) construct an interactive deepfake analysis system called DFA-GPT, as a strong baseline for the community.
arXiv Detail & Related papers (2025-01-02T09:34:11Z) - Instruction Tuning with Retrieval-based Examples Ranking for Aspect-based Sentiment Analysis [7.458853474864602]
Aspect-based sentiment analysis (ABSA) identifies sentiment information related to specific aspects and provides deeper market insights to businesses and organizations.
Recent studies have proposed using fixed examples for instruction tuning to reformulate ABSA as a generation task.
This study proposes an instruction learning method with retrieval-based example ranking for ABSA tasks.
arXiv Detail & Related papers (2024-05-28T10:39:10Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation [45.21355506181213]
We propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs.
Based on this, we propose an automatic alignment method, Direct Large Model Alignment (DLMA)
In the experimental stage, our DLMA method could surpass the textttRLHF method without relying on human-annotated preference data.
arXiv Detail & Related papers (2024-02-19T07:46:40Z) - Optimization meets Machine Learning: An Exact Algorithm for Semi-Supervised Support Vector Machines [0.9831489366502302]
Support vector machines (SVMs) are well-studied supervised learning models for binary classification.
We present a new branch approach for S3VMs using semidefinite programming (SDP) relaxations.
SDP relaxation provides bounds significantly stronger than the ones available in the literature.
arXiv Detail & Related papers (2023-12-15T13:44:54Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Automata Learning from Preference and Equivalence Queries [17.33092604696224]
We propose a novel variant of the active automata learning problem: actively learn finite automata using preference queries.
ReMAP is guaranteed to correctly infer the minimal complexity with query complexity under exact equivalence queries.
Our empirical evaluations indicate REMAP scales to large automata is effective at learning correct automata from consistent teachers.
arXiv Detail & Related papers (2023-08-18T04:49:45Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Sparse Conditional Hidden Markov Model for Weakly Supervised Named
Entity Recognition [68.68300358332156]
We propose the sparse conditional hidden Markov model (Sparse-CHMM) to evaluate noisy labeling functions.
Sparse-CHMM is optimized through unsupervised learning with a three-stage training pipeline.
It achieves a 3.01 average F1 score improvement on five comprehensive datasets.
arXiv Detail & Related papers (2022-05-27T20:47:30Z) - Fast Variational AutoEncoder with Inverted Multi-Index for Collaborative
Filtering [59.349057602266]
Variational AutoEncoder (VAE) has been extended as a representative nonlinear method for collaborative filtering.
We propose to decompose the inner-product-based softmax probability based on the inverted multi-index.
FastVAE can outperform the state-of-the-art baselines in terms of both sampling quality and efficiency.
arXiv Detail & Related papers (2021-09-13T08:31:59Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - AFP-SRC: Identification of Antifreeze Proteins Using Sparse
Representation Classifier [5.285065659030821]
Species living in the extreme cold environment fight against the harsh conditions using antifreeze proteins (AFPs)
We propose a computational framework for the prediction of AFPs using a sample-specific classification method using the sparse reconstruction.
The proposed method is found to outperform in terms of Balanced accuracy and Youden's index.
arXiv Detail & Related papers (2020-09-11T08:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.