Adversarial Encoder-Multi-Task-Decoder for Multi-Stage Processes
- URL: http://arxiv.org/abs/2003.06899v1
- Date: Sun, 15 Mar 2020 19:30:31 GMT
- Title: Adversarial Encoder-Multi-Task-Decoder for Multi-Stage Processes
- Authors: Andre Mendes, Julian Togelius, Leandro dos Santos Coelho
- Abstract summary: In multi-stage processes, decisions occur in an ordered sequence of stages.
We introduce a framework that combines adversarial autoencoders (AAE), multi-task learning (MTL), and multi-label semi-supervised learning (MLSSL)
Using real-world data from different domains, we show that our approach outperforms other state-of-the-art methods.
- Score: 5.933303832684138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-stage processes, decisions occur in an ordered sequence of stages.
Early stages usually have more observations with general information
(easier/cheaper to collect), while later stages have fewer observations but
more specific data. This situation can be represented by a dual funnel
structure, in which the sample size decreases from one stage to the other while
the information increases. Training classifiers in this scenario is challenging
since information in the early stages may not contain distinct patterns to
learn (underfitting). In contrast, the small sample size in later stages can
cause overfitting. We address both cases by introducing a framework that
combines adversarial autoencoders (AAE), multi-task learning (MTL), and
multi-label semi-supervised learning (MLSSL). We improve the decoder of the AAE
with an MTL component so it can jointly reconstruct the original input and use
feature nets to predict the features for the next stages. We also introduce a
sequence constraint in the output of an MLSSL classifier to guarantee the
sequential pattern in the predictions. Using real-world data from different
domains (selection process, medical diagnosis), we show that our approach
outperforms other state-of-the-art methods.
Related papers
- On Inter-dataset Code Duplication and Data Leakage in Large Language Models [4.148857672591562]
This paper explores the phenomenon of inter-dataset code duplication and its impact on evaluating large language models (LLMs)
Our findings reveal a potential threat to the evaluation of LLMs across multiple SE tasks, stemming from the inter-dataset code duplication phenomenon.
We provide evidence that open-source models could be affected by inter-dataset duplication.
arXiv Detail & Related papers (2024-01-15T19:46:40Z) - Unsupervised Continual Anomaly Detection with Contrastively-learned
Prompt [80.43623986759691]
We introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD.
The framework equips the UAD with continual learning capability through contrastively-learned prompts.
We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation.
arXiv Detail & Related papers (2024-01-02T03:37:11Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene [10.822477939237459]
We propose contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings.
CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.
arXiv Detail & Related papers (2021-06-04T08:17:48Z) - Contrastive Prototype Learning with Augmented Embeddings for Few-Shot
Learning [58.2091760793799]
We propose a novel contrastive prototype learning with augmented embeddings (CPLAE) model.
With a class prototype as an anchor, CPL aims to pull the query samples of the same class closer and those of different classes further away.
Extensive experiments on several benchmarks demonstrate that our proposed CPLAE achieves new state-of-the-art.
arXiv Detail & Related papers (2021-01-23T13:22:44Z) - Multi-Stage Transfer Learning with an Application to Selection Process [5.933303832684138]
In multi-stage processes, decisions happen in an ordered sequence of stages.
In this work, we proposed a textitMulti-StaGe Transfer Learning (MSGTL) approach that uses knowledge from simple classifiers trained in early stages.
We show that it is possible to control the trade-off between conserving knowledge and fine-tuning using a simple probabilistic map.
arXiv Detail & Related papers (2020-06-01T21:27:04Z) - Conditional Mutual information-based Contrastive Loss for Financial Time
Series Forecasting [12.0855096102517]
We present a representation learning framework for financial time series forecasting.
In this paper, we propose to first learn compact representations from time series data, then use the learned representations to train a simpler model for predicting time series movements.
arXiv Detail & Related papers (2020-02-18T15:24:33Z) - Few-Shot Learning as Domain Adaptation: Algorithm and Analysis [120.75020271706978]
Few-shot learning uses prior knowledge learned from the seen classes to recognize the unseen classes.
This class-difference-caused distribution shift can be considered as a special case of domain shift.
We propose a prototypical domain adaptation network with attention (DAPNA) to explicitly tackle such a domain shift problem in a meta-learning framework.
arXiv Detail & Related papers (2020-02-06T01:04:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.