What is Next when Sequential Prediction Meets Implicitly Hard
Interaction?
- URL: http://arxiv.org/abs/2202.06620v1
- Date: Mon, 14 Feb 2022 11:15:28 GMT
- Title: What is Next when Sequential Prediction Meets Implicitly Hard
Interaction?
- Authors: Kaixi Hu, Lin Li, Qing Xie, Jianquan Liu, Xiaohui Tao
- Abstract summary: Hardness Aware Interaction Learning framework (HAIL) consists of two base sequential learning networks and mutual exclusivity distillation (MED)
Our framework can be easily extended to more peer base networks.
- Score: 12.093590031186034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hard interaction learning between source sequences and their next targets is
challenging, which exists in a myriad of sequential prediction tasks. During
the training process, most existing methods focus on explicitly hard
interactions caused by wrong responses. However, a model might conduct correct
responses by capturing a subset of learnable patterns, which results in
implicitly hard interactions with some unlearned patterns. As such, its
generalization performance is weakened. The problem gets more serious in
sequential prediction due to the interference of substantial similar candidate
targets.
To this end, we propose a Hardness Aware Interaction Learning framework
(HAIL) that mainly consists of two base sequential learning networks and mutual
exclusivity distillation (MED). The base networks are initialized differently
to learn distinctive view patterns, thus gaining different training
experiences. The experiences in the form of the unlikelihood of correct
responses are drawn from each other by MED, which provides mutual exclusivity
knowledge to figure out implicitly hard interactions. Moreover, we deduce that
the unlikelihood essentially introduces additional gradients to push the
pattern learning of correct responses. Our framework can be easily extended to
more peer base networks. Evaluation is conducted on four datasets covering
cyber and physical spaces. The experimental results demonstrate that our
framework outperforms several state-of-the-art methods in terms of top-k based
metrics.
Related papers
- A distributional simplicity bias in the learning dynamics of transformers [50.91742043564049]
We show that transformers, trained on natural language data, also display a simplicity bias.
Specifically, they sequentially learn many-body interactions among input tokens, reaching a saturation point in the prediction error for low-degree interactions.
This approach opens up the possibilities of studying how interactions of different orders in the data affect learning, in natural language processing and beyond.
arXiv Detail & Related papers (2024-10-25T15:39:34Z) - Diffusing States and Matching Scores: A New Framework for Imitation Learning [16.941612670582522]
Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function.
In recent years, diffusion models have emerged as a non-adversarial alternative to GANs.
We show our approach outperforms GAN-style imitation learning baselines across various continuous control problems.
arXiv Detail & Related papers (2024-10-17T17:59:25Z) - ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference [21.24566458648584]
We propose ConsistentEE, an early exiting method consistent in training and inference.
A policy network is added to decide whether an instance should exit or continue.
We incorporate memorized layer into reward function design, which allows "easy" instances to focus more on acceleration.
arXiv Detail & Related papers (2023-12-19T06:16:13Z) - Improving Language Models Meaning Understanding and Consistency by
Learning Conceptual Roles from Dictionary [65.268245109828]
Non-human-like behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness.
A striking phenomenon is the generation of inconsistent predictions, which produces contradictory results.
We propose a practical approach that alleviates the inconsistent behaviour issue by improving PLM awareness.
arXiv Detail & Related papers (2023-10-24T06:15:15Z) - Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph.
We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs.
The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z) - Learning Transferable Adversarial Robust Representations via Multi-view
Consistency [57.73073964318167]
We propose a novel meta-adversarial multi-view representation learning framework with dual encoders.
We demonstrate the effectiveness of our framework on few-shot learning tasks from unseen domains.
arXiv Detail & Related papers (2022-10-19T11:48:01Z) - Understanding the Logit Distributions of Adversarially-Trained Deep
Neural Networks [6.439477789066243]
Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks.
Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood.
We identify three logit characteristics essential to learning adversarial robustness.
arXiv Detail & Related papers (2021-08-26T19:09:15Z) - Linear Mode Connectivity in Multitask and Continual Learning [46.98656798573886]
We investigate whether multitask and continual solutions are similarly connected.
We propose an effective algorithm that constrains the sequentially learned minima to behave as the multitask solution.
arXiv Detail & Related papers (2020-10-09T10:53:25Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.