Learning from data in the mixed adversarial non-adversarial case:
Finding the helpers and ignoring the trolls
- URL: http://arxiv.org/abs/2208.03295v1
- Date: Fri, 5 Aug 2022 17:33:33 GMT
- Title: Learning from data in the mixed adversarial non-adversarial case:
Finding the helpers and ignoring the trolls
- Authors: Da Ju, Jing Xu, Y-Lan Boureau, Jason Weston
- Abstract summary: We study how to perform robust learning in such an environment.
We introduce a benchmark evaluation, SafetyMix, which can evaluate methods that learn safe vs. toxic language.
We propose and analyze several mitigating learning algorithms that identify trolls either at the example or at the user level.
- Score: 28.903534969338015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The promise of interaction between intelligent conversational agents and
humans is that models can learn from such feedback in order to improve.
Unfortunately, such exchanges in the wild will not always involve human
utterances that are benign or of high quality, and will include a mixture of
engaged (helpers) and unengaged or even malicious users (trolls). In this work
we study how to perform robust learning in such an environment. We introduce a
benchmark evaluation, SafetyMix, which can evaluate methods that learn safe vs.
toxic language in a variety of adversarial settings to test their robustness.
We propose and analyze several mitigating learning algorithms that identify
trolls either at the example or at the user level. Our main finding is that
user-based methods, that take into account that troll users will exhibit
adversarial behavior across multiple examples, work best in a variety of
settings on our benchmark. We then test these methods in a further real-life
setting of conversations collected during deployment, with similar results.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Boosting Feedback Efficiency of Interactive Reinforcement Learning by
Adaptive Learning from Scores [11.702616722462139]
This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning.
We show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods.
arXiv Detail & Related papers (2023-07-11T16:12:15Z) - Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function.
This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z) - What makes you change your mind? An empirical investigation in online
group decision-making conversations [17.152995902615235]
We investigate methods for detecting what makes someone change their mind.
To find out what makes someone change their mind, we incorporate various techniques such as neural text classification and language-agnostic change point detection.
Evaluation of these methods shows that while the task is not trivial, the best way to approach it is using a language-aware model with learning-to-rank training.
arXiv Detail & Related papers (2022-07-25T10:19:31Z) - On the Efficiency of Integrating Self-supervised Learning and
Meta-learning for User-defined Few-shot Keyword Spotting [51.41426141283203]
User-defined keyword spotting is a task to detect new spoken terms defined by users.
Previous works try to incorporate self-supervised learning models or apply meta-learning algorithms.
Our result shows that HuBERT combined with Matching network achieves the best result.
arXiv Detail & Related papers (2022-04-01T10:59:39Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Towards Improved and Interpretable Deep Metric Learning via Attentive
Grouping [103.71992720794421]
Grouping has been commonly used in deep metric learning for computing diverse features.
We propose an improved and interpretable grouping method to be integrated flexibly with any metric learning framework.
arXiv Detail & Related papers (2020-11-17T19:08:24Z) - Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users'
Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies.
Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z) - Let Me At Least Learn What You Really Like: Dealing With Noisy Humans
When Learning Preferences [0.76146285961466]
We propose a modification to uncertainty sampling which uses the expected output value to help speed up learning of preferences.
We compare our approach with the uncertainty sampling baseline, as well as conduct an ablation study to test the validity of each component of our approach.
arXiv Detail & Related papers (2020-02-15T00:36:23Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.