DynaSent: A Dynamic Benchmark for Sentiment Analysis
- URL: http://arxiv.org/abs/2012.15349v1
- Date: Wed, 30 Dec 2020 22:38:21 GMT
- Title: DynaSent: A Dynamic Benchmark for Sentiment Analysis
- Authors: Christopher Potts, Zhengxuan Wu, Atticus Geiger, Douwe Kiela
- Abstract summary: We introduce DynaSent, a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis.
DynaSent combines naturally occurring sentences with sentences created using the open-source Dynabench Platform.
It has a total of 121,634 sentences, each validated by five crowdworkers.
- Score: 31.724648265584445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark
task for ternary (positive/negative/neutral) sentiment analysis. DynaSent
combines naturally occurring sentences with sentences created using the
open-source Dynabench Platform, which facilities human-and-model-in-the-loop
dataset creation. DynaSent has a total of 121,634 sentences, each validated by
five crowdworkers, and its development and test splits are designed to produce
chance performance for even the best models we have been able to develop; when
future models solve this task, we will use them to create DynaSent version 2,
continuing the dynamic evolution of this benchmark. Here, we report on the
dataset creation effort, focusing on the steps we took to increase quality and
reduce artifacts. We also present evidence that DynaSent's Neutral category is
more coherent than the comparable category in other benchmarks, and we motivate
training models from scratch for each round over successive fine-tuning.
Related papers
- Evaluation of Text-to-Video Generation Models: A Dynamics Perspective [94.2662603491163]
Existing evaluation protocols primarily focus on temporal consistency and content continuity.
We propose an effective evaluation protocol, termed DEVIL, which centers on the dynamics dimension to evaluate T2V models.
arXiv Detail & Related papers (2024-07-01T08:51:22Z) - Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM
Evaluation [51.99752147380505]
This paper presents a benchmark self-evolving framework to dynamically evaluate Large Language Models (LLMs)
We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence.
Our framework widens performance discrepancies both between different models and within the same model across various tasks.
arXiv Detail & Related papers (2024-02-18T03:40:06Z) - Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech [107.81472531864195]
Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions.
We present Dynamic-SUPERB, a benchmark for building universal speech models capable of leveraging instruction tuning to perform multiple tasks in a zero-shot fashion.
arXiv Detail & Related papers (2023-09-18T06:43:30Z) - A Theory of Dynamic Benchmarks [24.170405353348592]
We study the benefits and practical limitations of dynamic benchmarking.
These results provide a theoretical foundation and a causal explanation for observed bottlenecks in empirical work.
arXiv Detail & Related papers (2022-10-06T18:56:46Z) - Time Will Change Things: An Empirical Study on Dynamic Language
Understanding in Social Media Classification [5.075802830306718]
We empirically study social media NLU in a dynamic setup, where models are trained on the past data and test on the future.
We show that auto-encoding and pseudo-labeling collaboratively show the best robustness in dynamicity.
arXiv Detail & Related papers (2022-10-06T12:18:28Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - Dyna-T: Dyna-Q and Upper Confidence Bounds Applied to Trees [0.9137554315375919]
We present a preliminary investigation of a novel algorithm called Dyna-T.
In reinforcement learning (RL) a planning agent has its own representation of the environment as a model.
Experience can be used for learning a better model or improve directly the value function and policy.
arXiv Detail & Related papers (2022-01-12T15:06:30Z) - Dynabench: Rethinking Benchmarking in NLP [82.26699038776812]
We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.
Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation.
We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform.
arXiv Detail & Related papers (2021-04-07T17:49:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.