Related papers: Fool Me Twice: Entailment from Wikipedia Gamification

Fool Me Twice: Entailment from Wikipedia Gamification

URL: http://arxiv.org/abs/2104.04725v1
Date: Sat, 10 Apr 2021 09:58:40 GMT
Title: Fool Me Twice: Entailment from Wikipedia Gamification
Authors: Julian Martin Eisenschlos, Bhuwan Dhingra, Jannis Bulian, Benjamin B\"orschinger, Jordan Boyd-Graber
Abstract summary: Gamification encourages adversarial examples, drastically lowering the number of examples that can be solved. We release FoolMeTwice, a dataset of challenging entailment pairs collected through a fun multi-player game.
Score: 12.071302977728221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We release FoolMeTwice (FM2 for short), a large dataset of challenging entailment pairs collected through a fun multi-player game. Gamification encourages adversarial examples, drastically lowering the number of examples that can be solved using "shortcuts" compared to other popular entailment datasets. Players are presented with two tasks. The first task asks the player to write a plausible claim based on the evidence from a Wikipedia page. The second one shows two plausible claims written by other players, one of which is false, and the goal is to identify it before the time runs out. Players "pay" to see clues retrieved from the evidence pool: the more evidence the player needs, the harder the claim. Game-play between motivated players leads to diverse strategies for crafting claims, such as temporal inference and diverting to unrelated evidence, and results in higher quality data for the entailment and evidence retrieval tasks. We open source the dataset and the game code.

Related papers

Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding [97.05584099530226]
We introduce MF$2$, a new benchmark for evaluating whether models can comprehend, consolidate, and recall key narrative information from full-length movies.<n>For each pair, models must correctly identify both the true and false claims.<n>Our experiments demonstrate that both open-weight and closed state-of-the-art models fall well short of human performance.
arXiv Detail & Related papers (2025-06-06T17:58:36Z)
Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain. We propose an adversarial algorithm to make the retriever component robust against distribution shift. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z)
Give Me More Details: Improving Fact-Checking with Latent Retrieval [58.706972228039604]
Evidence plays a crucial role in automated fact-checking. Existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine. We propose to incorporate full text from source documents as evidence and introduce two enriched datasets.
arXiv Detail & Related papers (2023-05-25T15:01:19Z)
JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions [75.42526766746515]
We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs. Our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge. Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models.
arXiv Detail & Related papers (2022-10-18T19:20:53Z)
Combining Sequential and Aggregated Data for Churn Prediction in Casual Freemium Games [0.0]
In freemium games, the revenue from a player comes from the in-app purchases made and the advertisement to which that player is exposed. Within this scenario, it is extremely important to be able to detect promptly when a player is about to quit playing. We investigate how to improve the current state-of-the-art in churn prediction by combining sequential and aggregate data.
arXiv Detail & Related papers (2022-09-06T14:49:18Z)
Efficient tracking of team sport players with few game-specific annotations [1.052782170493037]
We propose a new generic method to track team sport players during a full game thanks to few human annotations collected via a semi-interactive system. Non-ambiguous tracklets and their appearance features are automatically generated with a detection and a reidentification network both pre-trained on public datasets. We demonstrate the efficiency of our approach on a challenging rugby sevens dataset.
arXiv Detail & Related papers (2022-04-08T13:11:30Z)
Collusion Detection in Team-Based Multiplayer Games [57.153233321515984]
We propose a system that detects colluding behaviors in team-based multiplayer games. The proposed method analyzes the players' social relationships paired with their in-game behavioral patterns. We then automate the detection using Isolation Forest, an unsupervised learning technique specialized in highlighting outliers.
arXiv Detail & Related papers (2022-03-10T02:37:39Z)
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit [93.97385339354318]
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
arXiv Detail & Related papers (2021-11-08T23:38:47Z)
6MapNet: Representing soccer players from tracking data by a triplet network [19.343859572602558]
We build a triplet network named 6MapNet that can effectively capture the movement styles of players using in-game GPS data. Ourworks then map these heatmap pairs into feature vectors whose similarity corresponds to the actual similarity of playing styles.
arXiv Detail & Related papers (2021-09-10T07:57:12Z)
HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification [74.66819506353086]
HoVer is a dataset for many-hop evidence extraction and fact verification. It challenges models to extract facts from several Wikipedia articles that are relevant to a claim. Most of the 3/4-hop claims are written in multiple sentences, which adds to the complexity of understanding long-range dependency relations.
arXiv Detail & Related papers (2020-11-05T20:33:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.