Natural Language Inference with Self-Attention for Veracity Assessment
of Pandemic Claims
- URL: http://arxiv.org/abs/2205.02596v1
- Date: Thu, 5 May 2022 12:11:31 GMT
- Title: Natural Language Inference with Self-Attention for Veracity Assessment
of Pandemic Claims
- Authors: M. Arana-Catania, Elena Kochkina, Arkaitz Zubiaga, Maria Liakata, Rob
Procter, Yulan He
- Abstract summary: We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19.
We then propose novel techniques for automated veracity assessment based on Natural Language Inference.
- Score: 54.93898455714295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a comprehensive work on automated veracity assessment from dataset
creation to developing novel methods based on Natural Language Inference (NLI),
focusing on misinformation related to the COVID-19 pandemic. We first describe
the construction of the novel PANACEA dataset consisting of heterogeneous
claims on COVID-19 and their respective information sources. The dataset
construction includes work on retrieval techniques and similarity measurements
to ensure a unique set of claims. We then propose novel techniques for
automated veracity assessment based on Natural Language Inference including
graph convolutional networks and attention based approaches. We have carried
out experiments on evidence retrieval and veracity assessment on the dataset
using the proposed techniques and found them competitive with SOTA methods, and
provided a detailed discussion.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - Towards Explainable, Safe Autonomous Driving with Language Embeddings
for Novelty Identification and Active Learning: Framework and Experimental
Analysis with Real-World Data Sets [0.0]
This research explores the integration of language embeddings for active learning in autonomous driving datasets.
Our proposed method employs language-based representations to identify novel scenes, emphasizing the dual purpose of safety takeover responses and active learning.
arXiv Detail & Related papers (2024-02-11T22:53:21Z) - Statistical properties and privacy guarantees of an original
distance-based fully synthetic data generation method [0.0]
This work shows the technical feasibility of generating publicly releasable synthetic data using a multi-step framework.
By successfully assessing the quality of data produced using a novel multi-step synthetic data generation framework, we showed the technical and conceptual soundness of the Open-CESP initiative.
arXiv Detail & Related papers (2023-10-10T12:29:57Z) - A deep Natural Language Inference predictor without language-specific
training data [44.26507854087991]
We present a technique of NLP to tackle the problem of inference relation (NLI) between pairs of sentences in a target language of choice without a language-specific training dataset.
We exploit a generic translation dataset, manually translated, along with two instances of the same pre-trained model.
The model has been evaluated over machine translated Stanford NLI test dataset, machine translated Multi-Genre NLI test dataset, and manually translated RTE3-ITA test dataset.
arXiv Detail & Related papers (2023-09-06T10:20:59Z) - Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study
on Telematics Data with ChatGPT [0.0]
This research delves into the construction and utilization of synthetic datasets, specifically within the telematics sphere, leveraging OpenAI's powerful language model, ChatGPT.
To illustrate this data creation process, a hands-on case study is conducted, focusing on the generation of a synthetic telematics dataset.
arXiv Detail & Related papers (2023-06-23T15:15:13Z) - Going beyond research datasets: Novel intent discovery in the industry
setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform.
We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision.
We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z) - Reliable Evaluations for Natural Language Inference based on a Unified
Cross-dataset Benchmark [54.782397511033345]
Crowd-sourced Natural Language Inference (NLI) datasets may suffer from significant biases like annotation artifacts.
We present a new unified cross-datasets benchmark with 14 NLI datasets and re-evaluate 9 widely-used neural network-based NLI models.
Our proposed evaluation scheme and experimental baselines could provide a basis to inspire future reliable NLI research.
arXiv Detail & Related papers (2020-10-15T11:50:12Z) - COVID-19 therapy target discovery with context-aware literature mining [5.839799877302573]
We propose a system for contextualization of empirical expression data by approximating relations between entities.
In order to exploit a larger scientific context by transfer learning, we propose a novel embedding generation technique.
arXiv Detail & Related papers (2020-07-30T18:37:36Z) - A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset.
We measure consensus between answers generated by the model and a set of relevant answers.
We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.