Related papers: Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims

Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims

URL: http://arxiv.org/abs/2205.02596v1
Date: Thu, 5 May 2022 12:11:31 GMT
Title: Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims
Authors: M. Arana-Catania, Elena Kochkina, Arkaitz Zubiaga, Maria Liakata, Rob Procter, Yulan He
Abstract summary: We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19. We then propose novel techniques for automated veracity assessment based on Natural Language Inference.
Score: 54.93898455714295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a comprehensive work on automated veracity assessment from dataset creation to developing novel methods based on Natural Language Inference (NLI), focusing on misinformation related to the COVID-19 pandemic. We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19 and their respective information sources. The dataset construction includes work on retrieval techniques and similarity measurements to ensure a unique set of claims. We then propose novel techniques for automated veracity assessment based on Natural Language Inference including graph convolutional networks and attention based approaches. We have carried out experiments on evidence retrieval and veracity assessment on the dataset using the proposed techniques and found them competitive with SOTA methods, and provided a detailed discussion.

Related papers

An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval [51.10419281315848]
We conduct an empirical study to explore the potential of synthetic data for Text-Based Person Retrieval (TBPR) research. We propose an inter-class image generation pipeline, in which an automatic prompt construction strategy is introduced. We develop an intra-class image augmentation pipeline, in which the generative AI models are applied to further edit the images.
arXiv Detail & Related papers (2025-03-28T06:18:15Z)
Concept Navigation and Classification via Open-Source Large Language Model Processing [0.0]
This paper presents a novel methodological framework for detecting and classifying latent constructs from textual data using Open-Source Large Language Models (LLMs) The proposed hybrid approach combines automated summarization with human-in-the-loop validation to enhance the accuracy and interpretability of construct identification.
arXiv Detail & Related papers (2025-02-07T08:42:34Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples. We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics. When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z)
Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data Sets [0.0]
This research explores the integration of language embeddings for active learning in autonomous driving datasets. Our proposed method employs language-based representations to identify novel scenes, emphasizing the dual purpose of safety takeover responses and active learning.
arXiv Detail & Related papers (2024-02-11T22:53:21Z)
Statistical properties and privacy guarantees of an original distance-based fully synthetic data generation method [0.0]
This work shows the technical feasibility of generating publicly releasable synthetic data using a multi-step framework. By successfully assessing the quality of data produced using a novel multi-step synthetic data generation framework, we showed the technical and conceptual soundness of the Open-CESP initiative.
arXiv Detail & Related papers (2023-10-10T12:29:57Z)
A deep Natural Language Inference predictor without language-specific training data [44.26507854087991]
We present a technique of NLP to tackle the problem of inference relation (NLI) between pairs of sentences in a target language of choice without a language-specific training dataset. We exploit a generic translation dataset, manually translated, along with two instances of the same pre-trained model. The model has been evaluated over machine translated Stanford NLI test dataset, machine translated Multi-Genre NLI test dataset, and manually translated RTE3-ITA test dataset.
arXiv Detail & Related papers (2023-09-06T10:20:59Z)
Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study on Telematics Data with ChatGPT [0.0]
This research delves into the construction and utilization of synthetic datasets, specifically within the telematics sphere, leveraging OpenAI's powerful language model, ChatGPT. To illustrate this data creation process, a hands-on case study is conducted, focusing on the generation of a synthetic telematics dataset.
arXiv Detail & Related papers (2023-06-23T15:15:13Z)
Going beyond research datasets: Novel intent discovery in the industry setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform. We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision. We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z)
Reliable Evaluations for Natural Language Inference based on a Unified Cross-dataset Benchmark [54.782397511033345]
Crowd-sourced Natural Language Inference (NLI) datasets may suffer from significant biases like annotation artifacts. We present a new unified cross-datasets benchmark with 14 NLI datasets and re-evaluate 9 widely-used neural network-based NLI models. Our proposed evaluation scheme and experimental baselines could provide a basis to inspire future reliable NLI research.
arXiv Detail & Related papers (2020-10-15T11:50:12Z)
COVID-19 therapy target discovery with context-aware literature mining [5.839799877302573]
We propose a system for contextualization of empirical expression data by approximating relations between entities. In order to exploit a larger scientific context by transfer learning, we propose a novel embedding generation technique.
arXiv Detail & Related papers (2020-07-30T18:37:36Z)
A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset. We measure consensus between answers generated by the model and a set of relevant answers. We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.