UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying
Multilingual Check-worthy Claims
- URL: http://arxiv.org/abs/2109.09232v1
- Date: Sun, 19 Sep 2021 21:46:16 GMT
- Title: UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying
Multilingual Check-worthy Claims
- Authors: Ipek Baris Schlicht, Angel Felipe Magnoss\~ao de Paula and Paolo Rosso
- Abstract summary: In this paper, we propose a language identification task as an auxiliary task to mitigate unintended bias.
Our results show that joint training of language identification and check-worthy claim detection tasks can provide performance gains for some of the selected languages.
- Score: 6.167830237917659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying check-worthy claims is often the first step of automated
fact-checking systems. Tackling this task in a multilingual setting has been
understudied. Encoding inputs with multilingual text representations could be
one approach to solve the multilingual check-worthiness detection. However,
this approach could suffer if cultural bias exists within the communities on
determining what is check-worthy.In this paper, we propose a language
identification task as an auxiliary task to mitigate unintended bias.With this
purpose, we experiment joint training by using the datasets from CLEF-2021
CheckThat!, that contain tweets in English, Arabic, Bulgarian, Spanish and
Turkish. Our results show that joint training of language identification and
check-worthy claim detection tasks can provide performance gains for some of
the selected languages.
Related papers
- Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset [13.041053110012246]
We introduce a statistical test, the Preference Proportion Test, for identifying such unreliable subsets.
We find that filtering this low-quality data out when training models for the downstream task of phonetic transcription brings substantial benefits.
arXiv Detail & Related papers (2024-10-05T21:41:49Z) - Interpretable Unified Language Checking [42.816372695828306]
We present an interpretable, unified, language checking (UniLC) method for both human and machine-generated language.
We find that LLMs can achieve high performance on a combination of fact-checking, stereotype detection, and hate speech detection tasks.
arXiv Detail & Related papers (2023-04-07T16:47:49Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles
for Detecting Patronizing and Condescending Language [0.0]
Patronizing and condescending language (PCL) is everywhere, but rarely is the focus on its use by media towards vulnerable communities.
In this paper, we describe our system for detecting such language which was submitted to SemEval 2022 Task 4: Patronizing and Condescending Language Detection.
arXiv Detail & Related papers (2022-04-18T13:22:10Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Handshakes AI Research at CASE 2021 Task 1: Exploring different
approaches for multilingual tasks [0.22940141855172036]
The aim of the CASE 2021 Shared Task 1 was to detect and classify socio-political and crisis event information in a multilingual setting.
Our submission contained entries in all of the subtasks, and the scores obtained validated our research finding.
arXiv Detail & Related papers (2021-10-29T07:58:49Z) - Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual
Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved.
However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z) - Enhancing Answer Boundary Detection for Multilingual Machine Reading
Comprehension [86.1617182312817]
We propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision.
A mixed Machine Reading task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs.
A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web.
arXiv Detail & Related papers (2020-04-29T10:44:00Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.