ArNLI: Arabic Natural Language Inference for Entailment and
Contradiction Detection
- URL: http://arxiv.org/abs/2209.13953v1
- Date: Wed, 28 Sep 2022 09:37:16 GMT
- Title: ArNLI: Arabic Natural Language Inference for Entailment and
Contradiction Detection
- Authors: Khloud Al Jallad, Nada Ghneim
- Abstract summary: We have created a data set of more than 12k sentences and named ArNLI, that will be publicly available.
We proposed an approach to detect contradictions between pairs of sentences in Arabic language using contradiction vector combined with language model vector as an input to machine learning model.
Best results achieved using Random Forest classifier with an accuracy of 99%, 60%, 75% on PHEME, SICK and ArNLI respectively.
- Score: 1.8275108630751844
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Natural Language Inference (NLI) is a hot topic research in natural language
processing, contradiction detection between sentences is a special case of NLI.
This is considered a difficult NLP task which has a big influence when added as
a component in many NLP applications, such as Question Answering Systems, text
Summarization. Arabic Language is one of the most challenging low-resources
languages in detecting contradictions due to its rich lexical, semantics
ambiguity. We have created a data set of more than 12k sentences and named
ArNLI, that will be publicly available. Moreover, we have applied a new model
inspired by Stanford contradiction detection proposed solutions on English
language. We proposed an approach to detect contradictions between pairs of
sentences in Arabic language using contradiction vector combined with language
model vector as an input to machine learning model. We analyzed results of
different traditional machine learning classifiers and compared their results
on our created data set (ArNLI) and on an automatic translation of both PHEME,
SICK English data sets. Best results achieved using Random Forest classifier
with an accuracy of 99%, 60%, 75% on PHEME, SICK and ArNLI respectively.
Related papers
- Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning [5.5119571570277826]
Cross-lingual word alignment plays a crucial role in various natural language processing tasks.
Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings.
We propose incorporating contrastive learning into the BiLSTM-based encoder-decoder framework.
arXiv Detail & Related papers (2024-07-06T11:56:41Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Pretrained Models for Multilingual Federated Learning [38.19507070702635]
We study how multilingual text impacts Federated Learning (FL) algorithms.
We explore three multilingual language tasks, language modeling, machine translation, and text classification.
Our results show that using pretrained models reduces the negative effects of FL, helping them to perform near or better than centralized (no privacy) learning.
arXiv Detail & Related papers (2022-06-06T00:20:30Z) - Polish Natural Language Inference and Factivity -- an Expert-based
Dataset and Benchmarks [0.07734726150561087]
The dataset contains entirely natural language utterances in Polish.
It is a representative sample in regards to frequency of main verbs and other linguistic features.
BERT-based models consuming only the input sentences show that they capture most of the complexity of NLI/factivity.
arXiv Detail & Related papers (2022-01-10T18:32:55Z) - Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer [3.299672391663527]
We analyze a state-of-the-art multilingual model and try to determine what impacts good transfer between languages.
We show that looking at particular syntactic features is 2-4 times more helpful in predicting the performance than an aggregated syntactic similarity.
arXiv Detail & Related papers (2021-05-12T21:22:58Z) - Combining Context-Free and Contextualized Representations for Arabic
Sarcasm Detection and Sentiment Identification [0.0]
This paper proffers team SPPU-AASM's submission for the WANLP ArSarcasm shared-task 2021, which centers around the sarcasm and sentiment polarity detection of Arabic tweets.
The proposed system achieves a F1-sarcastic score of 0.62 and a F-PN score of 0.715 for the sarcasm and sentiment detection tasks, respectively.
arXiv Detail & Related papers (2021-03-09T19:39:43Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Inducing Language-Agnostic Multilingual Representations [61.97381112847459]
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world.
We examine three approaches for this: (i) re-aligning the vector spaces of target languages to a pivot source language; (ii) removing language-specific means and variances, which yields better discriminativeness of embeddings as a by-product; and (iii) increasing input similarity across languages by removing morphological contractions and sentence reordering.
arXiv Detail & Related papers (2020-08-20T17:58:56Z) - How to Probe Sentence Embeddings in Low-Resource Languages: On
Structural Design Choices for Probing Task Evaluation [82.96358326053115]
We investigate sensitivity of probing task results to structural design choices.
We probe embeddings in a multilingual setup with design choices that lie in a'stable region', as we identify for English.
We find that results on English do not transfer to other languages.
arXiv Detail & Related papers (2020-06-16T12:37:50Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.