Compositional Evaluation on Japanese Textual Entailment and Similarity
- URL: http://arxiv.org/abs/2208.04826v1
- Date: Tue, 9 Aug 2022 15:10:56 GMT
- Title: Compositional Evaluation on Japanese Textual Entailment and Similarity
- Authors: Hitomi Yanaka and Koji Mineshima
- Abstract summary: Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models.
Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English.
There are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English.
- Score: 20.864082353441685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are
widely used benchmark tasks for compositional evaluation of pre-trained
language models. Despite growing interest in linguistic universals, most
NLI/STS studies have focused almost exclusively on English. In particular,
there are no available multilingual NLI/STS datasets in Japanese, which is
typologically different from English and can shed light on the currently
controversial behavior of language models in matters such as sensitivity to
word order and case particles. Against this background, we introduce JSICK, a
Japanese NLI/STS dataset that was manually translated from the English dataset
SICK. We also present a stress-test dataset for compositional inference,
created by transforming syntactic structures of sentences in JSICK to
investigate whether language models are sensitive to word order and case
particles. We conduct baseline experiments on different pre-trained language
models and compare the performance of multilingual models when applied to
Japanese and other languages. The results of the stress-test experiments
suggest that the current pre-trained language models are insensitive to word
order and case marking.
Related papers
- Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation [13.713981533436135]
This paper analyzes the features of monotonic translations, which follow the word order of the source language, in simultaneous interpreting (SI)
We analyzed the characteristics of chunk-wise monotonic translation (CMT) sentences using the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation dataset.
We further investigated the features of CMT sentences by evaluating the output from the existing speech translation (ST) and simultaneous speech translation (simulST) models on the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation dataset.
arXiv Detail & Related papers (2024-06-13T09:10:16Z) - An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Cross-lingual Transfer or Machine Translation? On Data Augmentation for
Monolingual Semantic Textual Similarity [2.422759879602353]
Cross-lingual transfer of Wikipedia data exhibits improved performance for monolingual STS.
We find a superiority of the Wikipedia domain over the NLI domain for these languages, in contrast to prior studies that focused on NLI as training data.
arXiv Detail & Related papers (2024-03-08T12:28:15Z) - Jamp: Controlled Japanese Temporal Inference Dataset for Evaluating
Generalization Capacity of Language Models [18.874880342410876]
We present Jamp, a Japanese benchmark focused on temporal inference.
Our dataset includes a range of temporal inference patterns, which enables us to conduct fine-grained analysis.
We evaluate the generalization capacities of monolingual/multilingual LMs by splitting our dataset based on tense fragments.
arXiv Detail & Related papers (2023-06-19T07:00:14Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - IndicXNLI: Evaluating Multilingual Inference for Indian Languages [9.838755823660147]
IndicXNLI is an NLI dataset for 11 Indic languages.
By finetuning different pre-trained LMs on this IndicXNLI, we analyze various cross-lingual transfer techniques.
These experiments provide us with useful insights into the behaviour of pre-trained models for a diverse set of languages.
arXiv Detail & Related papers (2022-04-19T09:49:00Z) - Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.