JCoLA: Japanese Corpus of Linguistic Acceptability
- URL: http://arxiv.org/abs/2309.12676v1
- Date: Fri, 22 Sep 2023 07:35:45 GMT
- Title: JCoLA: Japanese Corpus of Linguistic Acceptability
- Authors: Taiga Someya, Yushi Sugimoto, Yohei Oseki
- Abstract summary: We introduce JCoLA (Japanese Corpus of Linguistic Acceptability), which consists of 10,020 sentences annotated with binary acceptability judgments.
We then evaluate the syntactic knowledge of 9 different types of Japanese language models on JCoLA.
- Score: 3.6141428739228902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural language models have exhibited outstanding performance in a range of
downstream tasks. However, there is limited understanding regarding the extent
to which these models internalize syntactic knowledge, so that various datasets
have recently been constructed to facilitate syntactic evaluation of language
models across languages. In this paper, we introduce JCoLA (Japanese Corpus of
Linguistic Acceptability), which consists of 10,020 sentences annotated with
binary acceptability judgments. Specifically, those sentences are manually
extracted from linguistics textbooks, handbooks and journal articles, and split
into in-domain data (86 %; relatively simple acceptability judgments extracted
from textbooks and handbooks) and out-of-domain data (14 %; theoretically
significant acceptability judgments extracted from journal articles), the
latter of which is categorized by 12 linguistic phenomena. We then evaluate the
syntactic knowledge of 9 different types of Japanese language models on JCoLA.
The results demonstrated that several models could surpass human performance
for the in-domain data, while no models were able to exceed human performance
for the out-of-domain data. Error analyses by linguistic phenomena further
revealed that although neural language models are adept at handling local
syntactic dependencies like argument structure, their performance wanes when
confronted with long-distance syntactic dependencies like verbal agreement and
NPI licensing.
Related papers
- Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models [0.0]
We propose a method in which we use token-based and sentence-based augmentation methods to generate counterfactual sentence pairs.
We show that the proposed method can improve the performance and robustness of the NLI model.
arXiv Detail & Related papers (2024-10-28T03:43:25Z) - Evaluating Neural Language Models as Cognitive Models of Language
Acquisition [4.779196219827507]
We argue that some of the most prominent benchmarks for evaluating the syntactic capacities of neural language models may not be sufficiently rigorous.
When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models.
We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
arXiv Detail & Related papers (2023-10-31T00:16:17Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - An Application of Pseudo-Log-Likelihoods to Natural Language Scoring [5.382454613390483]
A language model with relatively few parameters and training steps can outperform it on a recent large data set.
We produce some absolute state-of-the-art results for common sense reasoning in binary choice tasks.
We argue that robustness of the smaller model ought to be understood in terms of compositionality.
arXiv Detail & Related papers (2022-01-23T22:00:54Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Reranking Machine Translation Hypotheses with Structured and Web-based
Language Models [11.363601836199331]
Two structured language models are applied for N-best rescoring.
We find that the combination of these language models increases the BLEU score up to 1.6% absolutely on blind test sets.
arXiv Detail & Related papers (2021-04-25T22:09:03Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - Recurrent Neural Network Language Models Always Learn English-Like
Relative Clause Attachment [17.995905582226463]
We compare model performance in English and Spanish to show that non-linguistic biases in RNN LMs advantageously overlap with syntactic structure in English but not Spanish.
English models may appear to acquire human-like syntactic preferences, while models trained on Spanish fail to acquire comparable human-like preferences.
arXiv Detail & Related papers (2020-05-01T01:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.