To Drop or Not to Drop? Predicting Argument Ellipsis Judgments: A Case Study in Japanese
- URL: http://arxiv.org/abs/2404.11315v2
- Date: Sun, 27 Oct 2024 07:28:25 GMT
- Title: To Drop or Not to Drop? Predicting Argument Ellipsis Judgments: A Case Study in Japanese
- Authors: Yukiko Ishizuki, Tatsuki Kuribayashi, Yuichiroh Matsubayashi, Ryohei Sasano, Kentaro Inui,
- Abstract summary: We study whether and why a particular argument should be omitted across over 2,000 data points in the balanced corpus of Japanese.
The data indicate that native speakers overall share common criteria for such judgments.
The gap between the systems' prediction and human judgments in specific linguistic aspects is revealed.
- Score: 26.659122101710068
- License:
- Abstract: Speakers sometimes omit certain arguments of a predicate in a sentence; such omission is especially frequent in pro-drop languages. This study addresses a question about ellipsis -- what can explain the native speakers' ellipsis decisions? -- motivated by the interest in human discourse processing and writing assistance for this choice. To this end, we first collect large-scale human annotations of whether and why a particular argument should be omitted across over 2,000 data points in the balanced corpus of Japanese, a prototypical pro-drop language. The data indicate that native speakers overall share common criteria for such judgments and further clarify their quantitative characteristics, e.g., the distribution of related linguistic factors in the balanced corpus. Furthermore, the performance of the language model-based argument ellipsis judgment model is examined, and the gap between the systems' prediction and human judgments in specific linguistic aspects is revealed. We hope our fundamental resource encourages further studies on natural human ellipsis judgment.
Related papers
- Does Dependency Locality Predict Non-canonical Word Order in Hindi? [5.540151072128081]
dependency length minimization is a significant predictor of non-canonical (OSV) syntactic choices.
discourse predictability emerges as the primary determinant of constituent-order preferences.
This work sheds light on the role of expectation adaptation in word-ordering decisions.
arXiv Detail & Related papers (2024-05-13T13:24:17Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Is Argument Structure of Learner Chinese Understandable: A Corpus-Based
Analysis [8.883799596036484]
This paper presents a corpus-based analysis of argument structure errors in learner Chinese.
The data for analysis includes sentences produced by language learners as well as their corrections by native speakers.
We couple the data with semantic role labeling annotations that are manually created by two senior students.
arXiv Detail & Related papers (2023-08-17T21:10:04Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - Cross-Lingual Speaker Identification Using Distant Supervision [84.51121411280134]
We propose a speaker identification framework that addresses issues such as lack of contextual reasoning and poor cross-lingual generalization.
We show that the resulting model outperforms previous state-of-the-art methods on two English speaker identification benchmarks by up to 9% in accuracy and 5% with only distant supervision.
arXiv Detail & Related papers (2022-10-11T20:49:44Z) - Construction and Evaluation of a Self-Attention Model for Semantic
Understanding of Sentence-Final Particles [0.0]
Sentence-final particles serve an essential role in spoken Japanese because they express the speaker's mental attitudes toward a proposition and/or an interlocutor.
This paper proposes a self-attention model that takes various subjective senses in addition to language and images as input and learns the relationship between words and subjective senses.
arXiv Detail & Related papers (2022-10-01T13:54:54Z) - Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
We suggest a naturalistic strategy for input-level intervention on real world data in Spanish.
Using our approach, we isolate morpho-syntactic features from counfounders in sentences.
We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models.
arXiv Detail & Related papers (2022-05-14T11:47:58Z) - Text as Causal Mediators: Research Design for Causal Estimates of
Differential Treatment of Social Groups via Language Aspects [7.175621752912443]
We propose a causal research design for observational (non-experimental) data to estimate the natural direct and indirect effects of social group signals on speakers' responses.
We illustrate the promises and challenges of this framework via a theoretical case study of the effect of an advocate's gender on interruptions from justices during U.S. Supreme Court oral arguments.
arXiv Detail & Related papers (2021-09-15T19:15:35Z) - Evaluating Models of Robust Word Recognition with Serial Reproduction [8.17947290421835]
We compare several broad-coverage probabilistic generative language models in their ability to capture human linguistic expectations.
We find that those models that make use of abstract representations of preceding linguistic context best predict the changes made by people in the course of serial reproduction.
arXiv Detail & Related papers (2021-01-24T20:16:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.