Learning to Write Rationally: How Information Is Distributed in Non-Native Speakers' Essays
- URL: http://arxiv.org/abs/2411.03550v1
- Date: Tue, 05 Nov 2024 23:09:37 GMT
- Title: Learning to Write Rationally: How Information Is Distributed in Non-Native Speakers' Essays
- Authors: Zixin Tang, Janet G. van Hell,
- Abstract summary: We compare essays written by second language learners with various native language (L1) backgrounds to investigate how they distribute information in their non-native language (L2) production.
Analyses of surprisal and constancy of entropy rate indicated that writers with higher L2 proficiency can reduce the expected uncertainty of language production while still conveying informative content.
- Score: 1.5039745292757671
- License:
- Abstract: People tend to distribute information evenly in language production for better and clearer communication. In this study, we compared essays written by second language learners with various native language (L1) backgrounds to investigate how they distribute information in their non-native language (L2) production. Analyses of surprisal and constancy of entropy rate indicated that writers with higher L2 proficiency can reduce the expected uncertainty of language production while still conveying informative content. However, the uniformity of information distribution showed less variability among different groups of L2 speakers, suggesting that this feature may be universal in L2 essay writing and less affected by L2 writers' variability in L1 background and L2 proficiency.
Related papers
- LLM-based Translation Inference with Iterative Bilingual Understanding [45.00660558229326]
We propose a novel Iterative Bilingual Understanding Translation method based on the cross-lingual capabilities of large language models (LLMs)
The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately.
The proposed IBUT outperforms several strong comparison methods.
arXiv Detail & Related papers (2024-10-16T13:21:46Z) - Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models [7.615938028813914]
With Retrieval Augmented Generation (RAG), Large Language Models (LLMs) are playing a pivotal role in information search.
We studied LLM's linguistic preference in a RAG-based information search setting.
We found that LLMs displayed systemic bias towards information in the same language as the query language in both information retrieval and answer generation.
arXiv Detail & Related papers (2024-07-07T21:26:36Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Language Representation Projection: Can We Transfer Factual Knowledge
across Languages in Multilingual Language Models? [48.88328580373103]
We propose two parameter-free $textbfL$anguage $textbfR$epresentation $textbfP$rojection modules (LRP2)
The first module converts non-English representations into English-like equivalents, while the second module reverts English-like representations back into representations of the corresponding non-English language.
Experimental results on the mLAMA dataset demonstrate that LRP2 significantly improves factual knowledge retrieval accuracy and facilitates knowledge transferability across diverse non-English languages.
arXiv Detail & Related papers (2023-11-07T08:16:16Z) - Cross-Lingual Knowledge Editing in Large Language Models [73.12622532088564]
Knowledge editing has been shown to adapt large language models to new knowledge without retraining from scratch.
It is still unknown the effect of source language editing on a different target language.
We first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese.
arXiv Detail & Related papers (2023-09-16T11:07:52Z) - X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs [55.80189506270598]
X-PARADE is the first cross-lingual dataset of paragraph-level information divergences.
Annotators label a paragraph in a target language at the span level and evaluate it with respect to a corresponding paragraph in a source language.
Aligned paragraphs are sourced from Wikipedia pages in different languages.
arXiv Detail & Related papers (2023-09-16T04:34:55Z) - SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with
BERT [0.0]
Cross-linguistic transfer is the influence of linguistic structure of a speaker's native language on the successful acquisition of a foreign language.
We find that NLP literature has not given enough attention to the phenomenon of negative transfer.
Our findings call for further research using our novel Transformer-based SLA models.
arXiv Detail & Related papers (2023-05-31T06:22:07Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - A bifurcation threshold for contact-induced language change [0.0]
This paper proposes a mathematical model of such situations based on reinforcement learning and nonlinear dynamics.
The model is evaluated with the help of two case studies, morphological levelling in Afrikaans and the erosion of null subjects in Afro-Peruvian Spanish.
arXiv Detail & Related papers (2021-11-23T18:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.