Related papers: Unique Chinese Linguistic Phenomena

Related papers

When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z)
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases [47.920937001420505]
Pretraining language models on formal languages can improve their acquisition of natural language, but it is unclear which features of the formal language impart an inductive bias that leads to effective transfer. We find that formal languages with both these properties enable language models to achieve lower loss on natural language and better linguistic generalization compared to other languages.
arXiv Detail & Related papers (2025-02-26T15:55:55Z)
Analyzing The Language of Visual Tokens [48.62180485759458]
We take a natural-language-centric approach to analyzing discrete visual languages. We show that higher token innovation drives greater entropy and lower compression, with tokens predominantly representing object parts. We also show that visual languages lack cohesive grammatical structures, leading to higher perplexity and weaker hierarchical organization compared to natural languages.
arXiv Detail & Related papers (2024-11-07T18:59:28Z)
The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models [64.43764443000003]
We identify two sources of visual-linguistic compositionality: linguistic priors and the interplay between images and texts. We propose a new metric for compositionality without such linguistic priors.
arXiv Detail & Related papers (2023-10-04T12:48:33Z)
Discourse Representation Structure Parsing for Chinese [8.846860617823005]
We explore the feasibility of Chinese semantic parsing in the absence of labeled data for Chinese meaning representations. We propose a test suite designed explicitly for Chinese semantic parsing, which provides fine-grained evaluation for parsing performance. Our experimental results show that the difficulty of Chinese semantic parsing is mainly caused by adverbs.
arXiv Detail & Related papers (2023-06-16T09:47:45Z)
Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence. Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z)
When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer [15.578267998149743]
We show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order. There is a strong correlation between transfer performance and word embedding alignment between languages. Our results call for focus in multilingual models on explicitly improving word embedding alignment between languages.
arXiv Detail & Related papers (2021-10-27T21:25:39Z)
Controlled Evaluation of Grammatical Knowledge in Mandarin Chinese Language Models [22.57309958548928]
We investigate whether structural supervision improves language models' ability to learn grammatical dependencies in typologically different languages. We train LSTMs, Recurrent Neural Network Grammars, Transformer language models, and generative parsing models on datasets of different sizes. We find suggestive evidence that structural supervision helps with representing syntactic state across intervening content and improves performance in low-data settings.
arXiv Detail & Related papers (2021-09-22T22:11:30Z)
Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis. We cluster all the target languages into multiple groups and name each group as a representation sprachbund. Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z)
Pragmatic information in translation: a corpus-based study of tense and mood in English and German [70.3497683558609]
Grammatical tense and mood are important linguistic phenomena to consider in natural language processing (NLP) research. We consider the correspondence between English and German tense and mood in translation. Of particular importance is the challenge of modeling tense and mood in rule-based, phrase-based statistical and neural machine translation.
arXiv Detail & Related papers (2020-07-10T08:15:59Z)
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source. We observe that our representations embed typology and strengthen correlations with language relationships. We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
Compositionality and Generalization in Emergent Languages [42.68870559695238]
We study whether the language emerging in deep multi-agent simulations possesses a similar ability to refer to novel primitive combinations. We find no correlation between the degree of compositionality of an emergent language and its ability to generalize. The more compositional a language is, the more easily it will be picked up by new learners.
arXiv Detail & Related papers (2020-04-20T08:30:14Z)
A Corpus of Adpositional Supersenses for Mandarin Chinese [15.757892250956715]
This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese. Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria. We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English.
arXiv Detail & Related papers (2020-03-18T18:59:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.