ChiEngMixBench: Evaluating Large Language Models on Spontaneous and Natural Chinese-English Code-Mixed Generation
- URL: http://arxiv.org/abs/2601.16217v1
- Date: Fri, 02 Jan 2026 08:18:27 GMT
- Title: ChiEngMixBench: Evaluating Large Language Models on Spontaneous and Natural Chinese-English Code-Mixed Generation
- Authors: Qingyan Yang, Tongxi Wang, Yunsheng Luo,
- Abstract summary: We introduce ChiEngMixBench, the first benchmark designed to evaluate code-mixing ability in authentic community contexts.<n>ChiEngMixBench formulates code-mixing as a cognitive alignment problem, characterized by two complementary signals: Spontaneity and Naturalness.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code-mixing is increasingly prevalent in interactions between humans and large language models, yet existing work often reduces it to a translation or convertibility problem, making it difficult to assess whether a model's switching behavior is context-appropriate and aligned with human conventions. We introduce ChiEngMixBench, the first benchmark designed to evaluate code-mixing ability in authentic community contexts, built upon a general construction pipeline that enables scalable dataset development across domains and bilingual pairs. ChiEngMixBench formulates code-mixing as a cognitive alignment problem, characterized by two complementary signals: Spontaneity and Naturalness. Empirical evaluation shows that our metrics can systematically distinguish code-mixing performance across models. Beyond benchmarking, we further uncover an implicitly emergent Terminology Layering Strategy, a phenomenon consistent with the Matrix Language Frame (MLF) theory, indicating structured cognitive alignment between multilingual large language models and human communication.
Related papers
- UniCoR: Modality Collaboration for Robust Cross-Language Hybrid Code Retrieval [21.36577755250534]
We propose UniCoR, a self-supervised framework that learns Unified Code Representations framework to learn unified and robust code representations.<n>We show that UniCoR outperforms all baseline models, achieving an average improvement of 8.64% in MRR and 11.54% in MAP over the best-performing baseline.<n>UniCoR exhibits stability in hybrid code retrieval and generalization capability in cross-language scenarios.
arXiv Detail & Related papers (2025-12-11T09:15:38Z) - On the Entity-Level Alignment in Crosslingual Consistency [62.33186691736433]
SubSub and SubInj integrate English translations of subjects into prompts across languages, leading to substantial gains in factual recall accuracy and consistency.<n>These interventions reinforce the entity representation alignment in the conceptual space through model's internal pivot-language processing.
arXiv Detail & Related papers (2025-10-11T16:26:50Z) - Interference Matrix: Quantifying Cross-Lingual Interference in Transformer Encoders [55.749883010057545]
We construct an interference matrix by training and evaluating small BERT-like models on all possible language pairs.<n>Our analysis reveals that interference between languages is asymmetrical and that its patterns do not align with traditional linguistic characteristics.
arXiv Detail & Related papers (2025-08-04T10:02:19Z) - Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models [56.61984030508691]
We present the first mechanistic interpretability study of language confusion.<n>We show that confusion points (CPs) are central to this phenomenon.<n>We show that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion.
arXiv Detail & Related papers (2025-05-22T11:29:17Z) - Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training [58.696660064190475]
We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities.<n>To better explore the power of code-switching for language alignment during pre-training, we investigate the strategy of synthetic code-switching.
arXiv Detail & Related papers (2025-04-02T15:09:58Z) - FUSE : A Ridge and Random Forest-Based Metric for Evaluating MT in Indigenous Languages [3.5507492850515323]
This paper presents the winning submission of the RaaVa team to the Americas 2025 Shared Task 3 on Automatic Evaluation Metrics for Machine Translation.<n>We introduce Feature-Union Scorer (FUSE) for Evaluation, FUSE integrates Ridge regression and Gradient Boosting to model translation quality.<n>Results show that FUSE consistently achieves higher Pearson and Spearman correlations with human judgments.
arXiv Detail & Related papers (2025-03-28T06:58:55Z) - Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models [3.63819860423174]
Structural priming is a cognitive phenomenon where exposure to a particular syntactic structure increases the likelihood of producing the same structure in subsequent utterances.<n>We introduce PRISMATIC, the first multimodal structural priming dataset.<n>We propose the Syntactic Preservation Index (SPI), a novel reference-free evaluation metric designed specifically to assess structural priming effects in sentence level.
arXiv Detail & Related papers (2025-02-24T21:33:27Z) - Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models [16.82812708514889]
Code-switching, the phenomenon of alternating between two or more languages in a single conversation, presents unique challenges for Natural Language Processing (NLP)
Most existing research focuses on either syntactic constraints or neural generation, with few efforts to integrate linguistic theory with large language models (LLMs) for generating natural code-switched text.
We introduce EZSwitch, a novel framework that combines Equivalence Constraint Theory (ECT) with LLMs to produce linguistically valid and fluent code-switched text.
arXiv Detail & Related papers (2024-10-30T03:03:32Z) - Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.<n>We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating
Chinese and English Computational Language Models [44.74364661212373]
This paper proposes MulCogBench, a cognitive benchmark dataset collected from native Chinese and English participants.
It encompasses a variety of cognitive data, including subjective semantic ratings, eye-tracking, functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG)
Results show that language models share significant similarities with human cognitive data and the similarity patterns are modulated by the data modality and stimuli complexity.
arXiv Detail & Related papers (2024-03-02T07:49:57Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.