Proposition from the Perspective of Chinese Language: A Chinese
Proposition Classification Evaluation Benchmark
- URL: http://arxiv.org/abs/2309.09602v1
- Date: Mon, 18 Sep 2023 09:18:39 GMT
- Title: Proposition from the Perspective of Chinese Language: A Chinese
Proposition Classification Evaluation Benchmark
- Authors: Conghui Niu, Mengyang Hu, Lin Bo, Xiaoli He, Dong Yu, Pengyuan Liu
- Abstract summary: We propose a comprehensive multi-level proposition classification system based on linguistics and logic.
We create a large-scale Chinese proposition dataset PEACE from multiple domains.
Results show the importance of properly modeling the semantic features of propositions.
- Score: 21.91454409571424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing propositions often rely on logical constants for classification.
Compared with Western languages that lean towards hypotaxis such as English,
Chinese often relies on semantic or logical understanding rather than logical
connectives in daily expressions, exhibiting the characteristics of parataxis.
However, existing research has rarely paid attention to this issue. And
accurately classifying these propositions is crucial for natural language
understanding and reasoning. In this paper, we put forward the concepts of
explicit and implicit propositions and propose a comprehensive multi-level
proposition classification system based on linguistics and logic.
Correspondingly, we create a large-scale Chinese proposition dataset PEACE from
multiple domains, covering all categories related to propositions. To evaluate
the Chinese proposition classification ability of existing models and explore
their limitations, We conduct evaluations on PEACE using several different
methods including the Rule-based method, SVM, BERT, RoBERTA, and ChatGPT.
Results show the importance of properly modeling the semantic features of
propositions. BERT has relatively good proposition classification capability,
but lacks cross-domain transferability. ChatGPT performs poorly, but its
classification ability can be improved by providing more proposition
information. Many issues are still far from being resolved and require further
study.
Related papers
- Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Do We Need Language-Specific Fact-Checking Models? The Case of Chinese [15.619421104102516]
This paper investigates the potential benefits of language-specific fact-checking models, focusing on the case of Chinese.
We first demonstrate the limitations of translation-based methods and multilingual large language models, highlighting the need for language-specific systems.
We propose a Chinese fact-checking system that can better retrieve evidence from a document by incorporating context information.
arXiv Detail & Related papers (2024-01-27T20:26:03Z) - Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification [4.498100922387482]
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient.
Previous results demonstrated that these methods can even improve performance on some classification tasks.
This paper investigates how these techniques influence the classification performance and computation costs compared to full fine-tuning.
arXiv Detail & Related papers (2023-08-14T17:12:43Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - How to Agree to Disagree: Managing Ontological Perspectives using
Standpoint Logic [2.9005223064604073]
Standpoint Logic is a simple, yet versatile multi-modal logic add-on'' for existing KR languages.
We provide a polytime translation into the standpoint-free version of First-Order Standpoint Logic.
We then establish a similar translation for the very expressive description logic SROIQb_s underlying the OWL 2 DL language.
arXiv Detail & Related papers (2022-06-14T12:29:08Z) - A Multi-level Supervised Contrastive Learning Framework for Low-Resource
Natural Language Inference [54.678516076366506]
Natural Language Inference (NLI) is a growingly essential task in natural language understanding.
Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference.
arXiv Detail & Related papers (2022-05-31T05:54:18Z) - Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT [7.057643880514415]
We investigate how Multilingual BERT (mBERT) encodes grammar by examining how the high-order grammatical feature of morphosyntactic alignment is manifested across the embedding spaces of different languages.
arXiv Detail & Related papers (2021-01-26T19:21:59Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.