Synonym Knowledge Enhanced Reader for Chinese Idiom Reading
Comprehension
- URL: http://arxiv.org/abs/2011.04499v1
- Date: Mon, 9 Nov 2020 15:28:53 GMT
- Title: Synonym Knowledge Enhanced Reader for Chinese Idiom Reading
Comprehension
- Authors: Siyu Long and Ran Wang and Kun Tao and Jiali Zeng and Xin-Yu Dai
- Abstract summary: Machine reading comprehension (MRC) is the task that asks a machine to answer questions based on a given context.
We first define the concept of literal meaning coverage to measure the consistency between semantics and literal meanings for Chinese idioms.
To fully utilize the synonymic relationship, we propose the synonym knowledge enhanced reader.
Experimental results on ChID, a large-scale Chinese idiom reading comprehension dataset, show that our model achieves state-of-the-art performance.
- Score: 22.25730077173127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine reading comprehension (MRC) is the task that asks a machine to answer
questions based on a given context. For Chinese MRC, due to the non-literal and
non-compositional semantic characteristics, Chinese idioms pose unique
challenges for machines to understand. Previous studies tend to treat idioms
separately without fully exploiting the relationship among them. In this paper,
we first define the concept of literal meaning coverage to measure the
consistency between semantics and literal meanings for Chinese idioms. With the
definition, we prove that the literal meanings of many idioms are far from
their semantics, and we also verify that the synonymic relationship can
mitigate this inconsistency, which would be beneficial for idiom comprehension.
Furthermore, to fully utilize the synonymic relationship, we propose the
synonym knowledge enhanced reader. Specifically, for each idiom, we first
construct a synonym graph according to the annotations from a high-quality
synonym dictionary or the cosine similarity between the pre-trained idiom
embeddings and then incorporate the graph attention network and gate mechanism
to encode the graph. Experimental results on ChID, a large-scale Chinese idiom
reading comprehension dataset, show that our model achieves state-of-the-art
performance.
Related papers
- SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation [55.26111461168754]
We introduce SMILE: Semantic Metric Integrating Lexical Exactness, a novel approach that combines sentence-level semantic understanding with keyword-level semantic understanding and easy keyword matching.<n>It is highly correlated with human judgments and computationally lightweight, bridging the gap between lexical and semantic evaluation.
arXiv Detail & Related papers (2025-11-21T17:30:18Z) - Chengyu-Bench: Benchmarking Large Language Models for Chinese Idiom Understanding and Use [1.5129424416840094]
Chengyu-Bench comprises 2,937 human-verified examples covering 1,765 common idioms sourced from diverse corpora.<n>We evaluate leading LLMs and find they achieve over 95% accuracy on Evaluative Connotation, but only 85% on Appropriateness and 40% top-1 accuracy on Open Cloze.<n>Chengyu-Bench demonstrates that while LLMs can reliably gauge idiom sentiment, they still struggle to grasp the cultural and contextual nuances essential for proper usage.
arXiv Detail & Related papers (2025-06-22T17:26:09Z) - That was the last straw, we need more: Are Translation Systems Sensitive
to Disambiguating Context? [64.38544995251642]
We study semantic ambiguities that exist in the source (English in this work) itself.
We focus on idioms that are open to both literal and figurative interpretations.
We find that current MT models consistently translate English idioms literally, even when the context suggests a figurative interpretation.
arXiv Detail & Related papers (2023-10-23T06:38:49Z) - Discourse Representation Structure Parsing for Chinese [8.846860617823005]
We explore the feasibility of Chinese semantic parsing in the absence of labeled data for Chinese meaning representations.
We propose a test suite designed explicitly for Chinese semantic parsing, which provides fine-grained evaluation for parsing performance.
Our experimental results show that the difficulty of Chinese semantic parsing is mainly caused by adverbs.
arXiv Detail & Related papers (2023-06-16T09:47:45Z) - Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language
Pre-training [50.100992353488174]
We introduce CDBERT, a new learning paradigm that enhances the semantics understanding ability of the Chinese PLMs with dictionary knowledge and structure of Chinese characters.
We name the two core modules of CDBERT as Shuowen and Jiezi, where Shuowen refers to the process of retrieving the most appropriate meaning from Chinese dictionaries.
Our paradigm demonstrates consistent improvements on previous Chinese PLMs across all tasks.
arXiv Detail & Related papers (2023-05-30T05:48:36Z) - Comprehending and Ordering Semantics for Image Captioning [124.48670699658649]
We propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net)
COS-Net unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture.
arXiv Detail & Related papers (2022-06-14T15:51:14Z) - Can Transformer be Too Compositional? Analysing Idiom Processing in
Neural Machine Translation [55.52888815590317]
Unlike literal expressions, idioms' meanings do not directly follow from their parts.
NMT models are often unable to translate idioms accurately and over-generate compositional, literal translations.
We investigate whether the non-compositionality of idioms is reflected in the mechanics of the dominant NMT model, Transformer.
arXiv Detail & Related papers (2022-05-30T17:59:32Z) - Chinese Idiom Paraphrasing [33.585450600066395]
Chinese idioms are hard to be understood by children and non-native speakers.
This study proposes a novel task, denoted as Chinese Paraphrasing (CIP)
CIP aims to rephrase idioms- sentences to non-idiomatic ones under the premise of preserving the original sentence's meaning.
arXiv Detail & Related papers (2022-04-15T17:24:25Z) - An In-depth Study on Internal Structure of Chinese Words [34.864343591706984]
This work proposes to model the deep internal structures of Chinese words as dependency trees with 11 labels for distinguishing syntactic relationships.
We manually annotate a word-internal structure treebank (WIST) consisting of over 30K multi-char words from Chinese Penn Treebank.
We present detailed and interesting analysis on WIST to reveal insights on Chinese word formation.
arXiv Detail & Related papers (2021-06-01T09:09:51Z) - Do Context-Aware Translation Models Pay the Right Attention? [61.25804242929533]
Context-aware machine translation models are designed to leverage contextual information, but often fail to do so.
In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words?
We introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations.
Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words.
arXiv Detail & Related papers (2021-05-14T17:32:24Z) - A BERT-based Dual Embedding Model for Chinese Idiom Prediction [8.903106634925853]
Chinese idiom prediction task is to select the correct idiom from a set of candidate idioms given a context with a blank.
We propose a BERT-based dual embedding model to encode the contextual words as well as to learn dual embeddings of the idioms.
arXiv Detail & Related papers (2020-11-04T16:12:39Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.