Related papers: Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields

Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields

URL: http://arxiv.org/abs/2010.03224v1
Date: Wed, 7 Oct 2020 07:06:09 GMT
Title: Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields
Authors: Jingxuan Yang, Kerui Xu, Jun Xu, Si Li, Sheng Gao, Jun Guo, Ji-Rong Wen, Nianwen Xue
Abstract summary: We present a novel framework that combines the strength of Transformer network with General Conditional Random Fields (GCRF) to model the dependencies between pronouns in neighboring utterances. Results on three Chinese conversation datasets show that the Transformer-GCRF model outperforms the state-of-the-art dropped pronoun recovery models.
Score: 54.03719496661691
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pronouns are often dropped in Chinese conversations and recovering the dropped pronouns is important for NLP applications such as Machine Translation. Existing approaches usually formulate this as a sequence labeling task of predicting whether there is a dropped pronoun before each token and its type. Each utterance is considered to be a sequence and labeled independently. Although these approaches have shown promise, labeling each utterance independently ignores the dependencies between pronouns in neighboring utterances. Modeling these dependencies is critical to improving the performance of dropped pronoun recovery. In this paper, we present a novel framework that combines the strength of Transformer network with General Conditional Random Fields (GCRF) to model the dependencies between pronouns in neighboring utterances. Results on three Chinese conversation datasets show that the Transformer-GCRF model outperforms the state-of-the-art dropped pronoun recovery models. Exploratory analysis also demonstrates that the GCRF did help to capture the dependencies between pronouns in neighboring utterances, thus contributes to the performance improvements.

Related papers

A Bayesian account of pronoun and neopronoun acquisition [10.775624456460063]
We argue for explicitly modeling individual differences in pronoun selection. We present a probabilistic graphical modeling approach based on the nested Chinese Restaurant Franchise Process. We show that such a model can account for variability in how quickly pronouns or names are integrated into symbolic knowledge.
arXiv Detail & Related papers (2025-04-03T18:49:08Z)
Mention Attention for Pronoun Translation [5.896961355859321]
We introduce an additional mention attention module in the decoder to pay extra attention to source mentions but not non-mention tokens. Our mention attention module not only extracts features from source mentions, but also considers target-side context which benefits pronoun translation. We conduct experiments on the WMT17 English-German translation task, and evaluate our models on general translation and pronoun translation.
arXiv Detail & Related papers (2024-12-19T13:19:19Z)
Mitigating Bias in Queer Representation within Large Language Models: A Collaborative Agent Approach [0.0]
Large Language Models (LLMs) often perpetuate biases in pronoun usage, leading to misrepresentation or exclusion of queer individuals. This paper addresses the specific problem of biased pronoun usage in LLM outputs, particularly the inappropriate use of traditionally gendered pronouns. We introduce a collaborative agent pipeline designed to mitigate these biases by analyzing and optimizing pronoun usage for inclusivity.
arXiv Detail & Related papers (2024-11-12T09:14:16Z)
Constructing Cloze Questions Generatively [2.2719421441459406]
We present a generative method for constructing cloze questions from an article using neural networks and WordNet. CQG selects an answer key for a given sentence, segments it into a sequence of instances, generates instance-level distractor candidates (IDCs) using a transformer and sibling synsets. It then removes inappropriate IDCs, ranks the remaining IDCs based on contextual embedding similarities, as well as synset and lexical relatedness, forms distractor candidates by replacing instances with the corresponding top-ranked IDCs, and checks if they are legitimate phrases.
arXiv Detail & Related papers (2024-10-05T18:55:38Z)
Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased? [26.583741801345507]
We present a dataset of over 5 million instances to measure pronoun fidelity in English. Our results show that pronoun fidelity is not robust, in a simple, naturalistic setting where humans achieve nearly 100% accuracy.
arXiv Detail & Related papers (2024-04-04T01:07:14Z)
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention. Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z)
Causal interventions expose implicit situation models for commonsense language understanding [3.290878132806227]
We analyze performance on the Winograd Challenge, where a single context cue shifts interpretation of an ambiguous pronoun. We identify a circuit of attention heads that are responsible for propagating information from the context word. These analyses suggest distinct pathways through which implicit situation models are constructed to guide pronoun resolution.
arXiv Detail & Related papers (2023-06-06T17:36:43Z)
Mapping of attention mechanisms to a generalized Potts model [50.91742043564049]
We show that training a neural network is exactly equivalent to solving the inverse Potts problem by the so-called pseudo-likelihood method. We also compute the generalization error of self-attention in a model scenario analytically using the replica method.
arXiv Detail & Related papers (2023-04-14T16:32:56Z)
CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results. We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER. We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z)
Joint Entity and Relation Canonicalization in Open Knowledge Graphs using Variational Autoencoders [11.259587284318835]
Noun phrases and relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to face this problem take a two-step approach: first, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational AutoEncoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach.
arXiv Detail & Related papers (2020-12-08T22:58:30Z)
A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution [55.39835612617972]
Pronoun Coreference Resolution (PCR) is the task of resolving pronominal expressions to all mentions they refer to. As one important natural language understanding (NLU) component, pronoun resolution is crucial for many downstream tasks and still challenging for existing models. We conduct extensive experiments to show that even though current models are achieving good performance on the standard evaluation set, they are still not ready to be used in real applications.
arXiv Detail & Related papers (2020-09-27T01:40:01Z)
Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations. In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.