Related papers: Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?

Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?

URL: http://arxiv.org/abs/2206.14969v1
Date: Thu, 30 Jun 2022 01:43:05 GMT
Title: Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?
Authors: Xiang Zhou, Shiyue Zhang, Mohit Bansal
Abstract summary: We propose a Masked Part-of-Speech Model (MPoSM) to facilitate flexible dependency modeling. MPoSM can model arbitrary tag dependency and perform POS induction through the objective of masked POS reconstruction. We achieve competitive results on both the English Penn WSJ dataset and the universal treebank containing 10 diverse languages.
Score: 94.68962249604749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Previous Part-Of-Speech (POS) induction models usually assume certain independence assumptions (e.g., Markov, unidirectional, local dependency) that do not hold in real languages. For example, the subject-verb agreement can be both long-term and bidirectional. To facilitate flexible dependency modeling, we propose a Masked Part-of-Speech Model (MPoSM), inspired by the recent success of Masked Language Models (MLM). MPoSM can model arbitrary tag dependency and perform POS induction through the objective of masked POS reconstruction. We achieve competitive results on both the English Penn WSJ dataset as well as the universal treebank containing 10 diverse languages. Though modeling the long-term dependency should ideally help this task, our ablation study shows mixed trends in different languages. To better understand this phenomenon, we design a novel synthetic experiment that can specifically diagnose the model's ability to learn tag agreement. Surprisingly, we find that even strong baselines fail to solve this problem consistently in a very simplified setting: the agreement between adjacent words. Nonetheless, MPoSM achieves overall better performance. Lastly, we conduct a detailed error analysis to shed light on other remaining challenges. Our code is available at https://github.com/owenzx/MPoSM

Related papers

Token Weighting for Long-Range Language Modeling [50.2371550397256]
We propose novel token-weighting schemes that assign different weights to each training token in the loss. We evaluate all methods on multiple long-context understanding tasks and show that non-uniform loss weights are helpful. This work contributes to a better understanding of the trade-offs long-context language modeling faces.
arXiv Detail & Related papers (2025-03-12T09:46:59Z)
Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages [3.716965622352967]
We propose new criteria to evaluate the quality of lexical representation and vocabulary overlap observed in sub-word tokenizers. Our findings show that the overlap of vocabulary across languages can be actually detrimental to certain downstream tasks.
arXiv Detail & Related papers (2023-05-26T18:06:49Z)
Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. We refer to them as Augmented Language Models (ALMs) The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z)
Deanthropomorphising NLP: Can a Language Model Be Conscious? [7.41244589428771]
We take the position that such a large language model cannot be sentient, or conscious, and that LaMDA in particular exhibits no advances over other similar models that would qualify it. We see the claims of sentience as part of a wider tendency to use anthropomorphic language in NLP reporting.
arXiv Detail & Related papers (2022-11-21T14:18:25Z)
Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks [39.39138995087475]
We ask how much of human-like thinking can be captured by learning statistical patterns in language alone. Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization. We find that humans are far more robust than LLMs on this benchmark.
arXiv Detail & Related papers (2022-05-11T18:14:33Z)
Contrastive Learning for Prompt-Based Few-Shot Language Learners [14.244787327283335]
We present a contrastive learning framework that clusters inputs from the same class under different augmented "views" We create different "views" of an example by appending it with different language prompts and contextual demonstrations. Our method can improve over the state-of-the-art methods in a diverse set of 15 language tasks.
arXiv Detail & Related papers (2022-05-03T04:56:45Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition. How to effectively model linguistic rules in end-to-end deep networks remains a research challenge. We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle [88.65264818967489]
We propose a new syntax-aware language model: Syntactic Ordered Memory (SOM) The model explicitly models the structure with an incremental and maintains the conditional probability setting of a standard language model. Experiments show that SOM can achieve strong results in language modeling, incremental parsing and syntactic generalization tests.
arXiv Detail & Related papers (2020-10-21T17:39:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.