Related papers: Punctuation Restoration Improves Structure Understanding Without Supervision

Punctuation Restoration Improves Structure Understanding Without Supervision

URL: http://arxiv.org/abs/2402.08382v4
Date: Sun, 30 Mar 2025 20:35:33 GMT
Title: Punctuation Restoration Improves Structure Understanding Without Supervision
Authors: Junghyun Min, Minho Lee, Woochul Lee, Yeonsoo Lee,
Abstract summary: We show that punctuation restoration as a learning objective improves performance on structure-related tasks.<n>Our results show that punctuation restoration is an effective learning objective that can improve structure understanding.
Score: 5.925894224649895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unsupervised learning objectives like autoregressive and masked language modeling constitute a significant part in producing pre-trained representations that perform various downstream applications from natural language understanding to conversational tasks. However, despite impressive generative capabilities of recent large language models, their abilities to capture syntactic or semantic structure within text lag behind. We hypothesize that the mismatch between linguistic performance and competence in machines is attributable to insufficient learning of linguistic structure knowledge via currently popular pre-training objectives. Working with English, we show that punctuation restoration as a learning objective improves performance on structure-related tasks like named entity recognition, open information extraction, chunking, and part-of-speech tagging. Punctuation restoration results in $\blacktriangle$$\geq2\%$p improvement in 16 out of 18 experiments, across 6 out of 7 tasks. Our results show that punctuation restoration is an effective learning objective that can improve structure understanding and yield a more robust structure-aware representations of natural language in base-sized models.

Related papers

Knowledge Graph-Infused Fine-Tuning for Structured Reasoning in Large Language Models [41.59092188743925]
It proposes a fine-tuning algorithm framework based on knowledge graph injection.<n>It builds on pretrained language models and introduces structured graph information for auxiliary learning.<n>It demonstrates better semantic consistency and contextual logic modeling in scenarios involving structural reasoning and entity extraction.
arXiv Detail & Related papers (2025-08-20T04:52:12Z)
Annotating FrameNet via Structure-Conditioned Language Generation [15.877232416259805]
We propose a framework to produce novel frame-semantically annotated sentences following an overgenerate-and-filter approach. Our results show that conditioning on rich, explicit semantic information tends to produce generations with high human acceptance.
arXiv Detail & Related papers (2024-06-07T11:01:15Z)
Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks. We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking. We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z)
Emergent Linguistic Structures in Neural Networks are Fragile [20.692540987792732]
Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks. We propose a framework to assess the consistency and robustness of linguistic representations.
arXiv Detail & Related papers (2022-10-31T15:43:57Z)
An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks [33.765874588342285]
Infusing language models with syntactic or semantic knowledge from structural linguistic priors has shown improvements on many language understanding tasks. We conduct empirical study of replacing parsed graphs or trees with trivial ones for tasks in the GLUE benchmark. It reveals that the gains might not be significantly attributed to explicit linguistic priors but rather to more feature interactions brought by fusion layers.
arXiv Detail & Related papers (2022-10-24T07:47:32Z)
Sentence Representation Learning with Generative Objective rather than Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction. Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z)
DeepStruct: Pretraining of Language Models for Structure Prediction [64.84144849119554]
We pretrain language models on a collection of task-agnostic corpora to generate structures from text. Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks. We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets.
arXiv Detail & Related papers (2022-05-21T00:58:22Z)
Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics [14.863872352905629]
This paper focuses on robotic reinforcement learning with sparse rewards for natural language goal representations. We first present a mechanism for hindsight instruction replay utilizing expert feedback. Second, we propose a seq2seq model to generate linguistic hindsight instructions.
arXiv Detail & Related papers (2022-04-08T22:01:36Z)
Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features. To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives. Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z)
ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text. Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z)
Retrofitting Structure-aware Transformer Language Model for End Tasks [34.74181162627023]
We consider retrofitting structure-aware Transformer language model for facilitating end tasks. Middle-layer structural learning strategy is leveraged for structure integration. Experimental results show that the retrofitted structure-aware Transformer language model achieves improved perplexity.
arXiv Detail & Related papers (2020-09-16T01:07:07Z)
Semantics-Aware Inferential Network for Natural Language Understanding [79.70497178043368]
We propose a Semantics-Aware Inferential Network (SAIN) to meet such a motivation. Taking explicit contextualized semantics as a complementary input, the inferential module of SAIN enables a series of reasoning steps over semantic clues. Our model achieves significant improvement on 11 tasks including machine reading comprehension and natural language inference.
arXiv Detail & Related papers (2020-04-28T07:24:43Z)
Probing Linguistic Features of Sentence-Level Representations in Neural Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE) We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets. We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.