Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for
Inferring Online Health Texts
- URL: http://arxiv.org/abs/2309.09877v1
- Date: Mon, 18 Sep 2023 15:37:30 GMT
- Title: Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for
Inferring Online Health Texts
- Authors: Joseph Gatto, Sarah M. Preum
- Abstract summary: We employ Abstract Representation (AMR) graphs as a means to model low-resource Health NLP tasks.
AMRs are well suited to model online health texts as they represent multi-sentence inputs, abstract away from complex terminology, and model long-distance relationships.
Our experiments show that we can improve performance on 6 low-resource health NLP tasks by augmenting text embeddings with semantic graph embeddings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: User-generated texts available on the web and social platforms are often long
and semantically challenging, making them difficult to annotate. Obtaining
human annotation becomes increasingly difficult as problem domains become more
specialized. For example, many health NLP problems require domain experts to be
a part of the annotation pipeline. Thus, it is crucial that we develop
low-resource NLP solutions able to work with this set of limited-data problems.
In this study, we employ Abstract Meaning Representation (AMR) graphs as a
means to model low-resource Health NLP tasks sourced from various online health
resources and communities. AMRs are well suited to model online health texts as
they can represent multi-sentence inputs, abstract away from complex
terminology, and model long-distance relationships between co-referring tokens.
AMRs thus improve the ability of pre-trained language models to reason about
high-complexity texts. Our experiments show that we can improve performance on
6 low-resource health NLP tasks by augmenting text embeddings with semantic
graph embeddings. Our approach is task agnostic and easy to merge into any
standard text classification pipeline. We experimentally validate that AMRs are
useful in the modeling of complex texts by analyzing performance through the
lens of two textual complexity measures: the Flesch Kincaid Reading Level and
Syntactic Complexity. Our error analysis shows that AMR-infused language models
perform better on complex texts and generally show less predictive variance in
the presence of changing complexity.
Related papers
- Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Surveying the Landscape of Text Summarization with Deep Learning: A
Comprehensive Review [2.4185510826808487]
Deep learning has revolutionized natural language processing (NLP) by enabling the development of models that can learn complex representations of language data.
Deep learning models for NLP typically use large amounts of data to train deep neural networks, allowing them to learn the patterns and relationships in language data.
Applying deep learning to text summarization refers to the use of deep neural networks to perform text summarization tasks.
arXiv Detail & Related papers (2023-10-13T21:24:37Z) - Revisiting the Roles of "Text" in Text Games [102.22750109468652]
This paper investigates the roles of text in the face of different reinforcement learning challenges.
We propose a simple scheme to extract relevant contextual information into an approximate state hash.
Such a lightweight plug-in achieves competitive performance with state-of-the-art text agents.
arXiv Detail & Related papers (2022-10-15T21:52:39Z) - To Augment or Not to Augment? A Comparative Study on Text Augmentation
Techniques for Low-Resource NLP [0.0]
We investigate three categories of text augmentation methodologies which perform changes on the syntax.
We compare them on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families.
Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT.
arXiv Detail & Related papers (2021-11-18T10:52:48Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Modular Self-Supervision for Document-Level Relation Extraction [17.039775384229355]
We propose decomposing document-level relation extraction into relation detection and argument resolution.
We conduct a thorough evaluation in biomedical machine reading for precision oncology, where cross-paragraph relation mentions are prevalent.
Our method outperforms prior state of the art, such as multi-scale learning and graph neural networks, by over 20 absolute F1 points.
arXiv Detail & Related papers (2021-09-11T20:09:18Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Neural Data-to-Text Generation via Jointly Learning the Segmentation and
Correspondence [48.765579605145454]
We propose to explicitly segment target text into fragment units and align them with their data correspondences.
The resulting architecture maintains the same expressive power as neural attention models.
On both E2E and WebNLG benchmarks, we show the proposed model consistently outperforms its neural attention counterparts.
arXiv Detail & Related papers (2020-05-03T14:28:28Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.