Related papers: Selective Text Augmentation with Word Roles for Low-Resource Text Classification

Selective Text Augmentation with Word Roles for Low-Resource Text Classification

URL: http://arxiv.org/abs/2209.01560v1
Date: Sun, 4 Sep 2022 08:13:11 GMT
Title: Selective Text Augmentation with Word Roles for Low-Resource Text Classification
Authors: Biyang Guo, Songqiao Han, Hailiang Huang
Abstract summary: Different words may play different roles in text classification, which inspires us to strategically select the proper roles for text augmentation. In this work, we first identify the relationships between the words in a text and the text category from the perspectives of statistical correlation and semantic similarity. We present a new augmentation technique called STA (Selective Text Augmentation) where different text-editing operations are selectively applied to words with specific roles.
Score: 3.4806267677524896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data augmentation techniques are widely used in text classification tasks to improve the performance of classifiers, especially in low-resource scenarios. Most previous methods conduct text augmentation without considering the different functionalities of the words in the text, which may generate unsatisfactory samples. Different words may play different roles in text classification, which inspires us to strategically select the proper roles for text augmentation. In this work, we first identify the relationships between the words in a text and the text category from the perspectives of statistical correlation and semantic similarity and then utilize them to divide the words into four roles -- Gold, Venture, Bonus, and Trivial words, which have different functionalities for text classification. Based on these word roles, we present a new augmentation technique called STA (Selective Text Augmentation) where different text-editing operations are selectively applied to words with specific roles. STA can generate diverse and relatively clean samples, while preserving the original core semantics, and is also quite simple to implement. Extensive experiments on 5 benchmark low-resource text classification datasets illustrate that augmented samples produced by STA successfully boost the performance of classification models which significantly outperforms previous non-selective methods, including two large language model-based techniques. Cross-dataset experiments further indicate that STA can help the classifiers generalize better to other datasets than previous methods.

Related papers

Label-template based Few-Shot Text Classification with Contrastive Learning [7.964862748983985]
We propose a simple and effective few-shot text classification framework. Label templates are embedded into input sentences to fully utilize the potential value of class labels. supervised contrastive learning is utilized to model the interaction information between support samples and query samples.
arXiv Detail & Related papers (2024-12-13T12:51:50Z)
Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings. An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts) This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z)
RankAug: Augmented data ranking for text classification [0.0]
RankAug is a text-ranking approach that detects and filters out the top augmented texts. We demonstrate that the judicious selection of filtering techniques can yield a substantial improvement of up to 35% in classification accuracy for under-represented classes.
arXiv Detail & Related papers (2023-11-08T08:47:49Z)
Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection. Our approach achieves better generation quality according to both automatic and human evaluations. Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z)
Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier. Our method is model-agnostic and can be easily applied to generic segmentation models. With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z)
Many-Class Text Classification with Matching [65.74328417321738]
We formulate textbfText textbfClassification as a textbfMatching problem between the text and the labels, and propose a simple yet effective framework named TCM. Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels.
arXiv Detail & Related papers (2022-05-23T15:51:19Z)
To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP [0.0]
We investigate three categories of text augmentation methodologies which perform changes on the syntax. We compare them on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families. Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT.
arXiv Detail & Related papers (2021-11-18T10:52:48Z)
Hierarchical Heterogeneous Graph Representation Learning for Short Text Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification. First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs. Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z)
What Have Been Learned & What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification [0.0]
We propose STA (Selective Text Augmentation) to selectively augment the text, where the informative, class-indicating words are emphasized but the irrelevant or noisy words are diminished. Experiments on four English and Chinese text classification benchmark datasets demonstrate that STA can substantially outperform the non-selective text augmentation methods.
arXiv Detail & Related papers (2021-09-01T04:03:11Z)
TF-CR: Weighting Embeddings for Text Classification [6.531659195805749]
We introduce a novel weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings. Experiments on 16 classification datasets show the effectiveness of TF-CR, leading to improved performance scores over existing weighting schemes.
arXiv Detail & Related papers (2020-12-11T19:23:28Z)
Be More with Less: Hypergraph Attention Networks for Inductive Text Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task. Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words. We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z)
Exploiting Class Labels to Boost Performance on Embedding-based Text Classification [16.39344929765961]
embeddings of different kinds have recently become the de facto standard as features used for text classification. We introduce a weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings.
arXiv Detail & Related papers (2020-06-03T08:53:40Z)
Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.