Text Guide: Improving the quality of long text classification by a text
selection method based on feature importance
- URL: http://arxiv.org/abs/2104.07225v1
- Date: Thu, 15 Apr 2021 04:10:08 GMT
- Title: Text Guide: Improving the quality of long text classification by a text
selection method based on feature importance
- Authors: Krzysztof Fiok (1), Waldemar Karwowski (1), Edgar Gutierrez (1)(2),
Mohammad Reza Davahli (1), Maciej Wilamowski (3), Tareq Ahram (1), Awad
Al-Juaid (4), and Jozef Zurada (5) ((1) Department of Industrial Engineering
and Management Systems, University of Central Florida, USA, (2) Center for
Latin-American Logistics Innovation, LOGyCA, Bogota, Colombia (3) Faculty of
Economic Sciences, University of Warsaw, Warsaw, Poland (4) Department of
Industrial Engineering, College of Engineering, Taif University, Saudi Arabia
(5) Business School, University of Louisville, USA)
- Abstract summary: We propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit.
We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of text classification methods has improved greatly over the
last decade for text instances of less than 512 tokens. This limit has been
adopted by most state-of-the-research transformer models due to the high
computational cost of analyzing longer text instances. To mitigate this problem
and to improve classification for longer texts, researchers have sought to
resolve the underlying causes of the computational cost and have proposed
optimizations for the attention mechanism, which is the key element of every
transformer model. In our study, we are not pursuing the ultimate goal of long
text classification, i.e., the ability to analyze entire text instances at one
time while preserving high performance at a reasonable computational cost.
Instead, we propose a text truncation method called Text Guide, in which the
original text length is reduced to a predefined limit in a manner that improves
performance over naive and semi-naive approaches while preserving low
computational costs. Text Guide benefits from the concept of feature
importance, a notion from the explainable artificial intelligence domain. We
demonstrate that Text Guide can be used to improve the performance of recent
language models specifically designed for long text classification, such as
Longformer. Moreover, we discovered that parameter optimization is the key to
Text Guide performance and must be conducted before the method is deployed.
Future experiments may reveal additional benefits provided by this new method.
Related papers
- Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace [52.24866347353916]
We propose an efficient method to explore the target embedding in a textual subspace.
We also propose an efficient selection strategy for determining the basis of the textual subspace.
Our method opens the door to more efficient representation learning for personalized text-to-image generation.
arXiv Detail & Related papers (2024-06-30T06:41:21Z) - Key Information Retrieval to Classify the Unstructured Data Content of
Preferential Trade Agreements [17.14791553124506]
We introduce a novel approach to long-text classification and prediction.
We employ embedding techniques to condense the long texts, aiming to diminish the redundancy therein.
Experimental outcomes indicate that our method realizes considerable performance enhancements in classifying long texts of Preferential Trade Agreements.
arXiv Detail & Related papers (2024-01-23T06:30:05Z) - Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection.
Our approach achieves better generation quality according to both automatic and human evaluations.
Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Optimizing Readability Using Genetic Algorithms [0.0]
This research presents ORUGA, a method that tries to automatically optimize the readability of any text in English.
The core idea behind the method is that certain factors affect the readability of a text, some of which are quantifiable.
In addition, this research seeks to preserve both the original text's content and form through multi-objective optimization techniques.
arXiv Detail & Related papers (2023-01-01T09:08:45Z) - Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts.
The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - Discourse-Aware Prompt Design for Text Generation [13.835916386769474]
We show that prompt based conditional text generation can be improved with simple and efficient methods.
First, we show that a higher-level discourse structure of human written text can be modelled with textithierarchical blocking on prefix parameters.
Second, we propose sparse prefix tuning by introducing textitattention sparsity on the prefix parameters at different layers of the network and learn sparse transformations on the softmax-function.
arXiv Detail & Related papers (2021-12-10T18:15:44Z) - Text Counterfactuals via Latent Optimization and Shapley-Guided Search [15.919650185010491]
We study the problem of generating counterfactual text for a classification model.
We aim to minimally alter the text to change the model's prediction.
White-box approaches have been successfully applied to similar problems in vision.
arXiv Detail & Related papers (2021-10-22T05:04:40Z) - Data Augmentation in Natural Language Processing: A Novel Text
Generation Approach for Long and Short Text Classifiers [8.19984844136462]
We present and evaluate a text generation method suitable to increase the performance of classifiers for long and short texts.
In a simulated low data regime additive accuracy gains of up to 15.53% are achieved.
We discuss implications and patterns for the successful application of our approach on different types of datasets.
arXiv Detail & Related papers (2021-03-26T13:16:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.