Related papers: Text Guide: Improving the quality of long text classification by a text selection method based on feature importance

Text Guide: Improving the quality of long text classification by a text selection method based on feature importance

URL: http://arxiv.org/abs/2104.07225v1
Date: Thu, 15 Apr 2021 04:10:08 GMT
Title: Text Guide: Improving the quality of long text classification by a text selection method based on feature importance
Authors: Krzysztof Fiok (1), Waldemar Karwowski (1), Edgar Gutierrez (1)(2), Mohammad Reza Davahli (1), Maciej Wilamowski (3), Tareq Ahram (1), Awad Al-Juaid (4), and Jozef Zurada (5) ((1) Department of Industrial Engineering and Management Systems, University of Central Florida, USA, (2) Center for Latin-American Logistics Innovation, LOGyCA, Bogota, Colombia (3) Faculty of Economic Sciences, University of Warsaw, Warsaw, Poland (4) Department of Industrial Engineering, College of Engineering, Taif University, Saudi Arabia (5) Business School, University of Louisville, USA)
Abstract summary: We propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.

Related papers

Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace [52.24866347353916]
We propose an efficient method to explore the target embedding in a textual subspace. We also propose an efficient selection strategy for determining the basis of the textual subspace. Our method opens the door to more efficient representation learning for personalized text-to-image generation.
arXiv Detail & Related papers (2024-06-30T06:41:21Z)
Key Information Retrieval to Classify the Unstructured Data Content of Preferential Trade Agreements [17.14791553124506]
We introduce a novel approach to long-text classification and prediction. We employ embedding techniques to condense the long texts, aiming to diminish the redundancy therein. Experimental outcomes indicate that our method realizes considerable performance enhancements in classifying long texts of Preferential Trade Agreements.
arXiv Detail & Related papers (2024-01-23T06:30:05Z)
Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection. Our approach achieves better generation quality according to both automatic and human evaluations. Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z)
LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation. By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation. We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Optimizing Readability Using Genetic Algorithms [0.0]
This research presents ORUGA, a method that tries to automatically optimize the readability of any text in English. The core idea behind the method is that certain factors affect the readability of a text, some of which are quantifiable. In addition, this research seeks to preserve both the original text's content and form through multi-objective optimization techniques.
arXiv Detail & Related papers (2023-01-01T09:08:45Z)
Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts. The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z)
Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems. We present an iterative in-place editing approach for text revision, which requires no parallel data. It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z)
Discourse-Aware Prompt Design for Text Generation [13.835916386769474]
We show that prompt based conditional text generation can be improved with simple and efficient methods. First, we show that a higher-level discourse structure of human written text can be modelled with textithierarchical blocking on prefix parameters. Second, we propose sparse prefix tuning by introducing textitattention sparsity on the prefix parameters at different layers of the network and learn sparse transformations on the softmax-function.
arXiv Detail & Related papers (2021-12-10T18:15:44Z)
Text Counterfactuals via Latent Optimization and Shapley-Guided Search [15.919650185010491]
We study the problem of generating counterfactual text for a classification model. We aim to minimally alter the text to change the model's prediction. White-box approaches have been successfully applied to similar problems in vision.
arXiv Detail & Related papers (2021-10-22T05:04:40Z)
Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers [8.19984844136462]
We present and evaluate a text generation method suitable to increase the performance of classifiers for long and short texts. In a simulated low data regime additive accuracy gains of up to 15.53% are achieved. We discuss implications and patterns for the successful application of our approach on different types of datasets.
arXiv Detail & Related papers (2021-03-26T13:16:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.