Key Information Retrieval to Classify the Unstructured Data Content of
Preferential Trade Agreements
- URL: http://arxiv.org/abs/2401.12520v1
- Date: Tue, 23 Jan 2024 06:30:05 GMT
- Title: Key Information Retrieval to Classify the Unstructured Data Content of
Preferential Trade Agreements
- Authors: Jiahui Zhao, Ziyi Meng, Stepan Gordeev, Zijie Pan, Dongjin Song,
Sandro Steinbach, Caiwen Ding
- Abstract summary: We introduce a novel approach to long-text classification and prediction.
We employ embedding techniques to condense the long texts, aiming to diminish the redundancy therein.
Experimental outcomes indicate that our method realizes considerable performance enhancements in classifying long texts of Preferential Trade Agreements.
- Score: 17.14791553124506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid proliferation of textual data, predicting long texts has
emerged as a significant challenge in the domain of natural language
processing. Traditional text prediction methods encounter substantial
difficulties when grappling with long texts, primarily due to the presence of
redundant and irrelevant information, which impedes the model's capacity to
capture pivotal insights from the text. To address this issue, we introduce a
novel approach to long-text classification and prediction. Initially, we employ
embedding techniques to condense the long texts, aiming to diminish the
redundancy therein. Subsequently,the Bidirectional Encoder Representations from
Transformers (BERT) embedding method is utilized for text classification
training. Experimental outcomes indicate that our method realizes considerable
performance enhancements in classifying long texts of Preferential Trade
Agreements. Furthermore, the condensation of text through embedding methods not
only augments prediction accuracy but also substantially reduces computational
complexity. Overall, this paper presents a strategy for long-text prediction,
offering a valuable reference for researchers and engineers in the natural
language processing sphere.
Related papers
- How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - A Survey of Text Representation Methods and Their Genealogy [0.0]
In recent years, with the advent of highly scalable artificial-neural-network-based text representation methods the field of natural language processing has seen unprecedented growth and sophistication.
We provide a survey of current approaches, by arranging them in a genealogy, and by conceptualizing a taxonomy of text representation methods to examine and explain the state-of-the-art.
arXiv Detail & Related papers (2022-11-26T15:22:01Z) - Generating Textual Adversaries with Minimal Perturbation [11.758947247743615]
We develop a novel attack strategy to find adversarial texts with high similarity to the original texts.
Our approach achieves higher success rates and lower perturbation rates in four benchmark datasets.
arXiv Detail & Related papers (2022-11-12T04:46:07Z) - Revisiting the Roles of "Text" in Text Games [102.22750109468652]
This paper investigates the roles of text in the face of different reinforcement learning challenges.
We propose a simple scheme to extract relevant contextual information into an approximate state hash.
Such a lightweight plug-in achieves competitive performance with state-of-the-art text agents.
arXiv Detail & Related papers (2022-10-15T21:52:39Z) - Text Guide: Improving the quality of long text classification by a text
selection method based on feature importance [0.0]
We propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit.
We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification.
arXiv Detail & Related papers (2021-04-15T04:10:08Z) - Topical Change Detection in Documents via Embeddings of Long Sequences [4.13878392637062]
We formulate the task of text segmentation as an independent supervised prediction task.
By fine-tuning on paragraphs of similar sections, we are able to show that learned features encode topic information.
Unlike previous approaches, which mostly operate on sentence-level, we consistently use a broader context.
arXiv Detail & Related papers (2020-12-07T12:09:37Z) - Review Regularized Neural Collaborative Filtering [11.960536488652354]
We propose a flexible neural recommendation framework, named Review Regularized Recommendation, short as R3.
It consists of a neural collaborative filtering part that focuses on prediction output, and a text processing part that serves as a regularizer.
Our preliminary results show that by using a simple text processing approach, it could achieve better prediction performance than state-of-the-art text-aware methods.
arXiv Detail & Related papers (2020-08-20T18:54:27Z) - Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition.
GSRM is introduced to capture global semantic context through multi-way parallel transmission.
Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.