Rank over Class: The Untapped Potential of Ranking in Natural Language
Processing
- URL: http://arxiv.org/abs/2009.05160v4
- Date: Fri, 3 Dec 2021 18:26:42 GMT
- Title: Rank over Class: The Untapped Potential of Ranking in Natural Language
Processing
- Authors: Amir Atapour-Abarghouei, Stephen Bonner, Andrew Stephen McGough
- Abstract summary: We argue that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould.
We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences.
In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification.
- Score: 8.637110868126546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text classification has long been a staple within Natural Language Processing
(NLP) with applications spanning across diverse areas such as sentiment
analysis, recommender systems and spam detection. With such a powerful
solution, it is often tempting to use it as the go-to tool for all NLP problems
since when you are holding a hammer, everything looks like a nail. However, we
argue here that many tasks which are currently addressed using classification
are in fact being shoehorned into a classification mould and that if we instead
address them as a ranking problem, we not only improve the model, but we
achieve better performance. We propose a novel end-to-end ranking approach
consisting of a Transformer network responsible for producing representations
for a pair of text sequences, which are in turn passed into a context
aggregating network outputting ranking scores used to determine an ordering to
the sequences based on some notion of relevance. We perform numerous
experiments on publicly-available datasets and investigate the applications of
ranking in problems often solved using classification. In an experiment on a
heavily-skewed sentiment analysis dataset, converting ranking results to
classification labels yields an approximately 22% improvement over
state-of-the-art text classification, demonstrating the efficacy of text
ranking over text classification in certain scenarios.
Related papers
- Text Classification using Graph Convolutional Networks: A Comprehensive Survey [11.1080224302799]
Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade.
This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision.
arXiv Detail & Related papers (2024-10-12T07:03:42Z) - AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings [53.78802457488845]
We introduce the idea of any-granularity ranking, which leverages multi-vector embeddings to rank at varying levels of granularity.
We demonstrate the application of proposition-level ranking to post-hoc citation addition in retrieval-augmented generation.
arXiv Detail & Related papers (2024-05-23T20:04:54Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - BERT Goes Off-Topic: Investigating the Domain Transfer Challenge using
Genre Classification [0.27195102129095]
We show that classification tasks still suffer from a performance gap when the underlying distribution of topics changes.
We quantify this phenomenon empirically with a large corpus and a large set of topics.
We suggest and successfully test a possible remedy: after augmenting the training dataset with topically-controlled synthetic texts, the F1 score improves by up to 50% for some topics.
arXiv Detail & Related papers (2023-11-27T18:53:31Z) - Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation [2.024620791810963]
This study benchmarks the performance of Prompt Tuning and baselines for multi-label text classification.
It is applied to classifying companies into an investment firm's proprietary industry taxonomy.
We confirm that the model's performance is consistent across both well-known and less-known companies.
arXiv Detail & Related papers (2023-09-21T13:45:32Z) - RankCSE: Unsupervised Sentence Representations Learning via Learning to
Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning.
It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.
An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z) - Like a Good Nearest Neighbor: Practical Content Moderation and Text
Classification [66.02091763340094]
Like a Good Nearest Neighbor (LaGoNN) is a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor.
LaGoNN is effective at flagging undesirable content and text classification, and improves the performance of SetFit.
arXiv Detail & Related papers (2023-02-17T15:43:29Z) - Learning List-Level Domain-Invariant Representations for Ranking [59.3544317373004]
We propose list-level alignment -- learning domain-invariant representations at the higher level of lists.
The benefits are twofold: it leads to the first domain adaptation generalization bound for ranking, in turn providing theoretical support for the proposed method.
arXiv Detail & Related papers (2022-12-21T04:49:55Z) - Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers.
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z) - Efficient strategies for hierarchical text classification: External
knowledge and auxiliary tasks [3.5557219875516655]
We perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy.
With our efficient approaches, we outperform previous studies, using a drastically reduced number of parameters, in two well-known English datasets.
arXiv Detail & Related papers (2020-05-05T20:22:18Z) - Joint Embedding of Words and Category Labels for Hierarchical
Multi-label Text Classification [4.2750700546937335]
hierarchical text classification (HTC) has received extensive attention and has broad application prospects.
We propose a joint embedding of text and parent category based on hierarchical fine-tuning ordered neurons LSTM (HFT-ONLSTM) for HTC.
arXiv Detail & Related papers (2020-04-06T11:06:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.