Related papers: Rank over Class: The Untapped Potential of Ranking in Natural Language Processing

Rank over Class: The Untapped Potential of Ranking in Natural Language Processing

URL: http://arxiv.org/abs/2009.05160v4
Date: Fri, 3 Dec 2021 18:26:42 GMT
Title: Rank over Class: The Untapped Potential of Ranking in Natural Language Processing
Authors: Amir Atapour-Abarghouei, Stephen Bonner, Andrew Stephen McGough
Abstract summary: We argue that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould. We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences. In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification.
Score: 8.637110868126546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text classification has long been a staple within Natural Language Processing (NLP) with applications spanning across diverse areas such as sentiment analysis, recommender systems and spam detection. With such a powerful solution, it is often tempting to use it as the go-to tool for all NLP problems since when you are holding a hammer, everything looks like a nail. However, we argue here that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould and that if we instead address them as a ranking problem, we not only improve the model, but we achieve better performance. We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences, which are in turn passed into a context aggregating network outputting ranking scores used to determine an ordering to the sequences based on some notion of relevance. We perform numerous experiments on publicly-available datasets and investigate the applications of ranking in problems often solved using classification. In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification, demonstrating the efficacy of text ranking over text classification in certain scenarios.

Related papers

Text Classification using Graph Convolutional Networks: A Comprehensive Survey [11.1080224302799]
Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade. This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision.
arXiv Detail & Related papers (2024-10-12T07:03:42Z)
AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings [53.78802457488845]
We introduce the idea of any-granularity ranking, which leverages multi-vector embeddings to rank at varying levels of granularity. We demonstrate the application of proposition-level ranking to post-hoc citation addition in retrieval-augmented generation.
arXiv Detail & Related papers (2024-05-23T20:04:54Z)
Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings. An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts) This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z)
BERT Goes Off-Topic: Investigating the Domain Transfer Challenge using Genre Classification [0.27195102129095]
We show that classification tasks still suffer from a performance gap when the underlying distribution of topics changes. We quantify this phenomenon empirically with a large corpus and a large set of topics. We suggest and successfully test a possible remedy: after augmenting the training dataset with topically-controlled synthetic texts, the F1 score improves by up to 50% for some topics.
arXiv Detail & Related papers (2023-11-27T18:53:31Z)
Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation [2.024620791810963]
This study benchmarks the performance of Prompt Tuning and baselines for multi-label text classification. It is applied to classifying companies into an investment firm's proprietary industry taxonomy. We confirm that the model's performance is consistent across both well-known and less-known companies.
arXiv Detail & Related papers (2023-09-21T13:45:32Z)
RankCSE: Unsupervised Sentence Representations Learning via Learning to Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning. It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework. An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z)
Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification [66.02091763340094]
Like a Good Nearest Neighbor (LaGoNN) is a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor. LaGoNN is effective at flagging undesirable content and text classification, and improves the performance of SetFit.
arXiv Detail & Related papers (2023-02-17T15:43:29Z)
Learning List-Level Domain-Invariant Representations for Ranking [59.3544317373004]
We propose list-level alignment -- learning domain-invariant representations at the higher level of lists. The benefits are twofold: it leads to the first domain adaptation generalization bound for ranking, in turn providing theoretical support for the proposed method.
arXiv Detail & Related papers (2022-12-21T04:49:55Z)
Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z)
Efficient strategies for hierarchical text classification: External knowledge and auxiliary tasks [3.5557219875516655]
We perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy. With our efficient approaches, we outperform previous studies, using a drastically reduced number of parameters, in two well-known English datasets.
arXiv Detail & Related papers (2020-05-05T20:22:18Z)
Joint Embedding of Words and Category Labels for Hierarchical Multi-label Text Classification [4.2750700546937335]
hierarchical text classification (HTC) has received extensive attention and has broad application prospects. We propose a joint embedding of text and parent category based on hierarchical fine-tuning ordered neurons LSTM (HFT-ONLSTM) for HTC.
arXiv Detail & Related papers (2020-04-06T11:06:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.