Supervised Text Classification using Text Search
- URL: http://arxiv.org/abs/2011.13832v2
- Date: Mon, 30 Nov 2020 19:53:45 GMT
- Title: Supervised Text Classification using Text Search
- Authors: Nabarun Mondal, Mrunal Lohia
- Abstract summary: Authors describe a class of industrial standard algorithms which can accurately predict classification of any text given prior labelled text data.
These algorithms were used to automate routing of issue tickets to the appropriate team.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised text classification is a classical and active area of ML research.
In large enterprise, solutions to this problem has significant importance. This
is specifically true in ticketing systems where prediction of the type and
subtype of tickets given new incoming ticket text to find out optimal routing
is a multi billion dollar industry.
In this paper authors describe a class of industrial standard algorithms
which can accurately ( 86\% and above ) predict classification of any text
given prior labelled text data - by novel use of any text search engine.
These algorithms were used to automate routing of issue tickets to the
appropriate team. This class of algorithms has far reaching consequences for a
wide variety of industrial applications, IT support, RPA script triggering,
even legal domain where massive set of pre labelled data are already available.
Related papers
- LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection [87.43727192273772]
It is often hard to tell whether a piece of text was human-written or machine-generated.
We present LLM-DetectAIve, designed for fine-grained detection.
It supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished.
arXiv Detail & Related papers (2024-08-08T07:43:17Z) - GuideWalk: A Novel Graph-Based Word Embedding for Enhanced Text Classification [0.0]
The processing of text data requires embedding, a method of translating the content of the text to numeric vectors.
A new text embedding approach, namely the Guided Transition Probability Matrix (GTPM) model is proposed.
The proposed method is tested with real-world data sets and eight well-known and successful embedding algorithms.
arXiv Detail & Related papers (2024-04-25T18:48:11Z) - Identifying Banking Transaction Descriptions via Support Vector Machine Short-Text Classification Based on a Specialized Labelled Corpus [7.046417074932257]
We describe a novel system that combines Natural Language Processing techniques with Machine Learning algorithms to classify banking transaction descriptions.
Motivated by existing solutions in spam detection, we also propose a short text similarity detector to reduce training set size based on the Jaccard distance.
We present a use case with a personal finance application, CoinScrap, which is available at Google Play and App Store.
arXiv Detail & Related papers (2024-03-29T13:15:46Z) - Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation [2.024620791810963]
This study benchmarks the performance of Prompt Tuning and baselines for multi-label text classification.
It is applied to classifying companies into an investment firm's proprietary industry taxonomy.
We confirm that the model's performance is consistent across both well-known and less-known companies.
arXiv Detail & Related papers (2023-09-21T13:45:32Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - Description-Based Text Similarity [59.552704474862004]
We identify the need to search for texts based on abstract descriptions of their content.
We propose an alternative model that significantly improves when used in standard nearest neighbor search.
arXiv Detail & Related papers (2023-05-21T17:14:31Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Automatic Detection of Industry Sectors in Legal Articles Using Machine
Learning Approaches [0.0]
A dataset consisting of over 1,700 annotated legal articles was created for the identification of six industry sectors.
The system achieved promising results with area under the receiver operating characteristic curve scores above 0.90 and F-scores above 0.81 with respect to the six industry sectors.
arXiv Detail & Related papers (2023-03-08T12:41:56Z) - Benchmarking Multimodal AutoML for Tabular Data with Text Fields [83.43249184357053]
We assemble 18 multimodal data tables that each contain some text fields.
Our benchmark enables researchers to evaluate their own methods for supervised learning with numeric, categorical, and text features.
arXiv Detail & Related papers (2021-11-04T09:29:16Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - Rank over Class: The Untapped Potential of Ranking in Natural Language
Processing [8.637110868126546]
We argue that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould.
We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences.
In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification.
arXiv Detail & Related papers (2020-09-10T22:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.