Hawk: An Industrial-strength Multi-label Document Classifier
- URL: http://arxiv.org/abs/2301.06057v1
- Date: Sun, 15 Jan 2023 09:52:18 GMT
- Title: Hawk: An Industrial-strength Multi-label Document Classifier
- Authors: Arshad Javeed
- Abstract summary: The paper describes the significance of these problems in detail and proposes a unique neural network architecture that addresses the above problems.
A hydranet-like architecture is designed to have granular control over and improve the modularity, coupled with a weighted loss driving task-specific heads.
The experimental results reveal that the proposed model outperforms the existing methods by a substantial margin.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There are a plethora of methods and algorithms that solve the classical
multi-label document classification. However, when it comes to deployment and
usage in an industry setting, most, if not all the contemporary approaches fail
to address some of the vital aspects or requirements of an ideal solution: i.
ability to operate on variable-length texts and rambling documents. ii.
catastrophic forgetting problem. iii. modularity when it comes to online
learning and updating the model. iv. ability to spotlight relevant text while
producing the prediction, i.e. visualizing the predictions. v. ability to
operate on imbalanced or skewed datasets. vi. scalability. The paper describes
the significance of these problems in detail and proposes a unique neural
network architecture that addresses the above problems. The proposed
architecture views documents as a sequence of sentences and leverages
sentence-level embeddings for input representation. A hydranet-like
architecture is designed to have granular control over and improve the
modularity, coupled with a weighted loss driving task-specific heads. In
particular, two specific mechanisms are compared: Bi-LSTM and
Transformer-based. The architecture is benchmarked on some of the popular
benchmarking datasets such as Web of Science - 5763, Web of Science - 11967,
BBC Sports, and BBC News datasets. The experimental results reveal that the
proposed model outperforms the existing methods by a substantial margin. The
ablation study includes comparisons of the impact of the attention mechanism
and the application of weighted loss functions to train the task-specific heads
in the hydranet.
Related papers
- Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - TopoBERT: Plug and Play Toponym Recognition Module Harnessing Fine-tuned
BERT [11.446721140340575]
TopoBERT, a toponym recognition module based on a one dimensional Convolutional Neural Network (CNN1D) and Bidirectional Representation from Transformers (BERT), is proposed and fine-tuned.
TopoBERT achieves state-of-the-art performance compared to the other five baseline models and can be applied to diverse toponym recognition tasks without additional training.
arXiv Detail & Related papers (2023-01-31T13:44:34Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Robust Graph Representation Learning via Predictive Coding [46.22695915912123]
Predictive coding is a message-passing framework initially developed to model information processing in the brain.
In this work, we build models that rely on the message-passing rule of predictive coding.
We show that the proposed models are comparable to standard ones in terms of performance in both inductive and transductive tasks.
arXiv Detail & Related papers (2022-12-09T03:58:22Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Online Deep Learning based on Auto-Encoder [4.128388784932455]
We propose a two-phase Online Deep Learning based on Auto-Encoder (ODLAE)
Based on auto-encoder, considering reconstruction loss, we extract abstract hierarchical latent representations of instances.
We devise two fusion strategies: the output-level fusion strategy, which is obtained by fusing the classification results of each hidden layer; and feature-level fusion strategy, which is leveraged self-attention mechanism to fusion every hidden layer output.
arXiv Detail & Related papers (2022-01-19T02:14:57Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - FitVid: Overfitting in Pixel-Level Video Prediction [117.59339756506142]
We introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks.
FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.
arXiv Detail & Related papers (2021-06-24T17:20:21Z) - Mapping the Internet: Modelling Entity Interactions in Complex
Heterogeneous Networks [0.0]
We propose a versatile, unified framework called HMill' for sample representation, model definition and training.
We show an extension of the universal approximation theorem to the set of all functions realized by models implemented in the framework.
We solve three different problems from the cybersecurity domain using the framework.
arXiv Detail & Related papers (2021-04-19T21:32:44Z) - TELESTO: A Graph Neural Network Model for Anomaly Classification in
Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance.
One direction aims at the recognition of re-occurring anomaly types to enable remediation automation.
We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.