Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL
Tagging
- URL: http://arxiv.org/abs/2306.03723v1
- Date: Tue, 6 Jun 2023 14:41:30 GMT
- Title: Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL
Tagging
- Authors: Soumya Sharma, Subhendu Khatuya, Manjunath Hegde, Afreen Shaikh.
Koustuv Dasgupta, Pawan Goyal, Niloy Ganguly
- Abstract summary: The U.S. Securities and Exchange Commission (SEC) mandates all public companies to file periodic financial statements that should contain numerals with a particular label from a taxonomy.
We formulate the task of formulate the task of a label to a particular numeral span in a sentence from an extremely large label set.
- Score: 23.01422165679548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The U.S. Securities and Exchange Commission (SEC) mandates all public
companies to file periodic financial statements that should contain numerals
annotated with a particular label from a taxonomy. In this paper, we formulate
the task of automating the assignment of a label to a particular numeral span
in a sentence from an extremely large label set. Towards this task, we release
a dataset, Financial Numeric Extreme Labelling (FNXL), annotated with 2,794
labels. We benchmark the performance of the FNXL dataset by formulating the
task as (a) a sequence labelling problem and (b) a pipeline with span
extraction followed by Extreme Classification. Although the two approaches
perform comparably, the pipeline solution provides a slight edge for the least
frequent labels.
Related papers
- Complementary to Multiple Labels: A Correlation-Aware Correction
Approach [65.59584909436259]
We show theoretically how the estimated transition matrix in multi-class CLL could be distorted in multi-labeled cases.
We propose a two-step method to estimate the transition matrix from candidate labels.
arXiv Detail & Related papers (2023-02-25T04:48:48Z) - Pairwise Instance Relation Augmentation for Long-tailed Multi-label Text
Classification [38.66674700075432]
We propose a Pairwise Instance Relation Augmentation Network (PIRAN) to augment tailed-label documents for balancing tail labels and head labels.
PIRAN consistently outperforms the SOTA methods, and dramatically improves the performance of tail labels.
arXiv Detail & Related papers (2022-11-19T12:45:54Z) - Open Vocabulary Extreme Classification Using Generative Models [24.17018785195843]
The extreme multi-label classification (XMC) task aims at tagging content with a subset of labels from an extremely large label set.
We propose GROOV, a fine-tuned seq2seq model for OXMC that generates the set of labels as a flat sequence and is trained using a novel loss independent of predicted label order.
We show the efficacy of the approach, experimenting with popular XMC datasets for which GROOV is able to predict meaningful labels outside the given vocabulary while performing on par with state-of-the-art solutions for known labels.
arXiv Detail & Related papers (2022-05-12T00:33:49Z) - Acknowledging the Unknown for Multi-label Learning with Single Positive
Labels [65.5889334964149]
Traditionally, all unannotated labels are assumed as negative labels in single positive multi-label learning (SPML)
We propose entropy-maximization (EM) loss to maximize the entropy of predicted probabilities for all unannotated labels.
Considering the positive-negative label imbalance of unannotated labels, we propose asymmetric pseudo-labeling (APL) with asymmetric-tolerance strategies and a self-paced procedure to provide more precise supervision.
arXiv Detail & Related papers (2022-03-30T11:43:59Z) - GNN-XML: Graph Neural Networks for Extreme Multi-label Text
Classification [23.79498916023468]
Extreme multi-label text classification (XMTC) aims to tag a text instance with the most relevant subset of labels from an extremely large label set.
GNN-XML is a scalable graph neural network framework tailored for XMTC problems.
arXiv Detail & Related papers (2020-12-10T18:18:34Z) - A Study on the Autoregressive and non-Autoregressive Multi-label
Learning [77.11075863067131]
We propose a self-attention based variational encoder-model to extract the label-label and label-feature dependencies jointly.
Our model can therefore be used to predict all labels in parallel while still including both label-label and label-feature dependencies.
arXiv Detail & Related papers (2020-12-03T05:41:44Z) - An Empirical Study on Large-Scale Multi-Label Text Classification
Including Few and Zero-Shot Labels [49.036212158261215]
Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications.
Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs)
We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs.
We propose a new state-of-the-art method which combines BERT with LWANs.
arXiv Detail & Related papers (2020-10-04T18:55:47Z) - openXDATA: A Tool for Multi-Target Data Generation and Missing Label
Completion [23.14045574165086]
A common problem in machine learning is to deal with datasets with disjoint label spaces and missing labels.
In this work, we introduce the openXdata tool that completes the missing labels in partially labelled or unlabelled datasets.
We show the ability to estimate both categories and continuous labels for all of the datasets, at rates that approached the ground truth values.
arXiv Detail & Related papers (2020-07-27T22:05:53Z) - Few-shot Slot Tagging with Collapsed Dependency Transfer and
Label-enhanced Task-adaptive Projection Network [61.94394163309688]
We propose a Label-enhanced Task-Adaptive Projection Network (L-TapNet) based on the state-of-the-art few-shot classification model -- TapNet.
Experimental results show that our model significantly outperforms the strongest few-shot learning baseline by 14.64 F1 scores in the one-shot setting.
arXiv Detail & Related papers (2020-06-10T07:50:44Z) - General Partial Label Learning via Dual Bipartite Graph Autoencoder [81.78871072599607]
We formulate a practical yet challenging problem: General Partial Label Learning (GPLL)
We propose a novel graph autoencoder called Dual Bipartite Graph Autoencoder (DB-GAE)
arXiv Detail & Related papers (2020-01-05T19:00:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.