Next-Year Bankruptcy Prediction from Textual Data: Benchmark and
Baselines
- URL: http://arxiv.org/abs/2208.11334v1
- Date: Wed, 24 Aug 2022 07:11:49 GMT
- Title: Next-Year Bankruptcy Prediction from Textual Data: Benchmark and
Baselines
- Authors: Henri Arno, Klaas Mulier, Joke Baeck and Thomas Demeester
- Abstract summary: Models for bankruptcy prediction are useful in several real-world scenarios.
The lack of a common benchmark dataset and evaluation strategy impedes the objective comparison between models.
This paper introduces such a benchmark for the unstructured data scenario, based on novel and established datasets.
- Score: 10.944533132358439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Models for bankruptcy prediction are useful in several real-world scenarios,
and multiple research contributions have been devoted to the task, based on
structured (numerical) as well as unstructured (textual) data. However, the
lack of a common benchmark dataset and evaluation strategy impedes the
objective comparison between models. This paper introduces such a benchmark for
the unstructured data scenario, based on novel and established datasets, in
order to stimulate further research into the task. We describe and evaluate
several classical and neural baseline models, and discuss benefits and flaws of
different strategies. In particular, we find that a lightweight bag-of-words
model based on static in-domain word representations obtains surprisingly good
results, especially when taking textual data from several years into account.
These results are critically assessed, and discussed in light of particular
aspects of the data and the task. All code to replicate the data and
experimental results will be released.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - EBES: Easy Benchmarking for Event Sequences [17.277513178760348]
Event sequences are common data structures in various real-world domains such as healthcare, finance, and user interaction logs.
Despite advances in temporal data modeling techniques, there is no standardized benchmarks for evaluating their performance on event sequences.
We introduce EBES, a comprehensive benchmarking tool with standardized evaluation scenarios and protocols.
arXiv Detail & Related papers (2024-10-04T13:03:43Z) - Numerical Literals in Link Prediction: A Critical Examination of Models and Datasets [2.5999037208435705]
Link Prediction models that incorporate numerical literals have shown minor improvements on existing benchmark datasets.
It is unclear whether a model is actually better in using numerical literals, or better capable of utilizing the graph structure.
We propose a methodology to evaluate LP models that incorporate numerical literals.
arXiv Detail & Related papers (2024-07-25T17:55:33Z) - DoubleMLDeep: Estimation of Causal Effects with Multimodal Data [7.014959855847738]
This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation.
We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model.
An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation.
arXiv Detail & Related papers (2024-02-01T21:34:34Z) - Leveraging Data Recasting to Enhance Tabular Reasoning [21.970920861791015]
Prior work has mostly relied on two data generation strategies.
The first is human annotation, which yields linguistically diverse data but is difficult to scale.
The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness.
arXiv Detail & Related papers (2022-11-23T00:04:57Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Data-to-text Generation with Variational Sequential Planning [74.3955521225497]
We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input.
We propose a neural model enhanced with a planning component responsible for organizing high-level information in a coherent and meaningful way.
We infer latent plans sequentially with a structured variational model, while interleaving the steps of planning and generation.
arXiv Detail & Related papers (2022-02-28T13:17:59Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.