Related papers: Company classification using machine learning

Company classification using machine learning

URL: http://arxiv.org/abs/2004.01496v2
Date: Wed, 20 May 2020 08:41:06 GMT
Title: Company classification using machine learning
Authors: Sven Husmann, Antoniya Shivarova, Rick Steinert
Abstract summary: We show that unsupervised machine learning algorithms can be used to visualize and classify company data. We implement the data-driven reduction visualization tool t-SNE in combination with spectral clustering. We show that the application of t-SNE and spectral clustering improves the overall portfolio performance.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent advancements in computational power and machine learning algorithms have led to vast improvements in manifold areas of research. Especially in finance, the application of machine learning enables both researchers and practitioners to gain new insights into financial data and well-studied areas such as company classification. In our paper, we demonstrate that unsupervised machine learning algorithms can be used to visualize and classify company data in an economically meaningful and effective way. In particular, we implement the data-driven dimension reduction and visualization tool t-distributed stochastic neighbor embedding (t-SNE) in combination with spectral clustering. The resulting company groups can then be utilized by experts in the field for empirical analysis and optimal decision making. By providing an exemplary out-of-sample study within a portfolio optimization framework, we show that the application of t-SNE and spectral clustering improves the overall portfolio performance. Therefore, we introduce our approach to the financial community as a valuable technique in the context of data analysis and company classification.

Related papers

Machine learning-based cloud resource allocation algorithms: a comprehensive comparative review [0.0]
Cloud resource allocation has emerged as a major challenge in modern computing environments.<n>Traditional approaches prove inadequate for handling the multi-objective optimization demands of existing cloud infrastructures.<n>This paper presents a comparative analysis of state-of-the-art artificial intelligence and machine learning algorithms for resource allocation.
arXiv Detail & Related papers (2025-10-31T20:30:21Z)
Analytical Survey of Learning with Low-Resource Data: From Analysis to Investigation [192.53529928861818]
Learning with high-resource data has demonstrated substantial success in artificial intelligence (AI)<n>However, the costs associated with data annotation and model training remain significant.<n>This survey employs active sampling theory to analyze the generalization error and label complexity associated with learning from low-resource data.
arXiv Detail & Related papers (2025-10-10T03:15:42Z)
A Survey of Optimization Modeling Meets LLMs: Progress and Future Directions [27.77977859998504]
With the advent of large language models (LLMs), new opportunities have emerged to automate the procedure of mathematical modeling.<n>This survey presents a comprehensive review of recent advancements that cover the entire technical stack.
arXiv Detail & Related papers (2025-08-12T06:55:33Z)
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
Visualizing Machine Learning Models for Enhanced Financial Decision-Making and Risk Management [0.0]
This study emphasizes how crucial it is to visualize machine learning models, especially for the banking industry, in order to improve interpretability and support predictions. Visual tools enable performance improvements and support the creation of innovative financial models.
arXiv Detail & Related papers (2025-02-20T22:10:02Z)
A Comprehensive Survey on Imbalanced Data Learning [56.65067795190842]
imbalanced data is prevalent in various types of raw data and hinders the performance of machine learning.<n>This survey systematically analyzes various real-world data formats.<n>It concludes existing researches for different data formats into four categories: data re-balancing, feature representation, training strategy, and ensemble learning.
arXiv Detail & Related papers (2025-02-13T04:53:17Z)
A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation. deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency. This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z)
Clustering Time Series Data with Gaussian Mixture Embeddings in a Graph Autoencoder Framework [10.33711719777708]
Time series data analysis is prevalent across various domains, including finance, healthcare, and environmental monitoring. Traditional time series clustering methods often struggle to capture the complex temporal dependencies inherent in such data. We propose the Variational Mixture Graph Autoencoder (VMGAE), a graph-based approach for time series clustering.
arXiv Detail & Related papers (2024-11-25T22:49:01Z)
Structure Learning via Mutual Information [0.8702432681310399]
We propose a framework for learning and representing functional relationships in data using mutual information (MI) features. Our method aims to capture the underlying structure of information in datasets, enabling more efficient and generalizable learning algorithms.
arXiv Detail & Related papers (2024-09-21T19:33:56Z)
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis [0.0]
This study proposes a methodology extracting tasks, machine learning methods, and dataset names from scientific papers. The proposed method's expression extraction performance, when using Llama3, achieves an F-score exceeding 0.8 across various categories. Benchmarking results on financial domain papers have demonstrated the effectiveness of this method.
arXiv Detail & Related papers (2024-08-22T03:10:52Z)
Generalizing Machine Learning Evaluation through the Integration of Shannon Entropy and Rough Set Theory [0.0]
We introduce a comprehensive framework that synergizes the granularity of rough set theory with the uncertainty quantification of Shannon entropy. Our methodology is rigorously tested on various datasets, showcasing its capability to not only assess predictive performance but also to illuminate the underlying data complexity and model robustness.
arXiv Detail & Related papers (2024-04-18T21:22:42Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data. Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z)
Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z)
A Field Guide to Federated Optimization [161.3779046812383]
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms.
arXiv Detail & Related papers (2021-07-14T18:09:08Z)
Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn. We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)
A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions. Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
Machine Learning Algorithms for Financial Asset Price Forecasting [0.0]
This study directly compares and contrasts state-of-the-art implementations of modern Machine Learning algorithms on high performance computing infrastructures. The implemented Machine Learning models - trained on time series data for an entire stock universe - significantly outperform the CAPM on out-of-sample (OOS) test data.
arXiv Detail & Related papers (2020-03-31T18:14:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.