Related papers: Learning Mutual Fund Categorization using Natural Language Processing

Learning Mutual Fund Categorization using Natural Language Processing

URL: http://arxiv.org/abs/2207.04959v1
Date: Mon, 11 Jul 2022 15:40:18 GMT
Title: Learning Mutual Fund Categorization using Natural Language Processing
Authors: Dimitrios Vamvourellis, Mate Attila Toth, Dhruv Desai, Dhagash Mehta, Stefano Pasquali
Abstract summary: We learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP) We show that the categorization system can indeed be learned with high accuracy.
Score: 0.5249805590164901
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long served the financial analysts to perform peer analysis for various purposes starting from competitor analysis, to quantifying portfolio diversification. The categorization methodology usually relies on fund composition data in the structured format extracted from the Form N-1A. Here, we initiate a study to learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP). Positing as a multi-class classification problem with the input data being only the investment strategy description as reported in the form and the target variable being the Lipper Global categories, and using various NLP models, we show that the categorization system can indeed be learned with high accuracy. We discuss implications and applications of our findings as well as limitations of existing pre-trained architectures in applying them to learn fund categorization.

Related papers

Demystifying Domain-adaptive Post-training for Financial LLMs [79.581577578952]
FINDAP is a systematic and fine-grained investigation into domain adaptive post-training of large language models (LLMs) Our approach consists of four key components: FinCap, FinRec, FinTrain and FinEval. The resulting model, Llama-Fin, achieves state-of-the-art performance across a wide range of financial tasks.
arXiv Detail & Related papers (2025-01-09T04:26:15Z)
Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
We propose a novel category-adaptive cross-modal semantic refinement and transfer (C$2$SRT) framework to explore the semantic correlation. The proposed framework consists of two complementary modules, i.e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module. Experiments on OV-MLR benchmarks clearly demonstrate that the proposed C$2$SRT framework outperforms current state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-09T04:00:18Z)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored. We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z)
High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models. To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence. Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z)
Structure-aware Domain Knowledge Injection for Large Language Models [38.08691252042949]
StructTuning is a methodology to transform Large Language Models (LLMs) into domain specialists. It significantly reduces the training corpus needs to a mere 5% while achieving an impressive 100% of traditional knowledge injection performance.
arXiv Detail & Related papers (2024-07-23T12:38:48Z)
Classifying Organizations for Food System Ontologies using Natural Language Processing [9.462188694526134]
We have created NLP models that can automatically classify organizations associated with environmental issues. As input, the NLP models are provided with text snippets retrieved by the Google search engine for each organization. We believe NLP models represent a promising approach for harvesting information to populate knowledge graphs.
arXiv Detail & Related papers (2023-09-19T19:07:48Z)
Company classification using zero-shot learning [0.0]
We propose an approach for company classification using NLP and zero-shot learning. We evaluate our approach on a dataset obtained through the Wharton Research Data Services (WRDS)
arXiv Detail & Related papers (2023-05-01T18:36:06Z)
Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z)
On Language Clustering: A Non-parametric Statistical Approach [0.0]
This study presents statistical approaches that may be employed in nonparametric nonhomogeneous data frameworks. It also examines their application in the field of natural language processing and language clustering.
arXiv Detail & Related papers (2022-09-14T15:27:41Z)
An Evolutionary Approach for Creating of Diverse Classifier Ensembles [11.540822622379176]
We propose a framework for classifier selection and fusion based on a four-step protocol called CIF-E. We implement and evaluate 24 varied ensemble approaches following the proposed CIF-E protocol. Experiments show that the proposed evolutionary approach can outperform the state-of-the-art literature approaches in many well-known UCI datasets.
arXiv Detail & Related papers (2022-08-23T14:23:27Z)
A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification. The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample. A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z)
Federated Learning Aggregation: New Robust Algorithms with Guarantees [63.96013144017572]
Federated learning has been recently proposed for distributed model training at the edge. This paper presents a complete general mathematical convergence analysis to evaluate aggregation strategies in a federated learning framework. We derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses.
arXiv Detail & Related papers (2022-05-22T16:37:53Z)
DRFLM: Distributionally Robust Federated Learning with Inter-client Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data. We propose a general framework to solve the above two challenges simultaneously. We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z)
Pareto-wise Ranking Classifier for Multi-objective Evolutionary Neural Architecture Search [15.454709248397208]
This study focuses on how to find feasible deep models under diverse design objectives. We propose a classification-wise Pareto evolution approach for one-shot NAS, where an online classifier is trained to predict the dominance relationship between the candidate and constructed reference architectures. We find a number of neural architectures with different model sizes ranging from 2M to 6M under diverse objectives and constraints.
arXiv Detail & Related papers (2021-09-14T13:28:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.