Learning Mutual Fund Categorization using Natural Language Processing
- URL: http://arxiv.org/abs/2207.04959v1
- Date: Mon, 11 Jul 2022 15:40:18 GMT
- Title: Learning Mutual Fund Categorization using Natural Language Processing
- Authors: Dimitrios Vamvourellis, Mate Attila Toth, Dhruv Desai, Dhagash Mehta,
Stefano Pasquali
- Abstract summary: We learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP)
We show that the categorization system can indeed be learned with high accuracy.
- Score: 0.5249805590164901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long
served the financial analysts to perform peer analysis for various purposes
starting from competitor analysis, to quantifying portfolio diversification.
The categorization methodology usually relies on fund composition data in the
structured format extracted from the Form N-1A. Here, we initiate a study to
learn the categorization system directly from the unstructured data as depicted
in the forms using natural language processing (NLP). Positing as a multi-class
classification problem with the input data being only the investment strategy
description as reported in the form and the target variable being the Lipper
Global categories, and using various NLP models, we show that the
categorization system can indeed be learned with high accuracy. We discuss
implications and applications of our findings as well as limitations of
existing pre-trained architectures in applying them to learn fund
categorization.
Related papers
- Demystifying Domain-adaptive Post-training for Financial LLMs [79.581577578952]
FINDAP is a systematic and fine-grained investigation into domain adaptive post-training of large language models (LLMs)
Our approach consists of four key components: FinCap, FinRec, FinTrain and FinEval.
The resulting model, Llama-Fin, achieves state-of-the-art performance across a wide range of financial tasks.
arXiv Detail & Related papers (2025-01-09T04:26:15Z) - Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
We propose a novel category-adaptive cross-modal semantic refinement and transfer (C$2$SRT) framework to explore the semantic correlation.
The proposed framework consists of two complementary modules, i.e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module.
Experiments on OV-MLR benchmarks clearly demonstrate that the proposed C$2$SRT framework outperforms current state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-09T04:00:18Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Structure-aware Domain Knowledge Injection for Large Language Models [38.08691252042949]
StructTuning is a methodology to transform Large Language Models (LLMs) into domain specialists.
It significantly reduces the training corpus needs to a mere 5% while achieving an impressive 100% of traditional knowledge injection performance.
arXiv Detail & Related papers (2024-07-23T12:38:48Z) - Classifying Organizations for Food System Ontologies using Natural
Language Processing [9.462188694526134]
We have created NLP models that can automatically classify organizations associated with environmental issues.
As input, the NLP models are provided with text snippets retrieved by the Google search engine for each organization.
We believe NLP models represent a promising approach for harvesting information to populate knowledge graphs.
arXiv Detail & Related papers (2023-09-19T19:07:48Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - On Language Clustering: A Non-parametric Statistical Approach [0.0]
This study presents statistical approaches that may be employed in nonparametric nonhomogeneous data frameworks.
It also examines their application in the field of natural language processing and language clustering.
arXiv Detail & Related papers (2022-09-14T15:27:41Z) - An Evolutionary Approach for Creating of Diverse Classifier Ensembles [11.540822622379176]
We propose a framework for classifier selection and fusion based on a four-step protocol called CIF-E.
We implement and evaluate 24 varied ensemble approaches following the proposed CIF-E protocol.
Experiments show that the proposed evolutionary approach can outperform the state-of-the-art literature approaches in many well-known UCI datasets.
arXiv Detail & Related papers (2022-08-23T14:23:27Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - Pareto-wise Ranking Classifier for Multi-objective Evolutionary Neural
Architecture Search [15.454709248397208]
This study focuses on how to find feasible deep models under diverse design objectives.
We propose a classification-wise Pareto evolution approach for one-shot NAS, where an online classifier is trained to predict the dominance relationship between the candidate and constructed reference architectures.
We find a number of neural architectures with different model sizes ranging from 2M to 6M under diverse objectives and constraints.
arXiv Detail & Related papers (2021-09-14T13:28:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.