Learning Mutual Fund Categorization using Natural Language Processing
- URL: http://arxiv.org/abs/2207.04959v1
- Date: Mon, 11 Jul 2022 15:40:18 GMT
- Title: Learning Mutual Fund Categorization using Natural Language Processing
- Authors: Dimitrios Vamvourellis, Mate Attila Toth, Dhruv Desai, Dhagash Mehta,
Stefano Pasquali
- Abstract summary: We learn the categorization system directly from the unstructured data as depicted in the forms using natural language processing (NLP)
We show that the categorization system can indeed be learned with high accuracy.
- Score: 0.5249805590164901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long
served the financial analysts to perform peer analysis for various purposes
starting from competitor analysis, to quantifying portfolio diversification.
The categorization methodology usually relies on fund composition data in the
structured format extracted from the Form N-1A. Here, we initiate a study to
learn the categorization system directly from the unstructured data as depicted
in the forms using natural language processing (NLP). Positing as a multi-class
classification problem with the input data being only the investment strategy
description as reported in the form and the target variable being the Lipper
Global categories, and using various NLP models, we show that the
categorization system can indeed be learned with high accuracy. We discuss
implications and applications of our findings as well as limitations of
existing pre-trained architectures in applying them to learn fund
categorization.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Classifying Organizations for Food System Ontologies using Natural
Language Processing [9.462188694526134]
We have created NLP models that can automatically classify organizations associated with environmental issues.
As input, the NLP models are provided with text snippets retrieved by the Google search engine for each organization.
We believe NLP models represent a promising approach for harvesting information to populate knowledge graphs.
arXiv Detail & Related papers (2023-09-19T19:07:48Z) - Company classification using zero-shot learning [0.0]
We propose an approach for company classification using NLP and zero-shot learning.
We evaluate our approach on a dataset obtained through the Wharton Research Data Services (WRDS)
arXiv Detail & Related papers (2023-05-01T18:36:06Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - On Language Clustering: A Non-parametric Statistical Approach [0.0]
This study presents statistical approaches that may be employed in nonparametric nonhomogeneous data frameworks.
It also examines their application in the field of natural language processing and language clustering.
arXiv Detail & Related papers (2022-09-14T15:27:41Z) - An Evolutionary Approach for Creating of Diverse Classifier Ensembles [11.540822622379176]
We propose a framework for classifier selection and fusion based on a four-step protocol called CIF-E.
We implement and evaluate 24 varied ensemble approaches following the proposed CIF-E protocol.
Experiments show that the proposed evolutionary approach can outperform the state-of-the-art literature approaches in many well-known UCI datasets.
arXiv Detail & Related papers (2022-08-23T14:23:27Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - Federated Learning Aggregation: New Robust Algorithms with Guarantees [63.96013144017572]
Federated learning has been recently proposed for distributed model training at the edge.
This paper presents a complete general mathematical convergence analysis to evaluate aggregation strategies in a federated learning framework.
We derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses.
arXiv Detail & Related papers (2022-05-22T16:37:53Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Pareto-wise Ranking Classifier for Multi-objective Evolutionary Neural
Architecture Search [15.454709248397208]
This study focuses on how to find feasible deep models under diverse design objectives.
We propose a classification-wise Pareto evolution approach for one-shot NAS, where an online classifier is trained to predict the dominance relationship between the candidate and constructed reference architectures.
We find a number of neural architectures with different model sizes ranging from 2M to 6M under diverse objectives and constraints.
arXiv Detail & Related papers (2021-09-14T13:28:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.