Quantifying Outlierness of Funds from their Categories using Supervised
Similarity
- URL: http://arxiv.org/abs/2308.06882v1
- Date: Mon, 14 Aug 2023 01:28:19 GMT
- Title: Quantifying Outlierness of Funds from their Categories using Supervised
Similarity
- Authors: Dhruv Desai, Ashmita Dhiman, Tushar Sharma, Deepika Sharma, Dhagash
Mehta, Stefano Pasquali
- Abstract summary: We aim to quantify the effect of miscategorization of funds utilizing a machine learning based approach.
We implement and employ a Random Forest (RF) based method of distance metric learning, and compute the so-called class-wise outlier measures for each data-point to identify outliers in the data.
We show that there is a strong relationship between the outlier measures of the funds and their future returns and discuss the implications of our findings.
- Score: 6.060757543617328
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mutual fund categorization has become a standard tool for the investment
management industry and is extensively used by allocators for portfolio
construction and manager selection, as well as by fund managers for peer
analysis and competitive positioning. As a result, a (unintended)
miscategorization or lack of precision can significantly impact allocation
decisions and investment fund managers. Here, we aim to quantify the effect of
miscategorization of funds utilizing a machine learning based approach. We
formulate the problem of miscategorization of funds as a distance-based outlier
detection problem, where the outliers are the data-points that are far from the
rest of the data-points in the given feature space. We implement and employ a
Random Forest (RF) based method of distance metric learning, and compute the
so-called class-wise outlier measures for each data-point to identify outliers
in the data. We test our implementation on various publicly available data
sets, and then apply it to mutual fund data. We show that there is a strong
relationship between the outlier measures of the funds and their future returns
and discuss the implications of our findings.
Related papers
- The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models [22.75594773147521]
We introduce Rank-Allocation-Based Bias Index (RABBI), a model-agnostic bias measure that assesses potential allocational harms arising from biases in large language models (LLMs)
Our results reveal that commonly-used bias metrics based on average performance gap and distribution distance fail to reliably capture group disparities in allocation outcomes.
Our work highlights the need to account for how models are used in contexts with limited resource constraints.
arXiv Detail & Related papers (2024-08-02T14:13:06Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Data-Driven Knowledge Transfer in Batch $Q^*$ Learning [5.6665432569907646]
We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments.
We propose a framework of Transferred Fitted $Q$-Iteration algorithm with general function approximation.
We show that the final learning error of the $Q*$ function is significantly improved from the single task rate.
arXiv Detail & Related papers (2024-04-01T02:20:09Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - IT-RUDA: Information Theory Assisted Robust Unsupervised Domain
Adaptation [7.225445443960775]
Distribution shift between train (source) and test (target) datasets is a common problem encountered in machine learning applications.
UDA technique carries out knowledge transfer from a label-rich source domain to an unlabeled target domain.
Outliers that exist in either source or target datasets can introduce additional challenges when using UDA in practice.
arXiv Detail & Related papers (2022-10-24T04:33:52Z) - Domain Adaptative Causality Encoder [52.779274858332656]
We leverage the characteristics of dependency trees and adversarial learning to address the tasks of adaptive causality identification and localisation.
We present a new causality dataset, namely MedCaus, which integrates all types of causality in the text.
arXiv Detail & Related papers (2020-11-27T04:14:55Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Deep Adversarial Domain Adaptation Based on Multi-layer Joint Kernelized
Distance [30.452492118887182]
Domain adaptation refers to the learning scenario that a model learned from the source data is applied on the target data.
The distribution discrepancy between source data and target data can substantially affect the adaptation performance.
A deep adversarial domain adaptation model based on a multi-layer joint kernelized distance metric is proposed.
arXiv Detail & Related papers (2020-10-09T02:32:48Z) - Learning Calibrated Uncertainties for Domain Shift: A Distributionally
Robust Learning Approach [150.8920602230832]
We propose a framework for learning calibrated uncertainties under domain shifts.
In particular, the density ratio estimation reflects the closeness of a target (test) sample to the source (training) distribution.
We show that our proposed method generates calibrated uncertainties that benefit downstream tasks.
arXiv Detail & Related papers (2020-10-08T02:10:54Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Machine Learning Fund Categorizations [2.7930955543692817]
We establish that an industry wide well-regarded categorization system is learnable using machine learning and largely reproducible.
We discuss the intellectual challenges in learning this man-made system, our results and their implications.
arXiv Detail & Related papers (2020-05-29T23:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.