Mutual Information Scoring: Increasing Interpretability in Categorical
Clustering Tasks with Applications to Child Welfare Data
- URL: http://arxiv.org/abs/2208.01802v1
- Date: Wed, 3 Aug 2022 01:11:09 GMT
- Title: Mutual Information Scoring: Increasing Interpretability in Categorical
Clustering Tasks with Applications to Child Welfare Data
- Authors: Pranav Sankhe, Seventy F. Hall, Melanie Sage, Maria Y. Rodriquez,
Varun Chandola, Kenneth Joseph
- Abstract summary: Youth in the American foster care system are significantly more likely than their peers to face a number of negative life outcomes.
Data on these youth have the potential to provide insights that can help identify ways to improve their path towards a better life.
The present work proposes a novel, prescriptive approach to using these data to provide insights about both data biases and the systems and youth they track.
- Score: 6.651036327739043
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Youth in the American foster care system are significantly more likely than
their peers to face a number of negative life outcomes, from homelessness to
incarceration. Administrative data on these youth have the potential to provide
insights that can help identify ways to improve their path towards a better
life. However, such data also suffer from a variety of biases, from missing
data to reflections of systemic inequality. The present work proposes a novel,
prescriptive approach to using these data to provide insights about both data
biases and the systems and youth they track. Specifically, we develop a novel
categorical clustering and cluster summarization methodology that allows us to
gain insights into subtle biases in existing data on foster youth, and to
provide insight into where further (often qualitative) research is needed to
identify potential ways of assisting youth.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - The Generation Gap: Exploring Age Bias in the Value Systems of Large Language Models [26.485974783643464]
We find a general inclination of Large Language Models (LLMs) values towards younger demographics, especially when compared to the US population.
Although a general inclination can be observed, we also found that this inclination toward younger groups can be different across different value categories.
arXiv Detail & Related papers (2024-04-12T18:36:20Z) - Deep Metric Learning for Computer Vision: A Brief Overview [4.980117530293724]
Objective functions that optimize deep neural networks play a vital role in creating an enhanced feature representation of the input data.
Deep Metric Learning seeks to develop methods that aim to measure the similarity between data samples.
We will provide an overview of recent progress in this area and discuss state-of-the-art Deep Metric Learning approaches.
arXiv Detail & Related papers (2023-12-01T21:53:36Z) - Unveiling the Potential of Probabilistic Embeddings in Self-Supervised
Learning [4.124934010794795]
Self-supervised learning has played a pivotal role in advancing machine learning by allowing models to acquire meaningful representations from unlabeled data.
We investigate the impact of probabilistic modeling on the information bottleneck, shedding light on a trade-off between compression and preservation of information in both representation and loss space.
Our findings suggest that introducing an additional bottleneck in the loss space can significantly enhance the ability to detect out-of-distribution examples.
arXiv Detail & Related papers (2023-10-27T12:01:16Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Novel Class Discovery without Forgetting [72.52222295216062]
We identify and formulate a new, pragmatic problem setting of NCDwF: Novel Class Discovery without Forgetting.
We propose a machine learning model to incrementally discover novel categories of instances from unlabeled data.
We introduce experimental protocols based on CIFAR-10, CIFAR-100 and ImageNet-1000 to measure the trade-off between knowledge retention and novel class discovery.
arXiv Detail & Related papers (2022-07-21T17:54:36Z) - Data Representativeness in Accessibility Datasets: A Meta-Analysis [7.6597163467929805]
We review datasets sourced by people with disabilities and older adults.
We find that accessibility datasets represent diverse ages, but have gender and race representation gaps.
We hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.
arXiv Detail & Related papers (2022-07-16T23:32:19Z) - Imagining new futures beyond predictive systems in child welfare: A
qualitative study with impacted stakeholders [89.6319385008397]
We conducted a set of seven design workshops with 35 stakeholders who have been impacted by the child welfare system.
We found that participants worried current PRMs perpetuate or exacerbate existing problems in child welfare.
Participants suggested new ways to use data and data-driven tools to better support impacted communities.
arXiv Detail & Related papers (2022-05-18T13:49:55Z) - Enhancing Facial Data Diversity with Style-based Face Aging [59.984134070735934]
In particular, face datasets are typically biased in terms of attributes such as gender, age, and race.
We propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns.
We show that the proposed method outperforms state-of-the-art algorithms for age transfer.
arXiv Detail & Related papers (2020-06-06T21:53:44Z) - Inclusive GAN: Improving Data and Minority Coverage in Generative Models [101.67587566218928]
We formalize the problem of minority inclusion as one of data coverage.
We then propose to improve data coverage by harmonizing adversarial training with reconstructive generation.
We develop an extension that allows explicit control over the minority subgroups that the model should ensure to include.
arXiv Detail & Related papers (2020-04-07T13:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.