A Data Fusion Framework for Multi-Domain Morality Learning
- URL: http://arxiv.org/abs/2304.02144v1
- Date: Tue, 4 Apr 2023 22:05:02 GMT
- Title: A Data Fusion Framework for Multi-Domain Morality Learning
- Authors: Siyi Guo, Negar Mokhberian, Kristina Lerman
- Abstract summary: We describe a data fusion framework for training on multiple heterogeneous datasets.
The proposed framework achieves state-of-the-art performance in different datasets compared to prior works in morality inference.
- Score: 3.0671872389903547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models can be trained to recognize the moral sentiment of text,
creating new opportunities to study the role of morality in human life. As
interest in language and morality has grown, several ground truth datasets with
moral annotations have been released. However, these datasets vary in the
method of data collection, domain, topics, instructions for annotators, etc.
Simply aggregating such heterogeneous datasets during training can yield models
that fail to generalize well. We describe a data fusion framework for training
on multiple heterogeneous datasets that improve performance and
generalizability. The model uses domain adversarial training to align the
datasets in feature space and a weighted loss function to deal with label
shift. We show that the proposed framework achieves state-of-the-art
performance in different datasets compared to prior works in morality
inference.
Related papers
- Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs [48.406728896785296]
We propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks.
Unlike existing methods, our approach facilitates seamless training without the need for additional manual reannotation or taxonomy reconciliation.
arXiv Detail & Related papers (2024-07-15T08:42:10Z) - Diversify Your Vision Datasets with Automatic Diffusion-Based
Augmentation [66.6546668043249]
ALIA (Automated Language-guided Image Augmentation) is a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains.
To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information.
We show that ALIA is able to surpasses traditional data augmentation and text-to-image generated data on fine-grained classification tasks.
arXiv Detail & Related papers (2023-05-25T17:43:05Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Cross-Domain Generalization and Knowledge Transfer in Transformers
Trained on Legal Data [0.0]
We analyze the ability of pre-trained language models to transfer knowledge among datasets annotated with different type systems.
Prediction of the rhetorical role a sentence plays in a case decision is an important and often studied task in AI & Law.
arXiv Detail & Related papers (2021-12-15T04:23:14Z) - Representation Matters: Assessing the Importance of Subgroup Allocations
in Training Data [85.43008636875345]
We show that diverse representation in training data is key to increasing subgroup performances and achieving population level objectives.
Our analysis and experiments describe how dataset compositions influence performance and provide constructive results for using trends in existing data, alongside domain knowledge, to help guide intentional, objective-aware dataset design.
arXiv Detail & Related papers (2021-03-05T00:27:08Z) - Improving Commonsense Causal Reasoning by Adversarial Training and Data
Augmentation [14.92157586545743]
This paper presents a number of techniques for making models more robust in the domain of causal reasoning.
We show a statistically significant improvement on performance and on both datasets, even with only a small number of additionally generated data points.
arXiv Detail & Related papers (2021-01-13T09:55:29Z) - A Note on Data Biases in Generative Models [16.86600007830682]
We investigate the impact of dataset quality on the performance of generative models.
We show how societal biases of datasets are replicated by generative models.
We present creative applications through unpaired transfer between diverse datasets such as photographs, oil portraits, and animes.
arXiv Detail & Related papers (2020-12-04T10:46:37Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.