Unsupervised Cross-Lingual Transfer of Structured Predictors without
Source Data
- URL: http://arxiv.org/abs/2110.03866v1
- Date: Fri, 8 Oct 2021 02:46:34 GMT
- Title: Unsupervised Cross-Lingual Transfer of Structured Predictors without
Source Data
- Authors: Kemal Kurniawan, Lea Frermann, Philip Schulz and Trevor Cohn
- Abstract summary: We show that the means of aggregating over the input models is critical, and that multiplying marginal probabilities of substructures to obtain high-probability structures for distant supervision is substantially better than taking the union over the input models.
Testing on 18 languages, we demonstrate that the method works in a cross-lingual setting, considering both dependency parsing and part-of-speech structured prediction problems.
Our analyses show that the proposed method produces less noisy labels for the distant supervision.
- Score: 37.1075911292287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Providing technologies to communities or domains where training data is
scarce or protected e.g., for privacy reasons, is becoming increasingly
important. To that end, we generalise methods for unsupervised transfer from
multiple input models for structured prediction. We show that the means of
aggregating over the input models is critical, and that multiplying marginal
probabilities of substructures to obtain high-probability structures for
distant supervision is substantially better than taking the union of such
structures over the input models, as done in prior work. Testing on 18
languages, we demonstrate that the method works in a cross-lingual setting,
considering both dependency parsing and part-of-speech structured prediction
problems. Our analyses show that the proposed method produces less noisy labels
for the distant supervision.
Related papers
- Causal Unsupervised Semantic Segmentation [60.178274138753174]
Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations.
We propose a novel framework, CAusal Unsupervised Semantic sEgmentation (CAUSE), which leverages insights from causal inference.
arXiv Detail & Related papers (2023-10-11T10:54:44Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained
models [62.23255433487586]
We propose an unsupervised fine-tuning framework to fine-tune the model or prompt on the unlabeled target data.
We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data.
arXiv Detail & Related papers (2023-04-29T22:05:22Z) - An Operational Perspective to Fairness Interventions: Where and How to
Intervene [9.833760837977222]
We present a holistic framework for evaluating and contextualizing fairness interventions.
We demonstrate our framework with a case study on predictive parity.
We find predictive parity is difficult to achieve without using group data.
arXiv Detail & Related papers (2023-02-03T07:04:33Z) - Query-Adaptive Predictive Inference with Partial Labels [0.0]
We propose a new methodology to construct predictive sets using only partially labeled data on top of black-box predictive models.
Our experiments highlight the validity of our predictive set construction as well as the attractiveness of a more flexible user-dependent loss framework.
arXiv Detail & Related papers (2022-06-15T01:48:42Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Learning Output Embeddings in Structured Prediction [73.99064151691597]
A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension.
A prediction in the original space is computed by solving a pre-image problem.
In this work, we propose to jointly learn a finite approximation of the output embedding and the regression function into the new feature space.
arXiv Detail & Related papers (2020-07-29T09:32:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.