Systematic Investigation of Strategies Tailored for Low-Resource
Settings for Sanskrit Dependency Parsing
- URL: http://arxiv.org/abs/2201.11374v1
- Date: Thu, 27 Jan 2022 08:24:53 GMT
- Title: Systematic Investigation of Strategies Tailored for Low-Resource
Settings for Sanskrit Dependency Parsing
- Authors: Jivnesh Sandhan, Laxmidhar Behera and Pawan Goyal
- Abstract summary: Existing state of the art approaches for Sanskrit Dependency Parsing (SDP) are hybrid in nature.
purely data-driven approaches do not match the performance of hybrid approaches due to labelled data sparsity.
We experiment with five strategies, namely, data augmentation, sequential transfer learning, cross-lingual/mono-lingual pretraining, multi-task learning and self-training.
Our proposed ensembled system outperforms the purely data-driven state of the art system by 2.8/3.9 points (Unlabelled Attachment Score (UAS)/Labelled Attachment Score (LAS)) absolute gain
- Score: 14.416855042499945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing state of the art approaches for Sanskrit Dependency Parsing (SDP),
are hybrid in nature, and rely on a lexicon-driven shallow parser for
linguistically motivated feature engineering. However, these methods fail to
handle out of vocabulary (OOV) words, which limits their applicability in
realistic scenarios. On the other hand, purely data-driven approaches do not
match the performance of hybrid approaches due to the labelled data sparsity.
Thus, in this work, we investigate the following question: How far can we push
a purely data-driven approach using recently proposed strategies for
low-resource settings? We experiment with five strategies, namely, data
augmentation, sequential transfer learning, cross-lingual/mono-lingual
pretraining, multi-task learning and self-training. Our proposed ensembled
system outperforms the purely data-driven state of the art system by 2.8/3.9
points (Unlabelled Attachment Score (UAS)/Labelled Attachment Score (LAS))
absolute gain. Interestingly, it also supersedes the performance of the state
of the art hybrid system by 1.2 points (UAS) absolute gain and shows comparable
performance in terms of LAS. Code and data will be publicly available at:
\url{https://github.com/Jivnesh/SanDP}.
Related papers
- Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing [6.074150063191985]
Cross-Lingual Back-Parsing is a novel data augmentation methodology designed to enhance cross-lingual transfer for semantic parsing.
Our methodology effectively performs cross-lingual data augmentation in challenging zero-resource settings.
arXiv Detail & Related papers (2024-10-01T08:53:38Z) - Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Vocabulary-Defined Semantics: Latent Space Clustering for Improving In-Context Learning [32.178931149612644]
In-context learning enables language models to adapt to downstream data or incorporate tasks by few samples as demonstrations within the prompts.
However, the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations.
We propose a novel approach "vocabulary-defined semantics"
arXiv Detail & Related papers (2024-01-29T14:29:48Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Selective In-Context Data Augmentation for Intent Detection using
Pointwise V-Information [100.03188187735624]
We introduce a novel approach based on PLMs and pointwise V-information (PVI), a metric that can measure the usefulness of a datapoint for training a model.
Our method first fine-tunes a PLM on a small seed of training data and then synthesizes new datapoints - utterances that correspond to given intents.
Our method is thus able to leverage the expressive power of large language models to produce diverse training data.
arXiv Detail & Related papers (2023-02-10T07:37:49Z) - SegAugment: Maximizing the Utility of Speech Translation Data with
Segmentation-based Augmentations [2.535399238341164]
End-to-end Speech Translation is hindered by a lack of available data resources.
We propose a new data augmentation strategy, SegAugment, to address this issue.
We show that the proposed method can also successfully augment sentence-level datasets.
arXiv Detail & Related papers (2022-12-19T18:29:31Z) - A Multi-level Supervised Contrastive Learning Framework for Low-Resource
Natural Language Inference [54.678516076366506]
Natural Language Inference (NLI) is a growingly essential task in natural language understanding.
Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference.
arXiv Detail & Related papers (2022-05-31T05:54:18Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.