Related papers: FIG: Forward-Inverse Generation for Low-Resource Domain-specific Event Detection

FIG: Forward-Inverse Generation for Low-Resource Domain-specific Event Detection

URL: http://arxiv.org/abs/2502.17394v1
Date: Mon, 24 Feb 2025 18:20:42 GMT
Title: FIG: Forward-Inverse Generation for Low-Resource Domain-specific Event Detection
Authors: Tanmay Parekh, Yuxuan Dong, Lucas Bandarkar, Artin Kim, I-Hung Hsu, Kai-Wei Chang, Nanyun Peng,
Abstract summary: Event Detection (ED) is the task of identifying typed event mentions of interest from natural language text.<n>We introduce FIG, a hybrid approach that leverages inverse generation for high-quality synthesis data.<n>Experimentation on three ED datasets from diverse domains reveals that FIG outperforms the best baseline.
Score: 84.82139313614255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Event Detection (ED) is the task of identifying typed event mentions of interest from natural language text, which benefits domain-specific reasoning in biomedical, legal, and epidemiological domains. However, procuring supervised data for thousands of events for various domains is a laborious and expensive task. To this end, existing works have explored synthetic data generation via forward (generating labels for unlabeled sentences) and inverse (generating sentences from generated labels) generations. However, forward generation often produces noisy labels, while inverse generation struggles with domain drift and incomplete event annotations. To address these challenges, we introduce FIG, a hybrid approach that leverages inverse generation for high-quality data synthesis while anchoring it to domain-specific cues extracted via forward generation on unlabeled target data. FIG further enhances its synthetic data by adding missing annotations through forward generation-based refinement. Experimentation on three ED datasets from diverse domains reveals that FIG outperforms the best baseline achieving average gains of 3.3% F1 and 5.4% F1 in the zero-shot and few-shot settings respectively. Analyzing the generated trigger hit rate and human evaluation substantiates FIG's superior domain alignment and data quality compared to existing baselines.

Related papers

TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text [11.417612899344697]
Accurately identifying adversarial techniques in security texts is critical for effective cyber defense.<n>Existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines.<n>We propose TechniqueRAG, a domain-specific retrieval-augmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs.
arXiv Detail & Related papers (2025-05-17T12:46:10Z)
Concept-Aware LoRA for Domain-Aligned Segmentation Dataset Generation [66.66243874361103]
dataset generation faces two key challenges: 1) aligning generated samples with the target domain and 2) producing informative samples beyond the training data. We propose Concept-Aware LoRA, a novel fine-tuning approach that selectively identifies and updates only the weights associated with necessary concepts for domain alignment. We demonstrate its effectiveness in generating datasets for urban-scene segmentation, outperforming baseline and state-of-the-art methods in in-domain settings.
arXiv Detail & Related papers (2025-03-28T06:23:29Z)
Data-Efficient CLIP-Powered Dual-Branch Networks for Source-Free Unsupervised Domain Adaptation [4.7589762171821715]
Source-free Unsupervised Domain Adaptation (SF-UDA) aims to transfer a model's performance from a labeled source domain to an unlabeled target domain without direct access to source samples. We introduce a data-efficient, CLIP-powered dual-branch network (CDBN) to address the dual challenges of limited source data and privacy concerns. CDBN achieves near state-of-the-art performance with far fewer source domain samples than existing methods across 31 transfer tasks on seven datasets.
arXiv Detail & Related papers (2024-10-21T09:25:49Z)
Gradual Source Domain Expansion for Unsupervised Domain Adaptation [45.207132297204424]
Unsupervised domain adaptation (UDA) tries to overcome the need for a large labeled dataset by transferring knowledge from a source dataset to a target dataset. We propose a gradual source domain expansion (GSDE) algorithm to overcome this problem. GSDE trains the UDA task several times from scratch, each time reinitializing the network weights, but each time expands the source dataset with target data.
arXiv Detail & Related papers (2023-11-16T06:18:35Z)
Multi-scale Feature Alignment for Continual Learning of Unlabeled Domains [3.9498537297431167]
generative feature-driven image replay in conjunction with a dual-purpose discriminator enables the generation of images with realistic features for replay. We present detailed ablation experiments studying our proposed method components and demonstrate a possible use-case of our continual UDA method for an unsupervised patch-based segmentation task.
arXiv Detail & Related papers (2023-02-02T18:19:01Z)
Cyclically Disentangled Feature Translation for Face Anti-spoofing [61.70377630461084]
We propose a novel domain adaptation method called cyclically disentangled feature translation network (CDFTN) CDFTN generates pseudo-labeled samples that possess: 1) source domain-invariant liveness features and 2) target domain-specific content features, which are disentangled through domain adversarial training. A robust classifier is trained based on the synthetic pseudo-labeled images under the supervision of source domain labels.
arXiv Detail & Related papers (2022-12-07T14:12:34Z)
Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives [16.68091981866261]
Unsupervised domain adaptation (UDA) is proposed to counter the performance drop on data in a target domain. UDA has yielded promising results on natural image processing, video analysis, natural language processing, time-series data analysis, medical image analysis, etc.
arXiv Detail & Related papers (2022-08-15T20:05:07Z)
Domain-Agnostic Prior for Transfer Semantic Segmentation [197.9378107222422]
Unsupervised domain adaptation (UDA) is an important topic in the computer vision community. We present a mechanism that regularizes cross-domain representation learning with a domain-agnostic prior (DAP) Our research reveals that UDA benefits much from better proxies, possibly from other data modalities.
arXiv Detail & Related papers (2022-04-06T09:13:25Z)
Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance. Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models. In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency [90.71745178767203]
Deep learning-based 3D object detection has achieved unprecedented success with the advent of large-scale autonomous driving datasets. Existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world. We study a more realistic setting, unsupervised 3D domain adaptive detection, which only utilizes source domain annotations.
arXiv Detail & Related papers (2021-07-23T17:19:23Z)
A Curriculum-style Self-training Approach for Source-Free Semantic Segmentation [91.13472029666312]
We propose a curriculum-style self-training approach for source-free domain adaptive semantic segmentation. Our method yields state-of-the-art performance on source-free semantic segmentation tasks for both synthetic-to-real and adverse conditions.
arXiv Detail & Related papers (2021-06-22T10:21:39Z)
Disentanglement-based Cross-Domain Feature Augmentation for Effective Unsupervised Domain Adaptive Person Re-identification [87.72851934197936]
Unsupervised domain adaptive (UDA) person re-identification (ReID) aims to transfer the knowledge from the labeled source domain to the unlabeled target domain for person matching. One challenge is how to generate target domain samples with reliable labels for training. We propose a Disentanglement-based Cross-Domain Feature Augmentation strategy.
arXiv Detail & Related papers (2021-03-25T15:28:41Z)
Generation for adaption: a Gan-based approach for 3D Domain Adaption inPoint Cloud [10.614067060304919]
Unsupervised domain adaptation (UDA) seeks to overcome such a problem without target domain labels. We propose a method that use a Generative adversarial network to generate synthetic data from the source domain. Experiments show that our approach performs better than other state-of-the-art UDA methods in three popular 3D object/scene datasets.
arXiv Detail & Related papers (2021-02-15T07:24:10Z)
Curriculum CycleGAN for Textual Sentiment Domain Adaptation with Multiple Sources [68.31273535702256]
We propose a novel instance-level MDA framework, named curriculum cycle-consistent generative adversarial network (C-CycleGAN) C-CycleGAN consists of three components: (1) pre-trained text encoder which encodes textual input from different domains into a continuous representation space, (2) intermediate domain generator with curriculum instance-level adaptation which bridges the gap across source and target domains, and (3) task classifier trained on the intermediate domain for final sentiment classification. We conduct extensive experiments on three benchmark datasets and achieve substantial gains over state-of-the-art DA approaches.
arXiv Detail & Related papers (2020-11-17T14:50:55Z)
Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG) It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains. Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
Unsupervised Domain Adaptation for Person Re-Identification through Source-Guided Pseudo-Labeling [2.449909275410288]
Person Re-Identification (re-ID) aims at retrieving images of the same person taken by different cameras. Unsupervised Domain Adaptation (UDA) is an interesting research direction for this challenge as it avoids a costly annotation of the target data. We introduce a framework which relies on a two-branch architecture optimizing classification and triplet loss based metric learning in source and target domains.
arXiv Detail & Related papers (2020-09-20T14:54:42Z)
Inductive Unsupervised Domain Adaptation for Few-Shot Classification via Clustering [16.39667909141402]
Few-shot classification tends to struggle when it needs to adapt to diverse domains. We introduce a framework, DaFeC, to improve Domain adaptation performance for Few-shot classification via Clustering. Our approach outperforms previous work with absolute gains (in classification accuracy) of 4.95%, 9.55%, 3.99% and 11.62%, respectively.
arXiv Detail & Related papers (2020-06-23T08:17:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.