Document Set Expansion with Positive-Unlabeled Learning: A Density
Estimation-based Approach
- URL: http://arxiv.org/abs/2401.11145v1
- Date: Sat, 20 Jan 2024 06:52:14 GMT
- Title: Document Set Expansion with Positive-Unlabeled Learning: A Density
Estimation-based Approach
- Authors: Haiyang Zhang, Qiuyi Chen, Yuanjie Zou, Yushan Pan, Jia Wang, Mark
Stevenson
- Abstract summary: Document set expansion aims to identify relevant documents from a large collection based on a small set of documents that are on a fine-grained topic.
Previous work shows that PU learning is a promising method for this task.
We propose a novel PU learning framework based on density estimation, called puDE, that can handle the above issues.
- Score: 18.923476312831394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document set expansion aims to identify relevant documents from a large
collection based on a small set of documents that are on a fine-grained topic.
Previous work shows that PU learning is a promising method for this task.
However, some serious issues remain unresolved, i.e. typical challenges that PU
methods suffer such as unknown class prior and imbalanced data, and the need
for transductive experimental settings. In this paper, we propose a novel PU
learning framework based on density estimation, called puDE, that can handle
the above issues. The advantage of puDE is that it neither constrained to the
SCAR assumption and nor require any class prior knowledge. We demonstrate the
effectiveness of the proposed method using a series of real-world datasets and
conclude that our method is a better alternative for the DSE task.
Related papers
- Uncertainty-guided Open-Set Source-Free Unsupervised Domain Adaptation with Target-private Class Segregation [22.474866164542302]
UDA approaches commonly assume that source and target domains share the same labels space.
This paper considers the more challenging Source-Free Open-set Domain Adaptation (SF-OSDA) setting.
We propose a novel approach for SF-OSDA that exploits the granularity of target-private categories by segregating their samples into multiple unknown classes.
arXiv Detail & Related papers (2024-04-16T13:52:00Z) - ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous
Driving [96.92499034935466]
End-to-end differentiable learning for autonomous driving has recently become a prominent paradigm.
One main bottleneck lies in its voracious appetite for high-quality labeled data.
We propose a planning-oriented active learning method which progressively annotates part of collected raw data.
arXiv Detail & Related papers (2024-03-05T11:39:07Z) - Contrastive Approach to Prior Free Positive Unlabeled Learning [15.269090018352875]
We propose a novel PU learning framework, that starts by learning a feature space through pretext-invariant representation learning.
Our proposed approach handily outperforms state-of-the-art PU learning methods across several standard PU benchmark datasets.
arXiv Detail & Related papers (2024-02-08T20:20:54Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery [76.63807209414789]
We challenge the status quo in class-iNCD and propose a learning paradigm where class discovery occurs continuously and truly unsupervisedly.
We propose simple baselines, composed of a frozen PTM backbone and a learnable linear classifier, that are not only simple to implement but also resilient under longer learning scenarios.
arXiv Detail & Related papers (2023-03-28T13:47:16Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - OTExtSum: Extractive Text Summarisation with Optimal Transport [45.78604902572955]
We propose a novel non-learning-based method by for the first time formulating text summarisation as an Optimal Transport (OT) problem.
Our proposed method outperforms the state-of-the-art non-learning-based methods and several recent learning-based methods in terms of the ROUGE metric.
arXiv Detail & Related papers (2022-04-21T13:25:34Z) - Variational Learning for Unsupervised Knowledge Grounded Dialogs [6.761874595503588]
Recent methods for knowledge grounded dialogs generate responses by incorporating information from an external textual document.
We develop a variational approach to the above technique wherein, we instead maximize the Evidence Lower bound (ELBO)
To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems.
arXiv Detail & Related papers (2021-11-23T13:41:03Z) - Boosting Weakly Supervised Object Detection via Learning Bounding Box
Adjusters [76.36104006511684]
Weakly-supervised object detection (WSOD) has emerged as an inspiring recent topic to avoid expensive instance-level object annotations.
We defend the problem setting for improving localization performance by leveraging the bounding box regression knowledge from a well-annotated auxiliary dataset.
Our method performs favorably against state-of-the-art WSOD methods and knowledge transfer model with similar problem setting.
arXiv Detail & Related papers (2021-08-03T13:38:20Z) - Positive-Unlabeled Classification under Class-Prior Shift: A
Prior-invariant Approach Based on Density Ratio Estimation [85.75352990739154]
We propose a novel PU classification method based on density ratio estimation.
A notable advantage of our proposed method is that it does not require the class-priors in the training phase.
arXiv Detail & Related papers (2021-07-11T13:36:53Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.