Reduction of Supervision for Biomedical Knowledge Discovery
- URL: http://arxiv.org/abs/2504.09582v1
- Date: Sun, 13 Apr 2025 14:05:40 GMT
- Title: Reduction of Supervision for Biomedical Knowledge Discovery
- Authors: Christos Theodoropoulos, Andrei Catalin Coman, James Henderson, Marie-Francine Moens,
- Abstract summary: It is essential to employ automated methods for knowledge extraction and processing.<n>Finding the right balance between the level of supervision and the effectiveness of models poses a significant challenge.<n>Our study addresses the challenge of identifying semantic relationships between biomedical entities in unstructured text.
- Score: 28.68816381566995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge discovery is hindered by the increasing volume of publications and the scarcity of extensive annotated data. To tackle the challenge of information overload, it is essential to employ automated methods for knowledge extraction and processing. Finding the right balance between the level of supervision and the effectiveness of models poses a significant challenge. While supervised techniques generally result in better performance, they have the major drawback of demanding labeled data. This requirement is labor-intensive and time-consuming and hinders scalability when exploring new domains. In this context, our study addresses the challenge of identifying semantic relationships between biomedical entities (e.g., diseases, proteins) in unstructured text while minimizing dependency on supervision. We introduce a suite of unsupervised algorithms based on dependency trees and attention mechanisms and employ a range of pointwise binary classification methods. Transitioning from weakly supervised to fully unsupervised settings, we assess the methods' ability to learn from data with noisy labels. The evaluation on biomedical benchmark datasets explores the effectiveness of the methods. Our approach tackles a central issue in knowledge discovery: balancing performance with minimal supervision. By gradually decreasing supervision, we assess the robustness of pointwise binary classification techniques in handling noisy labels, revealing their capability to shift from weakly supervised to entirely unsupervised scenarios. Comprehensive benchmarking offers insights into the effectiveness of these techniques, suggesting an encouraging direction toward adaptable knowledge discovery systems, representing progress in creating data-efficient methodologies for extracting useful insights when annotated data is limited.
Related papers
- Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - Shifting to Machine Supervision: Annotation-Efficient Semi and Self-Supervised Learning for Automatic Medical Image Segmentation and Classification [9.67209046726903]
We introduce the S4MI pipeline, a novel approach that leverages advancements in self-supervised and semi-supervised learning.
Our study benchmarks these techniques on three distinct medical imaging datasets to evaluate their effectiveness in classification and segmentation tasks.
Remarkably, the semi-supervised approach demonstrated superior outcomes in segmentation, outperforming fully-supervised methods while using 50% fewer labels across all datasets.
arXiv Detail & Related papers (2023-11-17T04:04:29Z) - InstructBio: A Large-scale Semi-supervised Learning Paradigm for
Biochemical Problems [38.57333125315448]
InstructMol is a semi-supervised learning algorithm to take better advantage of unlabeled examples.
InstructBio substantially improves the generalization ability of molecular models.
arXiv Detail & Related papers (2023-04-08T04:19:22Z) - Label Propagation with Weak Supervision [47.52032178837098]
We introduce a novel analysis of the classical label propagation algorithm (LPA) (Zhu & Ghahramani, 2002)
We provide an error bound that exploits both the local geometric properties of the underlying graph and the quality of the prior information.
We demonstrate the ability of our approach on multiple benchmark weakly supervised classification tasks, showing improvements upon existing semi-supervised and weakly supervised methods.
arXiv Detail & Related papers (2022-10-07T14:53:02Z) - Unsupervised deep learning techniques for powdery mildew recognition
based on multispectral imaging [63.62764375279861]
This paper presents a deep learning approach to automatically recognize powdery mildew on cucumber leaves.
We focus on unsupervised deep learning techniques applied to multispectral imaging data.
We propose the use of autoencoder architectures to investigate two strategies for disease detection.
arXiv Detail & Related papers (2021-12-20T13:29:13Z) - Uncertainty-Aware Deep Co-training for Semi-supervised Medical Image
Segmentation [4.935055133266873]
We propose a novel uncertainty-aware scheme to make models learn regions purposefully.
Specifically, we employ Monte Carlo Sampling as an estimation method to attain an uncertainty map.
In the backward process, we joint unsupervised and supervised losses to accelerate the convergence of the network.
arXiv Detail & Related papers (2021-11-23T03:26:24Z) - WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD)
An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images.
The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z) - Self-supervised driven consistency training for annotation efficient
histopathology image analysis [13.005873872821066]
Training a neural network with a large labeled dataset is still a dominant paradigm in computational histopathology.
We propose a self-supervised pretext task that harnesses the underlying multi-resolution contextual cues in histology whole-slide images to learn a powerful supervisory signal for unsupervised representation learning.
We also propose a new teacher-student semi-supervised consistency paradigm that learns to effectively transfer the pretrained representations to downstream tasks based on prediction consistency with the task-specific un-labeled data.
arXiv Detail & Related papers (2021-02-07T19:46:21Z) - Disambiguation of weak supervision with exponential convergence rates [88.99819200562784]
In supervised learning, data are annotated with incomplete yet discriminative information.
In this paper, we focus on partial labelling, an instance of weak supervision where, from a given input, we are given a set of potential targets.
We propose an empirical disambiguation algorithm to recover full supervision from weak supervision.
arXiv Detail & Related papers (2021-02-04T18:14:32Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Semi-supervised and Unsupervised Methods for Heart Sounds Classification
in Restricted Data Environments [4.712158833534046]
This study uses various supervised, semi-supervised and unsupervised approaches on the PhysioNet/CinC 2016 Challenge dataset.
A GAN based semi-supervised method is proposed, which allows the usage of unlabelled data samples to boost the learning of data distribution.
In particular, the unsupervised feature extraction using 1D CNN Autoencoder coupled with one-class SVM obtains good performance without any data labelling.
arXiv Detail & Related papers (2020-06-04T02:07:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.