Large-scale investigation of weakly-supervised deep learning for the
fine-grained semantic indexing of biomedical literature
- URL: http://arxiv.org/abs/2301.09350v2
- Date: Thu, 5 Oct 2023 14:17:39 GMT
- Title: Large-scale investigation of weakly-supervised deep learning for the
fine-grained semantic indexing of biomedical literature
- Authors: Anastasios Nentidis, Thomas Chatzopoulos, Anastasia Krithara,
Grigorios Tsoumakas, Georgios Paliouras
- Abstract summary: This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts.
The new method is evaluated on a large-scale retrospective scenario, based on concepts promoted to descriptors.
- Score: 7.171698704686836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Objective: Semantic indexing of biomedical literature is usually done at the
level of MeSH descriptors with several related but distinct biomedical concepts
often grouped together and treated as a single topic. This study proposes a new
method for the automated refinement of subject annotations at the level of MeSH
concepts. Methods: Lacking labelled data, we rely on weak supervision based on
concept occurrence in the abstract of an article, which is also enhanced by
dictionary-based heuristics. In addition, we investigate deep learning
approaches, making design choices to tackle the particular challenges of this
task. The new method is evaluated on a large-scale retrospective scenario,
based on concepts that have been promoted to descriptors. Results: In our
experiments concept occurrence was the strongest heuristic achieving a macro-F1
score of about 0.63 across several labels. The proposed method improved it
further by more than 4pp. Conclusion: The results suggest that concept
occurrence is a strong heuristic for refining the coarse-grained labels at the
level of MeSH concepts and the proposed method improves it further.
Related papers
- NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z) - Entangled Relations: Leveraging NLI and Meta-analysis to Enhance Biomedical Relation Extraction [35.320291731292286]
We introduce MetaEntail-RE, a novel adaptation method that harnesses NLI principles to enhance relation extraction.
Our approach follows past works by verbalizing relation classes into class-indicative hypotheses.
Our experimental results underscore the versatility of MetaEntail-RE, demonstrating performance gains across both biomedical and general domains.
arXiv Detail & Related papers (2024-05-31T23:05:04Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data.
The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - A reproducible experimental survey on biomedical sentence similarity: a
string-based method sets the state of the art [0.0]
This report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity.
Our aim is to elucidate the state of the art of the problem and to solve some problems preventing the evaluation of most of current methods.
Our experiments confirm that the pre-processing stages, and the choice of the NER tool, have a significant impact on the performance of the sentence similarity methods.
arXiv Detail & Related papers (2022-05-18T06:20:42Z) - Self-Supervised Detection of Contextual Synonyms in a Multi-Class
Setting: Phenotype Annotation Use Case [11.912581294872767]
Contextualised word embeddings is a powerful tool to detect contextual synonyms.
We propose a self-supervised pre-training approach which is able to detect contextual synonyms of concepts being training on the data created by shallow matching.
arXiv Detail & Related papers (2021-09-04T21:35:01Z) - Exemplar Auditing for Multi-Label Biomedical Text Classification [0.4873362301533824]
We generalize a recently proposed zero-shot sequence labeling method, "supervised labeling via a convolutional decomposition"
The approach yields classification with "introspection", relating the fine-grained features of an inference-time prediction to their nearest neighbors.
Our proposed approach yields both a competitively effective classification model and an interrogation mechanism to aid healthcare workers in understanding the salient features that drive the model's predictions.
arXiv Detail & Related papers (2020-04-07T02:54:20Z) - Panoptic Feature Fusion Net: A Novel Instance Segmentation Paradigm for
Biomedical and Biological Images [91.41909587856104]
We present a Panoptic Feature Fusion Net (PFFNet) that unifies the semantic and instance features in this work.
Our proposed PFFNet contains a residual attention feature fusion mechanism to incorporate the instance prediction with the semantic features.
It outperforms several state-of-the-art methods on various biomedical and biological datasets.
arXiv Detail & Related papers (2020-02-15T09:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.