BI-LAVA: Biocuration with Hierarchical Image Labeling through Active
Learning and Visual Analysis
- URL: http://arxiv.org/abs/2308.08003v1
- Date: Tue, 15 Aug 2023 19:36:19 GMT
- Title: BI-LAVA: Biocuration with Hierarchical Image Labeling through Active
Learning and Visual Analysis
- Authors: Juan Trelles and Andrew Wentzel and William Berrios and G. Elisabeta
Marai
- Abstract summary: BI-LAVA is a system for organizing scientific images in hierarchical structures.
It uses a small set of image labels, a hierarchical set of image classifiers, and active learning to help model builders deal with incomplete ground-truth labels.
An evaluation shows that our mixed human-machine approach successfully supports domain experts in understanding the characteristics of classes within the taxonomy.
- Score: 2.859324824091085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the biomedical domain, taxonomies organize the acquisition modalities of
scientific images in hierarchical structures. Such taxonomies leverage large
sets of correct image labels and provide essential information about the
importance of a scientific publication, which could then be used in biocuration
tasks. However, the hierarchical nature of the labels, the overhead of
processing images, the absence or incompleteness of labeled data, and the
expertise required to label this type of data impede the creation of useful
datasets for biocuration. From a multi-year collaboration with biocurators and
text-mining researchers, we derive an iterative visual analytics and active
learning strategy to address these challenges. We implement this strategy in a
system called BI-LAVA Biocuration with Hierarchical Image Labeling through
Active Learning and Visual Analysis. BI-LAVA leverages a small set of image
labels, a hierarchical set of image classifiers, and active learning to help
model builders deal with incomplete ground-truth labels, target a hierarchical
taxonomy of image modalities, and classify a large pool of unlabeled images.
BI-LAVA's front end uses custom encodings to represent data distributions,
taxonomies, image projections, and neighborhoods of image thumbnails, which
help model builders explore an unfamiliar image dataset and taxonomy and
correct and generate labels. An evaluation with machine learning practitioners
shows that our mixed human-machine approach successfully supports domain
experts in understanding the characteristics of classes within the taxonomy, as
well as validating and improving data quality in labeled and unlabeled
collections.
Related papers
- A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect
Dataset [18.211840156134784]
This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment.
The dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community.
arXiv Detail & Related papers (2023-07-19T20:54:08Z) - Graph Attention Transformer Network for Multi-Label Image Classification [50.0297353509294]
We propose a general framework for multi-label image classification that can effectively mine complex inter-label relationships.
Our proposed methods can achieve state-of-the-art performance on three datasets.
arXiv Detail & Related papers (2022-03-08T12:39:05Z) - Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph [96.95815946327079]
It is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities.
We propose a novel approach that constructs a multi-modal knowledge graph to associate the visual objects with named entities.
arXiv Detail & Related papers (2021-07-26T05:50:41Z) - Self-Ensembling Contrastive Learning for Semi-Supervised Medical Image
Segmentation [6.889911520730388]
We aim to boost the performance of semi-supervised learning for medical image segmentation with limited labels.
We learn latent representations directly at feature-level by imposing contrastive loss on unlabeled images.
We conduct experiments on an MRI and a CT segmentation dataset and demonstrate that the proposed method achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-05-27T03:27:58Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer [140.72439827136085]
We propose a graph reasoning and transfer learning framework named "Graphonomy"
It incorporates human knowledge and label taxonomy into the intermediate graph representation learning beyond local convolutions.
It learns the global and structured semantic coherency in multiple domains via semantic-aware graph reasoning and transfer.
arXiv Detail & Related papers (2021-01-26T08:19:03Z) - Knowledge-Guided Multi-Label Few-Shot Learning for General Image
Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks.
It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence.
Then, it introduces the label semantics to guide learning semantic-specific features.
It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z) - Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier.
We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z) - Learning Representations For Images With Hierarchical Labels [1.3579420996461438]
We present a set of methods to leverage information about the semantic hierarchy induced by class labels.
We show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
Although, both the CNN-classifiers injected with hierarchical information, and the embedding-based models outperform a hierarchy-agnostic model on the newly presented, real-world ETH Entomological Collection image dataset.
arXiv Detail & Related papers (2020-04-02T09:56:03Z) - Collaborative Learning of Semi-Supervised Clustering and Classification
for Labeling Uncurated Data [6.871887763122593]
Domain-specific image collections present potential value in various areas of science and business.
To employ contemporary supervised image analysis methods on such image data, they must first be cleaned and organized, and then manually labeled for the nomenclature employed in the specific domain.
We designed and implemented the Plud system to minimize the effort spent by an expert and handles realistic large collections of images.
arXiv Detail & Related papers (2020-03-09T17:03:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.