BioimageAIpub: a toolbox for AI-ready bioimaging data publishing
- URL: http://arxiv.org/abs/2512.15820v1
- Date: Wed, 17 Dec 2025 15:12:29 GMT
- Title: BioimageAIpub: a toolbox for AI-ready bioimaging data publishing
- Authors: Stefan Dvoretskii, Anwai Archit, Constantin Pape, Josh Moore, Marco Nolden,
- Abstract summary: BioimageAIpub is a workflow that streamlines bioimaging data conversion.<n>It enables a seamless upload to HuggingFace, a widely used platform for sharing datasets and models.
- Score: 1.149497648076115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern bioimage analysis approaches are data hungry, making it necessary for researchers to scavenge data beyond those collected within their (bio)imaging facilities. In addition to scale, bioimaging datasets must be accompanied with suitable, high-quality annotations and metadata. Although established data repositories such as the Image Data Resource (IDR) and BioImage Archive offer rich metadata, their contents typically cannot be directly consumed by image analysis tools without substantial data wrangling. Such a tedious assembly and conversion of (meta)data can account for a dedicated amount of time investment for researchers, hindering the development of more powerful analysis tools. Here, we introduce BioimageAIpub, a workflow that streamlines bioimaging data conversion, enabling a seamless upload to HuggingFace, a widely used platform for sharing machine learning datasets and models.
Related papers
- Flexible metadata harvesting for ecology using large language models [3.4117490081172774]
We develop a large language model (LLM)-based metadata harvester.<n>It flexibly extracts metadata from any dataset's landing page.<n>It converts these to a user-defined, unified format using existing metadata standards.
arXiv Detail & Related papers (2025-08-21T10:10:29Z) - DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models [48.347550000332866]
DRAGON is a comprehensive dataset comprising images from 25 diffusion models.<n>The dataset contains a broad variety of images representing diverse subjects.<n>DRAGON is designed to support the forensic community in developing and evaluating detection and attribution techniques for synthetic content.
arXiv Detail & Related papers (2025-05-16T13:50:34Z) - MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - AEye: A Visualization Tool for Image Datasets [18.95453617434051]
AEye is a semantically meaningful visualization tool tailored to image datasets.
AEye embeds images into semantically meaningful high-dimensional representations, facilitating data clustering and organization.
AEye facilitates semantic search functionalities for both text and image queries, enabling users to search for content.
arXiv Detail & Related papers (2024-08-07T20:19:20Z) - An Innovative Tool for Uploading/Scraping Large Image Datasets on Social
Networks [9.27070946719462]
We propose an automated approach by means of a digital tool that we created on purpose.
The tool is capable of automatically uploading an entire image dataset to the desired digital platform and then downloading all the uploaded pictures.
arXiv Detail & Related papers (2023-11-01T23:27:37Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Data privacy protection in microscopic image analysis for material data
mining [8.266759895003279]
In this study, a material microstructure image feature extraction algorithm FedTransfer based on data privacy protection is proposed.
The core contributions are as follows: 1) the federated learning algorithm is introduced into the polycrystalline microstructure image segmentation task to make full use of different user data to carry out machine learning, break the data island and improve the model generalization ability under the condition of ensuring the privacy and security of user data.
By sharing style information of images that is not urgent for user confidentiality, it can reduce the performance penalty caused by the distribution difference of data among different users.
arXiv Detail & Related papers (2021-11-09T11:16:33Z) - Development of Semantic Web-based Imaging Database for Biological
Morphome [0.0]
We introduce the RIKEN Microstructural Imaging MetaDatabase.
It is a semantic web-based imaging database in which image metadata are described.
We discuss advanced utilisation of morphological imaging data that can be promoted by this database.
arXiv Detail & Related papers (2021-10-20T15:59:35Z) - Data Augmentation for Meta-Learning [58.47185740820304]
meta-learning algorithms sample data, query data, and tasks on each training step.
Data augmentation can be used not only to expand the number of images available per class, but also to generate entirely new classes/tasks.
Our proposed meta-specific data augmentation significantly improves the performance of meta-learners on few-shot classification benchmarks.
arXiv Detail & Related papers (2020-10-14T13:48:22Z) - From ImageNet to Image Classification: Contextualizing Progress on
Benchmarks [99.19183528305598]
We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset.
Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.
arXiv Detail & Related papers (2020-05-22T17:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.