BioFors: A Large Biomedical Image Forensics Dataset
- URL: http://arxiv.org/abs/2108.12961v1
- Date: Mon, 30 Aug 2021 02:39:13 GMT
- Title: BioFors: A Large Biomedical Image Forensics Dataset
- Authors: Ekraam Sabir, Soumyaroop Nandi, Wael AbdAlmageed, Prem Natarajan
- Abstract summary: We present BioFors -- the first dataset for benchmarking common biomedical image manipulations.
BioFors comprises 47,805 images extracted from 1,031 open-source research papers.
We benchmark BioFors on all tasks with suitable state-of-the-art algorithms.
- Score: 22.32517325828983
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Research in media forensics has gained traction to combat the spread of
misinformation. However, most of this research has been directed towards
content generated on social media. Biomedical image forensics is a related
problem, where manipulation or misuse of images reported in biomedical research
documents is of serious concern. The problem has failed to gain momentum beyond
an academic discussion due to an absence of benchmark datasets and standardized
tasks. In this paper we present BioFors -- the first dataset for benchmarking
common biomedical image manipulations. BioFors comprises 47,805 images
extracted from 1,031 open-source research papers. Images in BioFors are divided
into four categories -- Microscopy, Blot/Gel, FACS and Macroscopy. We also
propose three tasks for forensic analysis -- external duplication detection,
internal duplication detection and cut/sharp-transition detection. We benchmark
BioFors on all tasks with suitable state-of-the-art algorithms. Our results and
analysis show that existing algorithms developed on common computer vision
datasets are not robust when applied to biomedical images, validating that more
research is required to address the unique challenges of biomedical image
forensics.
Related papers
- MultiOrg: A Multi-rater Organoid-detection Dataset [1.29058164565662]
This dataset comprises over 400 high-resolution 2d microscopy images and curated annotations of more than 60,000 organoids.
We additionally provide a benchmark for organoid detection, and make the best model available through an easily installable, interactive plugin for the popular image visualization tool Napari.
arXiv Detail & Related papers (2024-10-18T17:05:03Z) - BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once [58.41069132627823]
holistic image analysis comprises subtasks such as segmentation, detection, and recognition of relevant objects.
Here, we propose BiomedParse, a biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition for 82 object types across 9 imaging modalities.
Through joint learning, we can improve accuracy for individual tasks and enable novel applications such as segmenting all relevant objects in a noisy image through a text prompt.
arXiv Detail & Related papers (2024-05-21T17:54:06Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - BiomedCLIP: a multimodal biomedical foundation model pretrained from
fifteen million scientific image-text pairs [48.376109878173956]
We present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets.
PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles.
Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing.
arXiv Detail & Related papers (2023-03-02T02:20:04Z) - MONet: Multi-scale Overlap Network for Duplication Detection in
Biomedical Images [20.533739598331646]
We propose a multi-scale overlap detection model to detect duplicated image regions.
It achieves state-of-the-art performance overall and on multiple biomedical image categories.
arXiv Detail & Related papers (2022-07-19T07:25:43Z) - Anomaly Detection in Medical Imaging -- A Mini Review [0.8122270502556374]
This paper uses a semi-exhaustive literature review of relevant anomaly detection papers in medical imaging to cluster into applications.
The main results showed that the current research is mostly motivated by reducing the need for labelled data.
Also, the successful and substantial amount of research in the brain MRI domain shows the potential for applications in further domains like OCT and chest X-ray.
arXiv Detail & Related papers (2021-08-25T11:45:40Z) - Domain-Specific Pretraining for Vertical Search: Case Study on
Biomedical Literature [67.4680600632232]
Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck.
We propose a general approach for vertical search based on domain-specific pretraining.
Our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search.
arXiv Detail & Related papers (2021-06-25T01:02:55Z) - A comparative study of semi- and self-supervised semantic segmentation
of biomedical microscopy data [0.13701366534590495]
Convolutional Neural Networks (CNNs) have become the state-of-the-art method for biomedical image analysis.
These networks are usually trained in a supervised manner, requiring large amounts of labelled training data.
In this work, we validate alternative ways to train CNNs with fewer labels for biomedical image segmentation using.
arXiv Detail & Related papers (2020-11-11T20:57:10Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.