Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow
- URL: http://arxiv.org/abs/2505.02780v1
- Date: Mon, 05 May 2025 16:46:53 GMT
- Title: Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow
- Authors: Jai Prakash Veerla, Partha Sai Guttikonda, Helen H. Shang, Mohammad Sadegh Nasr, Cesar Torres, Jacob M. Luber,
- Abstract summary: Pathologists rely on gigapixel whole-slide images (WSIs) to diagnose diseases like cancer.<n>Current digital pathology tools hinder diagnosis.<n>PathVis is a mixed-reality visualization platform for Apple Vision Pro.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Pathologists rely on gigapixel whole-slide images (WSIs) to diagnose diseases like cancer, yet current digital pathology tools hinder diagnosis. The immense scale of WSIs, often exceeding 100,000 X 100,000 pixels, clashes with the limited views traditional monitors offer. This mismatch forces constant panning and zooming, increasing pathologist cognitive load, causing diagnostic fatigue, and slowing pathologists' adoption of digital methods. PathVis, our mixed-reality visualization platform for Apple Vision Pro, addresses these challenges. It transforms the pathologist's interaction with data, replacing cumbersome mouse-and-monitor navigation with intuitive exploration using natural hand gestures, eye gaze, and voice commands in an immersive workspace. PathVis integrates AI to enhance diagnosis. An AI-driven search function instantly retrieves and displays the top five similar patient cases side-by-side, improving diagnostic precision and efficiency through rapid comparison. Additionally, a multimodal conversational AI assistant offers real-time image interpretation support and aids collaboration among pathologists across multiple Apple devices. By merging the directness of traditional pathology with advanced mixed-reality visualization and AI, PathVis improves diagnostic workflows, reduces cognitive strain, and makes pathology practice more effective and engaging. The PathVis source code and a demo video are publicly available at: https://github.com/jaiprakash1824/Path_Vis
Related papers
- Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction [66.71402249062777]
We present ScanDiff, a novel architecture that combines diffusion models with Vision Transformers to generate diverse and realistic scanpaths.<n>Our method explicitly models scanpath variability by leveraging the nature of diffusion models, producing a wide range of plausible gaze trajectories.<n>Experiments on benchmark datasets show that ScanDiff surpasses state-of-the-art methods in both free-viewing and task-driven scenarios.
arXiv Detail & Related papers (2025-07-30T18:36:09Z) - Evidence-based diagnostic reasoning with multi-agent copilot for human pathology [7.976907866539546]
Current multimodal large language models (MLLMs) in computational pathology face limitations.<n>We introduce PathChat+, a new MLLM specifically designed for human pathology, trained on over 1 million diverse, pathology-specific instruction samples.<n>We also present SlideSeek, a reasoning-enabled multi-agent AI system leveraging PathChat+ to autonomously evaluate gigapixel whole-slide images.
arXiv Detail & Related papers (2025-06-26T03:02:16Z) - Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models [51.900488744931785]
We introduce the Visual Graph Arena (VGA) to evaluate and improve AI systems' capacity for visual abstraction.<n>Humans achieve near-perfect accuracy across tasks, while models totally failed on isomorphism detection and showed limited success in path/cycle tasks.<n>By isolating the challenge of representation-invariant reasoning, the VGA provides a framework to drive progress toward human-like conceptualization in AI visual models.
arXiv Detail & Related papers (2025-06-06T17:06:25Z) - From Pixels to Histopathology: A Graph-Based Framework for Interpretable Whole Slide Image Analysis [81.19923502845441]
We develop a graph-based framework that constructs WSI graph representations.<n>We build tissue representations (nodes) that follow biological boundaries rather than arbitrary patches.<n>In our method's final step, we solve the diagnostic task through a graph attention network.
arXiv Detail & Related papers (2025-03-14T20:15:04Z) - A Deep Learning Approach for Augmenting Perceptional Understanding of Histopathology Images [0.1813006808606333]
This Paper Presents A Novel Approach To Enhancing The Analysis Of Histopathology Images.<n>A Mult-modal-Model That Combines Vision Transformers (Vit) With Gpt-2 For Image Captioning.
arXiv Detail & Related papers (2025-03-10T03:50:25Z) - PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology [11.366456922386263]
Diagnosing diseases through whole slide images is fundamental in modern pathology.<n>Traditional AI approaches fail short of such a holistic, iterative, multi-scale diagnostic procedure.<n>We introduce PathFinder, a multi-modal, multi-agent framework that emulates the decision-making process of expert pathologists.
arXiv Detail & Related papers (2025-02-13T03:08:02Z) - A Sanity Check for AI-generated Image Detection [49.08585395873425]
We propose AIDE (AI-generated Image DEtector with Hybrid Features) to detect AI-generated images.<n>AIDE achieves +3.5% and +4.6% improvements to state-of-the-art methods.
arXiv Detail & Related papers (2024-06-27T17:59:49Z) - A Foundational Multimodal Vision Language AI Assistant for Human
Pathology [6.759775793033743]
We present PathChat, a vision-language generalist AI assistant for human pathology using an in-house developed vision encoder pretrained on 100 million histology images.
PathChat achieved a diagnostic accuracy of 87% on multiple-choice questions based on publicly available cases of diverse tissue origins and disease models.
arXiv Detail & Related papers (2023-12-13T00:24:37Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - Cross-Modality Neuroimage Synthesis: A Survey [71.27193056354741]
Multi-modality imaging improves disease diagnosis and reveals distinct deviations in tissues with anatomical properties.
The existence of completely aligned and paired multi-modality neuroimaging data has proved its effectiveness in brain research.
An alternative solution is to explore unsupervised or weakly supervised learning methods to synthesize the absent neuroimaging data.
arXiv Detail & Related papers (2022-02-14T19:29:08Z) - CheXstray: Real-time Multi-Modal Data Concordance for Drift Detection in
Medical Imaging AI [1.359138408203412]
We build and test a medical imaging AI drift monitoring workflow that tracks data and model drift without contemporaneous ground truth.
Key contributions include (1) proof-of-concept for medical imaging drift detection including use of VAE and domain specific statistical methods.
This work has important implications for addressing the translation gap related to continuous medical imaging AI model monitoring in dynamic healthcare environments.
arXiv Detail & Related papers (2022-02-06T18:58:35Z) - Remote Pathological Gait Classification System [3.8098589140053862]
Gait analysis can be used to detect impairments and help diagnose illnesses and assess patient recovery.
The biggest publicly available pathological gait dataset contains only 10 subjects, simulating 4 gait pathologies.
This paper presents a new dataset called GAIT-IT, captured from 21 subjects simulating 4 gait pathologies, with 2 severity levels, besides normal gait.
arXiv Detail & Related papers (2021-05-04T17:21:29Z) - AEGIS: A real-time multimodal augmented reality computer vision based
system to assist facial expression recognition for individuals with autism
spectrum disorder [93.0013343535411]
This paper presents the development of a multimodal augmented reality (AR) system which combines the use of computer vision and deep convolutional neural networks (CNN)
The proposed system, which we call AEGIS, is an assistive technology deployable on a variety of user devices including tablets, smartphones, video conference systems, or smartglasses.
We leverage both spatial and temporal information in order to provide an accurate expression prediction, which is then converted into its corresponding visualization and drawn on top of the original video frame.
arXiv Detail & Related papers (2020-10-22T17:20:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.