KonfAI: A Modular and Fully Configurable Framework for Deep Learning in Medical Imaging
- URL: http://arxiv.org/abs/2508.09823v2
- Date: Tue, 14 Oct 2025 17:17:59 GMT
- Title: KonfAI: A Modular and Fully Configurable Framework for Deep Learning in Medical Imaging
- Authors: Valentin Boussot, Jean-Louis Dillenseger,
- Abstract summary: KonfAI is a modular and fully deep learning framework specifically designed for medical imaging tasks.<n>It enables users to define complete training, inference, and evaluation through structured YAML configuration files.<n>KonfAI is open source and available at https://github.com/vboussot/KonfAI.
- Score: 1.2556993688873865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: KonfAI is a modular, extensible, and fully configurable deep learning framework specifically designed for medical imaging tasks. It enables users to define complete training, inference, and evaluation workflows through structured YAML configuration files, without modifying the underlying code. This declarative approach enhances reproducibility, transparency, and experimental traceability while reducing development time. Beyond the capabilities of standard pipelines, KonfAI provides native abstractions for advanced strategies including patch-based learning, test-time augmentation, model ensembling, and direct access to intermediate feature representations for deep supervision. It also supports complex multi-model training setups such as generative adversarial architectures. Thanks to its modular and extensible architecture, KonfAI can easily accommodate custom models, loss functions, and data processing components. The framework has been successfully applied to segmentation, registration, and image synthesis tasks, and has contributed to top-ranking results in several international medical imaging challenges. KonfAI is open source and available at https://github.com/vboussot/KonfAI.
Related papers
- Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance [55.32799307123252]
We introduce a scalable data generation pipeline that transforms existing video editing pairs into high-fidelity training quadruplets.<n>We propose a unified editing architecture, Kiwi-Edit, that synergizes learnable queries and latent visual features for reference semantic guidance.
arXiv Detail & Related papers (2026-03-02T18:46:28Z) - AR-MOT: Autoregressive Multi-object Tracking [56.09738000988466]
We propose a novel autoregressive paradigm that formulates MOT as a sequence generation task within a large language model (LLM) framework.<n>This design enables the model to output structured results through flexible sequence construction, without requiring any task-specific heads.<n>To enhance region-level visual perception, we introduce an Object Tokenizer based on a pretrained detector.
arXiv Detail & Related papers (2026-01-05T09:17:28Z) - VSA:Visual-Structural Alignment for UI-to-Code [29.15071743239679]
We propose bfVSA (VSA), a multi-stage paradigm designed to synthesize organized assets through visual-text alignment.<n>Our framework yields a substantial improvement in code modularity and architectural consistency over state-of-the-art benchmarks.
arXiv Detail & Related papers (2025-12-23T03:55:45Z) - Everything is Context: Agentic File System Abstraction for Context Engineering [11.63011212134865]
This paper proposes a file-system abstraction for context engineering.<n>The abstraction offers a persistent, governed infrastructure for managing heterogeneous context artefacts.<n>As GenAI becomes an active collaborator in decision support, humans play a central role as curators, verifiers, and co-reasoners.
arXiv Detail & Related papers (2025-12-05T06:56:45Z) - A Fuzzy Logic Prompting Framework for Large Language Models in Adaptive and Uncertain Tasks [2.1756081703276]
We introduce a modular prompting framework that supports safer and more adaptive use of large language models (LLMs) across dynamic, user-centered tasks.<n>Our method combines a natural language boundary prompt with a control schema encoded with fuzzy scaffolding logic and adaptation rules.<n>In a simulated intelligent tutoring setting, the framework improves scaffolding quality, adaptivity, and instructional alignment across multiple models, outperforming standard prompting baselines.
arXiv Detail & Related papers (2025-08-08T23:50:48Z) - ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents [40.697759330690815]
ScreenCoder is a modular multi-agent framework that decomposes the task into three interpretable stages: grounding, planning, and generation.<n>By assigning these distinct responsibilities to specialized agents, our framework achieves significantly higher robustness and fidelity than end-to-end approaches.<n>Our approach achieves state-of-the-art performance in layout accuracy, structural coherence, and code correctness.
arXiv Detail & Related papers (2025-07-30T16:41:21Z) - Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate [0.0]
This paper explores an alternative, constructive approach to model development, built upon the foundation of non-trainable, deterministic input embeddings.<n>We show that specialist models trained on disparate datasets can be merged into a single, more capable Mixture-of-Experts model.<n>We introduce a layer-wise constructive training methodology, where a deep Transformer is "grown" by progressively stacking and training one layer at a time.
arXiv Detail & Related papers (2025-07-08T20:01:15Z) - TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba [88.31117598044725]
We explore cross-architecture training to transfer the ready knowledge in existing Transformer models to alternative architecture Mamba, termed TransMamba.<n>Our approach employs a two-stage strategy to expedite training new Mamba models, ensuring effectiveness in across uni-modal and cross-modal tasks.<n>For cross-modal learning, we propose a cross-Mamba module that integrates language awareness into Mamba's visual features, enhancing the cross-modal interaction capabilities of Mamba architecture.
arXiv Detail & Related papers (2025-02-21T01:22:01Z) - ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships.<n>Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands.<n>We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z) - EpiCoder: Encompassing Diversity and Complexity in Code Generation [49.170195362149386]
Existing methods for code generation use code snippets as seed data.<n>We introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features.<n>Our framework provides precise control over the complexity of the generated code, enabling functionalities that range from function-level operations to multi-file scenarios.
arXiv Detail & Related papers (2025-01-08T18:58:15Z) - Flemme: A Flexible and Modular Learning Platform for Medical Images [5.086862917025204]
Flemme is a FLExible and Modular learning platform for MEdical images.<n>We construct encoders using building blocks based on convolution, transformer, and state-space model (SSM) to process both 2D and 3D image patches.
arXiv Detail & Related papers (2024-08-18T05:47:33Z) - Interfacing Foundation Models' Embeddings [131.0352288172788]
We present FIND, a generalized interface for aligning foundation models' embeddings with unified image and dataset-level understanding spanning modality and granularity.
In light of the interleaved embedding space, we introduce FIND-Bench, which introduces new training and evaluation annotations to the COCO dataset for interleaved segmentation and retrieval.
arXiv Detail & Related papers (2023-12-12T18:58:02Z) - Towards More Unified In-context Visual Understanding [74.55332581979292]
We present a new ICL framework for visual understanding with multi-modal output enabled.
First, we quantize and embed both text and visual prompt into a unified representational space.
Then a decoder-only sparse transformer architecture is employed to perform generative modeling on them.
arXiv Detail & Related papers (2023-12-05T06:02:21Z) - i-Code Studio: A Configurable and Composable Framework for Integrative
AI [93.74891865028867]
We propose the i-Code Studio, a flexible and composable framework for Integrative AI.
The i-Code Studio orchestrates multiple pre-trained models in a fine-tuning-free fashion to conduct complex multimodal tasks.
The i-Code Studio achieves impressive results on a variety of zero-shot multimodal tasks, such as video-to-text retrieval, speech-to-speech translation, and visual question answering.
arXiv Detail & Related papers (2023-05-23T06:45:55Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - CM-GAN: Image Inpainting with Cascaded Modulation GAN and Object-Aware
Training [112.96224800952724]
We propose cascaded modulation GAN (CM-GAN) to generate plausible image structures when dealing with large holes in complex images.
In each decoder block, global modulation is first applied to perform coarse semantic-aware synthesis structure, then spatial modulation is applied on the output of global modulation to further adjust the feature map in a spatially adaptive fashion.
In addition, we design an object-aware training scheme to prevent the network from hallucinating new objects inside holes, fulfilling the needs of object removal tasks in real-world scenarios.
arXiv Detail & Related papers (2022-03-22T16:13:27Z) - Deploying deep learning in OpenFOAM with TensorFlow [2.1874189959020423]
This module is constructed with the C API and is integrated into OpenFOAM as an application that may be linked at run time.
Notably, our formulation precludes any restrictions related to the type of neural network architecture.
In addition, the proposed module outlines a path towards an open-source, unified and transparent framework for computational fluid dynamics and machine learning.
arXiv Detail & Related papers (2020-12-01T23:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.