Related papers: Volkit: A Performance-Portable Computer Vision Library for 3D Volumetric Data

Related papers

Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting [52.18697134979677]
Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS)<n>Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality.<n>We introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity.<n>Our framework outperforms state-of-the-art methods, while enabling real-time rendering with an approximate 43.7x speedup on 512-D feature maps.
arXiv Detail & Related papers (2025-12-24T04:16:18Z)
Language-guided 3D scene synthesis for fine-grained functionality understanding [64.148891566272]
We introduce SynthFun3D, the first method for task-based 3D scene synthesis.<n>It generates a 3D indoor environment using a furniture asset database with part-level annotation.<n>It reasons about the action to automatically identify and retrieve the 3D mask of the correct functional element.
arXiv Detail & Related papers (2025-11-28T14:40:03Z)
Advancing Annotat3D with Harpia: A CUDA-Accelerated Library For Large-Scale Volumetric Data Segmentation [0.1499944454332829]
This work introduces new capabilities to Annotat3D through Harpia.<n>The library is designed to support scalable, interactive segmentation for large 3D datasets in high-performance computing.<n>The system's interactive, human-in-the-loop interface, combined with efficient GPU resource management, makes it particularly suitable for collaborative scientific imaging.
arXiv Detail & Related papers (2025-11-14T21:45:02Z)
cubic: CUDA-accelerated 3D Bioimage Computing [42.83541173560835]
We introduce cubic, an open-source Python library that augmenting widely used SciPy and scikit-image APIs with GPU-accelerated alternatives.<n> cubic's API is device-agnostic and dispatches operations to GPU when data reside on the device and otherwise executes on CPU.<n>We evaluate cubic both by benchmarking individual operations and by reproducing existing deconvolution and segmentation pipelines.
arXiv Detail & Related papers (2025-10-15T22:22:06Z)
WHAR Datasets: An Open Source Library for Wearable Human Activity Recognition [5.46517570496579]
We introduce WHAR datasets, an open-source library designed to simplify WHAR data handling.<n>The library currently supports 9 widely-used datasets, integrates with PyTorch and is easily to new datasets.
arXiv Detail & Related papers (2025-08-12T08:43:30Z)
Harnessing LLMs for Document-Guided Fuzzing of OpenCV Library [14.337352597473911]
VISTAFUZZ is a novel technique for harnessing large language models for document-guided fuzzing of the OpenCV library.<n>VISTAFUZZ extracts constraints on individual input parameters and dependencies between these.<n>We evaluate the effectiveness of VISTAFUZZ in testing 330 APIs in the OpenCV library, and the results show that VISTAFUZZ detected 17 new bugs, where 10 bugs have been confirmed, and 5 of these have been fixed.
arXiv Detail & Related papers (2025-07-19T09:44:01Z)
R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation [78.26308457952636]
This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome limitations in autonomous driving simulation.<n>It enables realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time.<n>We show that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer.
arXiv Detail & Related papers (2025-06-09T14:50:19Z)
Kornia-rs: A Low-Level 3D Computer Vision Library In Rust [6.567185366423734]
textitkornia-rs is a high-performance 3D computer vision library written entirely in native Rust.<n>textitkornia-rs adopts a statically-typed tensor system and a modular set of crates, providing efficient image I/O, image processing and 3D operations.
arXiv Detail & Related papers (2025-05-18T13:50:00Z)
Real-Time Semantic Segmentation of Aerial Images Using an Embedded U-Net: A Comparison of CPU, GPU, and FPGA Workflows [0.0]
This study introduces a lightweight U-Net model optimized for real-time semantic segmentation of aerial images. We maintain the accuracy of the U-Net on a real-world dataset while significantly reducing the model's parameters and Multiply-Accumulate (MAC) operations by a factor of 16.
arXiv Detail & Related papers (2025-03-07T08:33:28Z)
ConvMesh: Reimagining Mesh Quality Through Convex Optimization [55.2480439325792]
This research introduces a convex optimization programming called disciplined convex programming to enhance existing meshes. By focusing on a sparse set of point clouds from both the original and target meshes, this method demonstrates significant improvements in mesh quality with minimal data requirements.
arXiv Detail & Related papers (2024-12-11T15:48:25Z)
Open-Vocabulary High-Resolution 3D (OVHR3D) Data Segmentation and Annotation Framework [1.1280113914145702]
This research aims to design and develop a comprehensive and efficient framework for 3D segmentation tasks. The framework integrates Grounding DINO and Segment anything Model, augmented by an enhancement in 2D image rendering via 3D mesh.
arXiv Detail & Related papers (2024-12-09T07:39:39Z)
Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation [7.204881999658682]
Inference for Large Language Models (LLMs) is computationally demanding. To reduce the cost of auto-regressive decoding, Key-Value ( KV) caching is used to store intermediate activations. The memory required for KV caching grows rapidly, often exceeding the capacity of GPU memory. A cost-effective alternative is to offload KV cache to CPU memory, which alleviates GPU memory pressure but shifts the bottleneck to the limited bandwidth of the PCIe connection between the CPU and GPU.
arXiv Detail & Related papers (2024-11-26T04:03:14Z)
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field. We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models. This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z)
KerasCV and KerasNLP: Vision and Language Power-Ups [9.395199188271254]
KerasCV and KerasNLP are extensions of the Keras API for Computer Vision and Natural Language Processing. These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance. The libraries are fully open-source (Apache 2.0 license) and available on GitHub.
arXiv Detail & Related papers (2024-05-30T16:58:34Z)
InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction [11.33083039877258]
InverseMatrixVT3D is an efficient method for transforming multi-view image features into 3D feature volumes for semantic occupancy prediction. We introduce a sparse matrix handling technique for the projection matrices to optimize GPU memory usage. Our approach achieves the top performance in detecting vulnerable road users (VRU), crucial for autonomous driving and road safety.
arXiv Detail & Related papers (2024-01-23T01:11:10Z)
AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z)
Mesh Convolution with Continuous Filters for 3D Surface Parsing [101.25796935464648]
We propose a series of modular operations for effective geometric feature learning from 3D triangle meshes. Our mesh convolutions exploit spherical harmonics as orthonormal bases to create continuous convolutional filters. We further contribute a novel hierarchical neural network for perceptual parsing of 3D surfaces, named PicassoNet++.
arXiv Detail & Related papers (2021-12-03T09:16:49Z)
Correlate-and-Excite: Real-Time Stereo Matching via Guided Cost Volume Excitation [65.83008812026635]
We construct Guided Cost volume Excitation (GCE) and show that simple channel excitation of cost volume guided by image can improve performance considerably. We present an end-to-end network that we call Correlate-and-Excite (CoEx)
arXiv Detail & Related papers (2021-08-12T14:32:26Z)
NViSII: A Scriptable Tool for Photorealistic Image Generation [21.453677837017462]
We present a Python-based built on NVIDIA's OptiX ray tracing engine and the OptiX AI denoiser, designed to generate high-quality synthetic images. Our tool enables the description and manipulation of complex dynamic 3D scenes.
arXiv Detail & Related papers (2021-05-28T16:35:32Z)
Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms. We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)
RUHSNet: 3D Object Detection Using Lidar Data in Real Time [0.0]
We propose a novel neural network architecture for detecting 3D objects in point cloud data. Our work surpasses the state of the art in this domain both in terms of average precision and speed running at > 30 FPS. This makes it a feasible option to be deployed in real time applications including self driving cars.
arXiv Detail & Related papers (2020-05-09T09:41:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.