MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation
- URL: http://arxiv.org/abs/2601.06883v1
- Date: Sun, 11 Jan 2026 12:12:08 GMT
- Title: MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation
- Authors: Xinhang Liu, Jiawei Shi, Zheng Dang, Yuchao Dai,
- Abstract summary: We present MixRI, a lightweight network that solves the CAD-based novel object pose estimation problem in RGB images.<n>We design our network to meet the demands of real-world applications, emphasizing reduced memory requirements and fast inference time.
- Score: 51.065981526165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present MixRI, a lightweight network that solves the CAD-based novel object pose estimation problem in RGB images. It can be instantly applied to a novel object at test time without finetuning. We design our network to meet the demands of real-world applications, emphasizing reduced memory requirements and fast inference time. Unlike existing works that utilize many reference images and have large network parameters, we directly match points based on the multi-view information between the query and reference images with a lightweight network. Thanks to our reference image fusion strategy, we significantly decrease the number of reference images, thus decreasing the time needed to process these images and the memory required to store them. Furthermore, with our lightweight network, our method requires less inference time. Though with fewer reference images, experiments on seven core datasets in the BOP challenge show that our method achieves comparable results with other methods that require more reference images and larger network parameters.
Related papers
- Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception [9.145558382187524]
We present a novel object-centric representation that can be used to detect, segment and estimate the 6DoF pose of objects unseen during training using RGB images.<n>Our method consists of an encoder that requires only a few RGB template views depicting an object to generate a sparse object-like point cloud.<n>Next, a decoder takes the object encoding together with a query image to generate a variety of dense predictions.
arXiv Detail & Related papers (2026-02-04T09:12:05Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Raw Image Reconstruction with Learned Compact Metadata [61.62454853089346]
We propose a novel framework to learn a compact representation in the latent space serving as the metadata in an end-to-end manner.
We show how the proposed raw image compression scheme can adaptively allocate more bits to image regions that are important from a global perspective.
arXiv Detail & Related papers (2023-02-25T05:29:45Z) - Correlation Verification for Image Retrieval [15.823918683848877]
We propose a novel image retrieval re-ranking network named Correlation Verification Networks (CVNet)
CVNet compresses dense feature correlation into image similarity while learning diverse geometric matching patterns from various image pairs.
Our proposed network shows state-of-the-art performance on several retrieval benchmarks with a significant margin.
arXiv Detail & Related papers (2022-04-04T13:18:49Z) - Deep Image Deblurring: A Survey [165.32391279761006]
Deblurring is a classic problem in low-level computer vision, which aims to recover a sharp image from a blurred input image.
Recent advances in deep learning have led to significant progress in solving this problem.
arXiv Detail & Related papers (2022-01-26T01:31:30Z) - Searching for Controllable Image Restoration Networks [57.23583915884236]
Existing methods require separate inference through the entire network per each output.
We propose a novel framework based on a neural architecture search technique that enables efficient generation of multiple imagery effects.
arXiv Detail & Related papers (2020-12-21T10:08:18Z) - High Quality Remote Sensing Image Super-Resolution Using Deep Memory
Connected Network [21.977093907114217]
Single image super-resolution is crucial for many applications such as target detection and image classification.
We propose a novel method named deep memory connected network (DMCN) based on a convolutional neural network to reconstruct high-quality super-resolution images.
arXiv Detail & Related papers (2020-10-01T15:06:02Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.