Captioning Images with Novel Objects via Online Vocabulary Expansion
- URL: http://arxiv.org/abs/2003.03305v1
- Date: Fri, 6 Mar 2020 16:34:15 GMT
- Title: Captioning Images with Novel Objects via Online Vocabulary Expansion
- Authors: Mikihiro Tanaka, Tatsuya Harada
- Abstract summary: We introduce a low cost method for generating descriptions from images containing novel objects.
We propose a method that can explain images with novel objects without retraining using the word embeddings of the objects estimated from only a small number of image features.
- Score: 62.525165808406626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we introduce a low cost method for generating descriptions
from images containing novel objects. Generally, constructing a model, which
can explain images with novel objects, is costly because of the following: (1)
collecting a large amount of data for each category, and (2) retraining the
entire system. If humans see a small number of novel objects, they are able to
estimate their properties by associating their appearance with known objects.
Accordingly, we propose a method that can explain images with novel objects
without retraining using the word embeddings of the objects estimated from only
a small number of image features of the objects. The method can be integrated
with general image-captioning models. The experimental results show the
effectiveness of our approach.
Related papers
- Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with
Prototypical Embedding [7.893308498886083]
Our proposed method aims to address the challenges of generalizability and fidelity in an object-driven way.
A prototypical embedding is based on the object's appearance and its class, before fine-tuning the diffusion model.
Our method outperforms several existing works.
arXiv Detail & Related papers (2024-01-28T17:11:42Z) - ObjectComposer: Consistent Generation of Multiple Objects Without
Fine-tuning [25.033615513933192]
We introduce ObjectComposer for generating compositions of multiple objects that resemble user-specified images.
Our approach is training-free, leveraging the abilities of preexisting models.
arXiv Detail & Related papers (2023-10-10T19:46:58Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - Automatic dataset generation for specific object detection [6.346581421948067]
We present a method to synthesize object-in-scene images, which can preserve the objects' detailed features without bringing irrelevant information.
Our result shows that in the synthesized image, the boundaries of objects blend very well with the background.
arXiv Detail & Related papers (2022-07-16T07:44:33Z) - ShaRF: Shape-conditioned Radiance Fields from a Single View [54.39347002226309]
We present a method for estimating neural scenes representations of objects given only a single image.
The core of our method is the estimation of a geometric scaffold for the object.
We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.
arXiv Detail & Related papers (2021-02-17T16:40:28Z) - Continuous Surface Embeddings [76.86259029442624]
We focus on the task of learning and representing dense correspondences in deformable object categories.
We propose a new, learnable image-based representation of dense correspondences.
We demonstrate that the proposed approach performs on par or better than the state-of-the-art methods for dense pose estimation for humans.
arXiv Detail & Related papers (2020-11-24T22:52:15Z) - Learning Object Detection from Captions via Textual Scene Attributes [70.90708863394902]
We argue that captions contain much richer information about the image, including attributes of objects and their relations.
We present a method that uses the attributes in this "textual scene graph" to train object detectors.
We empirically demonstrate that the resulting model achieves state-of-the-art results on several challenging object detection datasets.
arXiv Detail & Related papers (2020-09-30T10:59:20Z) - Neural Object Learning for 6D Pose Estimation Using a Few Cluttered
Images [30.240630713652035]
Recent methods for 6D pose estimation of objects assume either textured 3D models or real images that cover the entire range of target poses.
This paper proposes a method, Neural Object Learning (NOL), that creates synthetic images of objects in arbitrary poses by combining only a few observations from cluttered images.
arXiv Detail & Related papers (2020-05-07T19:33:06Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.