Bridging the Gap Between Object Detection and User Intent via
Query-Modulation
- URL: http://arxiv.org/abs/2106.10258v1
- Date: Fri, 18 Jun 2021 17:47:53 GMT
- Title: Bridging the Gap Between Object Detection and User Intent via
Query-Modulation
- Authors: Marco Fornoni, Chaochao Yan, Liangchen Luo, Kimberly Wilber, Alex
Stark, Yin Cui, Boqing Gong, Andrew Howard
- Abstract summary: query-modulated detectors show superior performance at detecting objects for a given label of interest.
They can be simultaneously trained to solve for both query-modulated detection and standard object detection.
- Score: 33.967176965675264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When interacting with objects through cameras, or pictures, users often have
a specific intent. For example, they may want to perform a visual search.
However, most object detection models ignore the user intent, relying on image
pixels as their only input. This often leads to incorrect results, such as lack
of a high-confidence detection on the object of interest, or detection with a
wrong class label. In this paper we investigate techniques to modulate standard
object detectors to explicitly account for the user intent, expressed as an
embedding of a simple query. Compared to standard object detectors,
query-modulated detectors show superior performance at detecting objects for a
given label of interest. Thanks to large-scale training data synthesized from
standard object detection annotations, query-modulated detectors can also
outperform specialized referring expression recognition systems. Furthermore,
they can be simultaneously trained to solve for both query-modulated detection
and standard object detection.
Related papers
- Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Detect Only What You Specify : Object Detection with Linguistic Target [0.0]
We propose Language-Targeted Detector (LTD) for the targeted detection based on a recently proposed Transformer-based detector.
LTD is a encoder-decoder architecture and our conditional decoder allows the model to reason about the encoded image with the textual input as the linguistic context.
arXiv Detail & Related papers (2022-11-18T07:28:47Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Exploiting Multi-Object Relationships for Detecting Adversarial Attacks
in Complex Scenes [51.65308857232767]
Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples.
Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks.
We develop a novel approach to perform context consistency checks using language models.
arXiv Detail & Related papers (2021-08-19T00:52:10Z) - Self-supervised object detection from audio-visual correspondence [101.46794879729453]
We tackle the problem of learning object detectors without supervision.
We do not assume image-level class labels, instead we extract a supervisory signal from audio-visual data.
We show that our method can learn to detect generic objects that go beyond instruments, such as airplanes and cats.
arXiv Detail & Related papers (2021-04-13T17:59:03Z) - Class-agnostic Object Detection [16.97782147401037]
We propose class-agnostic object detection as a new problem that focuses on detecting objects irrespective of their object-classes.
Specifically, the goal is to predict bounding boxes for all objects in an image but not their object-classes.
We propose training and evaluation protocols for benchmarking class-agnostic detectors to advance future research in this domain.
arXiv Detail & Related papers (2020-11-28T19:22:38Z) - Slender Object Detection: Diagnoses and Improvements [74.40792217534]
In this paper, we are concerned with the detection of a particular type of objects with extreme aspect ratios, namely textbfslender objects.
For a classical object detection method, a drastic drop of $18.9%$ mAP on COCO is observed, if solely evaluated on slender objects.
arXiv Detail & Related papers (2020-11-17T09:39:42Z) - Few-shot Object Detection with Self-adaptive Attention Network for
Remote Sensing Images [11.938537194408669]
We propose a few-shot object detector which is designed for detecting novel objects provided with only a few examples.
In order to fit the object detection settings, our proposed few-shot detector concentrates on the relations that lie in the level of objects instead of the full image.
The experiments demonstrate the effectiveness of the proposed method in few-shot scenes.
arXiv Detail & Related papers (2020-09-26T13:44:58Z) - Few-shot Object Detection with Feature Attention Highlight Module in
Remote Sensing Images [10.92844145381214]
We propose a few-shot object detector which is designed for detecting novel objects based on only a few examples.
Our model is composed of a feature-extractor, a feature attention highlight module as well as a two-stage detection backend.
Experiments demonstrate the effectiveness of the proposed method for few-shot cases.
arXiv Detail & Related papers (2020-09-03T12:38:49Z) - Black-box Explanation of Object Detectors via Saliency Maps [66.745167677293]
We propose D-RISE, a method for generating visual explanations for the predictions of object detectors.
We show that D-RISE can be easily applied to different object detectors including one-stage detectors such as YOLOv3 and two-stage detectors such as Faster-RCNN.
arXiv Detail & Related papers (2020-06-05T02:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.