RTGen: Real-Time Generative Detection Transformer
- URL: http://arxiv.org/abs/2502.20622v1
- Date: Fri, 28 Feb 2025 01:01:56 GMT
- Title: RTGen: Real-Time Generative Detection Transformer
- Authors: Chi Ruan,
- Abstract summary: We propose a real-time generative object detector with a succinct encoder-decoder architecture.<n>Specifically, we introduce a novel Region-Language Decoder (RL-Decoder), which innovatively integrates a non-autoregressive language model into the detection decoder.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While open-vocabulary object detectors require predefined categories during inference, generative object detectors overcome this limitation by endowing the model with text generation capabilities. However, existing generative object detection methods directly append an autoregressive language model to an object detector to generate texts for each detected object. This straightforward design leads to structural redundancy and increased processing time. In this paper, we propose a Real-Time GENerative Detection Transformer (RTGen), a real-time generative object detector with a succinct encoder-decoder architecture. Specifically, we introduce a novel Region-Language Decoder (RL-Decoder), which innovatively integrates a non-autoregressive language model into the detection decoder, enabling concurrent processing of object and text information. With these efficient designs, RTGen achieves a remarkable inference speed of 60.41 FPS. Moreover, RTGen obtains 18.6 mAP on the LVIS dataset, outperforming the previous SOTA method by 3.5 mAP.
Related papers
- InTraGen: Trajectory-controlled Video Generation for Object Interactions [100.79494904451246]
InTraGen is a pipeline for improved trajectory-based generation of object interaction scenarios.
Our results demonstrate improvements in both visual fidelity and quantitative performance.
arXiv Detail & Related papers (2024-11-25T14:27:50Z) - DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios [38.952481877244644]
We present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task.
Our development of DetectRL reveals the strengths and limitations of current SOTA detectors.
We believe DetectRL could serve as an effective benchmark for assessing detectors in real-world scenarios.
arXiv Detail & Related papers (2024-10-31T09:01:25Z) - OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer [63.141027246418]
We propose Open-Vocabulary Light-Weighted Detection Transformer (OVLW-DETR), a deployment friendly open-vocabulary detector with strong performance and low latency.
We provide an end-to-end training recipe that transferring knowledge from vision-language model (VLM) to object detector with simple alignment.
Experimental results demonstrate that the proposed approach is superior over existing real-time open-vocabulary detectors on standard Zero-Shot LVIS benchmark.
arXiv Detail & Related papers (2024-07-15T12:15:27Z) - Applying Ensemble Methods to Model-Agnostic Machine-Generated Text Detection [0.0]
We study the problem of detecting machine-generated text when the large language model it is possibly derived from is unknown.
We use a zero-shot model for machine-generated text detection which is highly accurate when the generative (or base) language model is the same as the discriminative (or scoring) language model.
arXiv Detail & Related papers (2024-06-18T12:58:01Z) - EAGLE: A Domain Generalization Framework for AI-generated Text Detection [15.254775341371364]
We propose a domain generalization framework for the detection of AI-generated text from unseen target generators.
We demonstrate how our framework effectively achieves impressive performance in detecting text generated by unseen target generators.
arXiv Detail & Related papers (2024-03-23T02:44:20Z) - Embedded Named Entity Recognition using Probing Classifiers [10.573861741540853]
EMBER enables streaming named entity recognition in decoder-only language models without fine-tuning them.
We show that EMBER maintains high token generation rates, with only a negligible decrease in speed of around 1%.
We make our code and data available online, including a toolkit for training, testing, and deploying efficient token classification models.
arXiv Detail & Related papers (2024-03-18T12:58:16Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z) - Rethinking Voxelization and Classification for 3D Object Detection [68.8204255655161]
The main challenge in 3D object detection from LiDAR point clouds is achieving real-time performance without affecting the reliability of the network.
We present a solution to improve network inference speed and precision at the same time by implementing a fast dynamic voxelizer.
In addition, we propose a lightweight detection sub-head model for classifying predicted objects and filter out false detected objects.
arXiv Detail & Related papers (2023-01-10T16:22:04Z) - RTMDet: An Empirical Study of Designing Real-Time Object Detectors [13.09100888887757]
We develop an efficient real-time object detector that exceeds the YOLO series and is easily for many object recognition tasks.
Together with better training techniques, the resulting object detector achieves, named RTMDet, 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU.
We hope the experimental results can provide new insights into designing versatile real-time object detectors for many object recognition tasks.
arXiv Detail & Related papers (2022-12-14T18:50:20Z) - Towards Generating Real-World Time Series Data [52.51620668470388]
We propose a novel generative framework for time series data generation - RTSGAN.
RTSGAN learns an encoder-decoder module which provides a mapping between a time series instance and a fixed-dimension latent vector.
To generate time series with missing values, we further equip RTSGAN with an observation embedding layer and a decide-and-generate decoder.
arXiv Detail & Related papers (2021-11-16T11:31:37Z) - Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device [53.323878851563414]
We propose a compiler-aware unified framework incorporating network enhancement and pruning search with the reinforcement learning techniques.
Specifically, a generator Recurrent Neural Network (RNN) is employed to provide the unified scheme for both network enhancement and pruning search automatically.
The proposed framework achieves real-time 3D object detection on mobile devices with competitive detection performance.
arXiv Detail & Related papers (2020-12-26T19:41:15Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.