DeepMark++: Real-time Clothing Detection at the Edge
        - URL: http://arxiv.org/abs/2006.00710v3
- Date: Tue, 10 Nov 2020 07:47:43 GMT
- Title: DeepMark++: Real-time Clothing Detection at the Edge
- Authors: Alexey Sidnev, Alexander Krapivin, Alexey Trushkov, Ekaterina
  Krasikova, Maxim Kazakov, Mikhail Viryasov
- Abstract summary: We propose a single-stage approach to deliver rapid clothing detection and keypoint estimation.
Our solution is based on a multi-target network CenterNet, and we introduce several powerful post-processing techniques to enhance performance.
Our most accurate model achieves results comparable to state-of-the-art solutions on the DeepFashion2 dataset.
- Score: 55.41644538483948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Clothing recognition is the most fundamental AI application challenge within
the fashion domain. While existing solutions offer decent recognition accuracy,
they are generally slow and require significant computational resources. In
this paper we propose a single-stage approach to overcome this obstacle and
deliver rapid clothing detection and keypoint estimation. Our solution is based
on a multi-target network CenterNet, and we introduce several powerful
post-processing techniques to enhance performance. Our most accurate model
achieves results comparable to state-of-the-art solutions on the DeepFashion2
dataset, and our light and fast model runs at 17 FPS on the Huawei P40 Pro
smartphone. In addition, we achieved second place in the DeepFashion2 Landmark
Estimation Challenge 2020 with 0.582 mAP on the test dataset.
 
      
        Related papers
        - FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution [50.55876151973996]
 A versatile video depth estimation model should (1) be accurate across frames, (2) produce high-resolution depth maps, and (3) support real-time streaming.
We propose FlashDepth, a method that satisfies all three requirements, performing depth estimation on a 2044x1148 streaming video at 24 FPS.
 arXiv  Detail & Related papers  (2025-04-09T17:59:31Z)
- First Place Solution to the ECCV 2024 ROAD++ Challenge @ ROAD++   Spatiotemporal Agent Detection 2024 [12.952512012601874]
 The task of Track 1 is agent detection, which aims to construct an "agent tube" for agents in consecutive video frames.
Our solutions focus on the challenges in this task including extreme-size objects, low-light, imbalance and fine-grained classification.
We rank first in the test set of Track 1 for the ROAD++ Challenge 2024, and achieve 30.82% average video-mAP.
 arXiv  Detail & Related papers  (2024-10-30T14:52:43Z)
- Image2PCI -- A Multitask Learning Framework for Estimating Pavement
  Condition Indices Directly from Images [8.64316207086894]
 This study develops a unified multi-tasking model that predicts the Pavement Condition Index directly from a top-down pavement image.
By multitasking, we are able to extract features from the detection and segmentation heads for automatically estimating the PCI directly from the images.
The model performs very well on our benchmarked and open pavement distress dataset.
 arXiv  Detail & Related papers  (2023-10-12T17:28:06Z)
- Efficient Single Object Detection on Image Patches with Early Exit
  Enhanced High-Precision CNNs [0.0]
 This paper proposes a novel approach for detecting objects using mobile robots in the context of the RoboCup Standard Platform League.
The challenge lies in detecting a dynamic object in varying lighting conditions and blurred images caused by fast movements.
To address this challenge, the paper presents a convolutional neural network architecture designed specifically for computationally constrained robotic platforms.
 arXiv  Detail & Related papers  (2023-09-07T07:23:55Z)
- Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI &
  AIM 2022 Challenge: Report [108.88637766066759]
 Deep learning-based single image depth estimation solutions can show a real-time performance on IoT platforms and smartphones.
Models developed in the challenge are also compatible with any Android or Linux-based mobile devices.
 arXiv  Detail & Related papers  (2022-11-07T22:20:07Z)
- Rethinking Keypoint Representations: Modeling Keypoints and Poses as
  Objects for Multi-Person Human Pose Estimation [79.78017059539526]
 We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
 arXiv  Detail & Related papers  (2021-11-16T15:36:44Z)
- Projected GANs Converge Faster [50.23237734403834]
 Generative Adversarial Networks (GANs) produce high-quality images but are challenging to train.
We make significant headway on these issues by projecting generated and real samples into a fixed, pretrained feature space.
Our Projected GAN improves image quality, sample efficiency, and convergence speed.
 arXiv  Detail & Related papers  (2021-11-01T15:11:01Z)
- 2nd Place Solution for Waymo Open Dataset Challenge - Real-time 2D
  Object Detection [26.086623067939605]
 In this report, we introduce a real-time method to detect the 2D objects from images.
We leverage accelerationRT to optimize the inference time of our detection pipeline.
Our framework achieves the latency of 45.8ms/frame on an Nvidia Tesla V100 GPU.
 arXiv  Detail & Related papers  (2021-06-16T11:32:03Z)
- When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
 We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
 arXiv  Detail & Related papers  (2021-05-27T13:51:42Z)
- Fast and Accurate Single-Image Depth Estimation on Mobile Devices,
  Mobile AI 2021 Challenge: Report [105.32612705754605]
 We introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solution.
The proposed solutions can generate VGA resolution depth maps at up to 10 FPS on the Raspberry Pi 4 while achieving high fidelity results.
 arXiv  Detail & Related papers  (2021-05-17T13:49:57Z)
- FastHand: Fast Hand Pose Estimation From A Monocular Camera [12.790733588554588]
 We propose a fast and accurate framework for hand pose estimation, dubbed as "FastHand"
FastHand offers high accuracy scores while reaching a speed of 25 frames per second on an NVIDIA Jetson TX2 graphics processing unit.
 arXiv  Detail & Related papers  (2021-02-14T04:12:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.