AI Online Filters to Real World Image Recognition
- URL: http://arxiv.org/abs/2002.08242v1
- Date: Tue, 11 Feb 2020 08:23:14 GMT
- Title: AI Online Filters to Real World Image Recognition
- Authors: Hai Xiao, Jin Shang and Mengyuan Huang
- Abstract summary: We study a novel approach to add reinforcement controls onto the image recognition reflex models to attain better overall performance.
Follow a common infrastructure with environment sensing and AI based modeling of self-adaptive agents, we implement multiple types of AI control agents.
- Score: 4.874719076317905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep artificial neural networks, trained with labeled data sets are widely
used in numerous vision and robotics applications today. In terms of AI, these
are called reflex models, referring to the fact that they do not self-evolve or
actively adapt to environmental changes. As demand for intelligent robot
control expands to many high level tasks, reinforcement learning and state
based models play an increasingly important role. Herein, in computer vision
and robotics domain, we study a novel approach to add reinforcement controls
onto the image recognition reflex models to attain better overall performance,
specifically to a wider environment range beyond what is expected of the task
reflex models. Follow a common infrastructure with environment sensing and AI
based modeling of self-adaptive agents, we implement multiple types of AI
control agents. To the end, we provide comparative results of these agents with
baseline, and an insightful analysis of their benefit to improve overall image
recognition performance in real world.
Related papers
- A Survey on Vision-Language-Action Models for Embodied AI [71.16123093739932]
Vision-language-action models (VLAs) have become a foundational element in robot learning.
Various methods have been proposed to enhance traits such as versatility, dexterity, and generalizability.
VLAs serve as high-level task planners capable of decomposing long-horizon tasks into executable subtasks.
arXiv Detail & Related papers (2024-05-23T01:43:54Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - Neural architecture impact on identifying temporally extended
Reinforcement Learning tasks [0.0]
We present Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari- 2600 game suite.
In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions.
In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too.
arXiv Detail & Related papers (2023-10-04T21:09:19Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - Low Dimensional State Representation Learning with Robotics Priors in
Continuous Action Spaces [8.692025477306212]
Reinforcement learning algorithms have proven to be capable of solving complicated robotics tasks in an end-to-end fashion.
We propose a framework combining the learning of a low-dimensional state representation, from high-dimensional observations coming from the robot's raw sensory readings, with the learning of the optimal policy.
arXiv Detail & Related papers (2021-07-04T15:42:01Z) - The AI Arena: A Framework for Distributed Multi-Agent Reinforcement
Learning [0.3437656066916039]
We introduce the AI Arena: a scalable framework with flexible abstractions for distributed multi-agent reinforcement learning.
We show performance gains due to a distributed multi-agent learning approach over commonly-used RL techniques in several different learning environments.
arXiv Detail & Related papers (2021-03-09T22:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.