Abstract: Intelligent robots provide a new insight into efficiency improvement in
industrial and service scenarios to replace human labor. However, these
scenarios include dense and dynamic obstacles that make motion planning of
robots challenging. Traditional algorithms like A* can plan collision-free
trajectories in static environment, but their performance degrades and
computational cost increases steeply in dense and dynamic scenarios.
Optimal-value reinforcement learning algorithms (RL) can address these problems
but suffer slow speed and instability in network convergence. Network of policy
gradient RL converge fast in Atari games where action is discrete and finite,
but few works have been done to address problems where continuous actions and
large action space are required. In this paper, we modify existing advantage
actor-critic algorithm and suit it to complex motion planning, therefore
optimal speeds and directions of robot are generated. Experimental results
demonstrate that our algorithm converges faster and stable than optimal-value
RL. It achieves higher success rate in motion planning with lesser processing
time for robot to reach its goal.