The use of machine learning in the self-driving industry has boosted a number
of recent advancements. In particular, the usage of large deep learning models
in the perception and prediction stack have proved quite successful, but there
still lacks significant literature on the use of machine learning in the
planning stack. The current state of the art in the planning stack often relies
on fast constrained optimization or rule-based approaches. Both of these
techniques fail to address a significant number of fundamental problems that
would allow the vehicle to operate more similarly to that of human drivers. In
this paper, we attempt to design a basic deep learning system to approach this
problem. Furthermore, the main underlying goal of this paper is to demonstrate
the potential uses of machine learning in the planning stack for autonomous
vehicles (AV) and provide a baseline work for ongoing and future research.
Designing a Recurrent Neural Network to Learn a Motion Planner
運動プランナー学習のためのリカレントニューラルネットワークの設計
0.75
for High-Dimensional Inputs Johnathan Chiu
高次元入力用 ジョナサン・チウ
0.53
University of California, Berkeley
カリフォルニア大学バークレー校
0.56
May 2022 Abstract The use of machine learning in the self-driving industry has boosted a number of recent advancements.
2022年5月 要約 自動運転業界における機械学習の利用は、最近の多くの進歩を加速している。
0.64
In particular, the usage of large deep learning models in the perception and prediction stack have proved quite successful, but there still lacks significant literature on the use of machine learning in the planning stack.
The current state-of-the-art in the planning stack often relies on fast constrained optimization or rule-based approaches.
計画スタックの現在の最先端は、しばしば高速制約付き最適化やルールベースのアプローチに依存している。
0.60
Both of these techniques fail to address a significant number of fundamental problems that would allow the vehicle to operate more similarly to that of human drivers.
In this paper, we attempt to design a basic deep learning system to approach this problem.
本稿では,この問題にアプローチするための基礎的深層学習システムの設計を試みる。
0.88
Furthermore, the main underlying goal of this paper is to demonstrate the potential uses of machine learning in the planning stack for autonomous vehicles (AV) and provide a baseline work for ongoing and future research.
Unsafe mannerisms is often denoted by actions of, including but not limited to, violating speed limits, making spontaneous lane changes in heavy traffic, and disregarding traffic lights and signs.
Given these hazards in driving, motion planning systems in self-driving vehicles should be able to operate under all the aforementioned situations and more.
These traditional methods require a significant number of edge cases to be accounted for and are not truly feasible in tackling the long-tail problem in the self-driving space.
Likewise, other methods such as imitation learning have their own set of limitations that being the reliance on massive amounts of data to be generated by real drivers.
Finally, reinforcement learning methods require training on very specific maneuvers and relies on a master policy to make decisions on what maneuvers to take.
We turn to deep learning methods to attempt to tackle the problem.
私たちはこの問題に取り組むために深層学習手法に目を向けます。
0.70
Specifically, we use a Recurrent Neural Network (RNN) to handle the controls over a series of states.
具体的には、リカレントニューラルネットワーク(RNN)を使用して、一連の状態を制御します。
0.77
We attempt to simplify the dimensionality of this problem by considering only important input parameters and relying on other parts of the AV stack that have proven to perform well.
2 Related Works A survey of the ongoing work from leading companies in the self-driving industry suggests that the motion planning problem in self-driving is far from being solved.
Every researcher has their own separate approach to tackling the problem.
どの研究者も、この問題に取り組むための独自のアプローチを持っている。
0.58
In this section, we highlight a few related works from both industry and academia.
本項では,産学と産学の関連作品をいくつか紹介する。
0.53
End-to-End Planning Some of the previous literature suggests the use of an end-to-end method.
エンドツーエンド計画 以前の文献のいくつかでは、エンドツーエンドメソッドの使用を示唆している。
0.50
In particular, researchers at NVIDIA designed a system that generates a set of controls directly from the provided images taken by onboard camera sensors [1].
Majority of the work nowadays have moved away from these ideas with emphasis placed on isolating different parts of the self-driving stack and tackling each component of the stack individually.
Reinforcement Learning Reinforcement learning (RL) techniques often require an agent to explore an environment and develop optimal policies that enable it to navigate its surroundings.
But, again, this work relies on training very specific maneuvers which creates a complex system and may not generalize for edge cases that can appear more often than considered.
Imitation Learning In Ashesh et al [5], the basis of the work relies on imitation learning – using human demonstrations to train a model.
模倣学習(Imitation Learning In Ashesh et al [5])では、作業の基礎は模倣学習に依存しています。 訳抜け防止モード: Imitation Learning In Ashesh et al [5], the basis of the work based on mimicion learning – 人間のデモを使ってモデルを訓練します
0.80
This method is promising, but it requires a significant number of training samples from real-world data.
この方法は有望だが、実世界のデータからかなりの数のトレーニングサンプルを必要とする。
0.72
This is costly and somewhat infeasible as it may limit the ability to scale for production.
これはコストがかかり、生産規模が制限される可能性があるため、実現不可能である。
0.63
In addition, this methodology requires models with large numbers of parameters which could cause issues in situations where quick reaction times are required.
Constrained Optimization Model Predictive Control (MPC) has been a longstanding solution to the motion planning problem.
制約付き最適化モデル予測制御(mpc)は、運動計画問題に対する長年の解決策である。
0.82
The ubiquity and power of current CPUs and GPUs have enabled this method to work well in some situations.
現在のCPUとGPUの汎用性とパワーにより、この手法はいくつかの状況でうまく動作するようになった。
0.64
While this is true, optimizing for a real-time solution could often fail when considering a significant number of input parameters found in real-world driving scenarios.
Search Algorithms Traditional search algorithms such as A∗ introduce the idea of using a discrete map and exploring all possible nodes before making a specific move.
Though the exhaustive search is safest in theory, it is too slow for practical uses.
徹底的な探索は理論的には最も安全であるが、実用には遅すぎる。
0.65
On the other hand, Tesla’s current path planning system uses Monte Carlo Tree Search (MCTS) which has proven to work well in a number of use cases [6].
一方、Teslaの現在のパスプランニングシステムはMonte Carlo Tree Search(MCTS)を使用しており、多くのユースケース [6] でうまく機能することが証明されている。
0.72
3 Overview of System 3.1 Generalizing a Coordinate Frame
3 システムの概要 3.1 コーディネートフレームの一般化
0.75
We use a continuous 2-Dimensional Cartesian coordinate system to represent the position of the ego vehicle and the world around us.
我々は,エゴ車の位置と周囲の世界を表すために,連続した2次元カルト座標系を用いる。
0.78
The ego vehicle is always considered to be the centerpoint in our coordinate system.
ego車両は、常に我々の座標系の中心であると考えられている。
0.73
2
2
0.42
英語(論文から抽出)
日本語訳
スコア
This implies that all objects in the scene are always shifted about the ego vehicle.
これは、シーン内のすべてのオブジェクトが、常にエゴ車の周りに移動していることを意味する。
0.53
This coordinate frame allows the model to better learn and generalize since the initial position is fixed and are features that the model does not need to learn.
By high detail, we mean that the vehicle’s navigation system provides which particular lane the vehicle should take on any given road.
詳細は、車両のナビゲーションシステムが、車両がどの道路を走るべきかを指定できることを意味している。
0.66
We further rely on the perception module to provide the polyline coordinates for the desired centerline (of the lane provided by the navigation) and road boundaries in question.
v0 and h0 are the initial velocity and initial heading, respectively.
v0 と h0 はそれぞれ初期速度と初期方向である。
0.83
vd is the desired velocity which we assume to be determined by a blackbox system that determines the safest driving speed using the speed limit and road condition(s) at the time of operation.
We consider two additional pieces of information from the surrounding environment – road lanes and
周辺環境からの2つの追加情報 - 道路車線と道路車線 - について検討する。
0.59
objects/obstacles.
object/obstacles。
0.32
We describe these in detail in the following sections. 4.1.1 Polyline Inputs
本項で詳述する。 4.1.1ポリリン入力
0.33
Our system only considers three pieces of road structure information: the centerline for the navigation system’s desired lane to stay in, the road boundary on the left, and the road boundary on the right.
The number of polyline coordinates, n, can be chosen dependent on how granular the input of the road should be.
ポリライン座標 n の数は、道路の入力の粒度に依存する。
0.41
As an example of our lane information, if we consider a 3-lane road and our navigation system suggests the vehicle to take the rightmost lane, the provided information is the left and right road boundaries and the centerline for the rightmost lane.
We designed our motion planner in this specific way to minimize the amount of information we need to consider and still operate under safety constraints.
We further elaborate on this design methodology in section 6 and this specific example is illustrated in Figure 1.
第6節でこの設計手法をさらに詳しく説明し、この具体例を図1に示します。
0.73
4.1.2 Object Representation We represent the objects around us by encoding the positions of the k-nearest objects, O, around us.
4.1.2 オブジェクト表現 k-nearest オブジェクトの位置 O を符号化することで、周りのオブジェクトを表現する。
0.82
We choose k arbitrarily for testing but, in reality, this value should be large enough to encapsulate the number
私たちはテストのために k を任意に選ぶが、実際には、この値は数をカプセル化するのに十分な大きさであるべきである。 訳抜け防止モード: 私たちはテストのためにkを任意に選ぶ しかし実際 この値は 数字をカプセル化できるくらいの大きさで
0.71
3
3
0.42
英語(論文から抽出)
日本語訳
スコア
Figure 1: (Top) Complete road information.
図1: (トップ) 完全な道路情報。
0.78
(Bottom) Model input.
(ボトム) モデル入力。
0.73
of obstacles in any true driving scenario.
真の運転シナリオにおける障害です
0.68
The objects surrounding the ego vehicle are represented by a vector of x, y-coordinates pairs identical to the polyline representation described above.
In sparse driving conditions, such as at night, the vehicle may not encounter k objects and does not need a vector of size k for the surrounding objects/obstacles.
夜間などの粗い運転条件では、車両は k 個の物体に遭遇せず、周囲の物体や障害物に対して k サイズのベクトルを必要としない。
0.71
In such situations, all leftover slots in the vector are filled with a (0, 0) pair.
そのような状況では、ベクトルのすべての余剰スロットは (0, 0) 対で満たされる。
0.79
With this representation of objects, we can only input static features.
このオブジェクトの表現では、静的な機能しか入力できません。
0.66
We suggest that all objects be encoded with motion vectors dependent on the output from the motion prediction stack.
動き予測スタックからの出力に依存する動きベクトルで全ての物体を符号化することを提案する。
0.83
We consider dynamic obstacles to be out of scope for this project and experiment with only static objects.
我々は動的障害をこのプロジェクトの範囲外と考え、静的なオブジェクトのみで実験します。
0.82
4.2 Model Architecture
4.2 モデルアーキテクチャ
0.71
Our system comprises of three separate networks.
システムは3つの別々のネットワークから成り立っている。
0.60
The first two of the networks are multi-layer perceptrons (MLP) used for embedding the road information and the object instances.
This allows us to explicitly define the acceleration and steering constraints outside of the model by multiplying the control value by its constraint value.
For e g , if the vehicle has an acceleration constraint of 3m/s2 and a turning constraint of 40◦, we multiply each ai by 3 and each ˙θi by 40 to impose the constraints.
4.3 Loss Function We define our loss to consider a few factors to optimize over.
4.3 損失機能 私たちは損失を最適化するいくつかの要因を検討するために定義します。
0.66
The motion planner should avoid all objects surrounding the vehicle, drive at the speed of the desired velocity, avoid the road boundaries, and follow the navigation system’s centerline (the vehicle’s guiding path to get to a final destination).
(3) Ecte is a function that describes the vehicles shortest distance from the centerline at time t, Ehe is a function that returns the heading error between the vehicle’s heading and the heading of the road at time t, and Eve is the velocity error at time t.
Additionally, Ecollision returns the summed distance from the ego to all k-nearest neighbors at time t.
さらに、ecollision は時間 t において ego からすべての k-nearest 近傍への合計距離を返す。
0.66
The formula describing Ecollision is Ecollision(t) =
エコリシオンを記述する式は ecollision(t) =
0.73
(cid:88) e5−od
(cid:88) e5-od
0.29
o∈O where od is the distance between the ego and object o at time t.
オノオ ここで od は、時刻 t における ego と対象 o の間の距離である。
0.52
We use a shifted and scaled exponential function to imply that a small distance to objects should be classified as ”reckless” driving and result in a larger loss.
The shifting and scaling coefficients were determined empirically.
シフト係数とスケーリング係数は経験的に決定された。
0.61
Another thing to address is that we do not consider the distance of any objects with position (0, 0) (the same as the ego) since we mentioned above these are just placeholders in the vector and not truly objects.
(5) where dist and midpoint are the formulas to compute the Euclidean distance between two points and finding the midpoint between two points, respectively.
In our implementation, we compute this by looping through each consecutive pair of polyline points while calculating the distance of the vehicle from the segment’s midpoint and returning the smallest of distances between the vehicle position.
Additionally, semi-supervised learning enables the model to learn directly from examples of human driving and attempt to mimic the actions and ”thought processes” behind deciding which path to make while driving.
We found that the model tended to converge quickly, within 300 to 400 iterations, even in complex situations.
複雑な状況でも,300~400回のイテレーションでモデルが急速に収束する傾向にあったのです。
0.77
Additionally, we observed that the model made interesting decisions in unusual scenarios.
さらに、このモデルは異常なシナリオで興味深い決定をした。
0.65
For instance, when we pointed the vehicle’s heading almost orthogonal to the road’s heading and near the boundary line, it learned to first reverse towards the centerline to adjust its heading before proceeding to move forward.
We found this observation most interesting given that our loss function accounts for velocity error.
我々の損失関数が速度誤差の原因であることを考えると、この観測は最も興味深い。
0.65
The model was able to foresee that the loss becomes minimized with this action first.
このモデルは、最初にこのアクションで損失が最小になることを予測できた。
0.76
We believe this is as a result of scaling each term at time t by the value of t.
これはtの値によって時間tで各項をスケーリングした結果であると考えています。
0.66
6 Design Decision Making Now that we have introduced our loss function in 4.3, we continue our discussion of incorporating only a single centerline and disregarding other lane information.
The main part of this design methodology was to simplify the inputs to the model while enabling us to completely remove rule-based operations for the vehicle.
In our design, the vehicle will automatically trigger a safe lane change if there is enough space on the left or right of the vehicle since the desired velocity is not achieved.
The vehicle passes around the slow vehicle, being sure to avoid other impediments, and returns back to its original lane to minimize the distance to the centerline which is described in the loss function.
Another example would be automatically changing lanes from the rightmost lane to left lane in preparation for a left turn.
もう1つの例は、左折に備えて、右端の車線から左端の車線に自動的に変更することである。
0.64
In this situation, the navigation system will relay information of a new centerline to the model, the left lane’s centerline.
この状況では、ナビゲーションシステムは、左車線の中心線であるモデルに新しい中心線の情報を中継する。
0.73
Rather than explicitly telling the vehicle to make two consecutive lane changes from a rule-based planner, our system should determine when and how to make a safe lane changes to minimize the vehicle’s distance to the provided centerline.
Making the left turn works similarly – we map a centerline of the turn lane and the the vehicle should make a turn that minimizes the distance between itself and the centerline of the turn lane.
The model will slow down and not cross the ”obstacles” since there is no path around them the vehicle can safely take.
このモデルは減速し、車の周りには安全に乗れる経路がないため、”オブスタクルス”を横切ることはない。
0.69
This is another example of why our model can be more efficient than rule-based decision where engineers need not handcraft every rule for all edge cases.
7 Results We perform a small case study by simulating a few complex scenarios (including some of the aforementioned scenarios in the previous section) and our model’s outputted trajectories in response to the environment it is placed in.
Additionally, our model is able to learn to maneuver in situations that are often dictated by rule-based algorithms suggesting that rule-based models can be eliminated fully.