This work considers identifying parameters characterizing a physical system's
dynamic motion directly from a video whose rendering configurations are
inaccessible. Existing solutions require massive training data or lack
generalizability to unknown rendering configurations. We propose a novel
approach that marries domain randomization and differentiable rendering
gradients to address this problem. Our core idea is to train a
rendering-invariant state-prediction (RISP) network that transforms image
differences into state differences independent of rendering configurations,
e.g., lighting, shadows, or material reflectance. To train this predictor, we
formulate a new loss on rendering variances using gradients from differentiable
rendering. Moreover, we present an efficient, second-order method to compute
the gradients of this loss, allowing it to be integrated seamlessly into modern
deep learning frameworks. We evaluate our method in rigid-body and
deformable-body simulation environments using four tasks: state estimation,
system identification, imitation learning, and visuomotor control. We further
demonstrate the efficacy of our approach on a real-world example: inferring the
state and action sequences of a quadrotor from a video of its motion sequences.
Compared with existing methods, our approach achieves significantly lower
reconstruction errors and has better generalizability among unknown rendering
configurations.
Joshua B. Tenenbaum MIT BCS, CBMM, CSAIL jbt@mit.edu
Joshua B. Tenenbaum MIT BCS, CBMM, CSAIL jbt@mit.edu
0.48
Wojciech Matusik MIT CSAIL wojciech@csail.mit.e du
MIT CSAIL wojciech@csail.mit.e du
0.38
Chuang Gan MIT-IBM Watson AI Lab ganchuang@csail.mit. edu
Chuang Gan MIT-IBM Watson AI Lab ganchuang@csail.mit. edu
0.37
ABSTRACT This work considers identifying parameters characterizing a physical system’s dynamic motion directly from a video whose rendering configurations are inaccessible.
We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem.
本稿では、ドメインのランダム化と微分可能なレンダリング勾配を組み合わせた新しい手法を提案する。
0.60
Our core idea is to train a rendering-invariant state-prediction (RISP) network that transforms image differences into state differences independent of rendering configurations, e g , lighting, shadows, or material reflectance.
Moreover, we present an efficient, second-order method to compute the gradients of this loss, allowing it to be integrated seamlessly into modern deep learning frameworks.
We evaluate our method in rigid-body and deformable-body simulation environments using four tasks: state estimation, system identification, imitation learning, and visuomotor control.
We further demonstrate the efficacy of our approach on a real-world example: inferring the state and action sequences of a quadrotor from a video of its motion sequences.
Compared with existing methods, our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations1 .
1 INTRODUCTION Reconstructing dynamic information about a physical system directly from a video has received considerable attention in the robotics, machine learning, computer vision, and graphics communities.
This problem is fundamentally challenging because of its deep coupling among physics, geometry, and perception of a system.
この問題は、物理学、幾何学、およびシステムの知覚の深い結合のため、基本的に困難である。
0.69
Traditional solutions like motion capture systems (Vicon; OptiTrack; Qualisys) can provide high-quality results but require prohibitively expensive external hardware platforms.
More recent development in differentiable simulation and rendering provides an inexpensive and attractive alternative to the motion capture systems and has shown promising proof-of-concept results (Murthy et al , 2020).
微分可能シミュレーションとレンダリングのより最近の開発は、モーションキャプチャシステムに代わる安価で魅力的な代替となり、概念実証(murthy et al , 2020)の有望な結果を示している。
0.60
However, existing methods in this direction typically assume the videos come from a known renderer.
しかし、この方向の既存の方法は通常、ビデオは既知のレンダラーから来ていると仮定する。
0.64
Such an assumption limits their usefulness in inferring dynamic information from an unknown rendering domain, which is common in real-world applications due to the discrepancy between rendering and real-world videos.
Existing techniques for aligning different rendering domains, e g , CycleGAN (Zhu et al , 2017), may help alleviate this issue, but they typically require access to the target domain with massive data, which is not always available.
さまざまなレンダリングドメイン(例えばcyclegan (zhu et al , 2017)をアライメントするための既存のテクニックは、この問題を緩和するのに役立つかもしれないが、通常、ターゲットドメインへのアクセスには大量のデータを必要とする。
0.72
To our best knowledge, inferring dynamic parameters of a physical system directly from videos under unknown rendering conditions remains far from being solved, and our work aims to fill this gap.
Each environment then aims to find proper system and control parameters to simulate and render the physical system (middle) so that it matches the dynamic motion of a reference video (bottom) with unknown rendering configurations.
Domain randomization is a classic technique for transferring knowledge between domains by generating massive samples whose variances can cover the discrepancy between domains.
We upgrade it with two key innovations: First, we notice that image differences are sensitive to changes in rendering configurations, which shadows the renderinginvariant, dynamics-related parameters that we genuinely aim to infer.
This observation motivates us to propose a rendering-invariant state predictor (RISP) that extracts state information of a physical system from videos.
本研究では,映像から物理系の状態情報を抽出するレンダリング不変状態予測器(RISP)を提案する。
0.78
Our second innovation is to leverage rendering gradients from a differentiable renderer.
第2のイノベーションは、微分可能なレンダラからのレンダリング勾配を活用することです。
0.46
Essentially, requiring the output of RISP to be agnostic to rendering configurations equals enforcing its gradients for rendering parameters to be zero.
Putting all these ideas together, we develop a powerful pipeline that effectively infers parameters of a physical system directly from video input under random rendering configurations.
We demonstrate the efficacy of our approach on a variety of challenging tasks evaluated in four environments (Sec. 4 and Fig 1) as well as in a real-world application (Fig. 4).
The experimental results show that our approach outperforms the state-of-the-art techniques by a large margin in most of these tasks due to the inclusion of rendering gradients in the training process.
In summary, our work makes the following contributions: • We investigate and identify the bottleneck in inferring state, system, and control parameters of
Such additional gradient information connects simulation tasks with numerical optimization techniques.
このような勾配情報が数値最適化手法とシミュレーションタスクを結びつける。
0.64
Previous works have demonstrated the power of gradients from differentiable simulators in rigid-body dynamics (Geilinger et al , 2020; Degrave et al , 2019; de Avila Belbute-Peres et al , 2018; Xu et al , 2021; Hong et al , 2021; Qiao et al , 2021a), deformable-body dynamics (Du et al , 2021b; Huang et al , 2020; Du et al , 2021a; Hu et al , 2019b; Gan et al , 2021; Hahn et al ,
従来の研究は、剛体力学における微分可能シミュレータ(Geilinger et al , 2020; Degrave et al , 2019; de Avila Belbute-Peres et al , 2018; Xu et al , 2021; Hong et al , 2021; Qiao et al , 2021a), deformable-body dynamics (Du et al , 2021b; Huang et al , 2020; Du et al , 2021a; Hu et al , 2019b; Gan et al , 2021b; Hahn et al , 2021a)の勾配の力を示した。
0.91
2
2
0.42
英語(論文から抽出)
日本語訳
スコア
Published as a conference paper at ICLR 2022
iclr 2022の会議論文として発表
0.69
Figure 2: An overview of our method (Sec. 3).
図2:我々の方法の概要(sec.3)。
0.69
We first train RISP using images rendered with random states and rendering parameters (top).
We then append RISP to the output of a differentiable renderer, leading to a fully differentiable pipeline from system and control parameters to states predicted from images (middle).
Given reference images generated from unknown parameters (dashed gray boxes) in the target domain (bottom), we feed them to RISP and minimize the discrepancies between predicted states (rightmost green-gray box) to reconstruct the underlying system parameters, states, or actions.
2019; Ma et al , 2021; Qiao et al , 2021b), fluids (Du et al , 2020; McNamara et al , 2004; Hu et al , 2019a), and co-dimensional objects (Qiao et al , 2020; Liang et al , 2019).
2019年、Ma et al , 2021b、Qiao et al , 2021b、流体(Du et al , 2020、McNamara et al , 2004、Hu et al , 2019a)、および共次元物体(Qiao et al , 2020、Liang et al , 2019)。
0.41
We make heavy use of differentiable simulators in this work but our contribution is orthogonal to them: we treat differentiable simulation as a black box, and our proposed approach is agnostic to the choice of simulators.
Differentiable rendering Differentiable rendering offers gradients for rendering inputs, e g , lighting, materials, or shapes (Ramamoorthi et al , 2007; Li et al , 2015; Jarosz et al , 2012).
微分可能レンダリング 微分可能レンダリングは、例えば、照明、材料、形状といった入力をレンダリングするための勾配を提供する(ramamoorthi et al , 2007; li et al , 2015; jarosz et al , 2012)。
0.67
The state-of-the-art differentiable renderers (Li et al , 2018; Nimier-David et al , 2019) are powerful in handling gradients even with global illumination or occlusion.
最先端の差別化可能なレンダラー(Li et al , 2018; Nimier-David et al , 2019)は、世界的な照明や閉塞でも勾配を扱うのに強力である。
0.63
Our work leverages these renderers but with a different focus on using their gradients as a physics prior in a learning pipeline.
Domain randomization The intuition behind domain randomization (Tobin et al , 2017; Peng et al , 2018; Andrychowicz et al , 2020; Sadeghi & Levine, 2017; Tan et al , 2018) is that a model can hopefully cross the domain discrepancy by seeing a large amount of random data in the source domain.
ドメインランダム化 ドメインランダム化の背後にある直感(Tobin et al , 2017; Peng et al , 2018; Andrychowicz et al , 2020; Sadeghi & Levine, 2017; Tan et al , 2018)は、ソースドメインで大量のランダムデータを見ることによって、モデルがドメインの矛盾を克服できることを願っている。
0.84
This often leads to robust but conservative performance in the target domain.
これはしばしば、ターゲットドメインで堅牢だが保守的なパフォーマンスをもたらす。
0.70
The generalizability of domain randomization comes from a more robust model that attempts to absorb domain discrepancies by behaving conservatively, while the generalizability of our method comes from a more accurate model that aims to match first-order gradient information.
3 METHOD Given a video showing the dynamic motion of a physical system, our goal is to infer the unknown state, system, or control parameters directly from the video with partial knowledge about the physics model and rendering conditions.
Specifically, we assume we know the governing equations of the physical system (e g , Newton’s law for rigid-body systems) and the camera position, but the exact system, control, or rendering parameters are not exposed.
Next, the RISP network learns to reconstruct the state information from these generated images.
次に、RISPネットワークは、生成された画像から状態情報を再構築することを学ぶ。
0.69
Putting 3 TrainingdataDifferen tiable renderer(training)Ra ndom statesRandom rendering parametersSource domain (training)Source domainActionsSystem parametersDifferenti able simulationDifferenti able rendererPredicted statesStatesRenderin g parametersImagesRISP Target domainImagesPredicte d statesRendererAction sSystem parametersSimulation StatesRendering parametersRISPLoss
パッティング 3 TrainingdataDifferen tiable renderer(training)Ra ndom statesRandom rendering parametersSource domain(training)Sour ce domainActionsSystem parametersDifferenti able SimulationDifferenti able rendererPredicted stateStatesRendering parametersImagesRISP Target domain ImagesPredicted statesRendererAction sSystem parametersSimulation StatesRendering parametersRISPLoss
0.42
英語(論文から抽出)
日本語訳
スコア
Published as a conference paper at ICLR 2022
iclr 2022の会議論文として発表
0.69
these two components together, we have a pipeline that can faithfully recover dynamic information of a physical system from a new video with unseen rendering configurations.
3.1 DIFFERENTIABLE SIMULATION AND RENDERING ENGINE Given a physical system with known dynamic model M, we first use a differentiable simulator to simulate its states based on action inputs at each time step after time discretization:
3.1 動的モデル M を持つ物理系が与えられた場合、まず、時間離散化後の各ステップにおける動作入力に基づいて、その状態をシミュレートするために微分可能なシミュレータを使用する。
0.77
(1) where N is the number of time steps in a rollout of physics simulation, and si, si+1 and ai represent the state and action vectors at the corresponding time steps, respectively.
1) N が物理シミュレーションのロールアウトにおける時間ステップの数であり、si, si+1, ai はそれぞれ対応する時間ステップにおける状態ベクトルと作用ベクトルを表す。
0.86
The φ vector encodes the system parameters in the model, e g , mass, inertia, and elasticity.
φベクトルは、モデル内のシステムパラメータ、例えば、質量、慣性、弾性を符号化する。
0.72
Next, we apply a differentiable renderer R to generate an image Ii for each state si:
次に、各状態 si に対して画像 Ii を生成するために微分可能なレンダラ R を適用する。 訳抜け防止モード: 次に、微分可能なレンダラーRを適用する。 to generate a image Ii for each state si
0.75
si+1 = Mφ(si, ai),
si+1 = Mφ(si, ai)
0.48
∀i = 0, 1,··· , N − 1,
i = 0, 1,···· , n − 1 である。
0.64
(2) Here, ψ is a vector encoding rendering parameters whose gradients are available in the renderer R. Examples of ψ include light intensity, material reflectence, or background color.
In other words, given an initial state s0 and a sequence of actions {ai}, we generate a sequence of states {si} from simulation and renders the corresponding image sequence {Ii}.
The task of recovering unknown information from a reference video {Iref
参照映像から未知の情報を回収する作業
0.54
i } can be formulated as follows:
i } は次のように定式化できる。
0.60
min s0,{ai},φ,ψ s.t.
ミン s0,{ai}, φ,ψ s.t。
0.68
i },{Ii}),
i },{ii})。
0.27
L({Iref {Ii} = Rψ[Mφ(s0,{ai})],
L({Iref {Ii} = R\[Mφ(s0,{ai})]
0.37
(5) where L is a loss function penalizing the difference between the generated images and their references.
(5)Lは、生成された画像と参照との差をペナル化する損失関数である。
0.75
Assuming that the simulator M and the renderer R are differentiable with respect to their inputs, we can run gradient-based optimization algorithms to solve Eqn.
(4). This is essentially the idea proposed in ∇Sim, the state-of-the-art method for identifying parameters directly from video inputs (Murthy et al , 2020).
(4). これは基本的に、ビデオ入力から直接パラメータを識別する最先端の方法であるシム(murthy et al, 2020)で提案されているアイデアである。
0.53
Specifically, ∇Sim defines L as a norm on pixelwise differences.
具体的には、sim は L を画素差のノルムとして定義する。
0.55
One major limitation in Eqn.
eqnの1つの大きな制限。
0.69
(4) is that it expects reasonably similar initial images {Ii} and references {Iref i } to successfully solve the optimization problem.
(4) 類似の初期画像 {ii} と参照 {iref i } が最適化問題をうまく解くことを期待している。
0.79
Indeed, since the optimization problem is highly nonlinear due to its coupling between simulation and rendering, local optimization techniques like gradient-descent can be trapped into local minima easily if {Ii} and {Iref i } are not close enough.
実際、最適化問題はシミュレーションとレンダリングのカップリングによって非常に非線形であるため、{Ii} と {Iref i } が十分接近していない場合、勾配線のような局所最適化手法は局所ミニマに容易に閉じ込められる。
0.75
While ∇Sim has reported promising results when {Ii} and {Iref i } are rendered with moderately different ψ, we found in our experiments that directly optimizing L defined on the image space rarely works when the two rendering domains are vastly different (Fig. 1).
Sim は {Ii} と {Iref i } が適度に異なるときの有望な結果を報告しているが、我々の実験では、画像空間上で定義される L を直接最適化することは、2つのレンダリング領域が大きく異なるときにはほとんど機能しない(図1)。
0.71
Therefore, we believe it requires a fundamentally different solution, motivating us to propose RISP in our method.
したがって,本手法でrispを提案する動機づけとなる,根本的に異なる解が必要であると考える。
0.59
3.2 THE RISP NETWORK
3.2 RISP ネットワーク
0.87
The difficulty of generalizing Eqn.
Eqn の一般化の難しさ。
0.69
(4) across different rendering domains is partially explained by the fact that the loss L is defined on the differences in the image space, which is sensitive to changes in rendering configurations.
To address this issue, we notice from many differentiable simulation papers that a loss function in the state space is fairly robust to random initialization (Du et al , 2020; Liang et al , 2019), inspiring us to redefine L in a state-like space.
この問題に対処するため、多くの微分可能なシミュレーション論文から、状態空間における損失関数がランダム初期化(du et al , 2020; liang et al , 2019)にかなり頑健であることに気付き、l を状態様空間に再定義するように促す。
0.76
More concretely, we introduce the RISP network N that takes as input an image I and outputs a state prediction ˆs = N (I).
より具体的には、画像Iの入力としてRISPネットワークNを導入し、状態予測をN(I)に出力する。
0.76
We then redefine the optimization problem in Eqn.
次に最適化問題をeqnで再定義する。
0.68
(4) as follows (Fig. 2):
(4)次の通りである(第2図)。
0.74
(7) Note that the network Nθ, parametrized by θ, is pre-trained and fixed in this optimization problem.
Essentially, Eqn. (6) maps the two image sequences to the predicted state space, after which the
基本的にはEqn。 (6) 2つの画像列を予測状態空間にマッピングし、そのあとに、
0.71
min s0,{ai},φ,ψ s.t.
ミン s0,{ai}, φ,ψ s.t。
0.68
L(Nθ({Iref i }),Nθ({Ii})), {Ii} = Rψ[Mφ(s0,{ai})].
L(Nθ({Iref i }),Nθ({Ii})), {Ii} = R\[Mφ(s0,{ai})] である。
0.77
(3) (4) (6)
(3) (4) (6)
0.42
4
4
0.42
英語(論文から抽出)
日本語訳
スコア
Published as a conference paper at ICLR 2022
iclr 2022の会議論文として発表
0.69
standard gradient-descent optimization follows.
標準勾配差の最適化は以下のとおりである。
0.35
A well-trained network N can be interpreted as an “inverse renderer” R−1 that recovers the rendering-invariant state vector regardless of the choice of rendering parameters ψ, allowing Eqn.
(6) to match the information behind two image sequences {Ii} and {Iref i } even when they are generated from different renderers Rψ.
(6) 2つの画像列 {ii} と {iref i } の背後にある情報を異なるレンダラ rψ から生成しても一致させる。
0.79
Below, we present two ideas to train the network N :
以下に、ネットワークNをトレーニングする2つのアイデアを紹介する。
0.69
The first idea: domain randomization Our first idea is to massively sample state-rendering pairs (sj, ψj) and render the corresponding image Ij = Rψj (sj), giving us a training set D = {(sj, ψj, Ij)}.
ドメインランダム化 最初のアイデアは、状態レンダリングペア (sj, sj) を大量にサンプリングし、対応する像 Ij = R j (sj) を描画し、トレーニングセット D = {(sj, sj, Ij)} を与えることである。
0.76
We then train N to minimize the prediction error:
予測誤差を最小限に抑えるために n を訓練します
0.62
Lerror(θ,D) =
エラー(θ,D) =
0.71
(cid:107)sj − Nθ(Ij)(cid:107)1
(cid:107)sj − Nθ(Ij)(cid:107)1
0.43
. (8) (cid:88)
. (8) (cid:88)
0.41
(sj ,ψj ,Ij )∈D
(sj ,sj ,Ij )公開されている。
0.55
(cid:124) (cid:123)(cid:122)
(cid:124) (cid:123)(cid:122)
0.38
Lerror j (cid:125)
エラー j (明暦125年)
0.49
The intuition is straightforward: Nθ learns to generalize over rendering configurations because it sees images generated with various rendering parameters ψ.
This is exactly the domain randomization idea (Tobin et al , 2017), which we borrow to solve our problem across different rendering domains.
これはまさにドメインランダム化のアイデア(Tobin et al , 2017)で、異なるレンダリング領域にわたる問題の解決に使用しています。
0.70
The second idea: rendering gradients One major bottleneck in domain randomization is its needs for massive training data that spans the whole distribution of rendering parameters ψ.
The intuition is that by suppressing this Jacobian to zero, we encourage the network N to flatten out its landscape along the dimension of rendering parameters ψ, and invariance across rendering configurations follows.
直観は、このヤコビアンを 0 に抑えることにより、ネットワーク N は、レンダリングパラメータ ? の次元に沿ってその風景を平らにし、レンダリング構成間の不変性も従うことを奨励する。
0.56
To implement this loss, we apply the chain rule:
この損失を実装するために、チェーンルールを適用します。
0.58
∂Nθ(Rψj (sj))
∂nθ(rψj(sj))
0.36
∂ψj ∂Nθ(Ij) ∂ψj
・・・・ ・・Nθ(Ij) ・・・
0.34
= = ∂Nθ(Ij)
= = Ij (複数形 Ijs)
0.41
∂Ij ∂Ij ∂ψj
∂Ij Ij (複数形 Ijs)
0.49
, (11) ∂Ij
, (11) ∂Ij
0.54
where the first term ∂Nθ (Ij ) is available in any modern deep learning frameworks and the second term ∂Ij can be obtained from the state-of-the-art differentiable renderer (Nimier-David et al , 2019).
第1項 ∂nθ (ij ) は現代のディープラーニングフレームワークで利用可能であり、第2項 ∂ij は最先端の微分可能レンダラ (nimier-david et al , 2019) から得られる。
0.59
∂ψj We can now see more clearly the intuition behind RISP: it requires the network’s sensitivity about input images to be orthogonal to the direction that rendering parameters can influence the image, leading to a rendering-invariant prediction.
We stress that the design of this new loss in Eqn.
Eqnにおけるこの新たな損失の設計を強調します。
0.73
(10) is non-trivial.
(10)は非自明である。
0.66
In fact, both Lerror and Lreg have their unique purposes and must be combined: Lerror encourages N to fit its output to individually different states, and Lreg attempts to smooth out its output along the ψ dimension.
実際、Lerror と Lreg はそれぞれ独自の目的を持ち、組み合わせなければならない: Lerror は N の出力を個別に異なる状態に合わせることを奨励し、Lreg はその出力を n 次元に沿って滑らかにしようとする。
0.75
Specifically, Lreg cannot be optimized as a standalone loss because it leads to a trivial solution of N always predicting constant states.
特に、Lreg は N の自明な解が常に定常状態を予測するので、独立損失として最適化することはできない。
0.75
Putting Lerror and Lreg together forces them to strike a balance between predicting accurate states and ignoring noises from rendering conditions, leading to a network N that truly learns the “inverse renderer” R−1.
It remains to show how to compute the gradient of the regularizer Lreg with respect to the network parameters θ, which is required by gradient-based optimizers to minimize this new loss.
As the loss definition now includes first-order derivatives, computing its gradients involves second-order partial derivatives, which can be time-consuming if implemented carelessly with multiple loops.
Our last contribution is to provide an efficient method for computing this gradient, which can be fully implemented with existing frameworks (PyTorch and mitsuba-2 in our experiments): Theorem 1 Assuming forward mode differentiation is available in the renderer R and reverse mode differentiation is available in the network N , we can compute a stochastic gradient ∂Lreg ∂θ in O(|s||θ|)
time per image using pre-computed data occupying O((cid:80)
O((cid:80)を占有する事前計算データを用いた画像毎の時間
0.62
j |ψj||Ij|) space.
j|Ij|) 空間。
0.78
5
5
0.42
英語(論文から抽出)
日本語訳
スコア
Published as a conference paper at ICLR 2022
iclr 2022の会議論文として発表
0.69
In particular, we stress that computing the gradients of Lreg does not require second-order gradients in the renderer R, which would exceed the capability of all existing differentiable renderers we are aware of.
Therefore, we use a slightly different regularizer in our implementation:
したがって、我々は実装にわずかに異なる正規化子を使用します。
0.58
Ltrain(θ,D) = Lerror + γ
ltrain(θ,d) = lerror + γ
0.81
(cid:107).
(cid:107)。
0.78
(12) (cid:88)
(12) (cid:88)
0.41
(sj ,ψj ,Ij )∈D
(sj ,sj ,Ij )公開されている。
0.55
(cid:107) ∂Lerror
(出典:107)∂Lerror
0.67
j ∂ψj
j (複数形 js)
0.20
In other words, we instead encourage the state prediction error to be rendering-invariant.
言い換えれば、状態予測エラーをレンダリング不変にすることを推奨するのです。
0.63
It can be seen from the proof in Theorem 1 that this new regularizer requires only O(θ) time to compute its gradients, and we have found empirically that the performance of this new regularizer is comparable to Eqn.
We leave a theoretical analysis of the two regularizers to future work.
2つの正則化器の理論解析を今後の研究に残す。
0.68
4 EXPERIMENTS In this section, we conduct various experiments to study the following questions: • Q1: Is pixelwise loss on videos across rendering domains sufficient for parameter prediction?
• Q2: If pixelwise loss is not good enough, are there other competitive alternatives to the state-
• q2:pixelwiseの損失が不十分なら、他の競合の選択肢はあるだろうか?
0.75
prediction loss in our approach?
私たちのアプローチにおける予測損失?
0.66
• Q3: Is the regularizer on rendering gradients in our loss necessary?
• q3: 損失の勾配をレンダリングするためのレギュレータは必要か?
0.76
• Q4: How does our approach compare with directly optimizing state discrepancies?
• Q4: このアプローチは、状態の相違を直接最適化するのと比べてどうですか?
0.56
• Q5: Is the proposed method applicable to real-world scenarios?
• q5:提案手法は現実のシナリオに適用可能か?
0.85
We address the first four questions using the simulation environment described in Sec. 4.1 and answer the last question using a real-world application at the end of this section.
More details about the experiments, including ablation study, can be found in Appendix.
アブレーション研究を含む実験の詳細は、Appendixで見ることができる。
0.72
4.1 EXPERIMENTAL SETUP Environments We implement four environments (Fig. 1): a rigid-body environment without contact (quadrotor), a rigid-body environment with contact (cube), an articulated body (hand), and a deformable-body environment (rod).
Each environment contains a differentiable simulator (Xu et al , 2021; Du et al , 2021b) and a differentiable renderer (Li et al , 2018).
各環境は、微分可能なシミュレータ(Xu et al , 2021; Du et al , 2021b)と微分可能なレンダラー(Li et al , 2018)を含む。
0.74
We deliberately generated the training set in Sec. 3 using a different renderer (Nimier-David et al , 2019) and used different distributions when sampling rendering configurations in the training set and the environments.
異なるレンダラ(nimier-david et al , 2019)を使用して、故意にsec.3のトレーニングセットを作成し、トレーニングセットと環境におけるレンダリング設定のサンプリングに異なるディストリビューションを使用しました。
0.67
Tasks We consider four types of tasks defined on the physical systems in all environments: state estimation, system identification, imitation learning, and visuomotor control.
タスク 我々は, 状態推定, システム同定, 模倣学習, visuomotor control の4つの物理システム上で定義されたタスクについて考察する。
0.79
The state estimation task (Sec. 4.2) require a model to predict the state of the physical system from a given image and serves as a prerequisite for the other downstream tasks.
The system identification (Sec. 4.3) and imitation learning (Sec. 4.4) tasks aim to recover the system parameters and control signals of a physical system from the video, respectively.
Finally, in the visuomotor control task (Appendix), we replace the video with a target image showing the desired state of the physical system and aim to discover the proper control signals that steer the system to the desired state.
最後に、ビジュモータ制御タスク(Appendix)において、物理系の所望の状態を示すターゲット画像に置き換え、所望の状態まで制御する適切な制御信号を見つけることを目的とする。 訳抜け防止モード: 最後に,visuomotor control task ( appendix) において,映像を物理システムの所望の状態を示す対象画像に置き換える。 そして、システムを所望の状態に制御する適切な制御信号を見つけることを目指す。
0.80
In all tasks, we use a photorealistic renderer (Pharr et al , 2016) to generate the target video or image.
すべてのタスクにおいて、ターゲットのビデオや画像を生成するためにフォトリアリスティックレンダラー(Pharr et al , 2016)を使用します。
0.71
Baselines We consider two strong baselines: The pixelwise-loss baseline is used by ∇Sim (Murthy et al , 2020), which is the state-of-the-art method for identifying system parameters directly from video inputs.
pixelwise-lossのベースラインは、ビデオ入力から直接システムパラメータを識別するための最先端の方法である、sim (murthy et al , 2020) によって使用されている。
0.68
We implement ∇Sim by removing RISP from our method and backpropagating the pixelwise loss on images through differentiable rendering and simulation.
We run this baseline to analyze the limit of pixelwise loss in downstream tasks (Q1).
このベースラインを実行して、下流タスクにおける画素単位の損失の限界を分析する(Q1)。
0.67
The second baseline is preceptualloss (Johnson et al , 2016), which replaces the pixelwise loss in ∇Sim with loss functions based on high-level features extracted by a pre-trained CNN.
第2のベースラインはプリセプチュアルロス(Johnson et al , 2016)であり、事前訓練されたCNNによって抽出された高次特徴に基づく損失関数により、シムのピクセル単位の損失を置き換える。
0.68
By comparing this baseline with our method, we can justify why we choose to let RISP predict states instead of other perceputal features (Q2).
We also include two weak baselines used by ∇Sim : The average baseline is a deterministic method that always returns the average quantity observed from the training set, and the random baseline returns a guess randomly drawn from the data distribution used to generate the training set.
We use these two weak baselines to avoid designing environments and tasks that are too trivial to solve.
これら2つの弱いベースラインを使用して、解決しづらい環境やタスクの設計を避けます。
0.72
Our methods We consider two versions of our methods in Sec. 3: ours-no-grad implements the domain randomization idea without using the proposed regularizers, and ours is the full approach that includes the regularizer using rendering gradients.
Oracle Throughout our experiments, we also consider an oracle method that directly minimizes the state differences obtained from simulation outputs without further rendering.
In particular, this oracle sees the ground-truth states for each image in the target video or image, which is inaccessible to all baselines and our methods.
We consider this approach to be an oracle because it is a subset of our approach that involve differentiable physics only, but it needs a perfect state-prediction network.
This oracle can give us an upper bound for the performance of our method (Q4).
このオラクルは、私たちのメソッド(q4)のパフォーマンスの上限を与えてくれます。
0.73
Training We build our RISP network and baselines upon a modified version of ResNet-18 (He et al , 2016) and train them with the Adam optimizer (Kingma & Ba, 2014).
トレーニング RISPネットワークを構築し、ResNet-18の修正版(He et al , 2016)に基づいてベースラインを構築し、Adam Optimizationr(Kingma & Ba, 2014)でトレーニングします。
0.81
We report more details of our network architecture, training strategies, and hyperparameters in Appendix.
Each entry in the table reports the mean and standard deviation of the state estimation error computed from 800 images under 4 rendering configurations.
In this task, we predict the states of the physical system from randomly generated images and report the mean and standard deviation of the state prediction errors in Table 1.
The two weak baselines perform poorly, confirming that this state-estimation task cannot be solved trivially.
2つの弱いベースラインは、この状態推定タスクが自明に解決できないことを確認する。
0.74
We highlight that our method with the rendering-gradient loss predicts the most stable and accurate state of the physical system across the board, which strongly demonstrates that RISP learns to make predictions independent of various rendering configurations.
Our system identification task aims to predict the system parameters of a physical system, e g , mass, density, stiffness, or elasticity, by watching a reference video with known action sequences.
We repeat this experiment 4 times with randomly generated rendering conditions and initial guesses and report the mean and standard deviation of each system parameter in Table 2.
The near-perfect performance of the oracle suggests that these system identification tasks are feasible to solve as long as a reliable state estimation is available.
Both of our methods outperform almost all baselines by a large margin, sometimes even by orders of magnitude.
どちらの方法もほぼすべてのベースラインを大きなマージンで上回り、時には桁違いにも上回っています。
0.60
This is as expected, since the previous task already suggests that our methods can predict states from a target video much more accurate than baselines, which is a crucial prerequisite for solving system identification.
We show the motions generated using a randomly chosen initial guess of the actions (top row) and optimized actions using our method with rendering gradients (middle row).
Our imitation learning tasks consider the problem of emulating the dynamic motion of a reference video.
我々の模倣学習タスクは、参照ビデオの動的動きをエミュレートする問題を考える。
0.73
The experiment setup is similar to the system identification task except that we swap the known and unknown variables in the environment: The system parameters are now known, and the goal is to infer the unknown sequence of actions from the reference video.
We report the results in Table 3 and find that our method with rendering gradients (ours in the table) achieves much lower errors, indicating that we resemble the motions in the video much more accurately.
The errors from pixelwise-loss have smaller variations across rendering domains but are larger than ours, indicating that it struggles to solve this task under all four rendering configurations.
In addition, the oracle finds a sequence of actions leading to more similar motions than our method, but it requires full and accurate knowledge of the state information which is rarely accessible from a video.
We visualize our results in the hand environment in Fig 3.
図3で、手作業環境での結果を視覚化します。
0.59
4.5 A REAL-WORLD APPLICATION Finally, we evaluate the efficacy of our approach on a real-world application: Given a video of a flying quadrotor, we aim to build its digital twin that replicates the video motion by inferring a reasonable sequence of actions.
This real-to-sim transfer problem is challenging due to a few reasons: First, the real-world video contains complex textures, materials, and lighting conditions unknown to us and unseen by our differentiable renderer.
Second, the quadrotor’s real-world dynamics is polluted by environmental noises and intricate aerodynamic effects, which are ignored in our differentiable simulation environment.
Despite these challenges, our method achieved a qualitatively good result with a standard differentiable rigid-body simulator and renderer, showing its generalizability across
Each entry reports the mean and standard deviation of the state discrepancy computed from 4 randomly generated initial guesses and rendering conditions.
We illustrate the motions reconstructed using our method (bottom row).
本手法(ボトム行)を用いて再構成した動きについて述べる。
0.59
5 CONCLUSIONS, LIMITATIONS, AND FUTURE WORK
5つの結論,限界,今後の課題
0.62
We proposed a framework that integrates rendering-invariant state-prediction into differentiable simulation and rendering for cross-domain parameter estimation.
The experiments in simulation and in real world have shown that our method is more robust than pixel-wise or perceptual losses on unseen rendering configurations.
Despite its promising results, RISP still has a few limitations.
有望な結果にもかかわらず、RISPにはいくつかの制限がある。
0.49
Firstly, we require a differentiable simulator that can capture the dynamic model of the object, which may not be available for realworld scenes with intricate dynamics, e g , fluids.
Lastly, we assume a moderately accurate object geometry is available, which can potentially be relaxed by combining NeRF (Mildenhall et al , 2020) with our approach to infer the geometry.
最後に,NeRF(Mildenhall et al , 2020)と我々の手法を組み合わせて幾何を推測することで緩和できる,適度に正確な物体幾何が可能であると仮定する。
0.77
ACKNOWLEDGMENTS We thank Sai Praveen Bangaru for our discussions on differentiable rendering.
This work was supported by MIT-IBM Watson AI Lab and its member company Nexplore, ONR MURI, DARPA Machine Common Sense program, ONR (N00014-18-1-2847), and Mitsubishi Electric.
この研究はMIT-IBM Watson AI LabとNexplore、ONR MURI、DARPA Machine Common Sense Program、ONR (N00014-18-1-2847)、三菱電機によって支援された。
0.73
9 0.0s5.0s10.0s14.4sIn put VideoOurs
9 0.0s5.0s10.0s14.4sIn put VideoOurs
0.26
英語(論文から抽出)
日本語訳
スコア
Published as a conference paper at ICLR 2022
iclr 2022の会議論文として発表
0.69
REFERENCES OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al Learning dexterous in-hand manipulation.
References OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al Learning dexterous in-hand manipulate。 訳抜け防止モード: 開会記 : マルシン・アンドリュチョヴィチ,ボウエン・ベーカー,マチェク・チョシー, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron Matthias Plappert, Glenn Powell, Alex Ray, et al Learning dexterous in - 手操作。
0.71
The International Journal of Robotics Research, 39(1):3–20, 2020.
the international journal of robotics research, 39(1):3–20, 2020を参照。
0.79
3 Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J Zico Kolter.
3 Filipe de Avila Belbute-Peres、Kevin Smith、Kelsey Allen、Josh Tenenbaum、J Zico Kolter。
0.42
Endto-end differentiable physics for learning and control.
学習と制御のためのエンドツーエンドの微分物理学。
0.52
Advances in Neural Information Processing Systems, 31, 2018.
ニューラル情報処理システムの進歩,2018年3月31日。
0.71
2 Jonas Degrave, Michiel Hermans, Joni Dambre, et al A differentiable physics engine for deep
2 ジョナス・デグレイブ、ミキエル・ハーマンズ、ジョニ・ダンブレら、深層用微分可能な物理エンジン
0.47
learning in robotics.
ロボット工学を学ぶ。
0.78
Frontiers in neurorobotics, pp. 6, 2019.
神経ロボティクスのフロンティア、2019年6月6日。
0.49
2 Tao Du, Kui Wu, Andrew Spielberg, Wojciech Matusik, Bo Zhu, and Eftychios Sifakis.
2 Tao Du, Kui Wu, Andrew Spielberg, Wojciech Matusik, Bo Zhu, Eftychios Sifakis。
0.40
Functional optimization of fluidic devices with differentiable stokes flow.
異なるストークスフローを有する流体デバイスの機能最適化
0.76
ACM Trans. Graph.
ACMトランス。 グラフ。
0.73
, 39(6), nov 2020.
39(6),2020年11月。
0.68
ISSN 0730-0301.
ISSN0730-0301。
0.66
doi: 10.1145/3414685.3417 795.
doi 10.1145/3414685.3417 795
0.28
URL https://doi.org/10.
URL https://doi.org/10。
0.29
1145/3414685.3417795 .
1145/3414685.3417795 .
0.25
3, 4 Tao Du, Josie Hughes, Sebastien Wah, Wojciech Matusik, and Daniela Rus.
3, 4 Tao Du, Josie Hughes, Sebastien Wah, Wojciech Matusik, Daniela Rus
0.38
Underwater soft robot modeling and control with differentiable simulation.
異なるシミュレーションによる水中ソフトロボットのモデリングと制御
0.81
IEEE Robotics and Automation Letters, 6(3): 4994–5001, 2021a.
IEEE Robotics and Automation Letters, 6(3): 4994–5001, 2021a。
0.48
doi: 10.1109/LRA.2021.307 0305.
10.1109/LRA.2021.307 0305。
0.22
2 Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, and Wojciech Matusik.
2 Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, Wojciech Matusik 訳抜け防止モード: 2 タオ・ドゥ、クイ・ウー、ピンチュアン・マ、セバスチャン・ワホ、 andrew spielberg氏、daniela rus氏、wojciech matusik氏。
0.42
DiffPD: Differentiable projective dynamics.
diffpd: 微分可能射影力学。
0.27
ACM Trans. Graph.
ACMトランス。 グラフ。
0.73
, 41(2), nov 2021b.
41(2), nov 2021b。
0.36
ISSN 0730-0301.
ISSN0730-0301。
0.66
doi: 10.1145/3490168.
10.1145/3490168
0.38
URL https://doi.org/10.1 145/3490168.
URL https://doi.org/10.1 145/3490168
0.23
2, 6, 14 Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, et al ThreeDWorld: A platform for interactive multi-modal physical simulation.
2, 6, 14 Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, et al ThreeDWorld: インタラクティブなマルチモーダル物理シミュレーションのためのプラットフォーム。
0.64
In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
2 Moritz Geilinger, David Hahn, Jonas Zehnder, Moritz Bächer, Bernhard Thomaszewski, and Stelian Coros.
2 Moritz Geilinger, David Hahn, Jonas Zehnder, Moritz Bächer, Bernhard Thomaszewski, Stelian Coros
0.39
ADD: Analytically differentiable dynamics for multi-body systems with frictional contact.
ADD:摩擦接触を有する多体システムの解析的微分力学
0.79
ACM Trans. Graph.
ACMトランス。 グラフ。
0.73
, 39(6), nov 2020.
39(6),2020年11月。
0.68
ISSN 0730-0301.
ISSN0730-0301。
0.66
doi: 10.1145/3414685.3417 766.
doi: 10.1145/3414685.3417 766。
0.52
URL https://doi.org/10.1 145/3414685.3417766.
URL https://doi.org/10.1 145/3414685.3417766
0.20
2 David Hahn, Pol Banzet, James M. Bern, and Stelian Coros.
2 david hahn氏、pol banzet氏、james m. bern氏、stelian coros氏。
0.49
Real2Sim: Visco-elastic parameter estimation from dynamic motion.
Real2Sim:動的運動からの粘弾性パラメータ推定
0.79
ACM Trans. Graph.
ACMトランス。 グラフ。
0.73
, 38(6), nov 2019.
第38回(6回)、2019年11月。
0.53
ISSN 0730-0301.
ISSN0730-0301。
0.66
doi: 10.1145/3355089.3356 548.
doi: 10.1145/3355089.3356 548。
0.52
URL https://doi.org/10.1 145/3355089.3356548.
URL https://doi.org/10.1 145/3355089.3356548
0.20
2 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
2 he、xiangyu zhang、shaoqing ren、jian sunの開明。
0.48
Deep residual learning for image recognition.
画像認識のための深い残差学習
0.81
In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016
0.40
7, 14 Yining Hong, Li Yi, Joshua B Tenenbaum, Antonio Torralba, and Chuang Gan.
7, 14 Yining Hong, Li Yi, Joshua B Tenenbaum, Antonio Torralba, Chuang Gan
0.39
PTR: A benchmark for part-based conceptual, relational, and physical reasoning.
PTR: 部分ベースの概念的、リレーショナル、物理的推論のためのベンチマーク。
0.66
In Advances In Neural Information Processing Systems, 2021.
ニューラル情報処理システム(2021年)の進歩
0.62
2 Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Fredo Durand.
2 Yuanming Hu、Luke Anderson、Tzu-Mao Li、Qi Sun、Nathan Carr、Jonathan Ragan-Kelley、Fredo Durand。 訳抜け防止モード: 2 Yuanming Hu, Luke Anderson, Tzu - Mao Li, Qi Sun ネイサン・カー、ジョナサン・レイガン、ケリー、フレド・デュラン。
0.55
DiffTaichi: Differentiable programming for physical simulation.
difftaichi: 物理シミュレーションのための微分可能プログラミング。
0.69
In International Conference on Learning Representations, 2019a.
英語) International Conference on Learning Representations, 2019a
0.80
3 Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, and Wojciech Matusik.
3 Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, Wojciech Matusik 訳抜け防止モード: 3 yuanming hu, jiancheng liu, andrew spielberg, joshua b. tenenbaum ウィリアム・t・フリーマン、jiajun wu、daniela rus、wojciech matusik。
0.51
ChainQueen: A real-time differentiable physical simulator for soft robotics.
ChainQueen: ソフトロボティクス用のリアルタイム微分可能な物理シミュレータ。
0.70
In 2019 International Conference on Robotics and Automation (ICRA), pp. 6265–6271, 2019b.
2019年、International Conference on Robotics and Automation (ICRA), pp. 6265–6271, 2019b。
0.47
doi: 10.1109/ICRA.2019.87 94333.
doi: 10.1109/icra.2019.87 94333。
0.36
2 Zhiao Huang, Yuanming Hu, Tao Du, Siyuan Zhou, Hao Su, Joshua B Tenenbaum, and Chuang Gan.
2 Zhiao Huang, Yuanming Hu, Tao Du, Siyuan Zhou, Hao Su, Joshua B Tenenbaum, Chuang Gan 訳抜け防止モード: 2 zhiao huang, yuanming hu, tao du, siyuan zhou, ハオ・スー、ジョシュア・b・テネンバウム、チュアン・ガン。
0.52
PlasticineLab: A soft-body manipulation benchmark with differentiable physics.
PlasticineLab: 微分可能な物理を備えたソフトボディ操作ベンチマーク。
0.58
In International Conference on Learning Representations, 2020.
2020年、国際学習表現会議に参加。
0.78
2 10
2 10
0.42
英語(論文から抽出)
日本語訳
スコア
Published as a conference paper at ICLR 2022
iclr 2022の会議論文として発表
0.69
Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, and Konstantinos Bousmalis.
Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, Konstantinos Bousmalis
0.38
Sim-to-real via sim-to-sim: Dataefficient robotic grasping via randomized-to-canoni cal adaptation networks.
Differentiable monte carlo ray tracing through edge sampling.
エッジサンプリングによるモンテカルロ線トレーシングの微分可能化
0.63
ACM Trans. Graph.
ACMトランス。 グラフ。
0.73
, 37(6), dec 2018.
37(6), 2018年12月。
0.70
ISSN 0730-0301.
ISSN0730-0301。
0.66
doi: 10.1145/3272127.3275 109.
doi: 10.1145/3272127.3275 109。
0.52
URL https://doi.org/10.1 145/3272127.3275109.
URL https://doi.org/10.1 145/3272127.3275109。
0.19
3, 6 Junbang Liang, Ming Lin, and Vladlen Koltun.
3, 6 Junbang Liang、Ming Lin、Vladlen Koltun。
0.36
Differentiable cloth simulation for inverse problems.
逆問題に対する微分可能な布シミュレーション
0.67
Advances in Neural Information Processing Systems, 32, 2019.
ニューラル・インフォメーション・プロセッシング・システムズ32, 2019の進歩。
0.49
3, 4 Ilya Loshchilov and Frank Hutter.
3, 4 Ilya LoshchilovとFrank Hutter。
0.61
SGDR: Stochastic gradient descent with warm restarts.
SGDR: 温かい再起動を伴う確率勾配降下。
0.67
arXiv preprint arXiv:1608.03983, 2016.
arXiv arXiv:1608.03983, 2016
0.40
14 Pingchuan Ma, Tao Du, John Z. Zhang, Kui Wu, Andrew Spielberg, Robert K. Katzschmann, and Wojciech Matusik.
14 Pingchuan Ma, Tao Du, John Z. Zhang, Kui Wu, Andrew Spielberg, Robert K. Katzschmann, Wojciech Matusik 訳抜け防止モード: 14 ピンチュアン・マ、タオ・ドゥ、ジョン・z・チャン、クイ・ウー andrew spielberg氏、robert k. katzschmann氏、wojciech matusik氏。
0.44
DiffAqua: A differentiable computational design pipeline for soft underwater swimmers with shape interpolation.
diffaqua: 形状補間のある柔らかい水中スイマーのための微分可能な計算設計パイプライン。
0.75
ACM Trans. Graph.
ACMトランス。 グラフ。
0.73
, 40(4), jul 2021.
39,40(4),ジュール2021。
0.57
ISSN 0730-0301.
ISSN0730-0301。
0.66
doi: 10.1145/3450626.3459 832.
doi: 10.1145/3450626.3459 832。
0.52
URL https://doi.org/10.1 145/3450626.3459832.
url https://doi.org/10.1 145/3450626.3459832。
0.36
3 Antoine McNamara, Adrien Treuille, Zoran Popovi´c, and Jos Stam.
3 アントワーヌ・マクナマラ、アドリアン・トレイユ、ゾラン・ポポヴィ、ジョス・スタム。
0.46
Fluid control using the adjoint method.
随伴法による流体制御
0.49
In ACM SIGGRAPH 2004 Papers, SIGGRAPH ’04, pp. 449–456, New York, NY, USA, 2004.
acm siggraph 2004 papers, siggraph ’04, pp. 449-456, new york, ny, usa, 2004 (英語)
0.71
Association for Computing Machinery.
アソシエーション・フォー・コンピューティング・マシンズ(Association for Computing Machinery)の略。
3 Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng.
3 Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, Ren Ng
0.39
NeRF: Representing scenes as neural radiance fields for view synthesis.
NeRF:ビュー合成のための神経放射場としてシーンを表現する。
0.67
In European conference on computer vision, pp. 405–421.
欧州のコンピュータビジョン会議において、p. 405-421。
0.71
Springer, 2020.
スプリンガー、2020年。
0.59
9 J Krishna Murthy, Miles Macklin, Florian Golemo, Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine, Jérôme Parent-Lévesque, Kevin Xie, Kenny Erleben, et al gradSim: Differentiable simulation for system identification and visuomotor control.
9 J Krishna Murthy氏、Miles Macklin氏、Florian Golemo氏、Vikram Voleti氏、Linda Petrini氏、Martin Weiss氏、Brandan Considine氏、Jérôme Parent-Lévesque氏、Kevin Xie氏、Kenny Erleben氏、そして、al gradSim氏。 訳抜け防止モード: 9 J Krishna Murthy, Miles Macklin, Florian Golemo, Vikram Voleti Linda Petrini, Martin Weiss, Breandan Considine, Jérôme Parent - Lévesque Kevin Xie, Kenny Erleben, et al gradSim : システム同定とビジュモータ制御のための微分可能シミュレーション
0.63
In International Conference on Learning Representations, 2020.
2020年、国際学習表現会議に参加。
0.78
1, 4, 6, 17
1, 4, 6, 17
0.43
Merlin Nimier-David, Delio Vicini, Tizian Zeltner, and Wenzel Jakob.
Merlin Nimier-David, Delio Vicini, Tizian Zeltner, Wenzel Jakob
0.39
Mitsuba 2: A retargetable forward and inverse renderer.
mitsuba 2: 前方および逆のレンダラをリターゲティングできる。
0.61
ACM Trans. Graph.
ACMトランス。 グラフ。
0.73
, 38(6), nov 2019.
第38回(6回)、2019年11月。
0.53
ISSN 0730-0301.
ISSN0730-0301。
0.66
doi: 10.1145/3355089.3356 498.
doi: 10.1145/3355089.3356 498。
0.52
URL https://doi.org/10.1 145/3355089.3356498.
URL https://doi.org/10.1 145/3355089.3356498
0.20
3, 5, 6 OptiTrack.
3, 5, 6 OptiTrack。
0.62
OptiTrack motion capture systems.
OptiTrack モーションキャプチャシステム。
0.69
https://optitrack.co m/.
https://optitrack.co m/
0.32
Accessed: 2021-
アクセス:2021年
0.63
10-05. 1 Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel.
10-05. 1 Xue Bin Peng、Marcin Andrychowicz、Wojciech Zaremba、Pieter Abbeel。
0.35
Sim-to-real transfer of robotic control with dynamics randomization.
動的ランダム化によるロボット制御のシミュレート
0.70
In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3803–3810, 2018.
2018年、IEEE International Conference on Robotics and Automation (ICRA), pp. 3803–3810, 2018。
0.42
doi: 10.1109/ICRA.2018.84 60528.
doi: 10.1109/icra.2018.84 60528。
0.33
3 Matt Pharr, Wenzel Jakob, and Greg Humphreys.
3 マット・ファー、ウェンツェル・ヤコブ、グレッグ・ハンフリーズ。
0.44
Physically Based Rendering: From Theory to
物理ベースレンダリング:理論から理論へ
0.70
Implementation. Morgan Kaufmann, 2016.
実装。 モーガン・カウフマン、2016年。
0.55
6 11
6 11
0.43
英語(論文から抽出)
日本語訳
スコア
Published as a conference paper at ICLR 2022
iclr 2022の会議論文として発表
0.69
Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, and Ming Lin.
1 Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey, Wojciech Matusik, Shinjiro Sueda, and Pulkit Agrawal.
1 Jie Xu、Tao Chen、Lara Zlokapa、Michael Foshey、Wojciech Matusik、Sueda真次郎、Pulkit Agrawal。 訳抜け防止モード: 1 Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey Wojciech Matusik、Sueda真次郎、Pulkit Agrawal。
0.37
An end-to-end differentiable framework for contact-aware robot design.
接触認識ロボット設計のためのエンドツーエンドの差別化フレームワーク
0.54
In Robotics: Science and Systems, 2021.
In Robotics: Science and Systems, 2021年。
0.90
2, 6, 14 Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros.
2, 6, 14 Jun-Yan Zhu、Taesung Park、Phillip Isola、Alexei A. Efros。
0.43
Unpaired image-to-image translation using cycle-consistent adversarial networks.
周期整合対向ネットワークを用いた不対向画像変換
0.67
In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251, 2017.
2017年、IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251, 2017。
0.81
doi: 10.1109/ICCV.2017.24 4.
doi: 10.1109/iccv.2017.24 4。
0.40
1 12
1 12
0.43
英語(論文から抽出)
日本語訳
スコア
Published as a conference paper at ICLR 2022
iclr 2022の会議論文として発表
0.69
A PROOF OF THE THEOREM For brevity, we remove the summation in Lreg and drop the index j to focus on deriving the gradients of the Frobenius term with respect to the network parameters θ.
Let G be the Jacobian matrix inside the Frobenius norm, and let i and j be its row and column indices.
g をフロベニウスノルム内のヤコビ行列とし、i と j をその行と列のインデックスとする。
0.63
We now derive the gradient of Lreg with respect to the k-th parameter in θ as follows:
現在、θ の k 番目のパラメータに関する lreg の勾配を次のように導出する。
0.71
∂Lreg ∂θk =
∂Lreg ∂θk =
0.36
= = = = ij
= = = = ij
0.43
ij (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
ij (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
0.39
ij i i 2Gij
ij 私は 私は 2Gij
0.47
∂Gij ∂θk 2Gij
∂Gij ∂θk 2Gij
0.34
∂Nθ(I)i [ ∂ ∂θk ∂I ∂2Nθ(I)i ∂I∂θk
∂nθ(i)i [ ∂ ∂θk ∂i ∂2nθ(i)i ∂i∂θk
0.37
∂I ∂ψj ] :
∂i ∂∂j ] :
0.38
∂I ∂ψj ], 2Gij[ ∂2Nθ(I)i ∂I∂θk ∂2Nθ(I)i ∂I∂θk
∂i ∂∂j ], 2Gij[ ・2Nθ(I)i ・I・Nθk ・2Nθ(I)i ・I・Nθk
0.34
: ( : (cid:88) (cid:88)
: ( : (cid:88)(cid:88)
0.41
j j : 2 ( ∂I 2Gij ∂ψj ∂Nθ(I)i
j j : 2 ( 日比2Gij・日比2Gij・日比θ(I)i
0.41
∂I )
∂i(ぎょうい) )
0.32
: ∂I ∂ψj )
: ∂i ∂∂j )
0.38
∂I ∂ψj . (13)
∂i ∂∂j . (13)
0.38
(14) (15) (16)
(14) (15) (16)
0.43
(17) The derivation suggests that we can loop over the state dimension (indexed by i) and run backpropagation to obtain ∂Lreg depends on the training data only and remains constant throughout the whole optimization process, so we can pre-compute them
Training We use Adam optimizer (18) with a learning rate of 1e-2 and a weight decay of 1e-6 for training.
訓練には1e-2の学習率と1e-6の減量を伴うadam optimizer(18)を用いる。
0.80
The learning rate has a cosine decay scheduler (22).
学習レートは、コサイン減衰スケジューラ(22)を有する。
0.71
We train the network for 100 epochs.
我々は100エポックの ネットワークを訓練する
0.74
We set the batch size to 16 by default.
バッチサイズはデフォルトで16に設定しました。
0.76
To avoid model collapse in the early stage of training, we linearly increase the magnitude coefficient of the rendering gradients from 0 to a environment-specific value: we set it to 1 for quadrotor, 20 for cube, 100 for hand, and 30 for rod.
Each entry reports the mean and standard deviation of the state discrepancy computed from 4 randomly generated initial guesses and rendering conditions.
B.3 TRAINING DETAILS We share the same experiment settings across imitation learning, and visuomotor control experiments experiments.
b.3 訓練の詳細 我々は、模倣学習と視覚運動制御実験で同じ実験環境を共有している。
0.56
We use RMSProp optimizer with a learning rate of 3e-3 and a momentum of 0.5.
学習速度3e-3,運動量0.5のRMSPropオプティマイザを用いた。
0.65
We optimize the system parameters or the action parameters by 100 iterations.
システムパラメータやアクションパラメータを100イテレーションで最適化します。
0.74
C MORE EXPERIMENTS C.1 VISUOMOTOR CONTROL
C分子実験 C.1 ビゾモーター制御
0.43
We consider a visuomotor control task defined as follows: given a target image displaying the desired state of the physical system, we optimize a sequence of actions that steer the physical system to the target state from a randomly generated initial state.
We consider an imitation learning task for the real-world quadrotor.
実世界のクアドロターのための模倣学習タスクについて考察する。
0.47
The imitation learning takes as input the intrinsic and extrinsic parameters of the camera, a real-world quadrotor video clip recorded by the camera, empirical values of the quadrotor’s system parameters, and the geometry of the quadrotor represented as a mesh file.
During the whole recording procedure, the camera remains a fixed position and pose.
撮影手順全体を通して、カメラは固定された位置とポーズを維持します。
0.67
After we obtain the original video clip, we trim out the leading and tailing frames with little motion and generate the pre-processed video clip with total length of 14.4s and frequency of 10Hz.
We manually control the quadrotor so that it follows a smooth trajectory.
手動で四角形を制御し、滑らかな軌道を辿る。
0.46
Motion capture (MoCap) system To evaluate the performance of various methods, we also build a MoCap system to log the position and rotation of the quadrotor.
Note that we only use MoCap data for test use, it is not necessarily a step in our pipeline, and none of our methods ever used the MoCap data as a dependency.
Next, we minimize a loss function defined as the difference between the target trajectory and a simulated trajectory from a differentiable quadrotor simulator, with the action at each video frame as the decision variables.
We illustrate the motions reconstructed using our method (upper middle), the pixelwise loss (lower middle), and its enhanced variant (bottom).
提案手法(上中)、画素単位の損失(下中中)、拡張変形(ボトム)を用いて再構成した動きについて述べる。 訳抜け防止モード: 提案手法(上中中)を用いて再構成した動きについて述べる。 the pixelwise loss (low middle ) and its enhanced variant ( bottom )
0.79
C.2.4 RESULTS We show the results of the real-world experiment in Fig 5 by visualizing a representative subset of frames.
The solid and dashed curves represent the training curves of ours and ours-no-grad respectively.
固体曲線と破断曲線は、それぞれ我々の訓練曲線を表す。
0.63
The color of the lines indicates the number of rendering configurations in training set where green and orange are 1 and 10, and red curve samples a different rendering configuration for every training data.
The red vertical line shows where the ground-truth Young’s modulus locates.
赤い垂直線は、地平線ヤングのモジュラーの位置を示しています。
0.66
RISP To validate the robustness of our method, we aggressively randomized the rendering configuration so that it differs significantly from the input video.
We observe that the pixelwise loss does not provide valid guidance to the optimization and steers the quadrotor out of the screen immediately.
画素分割損失は最適化への有効なガイダンスを提供しておらず、クワッドローターを直ちに画面から外す。
0.62
We presume the major reason behind the degeneration of pixelwise loss is the assumption of a known rendering configuration in the target domain.
画素損失の退化の背後にある主要な理由は、対象領域における既知のレンダリング構成の仮定である。
0.71
However, such information is generally non-trivial to obtain from a real-world video.
しかし、そのような情報は一般に現実世界のビデオから得ることは自明ではない。
0.62
GradSim-Enhanced According to our presumption above, we propose two modifications for improvement: First, we manually tuned the rendering configuration until it appears as close as possible to the real-world video, and second, we provide it with a good initial guess of the action sequence computed based on the ground-truth trajectory recorded from a motion capture system.
Note that the good initial guess from motion capture system is generally impossible to access in real-world applications and is not used in RISP or GradSim.
Even with these strong favors, GradSim-Enhanced only recovers the very beginning of the action sequence and ends up with an uncontrollable drifting out of screen.
We then train both of our methods for 100 epochs and report their performances on a test set consisting of 200 randomly states, each of which is augmented by 10 unseen rendering configurations.
The right inset summarizes the performances of ours (solid lines) and ours-no-grad (dashed lines) under varying number of rendering configurations, with the green, orange, and red colors corresponding to results trained on 1, 10, and randomly sampled rendering configurations.
It is obvious to see that all solid lines reach a lower state estimation loss than their dashed counterparts, indicating that our rendering gradient digs more information out of the same amount of rendering configurations.
17 020406080100#epoch0. 150.200.250.300.350. 400.450.500.55lossou rs / 1ours / 10ours / randomours-no-grad / 1ours-no-grad / 10ours-no-grad / random1.041.642.544. 046.341.051.652.554. 056.351.06Young' ;s Modulus0.10.20.30.40 .50.60.7L1 Loss (pbrt vs. redner)Ground-truthF irst and last framesEvery frameEvery 5th frameFirst and middle frames
17 020406080100#epoch0. 150.200.250.350.400. 500.55lossours / 1ours / randomours-no-grad / 1ours-no-grad / 10ours-no-grad / random1.041.642.544. 046.341.051.652.554. 056.351.06Young' ;s Modulus0.10.20.30.50 .60.7L1 Loss (pbrt vs. redner) Ground-Truth First and last frameEvery frameEvery 5th frame First and middle frames 訳抜け防止モード: 17 020406080100#epoch0. 150.200.250.350.350. 500.55lossours / 1ours / 10ours / randomours - no - grad / 1ours - no - grad / 10ours - no - grad / random1.041.642.544. 046.341.051.652.554. 056.351.06Young' ;s Modulus0.10.20.30.40 .50.60.7L1 Loss (pbrt vs. redner) 最後のフレーム すべてフレーム 第5フレーム 第1フレームと中間フレーム
0.36
英語(論文から抽出)
日本語訳
スコア
Published as a conference paper at ICLR 2022
iclr 2022の会議論文として発表
0.69
achieves a lower loss than the one without but using randomly sampled rendering configurations (red dashed line), which reflects the better data efficiency.
By comparing ours and ours-no-grad from Table 1, 2, 3, and 4, we can see that having the rendering gradients in our approach is crucial to its substantially better performance.
We stress that having a rendering-invariant state estimation is the core source of generalizability in our approach and the key to success in many downstream tasks.
We study the impact of temporal sampling rate on the performance of RISP using four different sampling strategies: sampling densely on every frame, sampling sparsely on every 5th frame, sampling only on the first and last frames, and sampling only on the first and middle frames.
We observe that RISP is robust against different sampling rates due to the same global optima of all curves.
RISPは、全ての曲線の同じ大域的最適化のため、異なるサンプリングレートに対して堅牢である。
0.73
Additional, except for the extreme case with substantial information loss (first and last frames), most of the curves are unimodal with one local minimum indicating an easier problem in optimization.