Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20201217となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# SU($N$)フェルミオンの3次元気体におけるボゾン化の証拠 Evidence for Bosonization in a three-dimensional gas of SU($N$) fermions ( http://arxiv.org/abs/1912.12105v3 ) ライセンス: Link先を確認	Bo Song, Yangqian Yan, Chengdong He, Zejian Ren, Qi Zhou and Gyu-Boong Jo	(参考訳) ボーソンとフェルミオンの境界をぼやかすことは、凝縮物物理学から原子、分子、光学物理学、高エネルギー物理学まで、様々な分野の興味深い量子現象の中心にある。そのような例の1つは、su($n$)対称性を持つ多成分フェルミ気体で、大きなn$極限においてスピンレスボソンのように振る舞うことが期待され、多くの内部状態はパウリの排他原理から制約を弱める。しかし、su($n$)フェルミオンのボゾン化は、正確な解が存在しない高次元において決して確立されていない。ここでは, SU($N$) フェルミオンイッテルビウムガス中のボゾン化の直接的証拠を3次元でチューナブルな$N$で報告する(3D)。我々は、運動量分布から希薄な量子ガスを制御する中心的な量である接触を測定するとともに、スピン当たりの接触が、我々の理論的な予測と一致する低公準状態における1/N$スケールの定数に近づくことを発見した。このスケーリングは熱力学におけるフェルミオン統計の消滅する役割を意味し、単一の物理量を測定することによってボゾン化を検証することができる。我々の研究は、任意の一般次元における内部自由度を調整し、ボソニック統計とフェルミイオン統計を交換する、高度に制御可能な量子シミュレータを提供する。また、多成分量子系とそれに基づく接触対称性を探索する新たな経路も提案されている。 Blurring the boundary between bosons and fermions lies at the heart of a wide range of intriguing quantum phenomena in multiple disciplines, ranging from condensed matter physics and atomic, molecular and optical physics to high energy physics. One such example is a multi-component Fermi gas with SU($N$) symmetry that is expected to behave like spinless bosons in the large $N$ limit, where the large number of internal states weakens constraints from the Pauli exclusion principle. However, bosonization in SU($N$) fermions has never been established in high dimensions where exact solutions are absent. Here, we report direct evidence for bosonization in a SU($N$) fermionic ytterbium gas with tunable $N$ in three dimensions (3D). We measure contacts, the central quantity controlling dilute quantum gases, from the momentum distribution, and find that the contact per spin approaches a constant with a 1/$N$ scaling in the low fugacity regime consistent with our theoretical prediction. This scaling signifies the vanishing role of the fermionic statistics in thermodynamics, and allows us to verify bosonization through measuring a single physical quantity. Our work delivers a highly controllable quantum simulator to exchange the bosonic and fermionic statistics through tuning the internal degrees of freedom in any generic dimensions. It also suggests a new route towards exploring multi-component quantum systems and their underlying symmetries with contacts.	翻訳日:2023-06-09 23:36:28 公開日:2020-12-17
# IEEE 7010:人工知能の持つ意味を評価するための新しい標準 IEEE 7010: A New Standard for Assessing the Well-being Implications of Artificial Intelligence ( http://arxiv.org/abs/2005.06620v3 ) ライセンス: Link先を確認	Daniel S. Schiff, Aladdin Ayesh, Laura Musikanski, John C. Havens	(参考訳) 人工知能(AI)によって実現された製品やサービスは、日々の生活の基盤になりつつある。政府や企業はAIイノベーションの恩恵を享受したいと熱心に考えているが、これらの自律的でインテリジェントなシステムが人間の幸福に与える影響は、ますます深刻な問題になっている。本稿では、AIの社会的・倫理的意味に焦点をあてた最初の国際標準として、電気・電子工学研究所(IEEE)標準(Std)7010-2020 自律・知能システムの人間福祉への影響を評価するための推奨プラクティスを紹介する。 AIのライフサイクルを通じて幸福な要素を組み込むことは困難かつ緊急であり、IEEE 7010はこれらの技術を設計、デプロイ、調達する人々にとって重要なガイダンスを提供する。まず、ウェルビーイングを中心としたAIのアプローチの利点と、ウェルビーイングデータの計測から始める。次に、IEEE 7010の概要を紹介し、その鍵となる原則と、標準がAIコミュニティにおけるアプローチと視点にどのように関係しているかについて説明する。最後に、今後の取り組みがどこに必要かを示します。 Artificial intelligence (AI) enabled products and services are becoming a staple of everyday life. While governments and businesses are eager to enjoy the benefits of AI innovations, the mixed impact of these autonomous and intelligent systems on human well-being has become a pressing issue. This article introduces one of the first international standards focused on the social and ethical implications of AI: The Institute of Electrical and Electronics Engineering (IEEE) Standard (Std) 7010-2020 Recommended Practice for Assessing the Impact of Autonomous and Intelligent Systems on Human Well-being. Incorporating well-being factors throughout the lifecycle of AI is both challenging and urgent and IEEE 7010 provides key guidance for those who design, deploy, and procure these technologies. We begin by articulating the benefits of an approach for AI centered around well-being and the measurement of well-being data. Next, we provide an overview of IEEE 7010, including its key principles and how the standard relates to approaches and perspectives in place in the AI community. Finally, we indicate where future efforts are needed.	翻訳日:2023-05-20 22:18:30 公開日:2020-12-17
# コンテクストと非互換性 Contextuality versus Incompatibility ( http://arxiv.org/abs/2005.05124v3 ) ライセンス: Link先を確認	Andrei Khrennikov	(参考訳) 我々の目標は、量子物理学の文脈性の基本概念と非互換性を比較することである。文脈性という2つの異なる概念、すなわち Bohr-contextuality と Bell-contextuality を区別しなければならない。後者は非コンテキスト性(ベル型)の不等式に違反して運用的に定義される。このような文脈性は非互換性と比較される。量子可観測性に対して、非可換性のない文脈性は存在しないことを示すことは容易である。非互換性のないコンテキスト性とは何か? (「dry-residue」とは?) 一般にこれは非常に複雑な問題である。我々は4つの量子観測装置の文脈性に集中した。 chsh-scenarios (natural quantum observables) では文脈性が非可逆性に還元されることを示した。しかし、一般に、非互換性のない文脈性は、いくつかの物理的内容を持つかもしれない。不適合性から文脈性成分を抽出する数学的制約を見出した。しかし、この制約の物理的意味は明確ではない。付録1では、ボーアの相補性原理に基づく別の文脈性について簡潔に論じ、これは文脈性-相補性原理として扱われる。ボーアコンテキスト性は量子基盤において重要な役割を果たす。非互換は、事実、ボーアの文脈性の結果である。最後に、認知心理学や意思決定など物理学以外の分野において、非互換性を清めるベル・コンテクチュアリティが重要な役割を担っていることを述べる。 Our aim is to compare the fundamental notions of quantum physics - contextuality vs. incompatibility. One has to distinguish two different notions of contextuality, {\it Bohr-contextuality} and {\it Bell-contextuality}. The latter is defined operationally via violation of noncontextuality (Bell type) inequalities. This sort of contextuality will be compared with incompatibility. It is easy to show that, for quantum observables, there is {\it no contextuality without incompatibility.} The natural question arises: What is contextuality without incompatibility? (What is "dry-residue"?) Generally this is the very complex question. We concentrated on contextuality for four quantum observables. We shown that in the CHSH-scenarios (for "natural quantum observables") {\it contextuality is reduced to incompatibility.} However, generally contextuality without incompatibility may have some physical content. We found a mathematical constraint extracting the contextuality component from incompatibility. However, the physical meaning of this constraint is not clear. In appendix 1, we briefly discuss another sort of contextuality based on the Bohr's complementarity principle which is treated as the {\it contextuality-incompatibility principle}. Bohr-contextuality plays the crucial role in quantum foundations. Incompatibility is, in fact, a consequence of Bohr-contextuality. Finally, we remark that outside of physics, e.g., in cognitive psychology and decision making Bell-contextuality cleaned of incompatibility can play the important role.	翻訳日:2023-05-20 19:58:24 公開日:2020-12-17
# 新型コロナウイルス(covid-19)の接触追跡とプライバシ - 意見と選好の研究 COVID-19 Contact Tracing and Privacy: Studying Opinion and Preferences ( http://arxiv.org/abs/2005.06056v2 ) ライセンス: Link先を確認	Lucy Simko (1, 2 and 3), Ryan Calo (2 and 4), Franziska Roesner (1, 2 and 3), Tadayoshi Kohno (1, 2 and 3) ((1) Security and Privacy Research Lab, University of Washington, (2) Tech Policy Lab, University of Washington, (3) Paul G. Allen School of Computer Science & Engineering, University of Washington, (4) School of Law, University of Washington)	(参考訳) 新型コロナウイルス(COVID-19)感染の可能性がある患者を、感染した人の接触を全て通知することで、特定するプロセスだ。政府、技術系企業、研究グループは、スマートフォン、iotデバイス、ウェアラブル技術が自動的に「密接な接触」を追跡し、個人のポジティブなテストの際の事前連絡先を識別する可能性を認識している。しかし、現在、効果的な技術ベースの接触追跡と個人のプライバシーの間の緊張関係について重要な公的な議論がある。そこで本研究では,接触者追跡とプライバシに焦点をあてたオンライン調査の結果について報告する。第1回調査は4月1日と3日に実施され,第2回調査を中心に報告した。結果は、世論の多様性を示し、新型コロナウイルスの感染拡大を抑えるためにテクノロジーをどのように活用するかについて、公衆の議論に伝えることができる。引き続き縦断測定を行っており、2020年5月8日のレポートバージョン1.0を参照して、このレポートを時間とともに更新する。 NOTE: 2020年12月4日現在、このレポートはarXiv:2012.01553で発見されたReport Version 2.0に取って代わられている。 Report Version 2.0を読み、引用してください。 There is growing interest in technology-enabled contact tracing, the process of identifying potentially infected COVID-19 patients by notifying all recent contacts of an infected person. Governments, technology companies, and research groups alike recognize the potential for smartphones, IoT devices, and wearable technology to automatically track "close contacts" and identify prior contacts in the event of an individual's positive test. However, there is currently significant public discussion about the tensions between effective technology-based contact tracing and the privacy of individuals. To inform this discussion, we present the results of a sequence of online surveys focused on contact tracing and privacy, each with 100 participants. Our first surveys were on April 1 and 3, and we report primarily on those first two surveys, though we present initial findings from later survey dates as well. Our results present the diversity of public opinion and can inform the public discussion on whether and how to leverage technology to reduce the spread of COVID-19. We are continuing to conduct longitudinal measurements, and will update this report over time; citations to this version of the report should reference Report Version 1.0, May 8, 2020. NOTE: As of December 4, 2020, this report has been superseded by Report Version 2.0, found at arXiv:2012.01553. Please read and cite Report Version 2.0 instead.	翻訳日:2023-05-20 11:40:03 公開日:2020-12-17
# 格子対称性を持つアーベル位相位相の結晶ゲージ場と量子化離散幾何応答 Crystalline gauge fields and quantized discrete geometric response for Abelian topological phases with lattice symmetry ( http://arxiv.org/abs/2005.10265v3 ) ライセンス: Link先を確認	Naren Manjunath, Maissam Barkeshli	(参考訳) 連続体内のクリーン等方性量子ホール流体は、ホール伝導率、シフト、ホール粘度などの対称性で保護された量子化された不変量を持つ。ここでは、格子上で定義される位相相に対する対称性保護量子化不変量の理論を展開する。離散結晶ゲージ場を用いた位相場理論を開発し、(2+1)次元アーベル位相次数の量子化不変量を完全に特徴付け、対称群 $g = u(1) \times g_{\text{space}}$, ここで $g_{\text{space}}$ は、格子上の配向保存空間群対称性からなる。離散回転および並進対称性分数化は、離散スピンベクトル、連続体にアナログを持たない離散トーションベクトル、格子回転対称性がない領域ベクトル、また連続体にもアナログを持たない領域ベクトルによって特徴づけられることを示す。離散トーションベクトルは結晶運動量分数化の一種であり、これは2$, $3$, 4$-fold 回転対称性に対して非自明である。量子化トポロジカル応答理論は、偏光と角に分数電荷を結合するシフトの離散バージョン、偏光の分数量子化された角運動量、回転対称な分数電荷分極とその角運動量、単位セルあたりの電荷と角運動量に対する制約、および転位と面積の単位に束縛された量化運動量を含む。分数量子化された電荷偏極は、2ドル、3ドル、4ドルの回転対称性を持つ格子上でのみ自明であり、格子の転位に縛られた分数電荷と、境界に沿った単位長さ当たりの分数電荷を意味する。重要な役割は、格子の点群対称性に依存するバーガースベクトル上の有限群階数によって演じられる。 Clean isotropic quantum Hall fluids in the continuum possess a host of symmetry-protected quantized invariants, such as the Hall conductivity, shift and Hall viscosity. Here we develop a theory of symmetry-protected quantized invariants for topological phases defined on a lattice, where quantized invariants with no continuum analog can arise. We develop topological field theories using discrete crystalline gauge fields to fully characterize quantized invariants of (2+1)D Abelian topological orders with symmetry group $G = U(1) \times G_{\text{space}}$, where $G_{\text{space}}$ consists of orientation-preserving space group symmetries on the lattice. We show how discrete rotational and translational symmetry fractionalization can be characterized by a discrete spin vector, a discrete torsion vector which has no analog in the continuum or in the absence of lattice rotation symmetry, and an area vector, which also has no analog in the continuum. The discrete torsion vector implies a type of crystal momentum fractionalization that is only non-trivial for $2$, $3$, and $4$-fold rotation symmetry. The quantized topological response theory includes a discrete version of the shift, which binds fractional charge to disclinations and corners, a fractionally quantized angular momentum of disclinations, rotationally symmetric fractional charge polarization and its angular momentum counterpart, constraints on charge and angular momentum per unit cell, and quantized momentum bound to dislocations and units of area. The fractionally quantized charge polarization, which is non-trivial only on a lattice with $2$, $3$, and $4$-fold rotation symmetry, implies a fractional charge bound to lattice dislocations and a fractional charge per unit length along the boundary. An important role is played by a finite group grading on Burgers vectors, which depends on the point group symmetry of the lattice.	翻訳日:2023-05-19 05:41:18 公開日:2020-12-17
# 非マルコフ開量子系による高忠実テレポーテーションの実験的実現 Experimental realization of high-fidelity teleportation via non-Markovian open quantum system ( http://arxiv.org/abs/2007.01318v2 ) ライセンス: Link先を確認	Zhao-Di Liu, Yong-Nan Sun, Bi-Heng Liu, Chuan-Feng Li, Guang-Can Guo, Sina Hamedani Raja, Henri Lyyra, Jyrki Piilo	(参考訳) オープン量子系とデコヒーレンスの研究は、量子物理現象の基本的な理解にとって重要である。現実的な目的のために、量子資源を利用する多くの量子プロトコル(例えば絡み合い)が存在し、古典的な方法で達成できるものを超えることができる。我々は、オープン量子システムと量子情報科学の概念を結合し、非マルコフ開システムを介して量子プロトコルを効率的に実装できることを実証する実験実験を行う。 The results show that, at the time of implementation of the protocol, it is not necessary to have the quantum resource in the degree of freedom used for the basic protocol -- as long as there exists some other degree of freedom, or environment of an open system, which contains useful resources. The experiment is based on a pair of photons, where their polarizations act as open system qubits and frequencies as their environments -- while the path degree of freedom of one of the photons represents the state of Alice's qubit to be teleported to Bob's polarization qubit. Open quantum systems and study of decoherence are important for our fundamental understanding of quantum physical phenomena. For practical purposes, there exists a large number of quantum protocols exploiting quantum resources, e.g. entanglement, which allows to go beyond what is possible to achieve by classical means. We combine concepts from open quantum systems and quantum information science, and give a proof-of-principle experimental demonstration -- with teleportation -- that it is possible to implement efficiently a quantum protocol via non-Markovian open system. The results show that, at the time of implementation of the protocol, it is not necessary to have the quantum resource in the degree of freedom used for the basic protocol -- as long as there exists some other degree of freedom, or environment of an open system, which contains useful resources. The experiment is based on a pair of photons, where their polarizations act as open system qubits and frequencies as their environments -- while the path degree of freedom of one of the photons represents the state of Alice's qubit to be teleported to Bob's polarization qubit.	翻訳日:2023-05-11 20:38:39 公開日:2020-12-17
# グリーンアルゴリズム:計算の炭素フットプリントの定量化 Green Algorithms: Quantifying the carbon footprint of computation ( http://arxiv.org/abs/2007.07610v5 ) ライセンス: Link先を確認	Lo\"ic Lannelongue, Jason Grealey and Michael Inouye	(参考訳) 気候変動は、人間社会、経済、健康など、地球上の生命のほぼ全ての側面に大きな影響を与えている。様々な人間の活動は、データセンターやその他の大規模計算の源を含む温室効果ガスの排出に責任がある。高性能コンピューティングの発展により、多くの重要な科学的マイルストーンが達成されているが、環境への影響は過小評価されている。本稿では,処理時間,計算コアの種類,利用可能なメモリ,計算施設の効率と位置に基づいて,計算タスクの炭素フットプリントを標準化された信頼性の高い方法で推定するための方法論的枠組みを提案する。温室効果ガスの排出を解釈し、コンテクスト化するための指標が定義されており、車や飛行機が移動する同等の距離や、炭素の隔離に必要な木月数が含まれる。我々は、ユーザが計算のカーボンフットプリントを見積り、報告できる無料のオンラインツールであるgreen algorithms(www.green-algorithms.org)を開発した。 Green Algorithmsツールは、最小限の情報を必要とするため計算処理と容易に統合でき、既存のコードに干渉せず、幅広いCPU、GPU、クラウドコンピューティング、ローカルサーバー、デスクトップコンピュータも考慮している。最後に,グリーンアルゴリズムを適用し,粒子物理シミュレーション,天気予報,自然言語処理などに用いるアルゴリズムの温室効果ガス排出量を定量化する。本研究は, ほぼ任意の計算の炭素フットプリントを定量化するための, 単純な一般化可能なフレームワークと自由利用可能なツールを開発する。不要なCO2排出量を最小化するための一連の勧告と組み合わさって、認識を高め、よりグリーンな計算を容易にしたいと思っています。 Climate change is profoundly affecting nearly all aspects of life on earth, including human societies, economies and health. Various human activities are responsible for significant greenhouse gas emissions, including data centres and other sources of large-scale computation. Although many important scientific milestones have been achieved thanks to the development of high-performance computing, the resultant environmental impact has been underappreciated. In this paper, we present a methodological framework to estimate the carbon footprint of any computational task in a standardised and reliable way, based on the processing time, type of computing cores, memory available and the efficiency and location of the computing facility. Metrics to interpret and contextualise greenhouse gas emissions are defined, including the equivalent distance travelled by car or plane as well as the number of tree-months necessary for carbon sequestration. We develop a freely available online tool, Green Algorithms (www.green-algorithms.org), which enables a user to estimate and report the carbon footprint of their computation. The Green Algorithms tool easily integrates with computational processes as it requires minimal information and does not interfere with existing code, while also accounting for a broad range of CPUs, GPUs, cloud computing, local servers and desktop computers. Finally, by applying Green Algorithms, we quantify the greenhouse gas emissions of algorithms used for particle physics simulations, weather forecasts and natural language processing. Taken together, this study develops a simple generalisable framework and freely available tool to quantify the carbon footprint of nearly any computation. Combined with a series of recommendations to minimise unnecessary CO2 emissions, we hope to raise awareness and facilitate greener computation.	翻訳日:2023-05-09 09:13:12 公開日:2020-12-17
# 一般量子エラーに対する十分条件と制約 Sufficient Conditions and Constraints for Reversing General Quantum Errors ( http://arxiv.org/abs/2007.11083v2 ) ライセンス: Link先を確認	Alvin Gonzales, Daniel Dilley, Mark S. Byrd	(参考訳) 例えば誤り訂正のように、量子進化の効果を逆転することは、信頼できる量子デバイスを生成するために量子システムを制御する上で重要なタスクである。進化が完全正の写像によって制御されるとき、量子誤差補正符号条件(quantum error correcting code conditions)と呼ばれる可逆性条件が存在し、これは部分空間、すなわち符号空間上の量子演算の可逆性に必要かつ十分な条件である。しかし、進化が完全に正の写像によって説明されないと仮定すると、必要十分条件は分かっていない。ここでは、必ずしも完全正の写像に対応しない進化を考える。我々は、完全に正のマップ誤り訂正符号条件が、マップの領域にない符号空間につながり得ることを証明し、つまり、マップの出力は正でないことを示す。我々の定理の補足は関連する例のクラスを提供する。最後に、正当性を確保しつつ、量子エラー訂正符号条件の使用を可能にする十分な条件のセットを提供する。 Reversing the effects of a quantum evolution, for example as is done in error correction, is an important task for controlling quantum systems in order to produce reliable quantum devices. When the evolution is governed by a completely positive map, there exist reversibility conditions, known as the quantum error correcting code conditions, which are necessary and sufficient conditions for the reversibility of a quantum operation on a subspace, the code space. However, if we suppose that the evolution is not described by a completely positive map, necessary and sufficient conditions are not known. Here we consider evolutions that do not necessarily correspond to a completely positive map. We prove that the completely positive map error correcting code conditions can lead to a code space that is not in the domain of the map, meaning that the output of the map is not positive. A corollary to our theorem provides a class of relevant examples. Finally, we provide a set of sufficient conditions that will enable the use of quantum error correcting code conditions while ensuring positivity.	翻訳日:2023-05-08 20:39:25 公開日:2020-12-17
# 非古典光による光電電流の増大 Enhancing photoelectric current by nonclassical light ( http://arxiv.org/abs/2008.03876v2 ) ライセンス: Link先を確認	Hai-Yan Yao, Sheng-Wen Li	(参考訳) 非古典光子統計を用いた駆動光による光電電流の研究を行う。非古典的な入力光子統計のため、駆動光を古典物理学のように平面波として扱うだけでは不十分である。 We make a quantum approach to study such problems, and find that: when the driving light starts from a coherent state as the initial state, our quantum treatment well returns the quasi-classical driving description; when the the driving light is a generic state with a certain P function, the full system dynamics can be reduced as the P function average of many "branches" -- in each dynamics branch, the driving light starts from a coherent state, thus again the system dynamics can be obtained in the above quasi-classical way. この量子的アプローチに基づいて、異なる光子統計が光電電流に違いをもたらすことが判明した。同じ光強度を持つ全ての古典光状態のうち、ポアソン統計量を持つ入力光が最大の光電電流を生成し、非古典的部分ポアソン光がこの古典的上界を超えることを証明した。 We study the photoelectric current generated by a driving light with nonclassical photon statistics. Due to the nonclassical input photon statistics, it is no longer enough to treat the driving light as a planar wave as in classical physics. We make a quantum approach to study such problems, and find that: when the driving light starts from a coherent state as the initial state, our quantum treatment well returns the quasi-classical driving description; when the the driving light is a generic state with a certain P function, the full system dynamics can be reduced as the P function average of many "branches" -- in each dynamics branch, the driving light starts from a coherent state, thus again the system dynamics can be obtained in the above quasi-classical way. Based on this quantum approach, it turns out the different photon statistics does make differences to the photoelectric current. Among all the classical light states with the same light intensity, we prove that the input light with Poisson statistics generates the largest photoelectric current, while a nonclassical sub-Poisson light could exceed this classical upper bound.	翻訳日:2023-05-06 16:15:59 公開日:2020-12-17
# 非エルミート的特異点の不確かさのハンティング Hunting for the non-Hermitian exceptional points with fidelity susceptibility ( http://arxiv.org/abs/2009.07070v2 ) ライセンス: Link先を確認	Yu-Chin Tzeng, Chia-Yi Ju, Guang-Yin Chen, Wen-Min Huang	(参考訳) フィデリティ感受性は10年以上にわたってエルミート量子多体系の量子相転移を検知するために使われ、そこではフィデリティ感受性密度が熱力学的限界で$+\infty$に近づく。ここで、忠実性感受性$\chi$はヒルベルト空間の幾何学的構造を考慮して非エルミート量子系に一般化される。運動の計量方程式をスクラッチから解く代わりに、異点 (ep) でなければ、フィデリティが生物直交固有状態から成り、代数的または数値的に処理できるゲージを選んだ。 EP におけるヒルベルト空間幾何学の性質のため、$\chi$ が $-\infty$ に近づくと EP が見つかる。例えば、単純な$\mathcal{pt}$ symmetric $2\times2$ hamiltonian を単一のチューニングパラメータと非エルミート su-schriffer-heeger モデルで検討する。 The fidelity susceptibility has been used to detect quantum phase transitions in the Hermitian quantum many-body systems over a decade, where the fidelity susceptibility density approaches $+\infty$ in the thermodynamic limits. Here the fidelity susceptibility $\chi$ is generalized to non-Hermitian quantum systems by taking the geometric structure of the Hilbert space into consideration. Instead of solving the metric equation of motion from scratch, we chose a gauge where the fidelities are composed of biorthogonal eigenstates and can be worked out algebraically or numerically when not on the exceptional point (EP). Due to the properties of the Hilbert space geometry at EP, we found that EP can be found when $\chi$ approaches $-\infty$. As examples, we investigate the simplest $\mathcal{PT}$ symmetric $2\times2$ Hamiltonian with a single tuning parameter and the non-Hermitian Su-Schriffer-Heeger model.	翻訳日:2023-05-02 04:27:50 公開日:2020-12-17
# 同時メッセージパッシングモデルにおける光量子通信複雑性 Optical quantum communication complexity in the simultaneous message passing model ( http://arxiv.org/abs/2010.03195v2 ) ライセンス: Link先を確認	Ashutosh Marwah and Dave Touchette	(参考訳) 古典的なプロトコルの通信コストは通常、通信されるビット数によって測定され、プロトコルの通信に要する時間を決定する。同様に、有限次元の量子状態を使用する量子通信プロトコルの場合、通信コストは、通信される量子ビットの数で測定される。しかし、量子物理学では、通信プロトコルには光学量子状態のような無限次元の状態も使用できる。通信中に送信される(等価な)キュービット数のカウントに基づく通信コスト測定は、無限次元状態を使用するプロトコルのコストを直接測定することはできない。さらに、そのような量子ビットベースの通信コストを用いて無限次元プロトコルの物理的性質を推測することはできない。本稿では,無限次元プロトコルにおける物理資源の成長を理解するための枠組みを提供する。具体性のために光学プロトコルに焦点をあてる。通信に必要な時間と通信中のエネルギーは、そのようなプロトコルの重要な物理資源として識別される。光プロトコルでは、通信に必要な時間は、ある相手から別の相手へ送信されるタイムビンモードの数によって決定される。送信されるメッセージの平均光子数は、プロトコルにおける通信に必要なエネルギーを決定する。この2つの量の成長と問題の大きさの増大とのトレードオフが低いことを証明している。このようなトレードオフ関係を光量子通信複雑性関係と呼ぶ。 The communication cost of a classical protocol is typically measured in terms of the number of bits communicated for this determines the time required for communication during the protocol. Similarly, for quantum communication protocols, which use finite-dimensional quantum states, the communication cost is measured in terms of the number of qubits communicated. However, in quantum physics, one can also use infinite-dimensional states, like optical quantum states, for communication protocols. Communication cost measures based on counting the (equivalent) number of qubits transmitted during communication cannot be directly used to measure the cost of such protocols, which use infinite-dimensional states. Moreover, one cannot infer any physical property of infinite-dimensional protocols using such qubit based communication costs. In this paper, we provide a framework to understand the growth of physical resources in infinite-dimensional protocols. We focus on optical protocols for the sake of concreteness. The time required for communication and the energy expended during communication are identified as the important physical resources of such protocols. In an optical protocol, the time required for communication is determined by the number of time-bin modes that are transmitted from one party to another. The mean photon number of the messages sent determines the energy required during communication in the protocol. We prove a lower bound on the tradeoff between the growth of these two quantities with the growth of the problem size. We call such tradeoff relations optical quantum communication complexity relations.	翻訳日:2023-04-29 18:06:19 公開日:2020-12-17
# 安定化形式における自己テスト的極大次元真に絡み合った部分空間 Self-testing maximally-dimensional genuinely entangled subspaces within the stabilizer formalism ( http://arxiv.org/abs/2012.01164v2 ) ライセンス: Link先を確認	Owidiusz Makuta and Remigiusz Augusiak	(参考訳) 自己検査はもともと、絡み合った量子状態とそれらの上で実行される局所測定をデバイスに依存しない認証方法として導入された。近年, [f] では, Baccari \textit{et al。 }, arXiv:2003.02285] 状態自己テストの概念は、絡み合った部分空間に一般化され、模範的な真の絡み合った部分空間に対する最初の自己テスト戦略が与えられた。私たちの研究の主な目的は、この一連の研究を追求し、(次元の観点から)いかに「大きな」が真に絡み合った部分空間であり、それが自己テストされ、マルチキュービット安定化形式に集中しているかという問題に対処することである。この目的のために、まず、与えられた安定化部分空間が真に絡み合っているかどうかを効率的にチェックできるフレームワークを導入する。その上で、安定化部分空間内で構成できる真に絡み合った部分空間の最大次元を決定し、そのような極大次元部分空間を任意の数 qubit に対して例示的に構成する。第3に、ベルの不等式は、それらの部分空間からの絡み合った状態によって最大に破られ、従ってそれらをサポートする任意の混合状態であり、これらの不等式が自己テストに有用であることを示す。興味深いことに、ベルの不等式は、全ての観測者が2つの二コトミック測定を行う最も単純なマルチパーティイトベルシナリオにおいて、量子相関の集合の境界における高次元の顔構造を識別することができる。 Self-testing was originally introduced as a device-independent method of certification of entangled quantum states and local measurements performed on them. Recently, in [F. Baccari \textit{et al.}, arXiv:2003.02285] the notion of state self-testing has been generalized to entangled subspaces and the first self-testing strategies for exemplary genuinely entangled subspaces have been given. The main aim of our work is to pursue this line of research and to address the question how "large" (in terms of dimension) are genuinely entangled subspaces that can be self-tested, concentrating on the multiqubit stabilizer formalism. To this end, we first introduce a framework allowing to efficiently check whether a given stabilizer subspace is genuinely entangled. Building on it, we then determine the maximal dimension of genuinely entangled subspaces that can be constructed within the stabilizer subspaces and provide an exemplary construction of such maximally-dimensional subspaces for any number of qubits. Third, we construct Bell inequalities that are maximally violated by any entangled state from those subspaces and thus also any mixed states supported on them, and we show these inequalities to be useful for self-testing. Interestingly, our Bell inequalities allow for identification of higher-dimensional face structures in the boundaries of the sets of quantum correlations in the simplest multipartite Bell scenarios in which every observer performs two dichotomic measurements.	翻訳日:2023-04-22 07:56:15 公開日:2020-12-17
# 2つの並列光学ナノファイバーの誘導正規モードにおける磁場の空間分布 Spatial distributions of the fields in guided normal modes of two coupled parallel optical nanofibers ( http://arxiv.org/abs/2012.06078v2 ) ライセンス: Link先を確認	Fam Le Kien, Lewis Ruks, Sile Nic Chormaic, and Thomas Busch	(参考訳) 2つの並列光ナノファイバーの誘導正規モードにおけるフィールドの断面形状と空間分布について検討した。 2つの同一ナノファイバーの誘導正規モードにおける磁場成分の分布は、繊維の断面面における半径主軸と接主軸に対して対称または非対称であることを示す。主軸に対する磁場成分の対称性は電場成分の対称性とは反対である。例えば、$\mathcal{e}_z$-cosineモードでさえ、繊維間の電気的強度分布が支配的であり、2ファイバー中心にサドル点があることを示している。一方、奇数$\mathcal{E}_z$-sineモードの場合、二つのファイバー中心における電気強度分布は、ちょうど0の局所最小値に達する。その結果,ファイバー分離距離が小さく,ファイバー半径が小さいか,光波長が大きい場合,結合モード理論と正確なモード理論の差が大きいことがわかった。 2つのナノファイバーが同一でない場合、その強度分布は半径主軸に対して対称であり、接する主軸に対して非対称であることを示す。 We study the cross-sectional profiles and spatial distributions of the fields in guided normal modes of two coupled parallel optical nanofibers. We show that the distributions of the components of the field in a guided normal mode of two identical nanofibers are either symmetric or antisymmetric with respect to the radial principal axis and the tangential principal axis in the cross-sectional plane of the fibers. The symmetry of the magnetic field components with respect to the principal axes is opposite to that of the electric field components. We show that, in the case of even $\mathcal{E}_z$-cosine modes, the electric intensity distribution is dominant in the area between the fibers, with a saddle point at the two-fiber center. Meanwhile, in the case of odd $\mathcal{E}_z$-sine modes, the electric intensity distribution at the two-fiber center attains a local minimum of exactly zero. We find that the differences between the results of the coupled mode theory and the exact mode theory are large when the fiber separation distance is small and either the fiber radius is small or the light wavelength is large. We show that, in the case where the two nanofibers are not identical, the intensity distribution is symmetric about the radial principal axis and asymmetric about the tangential principal axis.	翻訳日:2023-04-21 03:33:49 公開日:2020-12-17
# 高スピン核の浴と結合した量子ドット電子スピンの駆動動力学 Driven dynamics of a quantum dot electron spin coupled to bath of higher-spin nuclei ( http://arxiv.org/abs/2012.07227v2 ) ライセンス: Link先を確認	Arian Vezvaee, Girish Sharma, Sophia E. Economou, and Edwin Barnes	(参考訳) 量子ドットに閉じ込められた電子とその周囲の核スピン環境の間の光駆動と超微細な相互作用の相互作用は、モード同期のような興味深い物理学を生み出す。本研究では、核スピンのユビキタススピン1/2近似を超えて、任意のスピンの核スピン浴に結合した自己集合量子ドットにおける光駆動電子スピンの包括的な理論的枠組みを示す。動的平均場法を用いて、四極子カップリングの有無にかかわらず核スピン分極分布を計算する。超微細相互作用は動的核分極とモードロックを促進するが、四極結合はこれらの効果に反する。これらの機構間の張力は定常状態の電子スピン発展にインプリントされ、量子ドットにおける四極子相互作用の重要性を測定する方法を提供する。その結果、四極子相互作用のような高スピン効果は、動的核偏極の発生とそれが電子スピンの進化に与える影響に大きな影響を与えることが示された。 The interplay of optical driving and hyperfine interaction between an electron confined in a quantum dot and its surrounding nuclear spin environment produces a range of interesting physics such as mode-locking. In this work, we go beyond the ubiquitous spin 1/2 approximation for nuclear spins and present a comprehensive theoretical framework for an optically driven electron spin in a self-assembled quantum dot coupled to a nuclear spin bath of arbitrary spin. Using a dynamical mean-field approach, we compute the nuclear spin polarization distribution with and without the quadrupolar coupling. We find that while hyperfine interactions drive dynamic nuclear polarization and mode-locking, quadrupolar couplings counteract these effects. The tension between these mechanisms is imprinted on the steady-state electron spin evolution, providing a way to measure the importance of quadrupolar interactions in a quantum dot. Our results show that higher-spin effects such as quadrupolar interactions can have a significant impact on the generation of dynamic nuclear polarization and how it influences the electron spin evolution.	翻訳日:2023-04-20 21:31:16 公開日:2020-12-17
# リング共振器を用いた超伝導量子プロセッサの長距離接続 Long-range connectivity in a superconducting quantum processor using a ring resonator ( http://arxiv.org/abs/2012.09463v1 ) ライセンス: Link先を確認	Sumeru Hazra, Anirban Bhattacharjee, Madhavi Chand, Kishor V. Salunkhe, Sriram Gopalakrishnan, Meghan P. Patankar and R. Vijay	(参考訳) 量子コヒーレンスとゲート忠実度は通常、量子プロセッサを特徴づける上で最も重要な2つの指標とみなされる。同様に重要なメトリックは、ゲート数を最小限に抑え、エラーの少ないアルゴリズムを効率的に実装できるため、ビット間接続である。しかし、超伝導プロセッサの量子ビット間接続は、物理的実現の実際的な制約のため、近隣に限られる傾向にある。本稿では,リング共振器を多経路結合素子とし,その周囲に一様分布する量子ビットを持つ新しい超伝導構造を提案する。我々の平面設計は、さらなる製造の複雑さを伴わずに、 art超伝導プロセッサの状態の接続性を大幅に向上させる。理論的には、量子ビット接続を解析し、各量子ビットが他の9つの量子ビットに接続可能な最大12個の量子ビットをサポートする装置で実験的に検証する。我々の概念はスケーラブルで、他のプラットフォームに適用可能であり、量子コンピューティング、アニール、シミュレーション、エラー修正の進歩を著しく加速する可能性がある。 Qubit coherence and gate fidelity are typically considered the two most important metrics for characterizing a quantum processor. An equally important metric is inter-qubit connectivity as it minimizes gate count and allows implementing algorithms efficiently with reduced error. However, inter-qubit connectivity in superconducting processors tends to be limited to nearest neighbour due to practical constraints in the physical realization. Here, we introduce a novel superconducting architecture that uses a ring resonator as a multi-path coupling element with the qubits uniformly distributed throughout its circumference. Our planar design provides significant enhancement in connectivity over state of the art superconducting processors without any additional fabrication complexity. We theoretically analyse the qubit connectivity and experimentally verify it in a device capable of supporting up to twelve qubits where each qubit can be connected to nine other qubits. Our concept is scalable, adaptable to other platforms and has the potential to significantly accelerate progress in quantum computing, annealing, simulations and error correction.	翻訳日:2023-04-20 08:46:33 公開日:2020-12-17
# インターフェロメトリスキームにおける決定論的量子相関 Deterministic quantum correlation in an interferometric scheme ( http://arxiv.org/abs/2012.09387v1 ) ライセンス: Link先を確認	Byoung S. Ham	(参考訳) 過去数十年にわたり、自発的パラメトリックダウン変換過程による非線形光学材料である \c{hi}^((2)) から生成される絡み合った光子対は、ベルの不等式違反や反相関といった様々な量子相関に対して集中的に研究されてきた。 mach-zehnder干渉計では、フォトニック・ド・ブロイの波長が標準量子限界を超えた位相分解能を持つ量子センシングでも研究されている。ここで, 量子性の基本原理は, 微視的状態における二部共役光子対だけでなく, マクロコヒーレンス絡み生成のための制御可能な量子相関のための干渉計方式で検討される。 Over the last several decades, entangled photon pairs generated from \c{hi}^((2)) nonlinear optical materials via spontaneous parametric down conversion processes have been intensively studied for various quantum correlations such as Bell inequality violation and anticorrelation. In a Mach-Zehnder interferometer, the photonic de Broglie wavelength has also been studied for quantum sensing with an enhanced phase resolution overcoming the standard quantum limit. Here, the fundamental principles of quantumness are investigated in an interferometric scheme for controllable quantum correlation not only for bipartite entangled photon pairs in a microscopic regime, but also for macroscopic coherence entanglement generation.	翻訳日:2023-04-20 08:45:10 公開日:2020-12-17
# KHOVID:デジタルコンタクトトレーシングを保護した相互運用可能なプライバシー KHOVID: Interoperable Privacy Preserving Digital Contact Tracing ( http://arxiv.org/abs/2012.09375v1 ) ライセンス: Link先を確認	Xiang Cheng, Hanchao Yang, Archanaa S Krishnan, Patrick Schaumont and Yaling Yang	(参考訳) パンデミックの間、接触追跡は集団内の感染率を下げるための重要な手段である。手間のかかる手動接触追跡処理を加速するために、デジタル接触追跡(DCT)ツールは、ユビキタス携帯電話のセンシングおよび信号機能を用いて、透明かつプライベートに接触イベントを追跡することができる。しかし、効果的なDCTは、ユーザのプライバシーを守るだけでなく、既存の手動接触追跡プロセスを強化する必要がある。実際、人口の全員が携帯電話を所有したり、DCTアプリをインストールして有効にしたりできるわけではない。 KHOVIDは、手動接触追跡相互運用性とDCTユーザのプライバシを両立させる。 KHOVIDのコアは、位置情報データを使用してユーザトラジェクトリをエンコードするプライバシーフレンドリなメカニズムである。手動接触追跡データは、同じ位置情報フォーマットで統合することができる。本稿では,DCTからの位置情報データの精度をBluetooth近接検出により向上させ,Bluetooth短命IDを符号化する新しい手法を提案する。このコントリビューションでは、KHOVIDの詳細な設計、アプリケーションとサーバソフトウェアを含むプロトタイプの実装、シミュレーションとフィールド実験に基づく検証が紹介されている。また,KHOVIDの長所と従来のDCTの長所を比較した。 During a pandemic, contact tracing is an essential tool to drive down the infection rate within a population. To accelerate the laborious manual contact tracing process, digital contact tracing (DCT) tools can track contact events transparently and privately by using the sensing and signaling capabilities of the ubiquitous cell phone. However, an effective DCT must not only preserve user privacy but also augment the existing manual contact tracing process. Indeed, not every member of a population may own a cell phone or have a DCT app installed and enabled. We present KHOVID to fulfill the combined goal of manual contact-tracing interoperability and DCT user privacy. At KHOVID's core is a privacy-friendly mechanism to encode user trajectories using geolocation data. Manual contact tracing data can be integrated through the same geolocation format. The accuracy of the geolocation data from DCT is improved using Bluetooth proximity detection, and we propose a novel method to encode Bluetooth ephemeral IDs. This contribution describes the detailed design of KHOVID; presents a prototype implementation including an app and server software; and presents a validation based on simulation and field experiments. We also compare the strengths of KHOVID with other, earlier proposals of DCT.	翻訳日:2023-04-20 08:44:27 公開日:2020-12-17
# ドープPPLNによる中赤外スペクトル非相関光子生成 : 理論的研究 Mid-infrared spectrally-uncorrelated biphotons generation from doped PPLN: a theoretical investigation ( http://arxiv.org/abs/2012.09352v1 ) ライセンス: Link先を確認	Bei Wei, Wu-Hao Cai, Chunling Ding, Guang-Wei Deng, Ryosuke Shimizu, Qiang Zhou, Rui-Bo Jin	(参考訳) MgOドープLN,ZnOドープLN,In2O3ドープZnLNのドーピング比0～7mol%を含むドーピングLN結晶を用いた自然パラメトリックダウンコンバージョン法により,中赤外スペクトル非相関二光子(MIR)の合成を理論的に検討した。位相整合関数の傾き角と対応するポーリング周期は、タイプii、タイプi、タイプ-0の位相整合条件で計算される。また, ドープLN結晶の熱特性と, 香港-奥羽-マンデル干渉における特性を計算した。ドーピング比はgvm(group-velocity-matching)の波長に大きな影響を与えることがわかった。特に、共ドープしたInZnLN結晶のGVM2波長は678.7nmであり、従来の温度調整法によって達成された100nm未満の波長よりもはるかに広い。ドーピング比は二光子状態を操作する自由度として利用できると結論することができる。スペクトル的に非相関なバイフォトンは、純粋な単一光子源と絡み合った光子源を作るのに使用することができ、これはMIR範囲での量子エンハンスドセンシング、イメージング、通信に有望な応用をもたらす可能性がある。 We theoretically investigate the preparation of mid-infrared (MIR) spectrally-uncorrelated biphotons from a spontaneous parametric down-conversion process using doped LN crystals, including MgO doped LN, ZnO doped LN, and In2O3 doped ZnLN with doping ratio from 0 to 7 mol%. The tilt angle of the phase-matching function and the corresponding poling period are calculated under type-II, type-I, and type-0 phase-matching conditions. We also calculate the thermal properties of the doped LN crystals and their performance in Hong-Ou-Mandel interference. It is found that the doping ratio has a substantial impact on the group-velocity-matching (GVM) wavelengths. Especially, the GVM2 wavelength of co-doped InZnLN crystal has a tunable range of 678.7 nm, which is much broader than the tunable range of less than 100 nm achieved by the conventional method of adjusting the temperature. It can be concluded that the doping ratio can be utilized as a degree of freedom to manipulate the biphoton state. The spectrally uncorrelated biphotons can be used to prepare pure single-photon source and entangled photon source, which may have promising applications for quantum-enhanced sensing, imaging, and communications at the MIR range.	翻訳日:2023-04-20 08:43:53 公開日:2020-12-17
# 箱の中の量子力学的粒子の運動量に関する新しい概念 A New Concept for the Momentum of a Quantum Mechanical Particle in a Box ( http://arxiv.org/abs/2012.09596v1 ) ライセンス: Link先を確認	M. H. Al-Hashimi and U.-J. Wiese	(参考訳) 箱の中の粒子に対して、演算子 $- i \partial_x$ はヘルミタンではない。運動量演算子 $p = p_r + i p_i$ は、自己共役作用素に拡張できるエルミート成分 $p_r$ と反エルミート成分 $i p_i$ を持つ。これにより、箱の内部に厳密に制限された粒子上での運動量の測定が記述される。 For a particle in a box, the operator $- i \partial_x$ is not Hermitean. We provide an alternative construction of a momentum operator $p = p_R + i p_I$, which has a Hermitean component $p_R$ that can be extended to a self-adjoint operator, as well as an anti-Hermitean component $i p_I$. This leads to a description of momentum measurements performed on a particle that is strictly limited to the interior of a box.	翻訳日:2023-04-20 08:35:24 公開日:2020-12-17
# 完全量子状態におけるエントロピー生成の数値的「正確な」シミュレーション:ボルツマンエントロピー対フォン・ノイマンエントロピー Numerically "exact" simulations of entropy production in the fully quantum regime: Boltzmann entropy versus von Neumann entropy ( http://arxiv.org/abs/2012.09546v1 ) ライセンス: Link先を確認	Souichi Sakamoto and Yoshitaka Tanimura	(参考訳) 本研究では, 時間依存外力下で熱浴に結合した系の熱力学変数を, 数値計算による階層的運動方程式 (heom) から準静的ヘルムホルツエネルギーを用いて評価する手法を提案する。種々の温度で非マルコフ熱浴と強く結合したスピン系のエントロピーを計算した。その結果,外乱の変化が十分に緩やかに起こると,系は常に熱平衡に達した。そこで我々は,等温過程におけるボルツマンエントロピーとフォン・ノイマンエントロピーを計算し,HEOMに基づく準静電平衡系における内部エネルギー,熱,仕事などの様々な熱力学的変数を計算した。 We found that, although the characteristic features of the system entropies in the Boltzmann and von Neumann cases as a function of the system--bath coupling strength are similar, those for the total entropy production are completely different. The total entropy production in the Boltzmann case is always positive, whereas that in the von Neumann case becomes negative if we chose a thermal equilibrium state of the total system (an unfactorized thermal equilibrium state) as the initial state. This is because the total entropy production in the von Neumann case does not properly take into account the contribution of the entropy from the system--bath interaction. したがって、ボルツマンエントロピーは全量子状態におけるエントロピー生成を調べるために用いられる必要がある。最後に,jarzynski等式の適用性について検討した。 We present a scheme to evaluate thermodynamic variables for a system coupled to a heat bath under a time-dependent external force using the quasi-static Helmholtz energy from the numerically "exact" hierarchical equations of motion (HEOM). We computed the entropy produced by a spin system strongly coupled to a non-Markovian heat bath for various temperatures. We showed that when changes to the external perturbation occurred sufficiently slowly, the system always reached thermal equilibrium. Thus, we calculated the Boltzmann entropy and the von Neumann entropy for an isothermal process, as well as various thermodynamic variables, such as changes of internal energies, heat, and work, for a system in quasi-static equilibrium based on the HEOM. We found that, although the characteristic features of the system entropies in the Boltzmann and von Neumann cases as a function of the system--bath coupling strength are similar, those for the total entropy production are completely different. The total entropy production in the Boltzmann case is always positive, whereas that in the von Neumann case becomes negative if we chose a thermal equilibrium state of the total system (an unfactorized thermal equilibrium state) as the initial state. This is because the total entropy production in the von Neumann case does not properly take into account the contribution of the entropy from the system--bath interaction. Thus, the Boltzmann entropy must be used to investigate entropy production in the fully quantum regime. Finally, we examined the applicability of the Jarzynski equality.	翻訳日:2023-04-20 08:34:52 公開日:2020-12-17
# SZX計算における対角ゲートの一考察 A note on diagonal gates in SZX-calculus ( http://arxiv.org/abs/2012.09540v1 ) ライセンス: Link先を確認	Titouan Carette	(参考訳) この注記では、スケーラブルなzxh計算が、計算ベースで対角的な量子ゲートをコンパクトに表現するためにどのように用いられるかを記述する。これには制御および多制御Zゲート、一般化、グラフ演算子とハイパーグラフ演算子、位相ガジェットが含まれる。 This note describes how the the scalable ZXH calculus can be used to represent in a compact way the quantum gates that are diagonal in the computational basis. This includes controlled and multi-controlled Z gates, their generalizations, respectively graph and hypergraph operators, and also phase gadgets.	翻訳日:2023-04-20 08:34:27 公開日:2020-12-17
# PURE: 近接性に基づく接触追跡プロトコルの分析フレームワーク PURE: A Framework for Analyzing Proximity-based Contact Tracing Protocols ( http://arxiv.org/abs/2012.09520v1 ) ライセンス: Link先を確認	Fabrizio Cicala, Weicheng Wang, Tianhao Wang, Ninghui Li, Elisa Bertino, Faming Liang, Yang Yang	(参考訳) 多くの近接型トレース(pct)プロトコルが提案され、covid-19の拡散に対抗するためにデプロイされている。本稿では,PCTプロトコルを解析するための体系的なアプローチを提案する。プライバシ,ユーティリティ,レジリエンス,効率(PURE)の4つの側面から,コンタクトトレース設計の望ましい特性のリストを抽出する。また、pctプロトコルの主な設計上の選択として、患者がサーバに報告する情報とマッチングを行う相手の2つを特定した。これら2つの選択肢はPUREプロパティの大部分を決定し、既存のプロトコルの包括的な分析と比較を可能にする。 Many proximity-based tracing (PCT) protocols have been proposed and deployed to combat the spreading of COVID-19. In this paper, we take a systematic approach to analyze PCT protocols. We identify a list of desired properties of a contact tracing design from the four aspects of Privacy, Utility, Resiliency, and Efficiency (PURE). We also identify two main design choices for PCT protocols: what information patients report to the server, and which party performs the matching. These two choices determine most of the PURE properties and enable us to conduct a comprehensive analysis and comparison of the existing protocols.	翻訳日:2023-04-20 08:34:08 公開日:2020-12-17
# 方向増幅のためのトポロジカル入力出力理論 Topological input-output theory for directional amplification ( http://arxiv.org/abs/2012.09488v1 ) ライセンス: Link先を確認	Tom\'as Ramos, Juan Jos\'e Garc\'ia-Ripoll, and Diego Porras	(参考訳) 指向性増幅器として機能するフォトニック駆動散逸格子の入出力関係に対する位相的アプローチを提案する。この理論は、光学的非エルミートカップリング行列から有効な位相絶縁体ハミルトニアンへの写像に依存する。この写像は、逆行列が系の線形入力出力応答を決定する非エルミート結合行列の特異値分解に基づいている。位相的に非自明なレジームでは、格子の入出力応答は位相絶縁体における零エネルギー状態と同値の特異値を持つ特異ベクトルによって支配され、コヒーレント入力信号の方向増幅に繋がる。このようなトポロジカル増幅方式では、ゲイン、帯域幅、付加雑音、雑音-信号比などの量子デバイスの増幅特性を完全に特徴付けることができる。我々は1次元の非相互フォトニック格子でアイデアを例示し、完全な解析的予測を導出する。方向増幅は量子制限に近く、利得は指数関数的に増加し、システムサイズは$N$となり、ノイズ-信号比は1/\sqrt{N}$として抑制される。これは、量子信号増幅と単一光子検出に対する我々の理論の興味深い応用を示している。 We present a topological approach to the input-output relations of photonic driven-dissipative lattices acting as directional amplifiers. Our theory relies on a mapping from the optical non-Hermitian coupling matrix to an effective topological insulator Hamiltonian. This mapping is based on the singular value decomposition of non-Hermitian coupling matrices, whose inverse matrix determines the linear input-output response of the system. In topologically non-trivial regimes, the input-output response of the lattice is dominated by singular vectors with zero singular values that are the equivalent of zero-energy states in topological insulators, leading to directional amplification of a coherent input signal. In such topological amplification regime, our theoretical framework allows us to fully characterize the amplification properties of the quantum device such as gain, bandwidth, added noise, and noise-to-signal ratio. We exemplify our ideas in a one-dimensional non-reciprocal photonic lattice, for which we derive fully analytical predictions. We show that the directional amplification is near quantum-limited with a gain growing exponentially with system size, $N$, while the noise-to-signal ratio is suppressed as $1/\sqrt{N}$. This points out to interesting applications of our theory for quantum signal amplification and single-photon detection.	翻訳日:2023-04-20 08:33:39 公開日:2020-12-17
# 増幅のための究極の量子限界:鏡の前の1つの原子 Ultimate quantum limit for amplification: a single atom in front of a mirror ( http://arxiv.org/abs/2012.09800v1 ) ライセンス: Link先を確認	Emely Wiegand, Ping-Yi Wen, Per Delsing, Io-Chun Hoi, Anton Frisk Kockum	(参考訳) 1次元半無限導波路の終端付近の原子に結合する光場に対する3種類の増幅過程について検討した。 3レベルアトムの裸または服装ベースでドライブが集団反転を生成する2つのセットアップと、駆動する2レベルアトムの高次プロセスによる増幅による1つのセットアップを考察する。いずれの場合も、導波路の端は光の鏡として機能する。これにより、オープン導波路における同じセットアップに比べて増幅が2つの方法で向上することがわかった。まず、ミラーは原子からの全ての出力を2つの出力チャネルに分割するのではなく、一方向に移動するように強制する。第二に、ミラーによる干渉により、原子内の異なる遷移に対する緩和率の比率の調整が可能となり、集団の反転が増加する。これらの要因により増幅の促進度を定量化し,超伝導量子回路を用いた実験において標準パラメータを示せることを示した。 We investigate three types of amplification processes for light fields coupling to an atom near the end of a one-dimensional semi-infinite waveguide. We consider two setups where a drive creates population inversion in the bare or dressed basis of a three-level atom and one setup where the amplification is due to higher-order processes in a driven two-level atom. In all cases, the end of the waveguide acts as a mirror for the light. We find that this enhances the amplification in two ways compared to the same setups in an open waveguide. Firstly, the mirror forces all output from the atom to travel in one direction instead of being split up into two output channels. Secondly, interference due to the mirror enables tuning of the ratio of relaxation rates for different transitions in the atom to increase population inversion. We quantify the enhancement in amplification due to these factors and show that it can be demonstrated for standard parameters in experiments with superconducting quantum circuits.	翻訳日:2023-04-20 08:26:52 公開日:2020-12-17
# 数値最適化のためのディープラーニングの性能について:タンパク質構造予測への応用 On the performance of deep learning for numerical optimization: an application to protein structure prediction ( http://arxiv.org/abs/2012.09741v1 ) ライセンス: Link先を確認	Hojjat Rakhshani, Lhassane Idoumghar, Soheila Ghambari, Julien Lepagnot, Mathieu Br\'evilliers	(参考訳) 深層ニューラルネットワークは最近、知覚タスクのための人工知能モデルの構築と評価にかなりの注意を払っている。本稿では,グローバル最適化問題に対処するためのディープラーニングモデルの性能について検討する。提案手法は,ニューラルネットワークを効率的に生成して解決するニューラルネットワーク探索(neural architecture search, nas)の考え方を採用している。ネットワークアーキテクチャの空間は有向非循環グラフを用いて表現され、新しい未知のタスクの目的関数を最適化する最良のアーキテクチャを見つけることを目的としている。 GPU計算負荷と長いトレーニング時間を備えた非常に大きなネットワークの提案とは異なり、私たちは、最高のアーキテクチャを見つけるための軽量な実装を探すことに重点を置いています。 NASの性能は、最初にCEC 2017ベンチマークスイートで実証実験によって分析される。その後、一連のタンパク質構造予測(psp)問題に適用される。実験の結果,手作業で設計したアルゴリズムと比較して,生成した学習モデルは競争力のある結果が得られることがわかった。 Deep neural networks have recently drawn considerable attention to build and evaluate artificial learning models for perceptual tasks. Here, we present a study on the performance of the deep learning models to deal with global optimization problems. The proposed approach adopts the idea of the neural architecture search (NAS) to generate efficient neural networks for solving the problem at hand. The space of network architectures is represented using a directed acyclic graph and the goal is to find the best architecture to optimize the objective function for a new, previously unknown task. Different from proposing very large networks with GPU computational burden and long training time, we focus on searching for lightweight implementations to find the best architecture. The performance of NAS is first analyzed through empirical experiments on CEC 2017 benchmark suite. Thereafter, it is applied to a set of protein structure prediction (PSP) problems. The experiments reveal that the generated learning models can achieve competitive results when compared to hand-designed algorithms; given enough computational budget	翻訳日:2023-04-20 08:26:37 公開日:2020-12-17
# 分散量子コンピューティングのためのコンパイラ設計 Compiler Design for Distributed Quantum Computing ( http://arxiv.org/abs/2012.09680v1 ) ライセンス: Link先を確認	Davide Ferrari, Angela Sara Cacciapuoti, Michele Amoretti and Marcello Caleffi	(参考訳) 分散量子コンピューティングアーキテクチャでは、Quantum Internetが提供するネットワークと通信機能により、単一のNISQデバイスが自分では処理できない計算処理を実行するために、遠隔量子処理ユニット(QPU)が通信や協調を行うことができる。この目的のために、分散量子コンピューティングは、任意の量子アルゴリズムを任意の分散量子コンピューティングアーキテクチャにマッピングするために、新しい世代の量子コンパイラを必要とする。本稿では,まず,分散量子コンピューティングのコンパイラ設計において生じる主な課題について述べる。そして、分散量子コンピューティングのための量子コンパイルによって引き起こされるオーバーヘッドの上限を解析的に導出する。導出された境界は、基礎となるコンピューティングアーキテクチャによって引き起こされるオーバーヘッドと、サブオプティマイズ量子コンパイラによって引き起こされる追加のオーバーヘッド、すなわち汎用性、効率性、効率性という3つの重要な特徴を達成するために、論文を通じて明確に設計されている。最後に,解析結果を検証し,広範な性能解析によりコンパイラ設計の有効性を確認する。 In distributed quantum computing architectures, with the network and communications functionalities provided by the Quantum Internet, remote quantum processing units (QPUs) can communicate and cooperate for executing computational tasks that single NISQ devices cannot handle by themselves. To this aim, distributed quantum computing requires a new generation of quantum compilers, for mapping any quantum algorithm to any distributed quantum computing architecture. With this perspective, in this paper, we first discuss the main challenges arising with compiler design for distributed quantum computing. Then, we analytically derive an upper bound of the overhead induced by quantum compilation for distributed quantum computing. The derived bound accounts for the overhead induced by the underlying computing architecture as well as the additional overhead induced by the sub-optimal quantum compiler -- expressly designed through the paper to achieve three key features, namely, general-purpose, efficient and effective. Finally, we validate the analytical results and we confirm the validity of the compiler design through an extensive performance analysis.	翻訳日:2023-04-20 08:24:56 公開日:2020-12-17
# 位相的に非等価な量子化 Topologically inequivalent quantizations ( http://arxiv.org/abs/2012.09929v1 ) ライセンス: Link先を確認	Giovanni Acquaviva, Alfredo Iorio, Luca Smaldone	(参考訳) 自発的に破れた u(1) 内部対称性を持つスカラー量子場理論において, 量子化の代数, 標準交換関係の表現について, ナムブ・ゴールドストーン粒子の凝縮によって渦型の位相的欠陥が形成される場合に論じる。系の物理的不連続な相の存在に必要な等価でない表現を持つためには、通常の熱力学的極限は不要である。これは新しいタイプの不等式であり、位相空間の非自明な位相構造が有限体積に現れるためである。我々はこれを、位相的および熱力学的位相の統一的な視点への第一歩とみなし、このシナリオの量子重力への応用の可能性についてコメントする。 We discuss the representations of the algebra of quantization, the canonical commutation relations, in a scalar quantum field theory with spontaneously broken U(1) internal symmetry, when a topological defect of the vortex type is formed via the condensation of Nambu-Goldstone particles. We find that the usual thermodynamic limit is not necessary in order to have the inequivalent representations needed for the existence of physically disjoint phases of the system. This is a new type of inequivalence, due to the nontrivial topological structure of the phase space, that appears at finite volume. We regard this as a first step towards a unifying view of topological and thermodynamic phases, and offer here comments on the possible application of this scenario to quantum gravity.	翻訳日:2023-04-20 08:16:58 公開日:2020-12-17
# 古典的流行モデルと非散逸性および散逸性量子タイト結合モデルとの等価性 Equivalence between classical epidemic model and non-dissipative and dissipative quantum tight-binding model ( http://arxiv.org/abs/2012.09923v1 ) ライセンス: Link先を確認	Krzysztof Pomorski	(参考訳) 古典的流行モデルと非散逸性および散逸性量子タイト結合モデルとの等価性が導かれる。古典的な流行モデルは、非散逸性および散逸性の両方でフォン・ノイマンエントロピーによって記述された静電結合量子ビットの場合に現れる量子絡みを再現することができる。その結果、量子力学的現象は古典的統計モデルによってほぼ完全にシミュレートされる可能性が示唆された。量子のような絡み合いと状態の重畳を含む。したがって、古典力学の観点から古典システムによって表現される結合型流行モデルは、量子技術、特に量子のような計算や量子のような通信の基盤となる。古典密度行列は、反可換性の観点から運動方程式によって導かれ、記述される。ラビのような振動の存在は、古典的流行モデルにおいて指摘されている。さらに、量子系におけるアハロノフ・ボーム効果の存在も古典的な流行モデルによって再現できる。量子ドットから作られ、位置ベースの量子ビットを用いて単純化された強結合モデルによって記述された全ての量子系は、量子行列ハミルトンの2倍の大きさを持つS行列の非常に特異な構造を持つ古典的モデルによって効果的に記述することができる。得られた結果は、量子力学の基本的な性質とユニークな性質を部分的に疑問視し、量子力学のオントロジーを古典的な統計物理学の枠組みに置き、量子力学が有効であり、現象学的であり、現実の基本的な図像ではないことを示唆する他の基本的な理論の出現の動機をもたらす可能性がある。 The equivalence between classical epidemic model and nondissipative and dissipative quantum tight-binding model is derived. Classical epidemic model can reproduce the quantum entanglement emerging in the case of electrostatically coupled qubits described by von-Neumann entropy both in non-dissipative and dissipative case. The obtained results shows that quantum mechanical phenomena might be almost entirely simulated by classical statistical model. It includes the quantum like entanglement and superposition of states. Therefore coupled epidemic models expressed by classical systems in terms of classical physics can be the base for possible incorporation of quantum technologies and in particular for quantum like computation and quantum like communication. The classical density matrix is derived and described by the equation of motion in terms of anticommutator. Existence of Rabi like oscillations is pointed in classical epidemic model. Furthermore the existence of Aharonov-Bohm effect in quantum systems can also be reproduced by the classical epidemic model. Every quantum system made from quantum dots and described by simplistic tight-binding model by use of position-based qubits can be effectively described by classical model with very specific structure of S matrix that has twice bigger size as it is the case of quantum matrix Hamiltonian. Obtained results partly question fundamental and unique character of quantum mechanics and are placing ontology of quantum mechanics much in the framework of classical statistical physics what can bring motivation for emergence of other fundamental theories bringing suggestion that quantum mechanical is only effective and phenomenological but not fundamental picture of reality.	翻訳日:2023-04-20 08:16:45 公開日:2020-12-17
# 逆高調波振動子の物理-最下地平線から事象地平線まで- Physics of the Inverted Harmonic Oscillator: From the lowest Landau level to event horizons ( http://arxiv.org/abs/2012.09875v1 ) ライセンス: Link先を確認	Varsha Subramanyan, Suraj S. Hegde, Smitha Vishveshwara and Barry Bradlyn	(参考訳) 本研究では, 逆調和振動子(IHO)ハミルトニアンを, 様々な物理系における散乱と時間縮退の量子力学を理解するためのパラダイムとして提示する。領域保存変換のジェネレータの1つとして、IHOハミルトニアンは拡張生成器、圧縮生成器、ローレンツ励起発生器、散乱ポテンシャルとして研究することができる。これらの異なる形態を確立するために、量子ホール系におけるホーキング・ウンルー効果と最低ランダウ準位(LLL)における散乱の異なる現象を基礎とするIHOの物理学を実証する。我々は、LLLにおけるIHOハミルトニアンの出現をゲージ不変な方法で導き、事象の地平線付近で量子力学を記述するリンドラー・ハミルトニアンと正確な平行性を示す。 ihoハミルトニアンを通じて同型リー代数によって記述される対称性を持つ特異な物理系を研究するこのアプローチにより、ウィグナー回転のような相対論的効果の観点から最低ランダウレベルの幾何学的応答を再解釈することができる。さらに、IHOの分析散乱行列は、量子化された時間遅延速度を持つスペクトルにおける準正規モード(QNM)の存在を指摘する。我々は、これらのqnmを波束散乱によってアクセスする方法を示し、ブラックホールで見られるものと平行な量子ホールポイントコンタクトジオメトリにおける新しい効果を提案する。 In this work, we present the inverted harmonic oscillator (IHO) Hamiltonian as a paradigm to understand the quantum mechanics of scattering and time-decay in a diverse set of physical systems. As one of the generators of area preserving transformations, the IHO Hamiltonian can be studied as a dilatation generator, squeeze generator, a Lorentz boost generator, or a scattering potential. In establishing these different forms, we demonstrate the physics of the IHO that underlies phenomena as disparate as the Hawking-Unruh effect and scattering in the lowest Landau level(LLL) in quantum Hall systems. We derive the emergence of the IHO Hamiltonian in the LLL in a gauge invariant way and show its exact parallels with the Rindler Hamiltonian that describes quantum mechanics near event horizons. This approach of studying distinct physical systems with symmetries described by isomorphic Lie algebras through the emergent IHO Hamiltonian enables us to reinterpret geometric response in the lowest Landau level in terms of relativistic effects such as Wigner rotation. Further, the analytic scattering matrix of the IHO points to the existence of quasinormal modes (QNMs) in the spectrum, which have quantized time-decay rates. We present a way to access these QNMs through wave packet scattering, thus proposing a novel effect in quantum Hall point contact geometries that parallels those found in black holes.	翻訳日:2023-04-20 08:16:06 公開日:2020-12-17
# 人工ゲージ場における超低温ボソンのダイナミクス:角運動量、フラグメンテーション、エントロピーの変動 Dynamics of Ultracold Bosons in Artificial Gauge Fields: Angular Momentum, Fragmentation, and the Variance of Entropy ( http://arxiv.org/abs/2012.09870v1 ) ライセンス: Link先を確認	Axel U.J. Lode, Sunayana Dutta, Camille L\'ev\^eque	(参考訳) 人工ゲージ場に突然切り替えることによって引き起こされる2次元相互作用する超低温ボソンのダイナミクスを考察する。このシステムは、高調波トラップ電位の基底状態において初期化される。応用された人工ゲージの強度の関数として、角運動量、断片化、および吸収のエントロピーのエントロピーと分散、あるいは単発画像のエントロピーをモニタリングすることにより、創発的ダイナミクスを解析する。我々は,マルチコンフィグレーション的時間依存ハーツリー法(mctdh-x)を用いて,時間依存多元ボソンschr\"odinger方程式を解く。人工ゲージ場がシステム内の角運動量に埋め込まれていることが分かる。フラクメンテーション (Fragmentation) - 縮小した一体密度行列の複数のマクロ固有値 - は角運動量の力学と同期して現れる: 多体状態のボソンは非自明な相関を発達させる。本研究では,超低温原子系の状態の標準的な投影計測である単発画像の画像エントロピーの分散を統計的に解析することにより,断片化と角運動量の評価が実験的に困難であることを実証する。 We consider the dynamics of two-dimensional interacting ultracold bosons triggered by suddenly switching on an artificial gauge field. The system is initialized in the ground state of a harmonic trapping potential. As a function of the strength of the applied artificial gauge, we analyze the emergent dynamics by monitoring the angular momentum, the fragmentation as well the entropy and variance of the entropy of absorption or single-shot images. We solve the underlying time-dependent many-boson Schr\"odinger equation using the multiconfigurational time-dependent Hartree method for indistinguishable particles (MCTDH-X). We find that the artificial gauge field implants angular momentum in the system. Fragmentation -- multiple macroscopic eigenvalues of the reduced one-body density matrix -- emerges in sync with the dynamics of angular momentum: the bosons in the many-body state develop non-trivial correlations. Fragmentation and angular momentum are experimentally difficult to assess; here, we demonstrate that they can be probed by statistically analyzing the variance of the image entropy of single-shot images that are the standard projective measurement of the state of ultracold atomic systems.	翻訳日:2023-04-20 08:15:40 公開日:2020-12-17
# 相互作用を持つ一般化オーブリー・アンドルーモデルにおける行列積状態をもつ多体移動エッジの探索 In search of a many-body mobility edge with matrix product states in a Generalized Aubry-Andr\'e model with interactions ( http://arxiv.org/abs/2012.09853v1 ) ライセンス: Link先を確認	Nicholas Pomata, Sriram Ganeshan, Tzu-Chieh Wei	(参考訳) 一般化された aubry-andr\'e (gaa) モデルにおける多体移動エッジの可能性について,shift-invert matrix product states (simps) アルゴリズム (phys) を用いて検討した。 Rev. Lett. 118, 017201 (2017)). 非相互作用GAAモデルは、自己双対誘導モビリティエッジを持つ1次元準周期モデルである。 SIMPSの利点は、エネルギー分解方式で多体状態をターゲットにしており、収束のために全多体状態を局所化する必要はなく、相互作用するGAAモデルが多体移動エッジを示すかどうかをテストすることができることである。解析の結果, 単一粒子移動エッジの存在下での標的状態は, 「MBL様」完全収束状態とSIMPSが収束しない完全非局在状態に一致しないことがわかった。有限結合次元の関数としての絡み合いスケーリング解析は、単一粒子移動端近傍の多体状態がSIMPS法において非局在状態がどのように現れるかに近い振る舞いを示す。 We investigate the possibility of a many-body mobility edge in the Generalized Aubry-Andr\'e (GAA) model with interactions using the Shift-Invert Matrix Product States (SIMPS) algorithm (Phys. Rev. Lett. 118, 017201 (2017)). The non-interacting GAA model is a one-dimensional quasiperiodic model with a self-duality induced mobility edge. The advantage of SIMPS is that it targets many-body states in an energy-resolved fashion and does not require all many-body states to be localized for convergence, which allows us to test if the interacting GAA model manifests a many-body mobility edge. Our analysis indicates that the targeted states in the presence of the single particle mobility edge match neither `MBL-like' fully-converged localized states nor the fully delocalized case where SIMPS fails to converge. An entanglement-scaling analysis as a function of the finite bond dimension indicates that the many-body states in the vicinity of a single-particle mobility edge behave closer to how delocalized states manifest within the SIMPS method.	翻訳日:2023-04-20 08:15:14 公開日:2020-12-17
# 非線形適応制御と予測における入射正則化と運動量アルゴリズム Implicit regularization and momentum algorithms in nonlinear adaptive control and prediction ( http://arxiv.org/abs/1912.13154v6 ) ライセンス: Link先を確認	Nicholas M. Boffi, Jean-Jacques E. Slotine	(参考訳) 動的システムの安定した同時学習と制御は適応制御の主題である。多くの実用的応用と豊富な理論を持つ確立された分野であるにもかかわらず、非線形システムの適応制御の開発の多くは、いくつかの重要なアルゴリズムを中心に展開されている。古典的適応非線形制御技術と最近の最適化と機械学習の進歩とを強く結び付けることで,適応非線形制御と適応動的予測の両面において,アルゴリズム開発に未発達の可能性が示された。まず,自然勾配降下とミラー降下に触発された一階適応則を導入する。データに一貫性のある複数のダイナミクスが存在する場合、これらの非ユークリッド適応法則は学習モデルを暗黙的に規則化する。このように学習中に課される局所幾何は、スパーシティのような望ましい性質のために、完全な追跡や予測を達成する多くのパラメータベクトルを選択できる。この結果を正規化力学予測器とオブザーバの設計に適用し、具体的にはハミルトン系、ラグランジュ系、および繰り返しニューラルネットワークを考える。その後、ブレグマン・ラグランジアン(bregman lagrangian)に基づく変分形式を開発し、線形パラメータ化システムや単調性や凸性要件を満たす非線形パラメータ化システムに適用可能な運動量を持つ適応則を定義する。ブレグマン・ラグランジュ方程式のオイラー・ラグランジュ方程式は運動量を持つ自然な勾配やミラー降下のような適応法則を導いており、無限摩擦極限においてそれらの一階の類似を回復する。理論的結果を示すシミュレーションを用いて分析を行った。 Stable concurrent learning and control of dynamical systems is the subject of adaptive control. Despite being an established field with many practical applications and a rich theory, much of the development in adaptive control for nonlinear systems revolves around a few key algorithms. By exploiting strong connections between classical adaptive nonlinear control techniques and recent progress in optimization and machine learning, we show that there exists considerable untapped potential in algorithm development for both adaptive nonlinear control and adaptive dynamics prediction. We first introduce first-order adaptation laws inspired by natural gradient descent and mirror descent. We prove that when there are multiple dynamics consistent with the data, these non-Euclidean adaptation laws implicitly regularize the learned model. Local geometry imposed during learning thus may be used to select parameter vectors - out of the many that will achieve perfect tracking or prediction - for desired properties such as sparsity. We apply this result to regularized dynamics predictor and observer design, and as concrete examples consider Hamiltonian systems, Lagrangian systems, and recurrent neural networks. We subsequently develop a variational formalism based on the Bregman Lagrangian to define adaptation laws with momentum applicable to linearly parameterized systems and to nonlinearly parameterized systems satisfying monotonicity or convexity requirements. We show that the Euler Lagrange equations for the Bregman Lagrangian lead to natural gradient and mirror descent-like adaptation laws with momentum, and we recover their first-order analogues in the infinite friction limit. We illustrate our analyses with simulations demonstrating our theoretical results.	翻訳日:2023-01-16 21:18:35 公開日:2020-12-17
# 1つの広い層をもつ深層ネットワークのグローバル収束とピラミッドトポロジー Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology ( http://arxiv.org/abs/2002.07867v3 ) ライセンス: Link先を確認	Quynh Nguyen and Marco Mondelli	(参考訳) 最近の研究により、勾配降下は、すべての隠れた層が多項式的にスケールし、n$ (n$ はトレーニングサンプルの数) でスケールする、過パラメータニューラルネットワークのグローバル最小値を見つけることができることが示されている。本稿では,深層ネットワークにおいて,入力層に追従する1層の幅$N$が,同様の保証を確保するのに十分であることを示す。特に、残りの全ての層は一定の幅を持ち、ピラミッドトポロジーを形成することができる。我々は、広く使われているLeCunの初期化への我々の結果の適用を示し、オーダー$N^2.$の単一ワイド層に対するオーバーパラメータ化要件を得る。 Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with $N$ ($N$ being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width $N$ following the input layer suffices to ensure a similar guarantee. In particular, all the remaining layers are allowed to have constant widths, and form a pyramidal topology. We show an application of our result to the widely used LeCun's initialization and obtain an over-parameterization requirement for the single wide layer of order $N^2.$	翻訳日:2022-12-30 19:33:36 公開日:2020-12-17
# 選好からの高速ベイズ逆流推論による安全な模倣学習 Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences ( http://arxiv.org/abs/2002.09089v4 ) ライセンス: Link先を確認	Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum	(参考訳) デモンストレーションによるベイズ報酬学習は、模倣学習を行う際の厳密な安全性と不確実性分析を可能にする。しかし、ベイジアン報酬学習法は一般に複雑な制御問題に対して計算的に難解である。ベイジアン・リワード補間法(Bayesian Reward Extrapolation, Bayesian REX)を提案する。ベイジアン・リワード学習アルゴリズムは, 自己教師付きタスクによる低次元特徴符号化を事前学習し, 実演よりも好みを生かして高速なベイジアン推定を行う。 Bayesian REXはデモからAtariゲームを学ぶことができ、ゲームスコアにアクセスすることなく、パーソナルラップトップでわずか5分で後部報酬関数から10万のサンプルを生成することができる。ベイジアンREXはまた、報酬関数の点推定のみを学習する最先端の手法と競合するか、それ以上の模倣学習性能をもたらす。最後に、ベイジアンREXは報酬関数のサンプルにアクセスすることなく、効率的な高信頼度ポリシー評価を可能にする。これらの信頼性の高いパフォーマンス境界は、さまざまな評価ポリシーのパフォーマンスとリスクをランク付けし、報酬ハッキング行動を検出する手段を提供するために使用できる。 Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally intractable for complex control problems. We propose Bayesian Reward Extrapolation (Bayesian REX), a highly efficient Bayesian reward learning algorithm that scales to high-dimensional imitation learning problems by pre-training a low-dimensional feature encoding via self-supervised tasks and then leveraging preferences over demonstrations to perform fast Bayesian inference. Bayesian REX can learn to play Atari games from demonstrations, without access to the game score and can generate 100,000 samples from the posterior over reward functions in only 5 minutes on a personal laptop. Bayesian REX also results in imitation learning performance that is competitive with or better than state-of-the-art methods that only learn point estimates of the reward function. Finally, Bayesian REX enables efficient high-confidence policy evaluation without having access to samples of the reward function. These high-confidence performance bounds can be used to rank the performance and risk of a variety of evaluation policies and provide a way to detect reward hacking behaviors.	翻訳日:2022-12-30 00:33:58 公開日:2020-12-17
# グラフ注意ニューラルネットワークを用いた目標知覚分類のための型付き構文依存性の検討 Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network ( http://arxiv.org/abs/2002.09685v3 ) ライセンス: Link先を確認	Xuefeng Bai, Pengbo Liu and Yue Zhang	(参考訳) 目標感情分類は、入力テキスト中の特定の目標言及に対する感情極性を予測する。支配的手法はニューラルネットワークを用いて入力文を符号化し、ターゲット参照とそれらのコンテキストの関係を抽出する。近年,タスクの依存関係構文を統合するためにグラフニューラルネットワークが研究され,最先端の結果が得られた。しかし、既存の手法では依存ラベル情報を考慮せず、直感的に有用である。そこで本研究では,型付き構文依存情報を統合する新しい関係グラフアテンションネットワークについて検討する。標準ベンチマークの結果,提案手法は感情分類性能を向上させるためにラベル情報を効果的に活用できることがわかった。最終的なモデルは最先端の構文ベースのアプローチを大幅に上回っています。 Targeted sentiment classification predicts the sentiment polarity on given target mentions in input texts. Dominant methods employ neural networks for encoding the input sentence and extracting relations between target mentions and their contexts. Recently, graph neural network has been investigated for integrating dependency syntax for the task, achieving the state-of-the-art results. However, existing methods do not consider dependency label information, which can be intuitively useful. To solve the problem, we investigate a novel relational graph attention network that integrates typed syntactic dependency information. Results on standard benchmarks show that our method can effectively leverage label information for improving targeted sentiment classification performances. Our final model significantly outperforms state-of-the-art syntax-based approaches.	翻訳日:2022-12-29 19:28:28 公開日:2020-12-17
# H\"古いクラスによってインデックス付けされた経験過程の上限の期待値の境界 Bounding the expectation of the supremum of empirical processes indexed by H\"older classes ( http://arxiv.org/abs/2003.13530v3 ) ライセンス: Link先を確認	Nicolas Schreuder	(参考訳) 本稿では、任意の滑らかなh\"olderクラスと$\mathbb r^d$ の有界集合上の任意の分布によってインデックスづけされた経験的過程の上限の期待値の上界を与える。これらの結果は、n$独立観測に基づいて未知の分布を経験的に推定し、その推定誤差を積分確率メトリクス(IPM)によって定量化する場合、非漸近的リスク境界と見なすことができる。特に、H\"古いクラスによってインデックスされたIMMを考慮し、対応するレートを導出する。これらの結果は、Wassertein-1 距離に対応する速度 $n^{-1/d}$ と、非常に滑らかな函数に対応する高速速度 $n^{-1/2}$(例えば、有界核で定義される RKHS の関数)の2つのよく知られた極端なケースの間で補間される。 In this note, we provide upper bounds on the expectation of the supremum of empirical processes indexed by H\"older classes of any smoothness and for any distribution supported on a bounded set in $\mathbb R^d$. These results can be alternatively seen as non-asymptotic risk bounds, when the unknown distribution is estimated by its empirical counterpart, based on $n$ independent observations, and the error of estimation is quantified by the integral probability metrics (IPM). In particular, the IPM indexed by a H\"older class is considered and the corresponding rates are derived. These results interpolate between the two well-known extreme cases: the rate $n^{-1/d}$ corresponding to the Wassertein-1 distance (the least smooth case) and the fast rate $n^{-1/2}$ corresponding to very smooth functions (for instance, functions from an RKHS defined by a bounded kernel).	翻訳日:2022-12-18 08:40:20 公開日:2020-12-17
# 異種ネットワーク表現学習:調査とベンチマークによる統一フレームワーク Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark ( http://arxiv.org/abs/2004.00216v3 ) ライセンス: Link先を確認	Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, Jiawei Han	(参考訳) 実世界のオブジェクトとその相互作用はしばしばマルチモーダルでマルチタイプであるため、ヘテロジニアスネットワークは伝統的な均質ネットワーク(graphs)のより強力で現実的な、汎用的なスーパークラスとして広く使われてきた。一方,表現学習 (\aka~embedding) は近年,様々なネットワークマイニングや分析作業において,集中的に研究されている。本研究では,既存のヘテロジニアス・ネットワーク・組み込み(hne)に関する研究を深く要約し,評価するための統一的なフレームワークを提供することを目的としている。この研究の最初の貢献として、HNEアルゴリズムの幅広い体系体が存在しており、既存のHNEアルゴリズムの利点に関する体系的な分類と分析のための一般的なパラダイムを提供する。さらに、既存のHNEアルゴリズムは概ね汎用性を備えているが、しばしば異なるデータセットで評価される。 HNEの応用上、このような間接的な比較は、特に実世界のアプリケーションデータから異種ネットワークを構築する様々な方法を考えると、効率的なデータ前処理や新しい技術設計へのタスクパフォーマンスの向上の適切な寄与を阻害する。したがって、第2の貢献として、スケール、構造、属性/ラベルの可用性、および \etcに関するさまざまな特性を持つ4つのベンチマークデータセットを作成します。異なる情報源から、HNEアルゴリズムの便利で公正な評価に向けて。第3のコントリビューションとして、実装を慎重にリファクタリングし、13の人気のあるHNEアルゴリズムの親和性のあるインターフェースを作成し、複数のタスクと実験的な設定に対して、それらの全周比較を提供する。 Since real-world objects and their interactions are often multi-modal and multi-typed, heterogeneous networks have been widely used as a more powerful, realistic, and generic superclass of traditional homogeneous networks (graphs). Meanwhile, representation learning (\aka~embedding) has recently been intensively studied and shown effective for various network mining and analytical tasks. In this work, we aim to provide a unified framework to deeply summarize and evaluate existing research on heterogeneous network embedding (HNE), which includes but goes beyond a normal survey. Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms. Moreover, existing HNE algorithms, though mostly claimed generic, are often evaluated on different datasets. Understandable due to the application favor of HNE, such indirect comparisons largely hinder the proper attribution of improved task performance towards effective data preprocessing and novel technical design, especially considering the various ways possible to construct a heterogeneous network from real-world application data. Therefore, as the second contribution, we create four benchmark datasets with various properties regarding scale, structure, attribute/label availability, and \etc.~from different sources, towards handy and fair evaluations of HNE algorithms. As the third contribution, we carefully refactor and amend the implementations and create friendly interfaces for 13 popular HNE algorithms, and provide all-around comparisons among them over multiple tasks and experimental settings.	翻訳日:2022-12-17 19:32:19 公開日:2020-12-17
# エネルギーモデルを用いた構成的視覚生成と推論 Compositional Visual Generation and Inference with Energy Based Models ( http://arxiv.org/abs/2004.06030v3 ) ライセンス: Link先を確認	Yilun Du, Shuang Li, Igor Mordatch	(参考訳) 人間の知能の重要な側面は、より単純なアイデアからますます複雑な概念を組み立て、迅速な学習と知識の適応を可能にする能力である。本稿では, 確率分布を直接組み合わせることで, エネルギーモデルでこの能力を発揮できることを示す。複合分布からのサンプルは概念の構成に対応する。例えば、笑顔の顔の分布と男性の顔の分布を考えると、笑顔の顔を生成するためにそれらを組み合わせることができる。これにより、コンビネーション、切断、概念の否定を同時に満足する自然画像を生成することができます。我々は,自然顔のCelebAデータセットと合成3Dシーン画像を用いて,モデルの構成生成能力を評価する。また、新たな概念を継続的に学習し、組み込む機能や、画像の基盤となる概念特性の合成を推論する機能など、我々のモデルに特有の利点も示しています。 A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge. In this paper we show that energy-based models can exhibit this ability by directly combining probability distributions. Samples from the combined distribution correspond to compositions of concepts. For example, given a distribution for smiling faces, and another for male faces, we can combine them to generate smiling male faces. This allows us to generate natural images that simultaneously satisfy conjunctions, disjunctions, and negations of concepts. We evaluate compositional generation abilities of our model on the CelebA dataset of natural faces and synthetic 3D scene images. We also demonstrate other unique advantages of our model, such as the ability to continually learn and incorporate new concepts, or infer compositions of concept properties underlying an image.	翻訳日:2022-12-13 23:07:56 公開日:2020-12-17
# pfnn:複素幾何学上の二階境界値問題のクラスを解くペナルティフリーニューラルネットワーク法 PFNN: A Penalty-Free Neural Network Method for Solving a Class of Second-Order Boundary-Value Problems on Complex Geometries ( http://arxiv.org/abs/2004.06490v2 ) ライセンス: Link先を確認	Hailong Sheng and Chao Yang	(参考訳) 複素測地における2階境界値問題のクラスを効率的に解くために, ペナルティのないニューラルネットワーク手法であるPFNNを提案する。滑らかさの要求を減らすため、元の問題は弱形式に再構成され、高階導関数の評価は避けられる。 1つではなく2つのニューラルネットワークを用いて近似解を構築し、1つのネットワークが必須境界条件を満たし、もう1つはドメインの残りの部分を処理している。このように、制約付き最適化ではなく、制約付き最適化問題は、ペナルティ項を追加することなく解決される。 2つのネットワークの絡み合いは、スケール不変で複雑なジオメトリに適応できる長さ係数関数の助けを借りて解消される。本稿では,pfnn法の収束を証明し,線形および非線形の2次境界値問題に対する数値実験を行い,pfnnが既存の手法よりも精度,柔軟性,頑健性において優れていることを示す。 We present PFNN, a penalty-free neural network method, to efficiently solve a class of second-order boundary-value problems on complex geometries. To reduce the smoothness requirement, the original problem is reformulated to a weak form so that the evaluations of high-order derivatives are avoided. Two neural networks, rather than just one, are employed to construct the approximate solution, with one network satisfying the essential boundary conditions and the other handling the rest part of the domain. In this way, an unconstrained optimization problem, instead of a constrained one, is solved without adding any penalty terms. The entanglement of the two networks is eliminated with the help of a length factor function that is scale invariant and can adapt with complex geometries. We prove the convergence of the PFNN method and conduct numerical experiments on a series of linear and nonlinear second-order boundary-value problems to demonstrate that PFNN is superior to several existing approaches in terms of accuracy, flexibility and robustness.	翻訳日:2022-12-13 10:15:52 公開日:2020-12-17
# MangaGAN:マンガ図面の方法論に基づく未完成のフォト・ツー・マンガ翻訳 MangaGAN: Unpaired Photo-to-Manga Translation Based on The Methodology of Manga Drawing ( http://arxiv.org/abs/2004.10634v2 ) ライセンス: Link先を確認	Hao Su, Jianwei Niu, Xuefeng Liu, Qingfeng Li, Jiahe Cui, and Ji Wan	(参考訳) マンガ(manga)は、主に白黒のストローク線や幾何学的な誇張を用いて人間の容姿、ポーズ、行動などを表現した、日本発祥の世界的人気漫画である。本稿では, 生成的逆ネットワーク(gan, generative adversarial network)を基盤としたマンガガン(mangagan)を提案する。マンガアーティストがいかにマンガを描くかに触発されたMangaGANは、デザインされたGANモデルによってマンガの幾何学的特徴を生成し、カスタマイズされたマルチGANアーキテクチャにより、各顔領域をマンガドメインに微妙に翻訳する。 MangaGANのトレーニングのために,マンガの顔の特徴,ランドマーク,身体などを含む,人気マンガ作品から収集された新しいデータセットを構築した。さらに,高品質なマンガ面を作成するために,スムースなストロークラインに対する構造的平滑化損失とノイズ画素の回避,およびフォトとマンガのドメイン間の類似性を向上させるための類似性保持モジュールを提案する。広汎な実験により、マンガガンは、顔の類似性と人気マンガスタイルの両方を保ち、他の関連する最先端の手法よりも優れた高品質なマンガフェイスを生成できることが示されている。 Manga is a world popular comic form originated in Japan, which typically employs black-and-white stroke lines and geometric exaggeration to describe humans' appearances, poses, and actions. In this paper, we propose MangaGAN, the first method based on Generative Adversarial Network (GAN) for unpaired photo-to-manga translation. Inspired by how experienced manga artists draw manga, MangaGAN generates the geometric features of manga face by a designed GAN model and delicately translates each facial region into the manga domain by a tailored multi-GANs architecture. For training MangaGAN, we construct a new dataset collected from a popular manga work, containing manga facial features, landmarks, bodies, and so on. Moreover, to produce high-quality manga faces, we further propose a structural smoothing loss to smooth stroke-lines and avoid noisy pixels, and a similarity preserving module to improve the similarity between domains of photo and manga. Extensive experiments show that MangaGAN can produce high-quality manga faces which preserve both the facial similarity and a popular manga style, and outperforms other related state-of-the-art methods.	翻訳日:2022-12-10 18:33:26 公開日:2020-12-17
# ディープニューラルネットワークと対向ロバストネスを用いたfMRIデコーディングの解釈可能性の向上 Improving the Interpretability of fMRI Decoding using Deep Neural Networks and Adversarial Robustness ( http://arxiv.org/abs/2004.11114v3 ) ライセンス: Link先を確認	Patrick McClure, Dustin Moraczewski, Ka Chun Lam, Adam Thomas, Francisco Pereira	(参考訳) 機能的磁気共鳴画像(fMRI)データから予測するために、ディープニューラルネットワーク(DNN)がますます使われている。しかし、それらは広く解釈不能な「ブラックボックス」と見なされており、その過程でdnnがどの入力情報が使われているかを知ることは困難であり、認知神経科学と臨床応用の両方において重要なものである。サリエンシマップは、入力特徴の相対的重要性の解釈可能な可視化を作成するための一般的なアプローチである。しかし、DNNが入力ノイズに敏感であることや、入力に過度に集中し、モデルに過少なため、マップを作成する方法は失敗することが多い。また,正当性マップが真に関連した入力情報にどの程度対応しているかを評価することも困難である。本稿では,勾配に基づく塩分濃度分布図を作成するための様々な手法を概説し,DNNを入力雑音に頑健にするために開発した新しい逆方向学習法について述べる。本稿では,DNNや線形モデルを用いて画像データから情報を復号化するための訓練を行う場合,fMRIにおける2つの定量評価手法を提案する。我々は,複雑なアクティベーション構造が知られている合成データセットと,DNNで生成されるサリエンシマップとHuman Connectome Project(HCP)データセットにおけるタスクデコーディングのための線形モデルを用いて,その手順を評価する。我々の重要な発見は、合成fMRIデータとHCP fMRIデータの両方において、異なる方法で生成されるサリエンシマップが、解釈可能性において大きく異なることである。驚くべきことに、dnnと線形モデルが同等のパフォーマンスレベルでデコードする場合であっても、dnn saliency mapは(重みや勾配から派生した)線形モデルsaliency mapsよりも解釈可能性が高い。最後に,我々の対人訓練法で作成したサリエンシマップは,他の方法よりも優れていた。 Deep neural networks (DNNs) are being increasingly used to make predictions from functional magnetic resonance imaging (fMRI) data. However, they are widely seen as uninterpretable "black boxes", as it can be difficult to discover what input information is used by the DNN in the process, something important in both cognitive neuroscience and clinical applications. A saliency map is a common approach for producing interpretable visualizations of the relative importance of input features for a prediction. However, methods for creating maps often fail due to DNNs being sensitive to input noise, or by focusing too much on the input and too little on the model. It is also challenging to evaluate how well saliency maps correspond to the truly relevant input information, as ground truth is not always available. In this paper, we review a variety of methods for producing gradient-based saliency maps, and present a new adversarial training method we developed to make DNNs robust to input noise, with the goal of improving interpretability. We introduce two quantitative evaluation procedures for saliency map methods in fMRI, applicable whenever a DNN or linear model is being trained to decode some information from imaging data. We evaluate the procedures using a synthetic dataset where the complex activation structure is known, and on saliency maps produced for DNN and linear models for task decoding in the Human Connectome Project (HCP) dataset. Our key finding is that saliency maps produced with different methods vary widely in interpretability, in both in synthetic and HCP fMRI data. Strikingly, even when DNN and linear models decode at comparable levels of performance, DNN saliency maps score higher on interpretability than linear model saliency maps (derived via weights or gradient). Finally, saliency maps produced with our adversarial training method outperform those from other methods.	翻訳日:2022-12-10 08:54:21 公開日:2020-12-17
# 半語彙言語 ---コンピュータビジョンにおける機械学習と記号推論の統合のための公式な基礎 Semi-Lexical Languages -- A Formal Basis for Unifying Machine Learning and Symbolic Reasoning in Computer Vision ( http://arxiv.org/abs/2004.12152v2 ) ライセンス: Link先を確認	Briti Gangopadhyay, Somnath Hazra and Pallab Dasgupta	(参考訳) 人間の視覚は、世界に関する事前の知識に基づいて推論することで、現実世界からの感覚入力の不完全性を補うことができる。しかし、ドメイン知識に基づく推論フレームワークが存在しないことで、複雑なシナリオを解釈する能力は制限されている。実世界の不完全なトークンを扱うための公式な基礎として半語彙言語を提案する。機械学習のパワーは不完全なトークンを言語のアルファベットにマッピングするために使用され、シンボリック推論は言語の入力のメンバーシップを決定するために使用される。半語彙言語はまた、入力の異なる部分で半語彙のトークンが解釈されるバリエーションを防ぐバインディングを持ち、それによって推論に頼り、個々のトークンの認識の質を高める。本稿では、純粋な機械学習と純粋にシンボリックな手法よりも、このようなフレームワークを使うことの利点を示すケーススタディを紹介する。 Human vision is able to compensate imperfections in sensory inputs from the real world by reasoning based on prior knowledge about the world. Machine learning has had a significant impact on computer vision due to its inherent ability in handling imprecision, but the absence of a reasoning framework based on domain knowledge limits its ability to interpret complex scenarios. We propose semi-lexical languages as a formal basis for dealing with imperfect tokens provided by the real world. The power of machine learning is used to map the imperfect tokens into the alphabet of the language and symbolic reasoning is used to determine the membership of input in the language. Semi-lexical languages also have bindings that prevent the variations in which a semi-lexical token is interpreted in different parts of the input, thereby leaning on deduction to enhance the quality of recognition of individual tokens. We present case studies that demonstrate the advantage of using such a framework over pure machine learning and pure symbolic methods.	翻訳日:2022-12-09 21:16:04 公開日:2020-12-17
# 文書品質予測のための構造タグの改善 Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction ( http://arxiv.org/abs/2005.00129v2 ) ライセンス: Link先を確認	Gideon Maillette de Buy Wenniger, Thomas van Dongen, Eleri Aedmaa, Herbert Teun Kruitbosch, Edwin A. Valentijn, and Lambert Schomaker	(参考訳) 長いテキスト、特に学術文書でのリカレントニューラルネットワークのトレーニングは、学習に問題を引き起こす。階層的注意ネットワーク(HAN)はこれらの問題を解決するのに有効であるが、テキストの構造に関する重要な情報を失う。これらの問題に対処するために、文書中の文の役割を示す構造タグとHANの使用を提案する。文にタグを追加し、タイトル、抽象的、あるいは本文に対応するマークを付けると、学術的な文書品質予測のための最先端技術よりも改善される。提案システムは,PeerReadデータセット上でのアクセプション/リジェクト予測のタスクに適用し,最近のBiLSTMモデルと共同テキスト+視覚モデル,および平易なHANとの比較を行った。通常のHANと比較すると、3つの領域で精度が向上する。計算と言語領域では、新しいモデルは全体として最もよく機能し、最良の文献結果よりも4.7%精度が向上します。また,allen ai s2orcデータセットから集計した88kの学術論文に対して,引用数予測用のタグを導入することで,改良を行った。構造タグを持つHANシステムでは,28.5%の分散が説明され,BiLSTMモデルの再実装よりも1.8%,通常のHANよりも1.0%向上した。 Training recurrent neural networks on long texts, in particular scholarly documents, causes problems for learning. While hierarchical attention networks (HANs) are effective in solving these problems, they still lose important information about the structure of the text. To tackle these problems, we propose the use of HANs combined with structure-tags which mark the role of sentences in the document. Adding tags to sentences, marking them as corresponding to title, abstract or main body text, yields improvements over the state-of-the-art for scholarly document quality prediction. The proposed system is applied to the task of accept/reject prediction on the PeerRead dataset and compared against a recent BiLSTM-based model and joint textual+visual model as well as against plain HANs. Compared to plain HANs, accuracy increases on all three domains. On the computation and language domain our new model works best overall, and increases accuracy 4.7% over the best literature result. We also obtain improvements when introducing the tags for prediction of the number of citations for 88k scientific publications that we compiled from the Allen AI S2ORC dataset. For our HAN-system with structure-tags we reach 28.5% explained variance, an improvement of 1.8% over our reimplementation of the BiLSTM-based model as well as 1.0% improvement over plain HANs.	翻訳日:2022-12-08 03:12:23 公開日:2020-12-17
# DramaQA:階層型QAによる文字中心のビデオストーリー理解 DramaQA: Character-Centered Video Story Understanding with Hierarchical QA ( http://arxiv.org/abs/2005.03356v2 ) ライセンス: Link先を確認	Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang	(参考訳) 近年のコンピュータビジョンと自然言語処理の進歩にもかかわらず、ビデオストーリーの本質的な難しさのため、ビデオストーリーを理解できる機械の開発はいまだに困難である。また,人間の認知過程に基づく映像理解の程度を評価する方法については,まだ研究が進んでいない。本稿では,ビデオストーリーを包括的に理解するために,新しいビデオ質問応答(ビデオQA)タスクであるDramaQAを提案する。 DramaQAは2つの視点に焦点を当てている。 1)人間の知能の認知発達段階に基づく評価指標としての階層的QA。 2) ストーリーの局所的コヒーレンスをモデル化するための文字中心のビデオアノテーション。我々のデータセットは、テレビドラマ『Another Miss Oh』の上に構築されており、17,983対のQAビデオクリップが23,928本あり、各QAペアは4つの難易度のうちの1つに属している。我々は217,308個のアノテーション付き画像を提供し,視覚境界ボックスや主要文字の動作や感情,解決されたスクリプトの同時参照など,文字中心のアノテーションを充実させた。さらに,ビデオの文字中心表現を階層的に理解し,質問に答えるマルチレベルコンテキストマッチングモデルを提案する。我々は,研究目的のためにデータセットとモデルを公開し,ビデオストーリー理解研究の新しい視点を提供することを期待している。 Despite recent progress on computer vision and natural language processing, developing a machine that can understand video story is still hard to achieve due to the intrinsic difficulty of video story. Moreover, researches on how to evaluate the degree of video understanding based on human cognitive process have not progressed as yet. In this paper, we propose a novel video question answering (Video QA) task, DramaQA, for a comprehensive understanding of the video story. The DramaQA focuses on two perspectives: 1) Hierarchical QAs as an evaluation metric based on the cognitive developmental stages of human intelligence. 2) Character-centered video annotations to model local coherence of the story. Our dataset is built upon the TV drama "Another Miss Oh" and it contains 17,983 QA pairs from 23,928 various length video clips, with each QA pair belonging to one of four difficulty levels. We provide 217,308 annotated images with rich character-centered annotations, including visual bounding boxes, behaviors and emotions of main characters, and coreference resolved scripts. Additionally, we suggest Multi-level Context Matching model which hierarchically understands character-centered representations of video to answer questions. We release our dataset and model publicly for research purposes, and we expect our work to provide a new perspective on video story understanding research.	翻訳日:2022-12-05 22:22:25 公開日:2020-12-17
# 多視点協調ネットワーク埋め込み Multi-View Collaborative Network Embedding ( http://arxiv.org/abs/2005.08189v2 ) ライセンス: Link先を確認	Sezin Kircali Ata, Yuan Fang, Min Wu, Jiaqi Shi, Chee Keong Kwoh and Xiaoli Li	(参考訳) 現実世界のネットワークは複数のビューを持つことが多く、各ビューは共通のノード間の1つのタイプの相互作用を記述する。例えば、ビデオ共有ネットワークでは、2つのユーザーノードが1つのビューに共通のお気に入りのビデオがある場合リンクされるが、共通のサブスクライバーを共有する場合は別のビューでリンクすることもできる。従来のシングルビューネットワークとは異なり、複数のビューは互いに補完するために異なるセマンティクスを維持する。本稿では,低次元表現を学習するためのマルチビューネットワーク埋め込み手法MANEを提案する。多様性はビューを個々のセマンティクスを維持するのを可能にし、コラボレーションはビューを協調させる。また,これまで検討されていない新たな2次コラボレーション形態を発見し,優れたノード表現を実現するためのフレームワークに統合した。さらに,各ビューが異なるノードを持つ場合が多いので, mane+ というノード毎のビュー重要度をモデル化するために mane の注意に基づく拡張を提案する。最後に,実世界の3つのパブリックマルチビューネットワーク上で総合的な実験を行い,本モデルが最先端のアプローチを一貫して上回っていることを示す。 Real-world networks often exist with multiple views, where each view describes one type of interaction among a common set of nodes. For example, on a video-sharing network, while two user nodes are linked if they have common favorite videos in one view, they can also be linked in another view if they share common subscribers. Unlike traditional single-view networks, multiple views maintain different semantics to complement each other. In this paper, we propose MANE, a multi-view network embedding approach to learn low-dimensional representations. Similar to existing studies, MANE hinges on diversity and collaboration - while diversity enables views to maintain their individual semantics, collaboration enables views to work together. However, we also discover a novel form of second-order collaboration that has not been explored previously, and further unify it into our framework to attain superior node representations. Furthermore, as each view often has varying importance w.r.t. different nodes, we propose MANE+, an attention-based extension of MANE to model node-wise view importance. Finally, we conduct comprehensive experiments on three public, real-world multi-view networks, and the results demonstrate that our models consistently outperform state-of-the-art approaches.	翻訳日:2022-12-02 05:18:12 公開日:2020-12-17
# 会話型質問応答のためのfluent response生成 Fluent Response Generation for Conversational Question Answering ( http://arxiv.org/abs/2005.10464v2 ) ライセンス: Link先を確認	Ashutosh Baheti, Alan Ritter, Kevin Small	(参考訳) 質問応答(QA)はオープンドメイン会話エージェントの重要な側面であり、会話QA(ConvQA)サブタスクにおける特定の研究の焦点を定めている。最近のConvQAの取り組みの特筆すべき制限は、応答がターゲットコーパスから抽出されることであり、高品質な会話エージェントの自然言語生成(NLG)の側面を無視していることである。そこで本研究では,seq2seq nlg法を用いて,正確性を維持しつつ流麗な文法的応答を生成する手法を提案する。技術的な観点からは、エンドツーエンドシステムのトレーニングデータを生成するためにデータ拡張を使用します。具体的には,Syntactic Transformations(STs)を開発し,質問固有候補応答を生成し,BERTに基づく分類器(Devlin et al., 2019)を用いてランク付けする。 SQuAD 2.0データに対する人間による評価(Rajpurkar et al., 2018)は、提案モデルが会話応答の生成においてベースラインのCoQAおよびQuACモデルより優れていることを示す。さらに、CoQAデータセット上でテストを実行することで、モデルのスケーラビリティを示す。コードとデータはhttps://github.com/abaheti95/QADialogSystemで入手できる。 Question answering (QA) is an important aspect of open-domain conversational agents, garnering specific research focus in the conversational QA (ConvQA) subtask. One notable limitation of recent ConvQA efforts is the response being answer span extraction from the target corpus, thus ignoring the natural language generation (NLG) aspect of high-quality conversational agents. In this work, we propose a method for situating QA responses within a SEQ2SEQ NLG approach to generate fluent grammatical answer responses while maintaining correctness. From a technical perspective, we use data augmentation to generate training data for an end-to-end system. Specifically, we develop Syntactic Transformations (STs) to produce question-specific candidate answer responses and rank them using a BERT-based classifier (Devlin et al., 2019). Human evaluation on SQuAD 2.0 data (Rajpurkar et al., 2018) demonstrate that the proposed model outperforms baseline CoQA and QuAC models in generating conversational responses. We further show our model's scalability by conducting tests on the CoQA dataset. The code and data are available at https://github.com/abaheti95/QADialogSystem.	翻訳日:2022-11-30 23:30:13 公開日:2020-12-17
# BERTを用いた名前付きエンティティ認識のためのクロスセンスコンテキストの探索 Exploring Cross-sentence Contexts for Named Entity Recognition with BERT ( http://arxiv.org/abs/2006.01563v2 ) ライセンス: Link先を確認	Jouni Luoma, Sampo Pyysalo	(参考訳) 名前付きエンティティ認識(NER)はしばしば、各入力が1文のテキストからなるシーケンス分類タスクとして扱われる。にもかかわらず、タスクの有用な情報が単一文コンテキストの範囲外にあることがしばしばあることは明らかである。最近提案されたBERTのような自己認識モデルは、入力中の長距離関係を効率的にキャプチャし、複数の文からなる入力を表現し、自然言語処理タスクにクロスセンス情報を組み込んだアプローチのための新しいオポチュナイトを作成することができる。本稿では, BERT モデルを用いた NER におけるクロス文情報の利用を5言語で検討する。 BERT入力に追加文の形でコンテキストを追加することで、テスト対象言語やモデル上でのNER性能が体系的に向上することがわかった。各入力に複数の文を含めることで、異なる文脈で同じ文の予測を研究することもできる。そこで本稿では,文の様々な予測を組み合わせ,さらにNER性能を向上させるための簡単な手法であるCMV(Contextual Majority Voting)を提案する。我々のアプローチでは、トレーニングや予測のために再構成例に頼るのではなく、基盤となるBERTアーキテクチャを変更する必要はない。 CoNLL'02とCoNLL'03 NERベンチマークを含む確立されたデータセットの評価は、我々の提案した手法が、英語、オランダ語、フィンランド語における最先端のNER結果を改善し、ドイツで報告されたBERTベースの最良の結果が得られることを示す。この作業で実装されたすべてのメソッドをオープンライセンスでリリースします。 Named entity recognition (NER) is frequently addressed as a sequence classification task where each input consists of one sentence of text. It is nevertheless clear that useful information for the task can often be found outside of the scope of a single-sentence context. Recently proposed self-attention models such as BERT can both efficiently capture long-distance relationships in input as well as represent inputs consisting of several sentences, creating new opportunitites for approaches that incorporate cross-sentence information in natural language processing tasks. In this paper, we present a systematic study exploring the use of cross-sentence information for NER using BERT models in five languages. We find that adding context in the form of additional sentences to BERT input systematically increases NER performance on all of the tested languages and models. Including multiple sentences in each input also allows us to study the predictions of the same sentences in different contexts. We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT. Our approach does not require any changes to the underlying BERT architecture, rather relying on restructuring examples for training and prediction. Evaluation on established datasets, including the CoNLL'02 and CoNLL'03 NER benchmarks, demonstrates that our proposed approach can improve on the state-of-the-art NER results on English, Dutch, and Finnish, achieves the best reported BERT-based results on German, and is on par with performance reported with other BERT-based approaches in Spanish. We release all methods implemented in this work under open licenses.	翻訳日:2022-11-26 00:31:39 公開日:2020-12-17
# ニューラルパワーユニット Neural Power Units ( http://arxiv.org/abs/2006.01681v4 ) ライセンス: Link先を確認	Niklas Heim, Tom\'a\v{s} Pevn\'y, V\'aclav \v{S}m\'idl	(参考訳) 従来のニューラルネットワークは、単純な算術演算を近似できるが、訓練中に見られた数の範囲を超えて一般化できない。ニューラル演算ユニットは、この困難を克服することを目指しているが、現在の演算ユニットは正の数で操作できるか、算術演算のサブセットしか表現できない。実数の全領域で動作するニューラルパワーユニット(NPU)を導入し,任意のパワー関数を単一層で学習することができる。したがって、NPUは既存の算術単位の欠点を修正し、その表現性を拡張する。ネットワークの複雑な数への変換を必要とせず、複雑な算術を用いてこれを実現する。 RealNPUへのユニットの単純化は、非常に透明なモデルをもたらす。我々は,NPUが人工算術データセットの精度と空間性において競合相手より優れており,RealNPUはデータからのみ動的システムの制御方程式を発見できることを示した。 Conventional Neural Networks can approximate simple arithmetic operations, but fail to generalize beyond the range of numbers that were seen during training. Neural Arithmetic Units aim to overcome this difficulty, but current arithmetic units are either limited to operate on positive numbers or can only represent a subset of arithmetic operations. We introduce the Neural Power Unit (NPU) that operates on the full domain of real numbers and is capable of learning arbitrary power functions in a single layer. The NPU thus fixes the shortcomings of existing arithmetic units and extends their expressivity. We achieve this by using complex arithmetic without requiring a conversion of the network to complex numbers. A simplification of the unit to the RealNPU yields a highly transparent model. We show that the NPUs outperform their competitors in terms of accuracy and sparsity on artificial arithmetic datasets, and that the RealNPU can discover the governing equations of a dynamical system only from data.	翻訳日:2022-11-25 23:09:56 公開日:2020-12-17
# 不確実性モデリングによる弱教師付き時間行動定位 Weakly-supervised Temporal Action Localization by Uncertainty Modeling ( http://arxiv.org/abs/2006.07006v3 ) ライセンス: Link先を確認	Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun	(参考訳) 弱教師付き時間的行動局所化は,ビデオレベルラベルのみを用いて時間的行動区間を検出することを目的としている。この目的のために、アクションクラスのフレームをバックグラウンドフレーム(つまり、どのアクションクラスにも属さないフレーム)から分離することが不可欠である。本稿では,背景フレームの非一貫性に関する分散サンプルとしてモデル化された背景フレームについて,新しい視点を提案する。フレームレベルのラベルを使わずに直接不確実性を学習することは不可能であり,不確実性として知られる各フレームの分布外確率を推定することにより,背景フレームを検出することができる。弱教師付き設定における不確実性学習を実現するために,複数のインスタンス学習定式化を利用する。さらに,すべてのアクションクラスに一様に分布する分布内(動作)確率を奨励することにより,背景フレームの識別性を向上する背景エントロピー損失も導入する。実験の結果,不確実性モデリングは背景フレームの干渉を軽減する効果があり,ベルやホイッスルを使わずに大きな性能向上をもたらすことがわかった。我々は,ベンチマークのTHUMOS'14とActivityNet(1.2と1.3)において,我々のモデルが最先端の手法を大幅に上回ることを示す。私たちのコードはhttps://github.com/pilhyeon/wtal-uncertainty-modelingで利用可能です。 Weakly-supervised temporal action localization aims to learn detecting temporal intervals of action classes with only video-level labels. To this end, it is crucial to separate frames of action classes from the background frames (i.e., frames not belonging to any action classes). In this paper, we present a new perspective on background frames where they are modeled as out-of-distribution samples regarding their inconsistency. Then, background frames can be detected by estimating the probability of each frame being out-of-distribution, known as uncertainty, but it is infeasible to directly learn uncertainty without frame-level labels. To realize the uncertainty learning in the weakly-supervised setting, we leverage the multiple instance learning formulation. Moreover, we further introduce a background entropy loss to better discriminate background frames by encouraging their in-distribution (action) probabilities to be uniformly distributed over all action classes. Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles. We demonstrate that our model significantly outperforms state-of-the-art methods on the benchmarks, THUMOS'14 and ActivityNet (1.2 & 1.3). Our code is available at https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling.	翻訳日:2022-11-22 03:17:14 公開日:2020-12-17
# UWSpeech: 無書き言語のための音声から音声への翻訳 UWSpeech: Speech to Speech Translation for Unwritten Languages ( http://arxiv.org/abs/2006.07926v2 ) ライセンス: Link先を確認	Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Kejun Zhang, Tie-Yan Liu	(参考訳) 既存の音声から音声への翻訳システムは、ターゲット言語のテキストに大きく依存している:通常、ソース言語を対象のテキストに翻訳し、テキストからターゲットの音声を合成するか、または補助訓練のためにターゲットのテキストでターゲットの音声に直接翻訳する。しかし、これらの方法は、テキストや音素が書けない、未記述のターゲット言語には適用できない。本稿では,UWSpeechと名づけられた非記述言語のための翻訳システムを開発する。これは,対象の非記述音声をコンバータで個別のトークンに変換し,次に翻訳器で対象の個別のトークンに翻訳し,最終的にターゲットの個別のトークンからインバータでターゲットの音声を合成する。本稿では,ベクトル量子化変分オートエンコーダ(VQ-VAE)と言語間音声認識(XL)を併用したXL-VAEという手法を提案する。スペイン語と英語の会話翻訳データセットの実験では、UWSpeechは直接翻訳とVQ-VAEベースラインをそれぞれ16と10のBLEUポイントで上回り、UWSpeechの利点と可能性を示している。 Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training. However, those methods cannot be applied to unwritten target languages, which have no written text or phoneme available. In this paper, we develop a translation system for unwritten languages, named as UWSpeech, which converts target unwritten speech into discrete tokens with a converter, and then translates source-language speech into target discrete tokens with a translator, and finally synthesizes target speech from target discrete tokens with an inverter. We propose a method called XL-VAE, which enhances vector quantized variational autoencoder (VQ-VAE) with cross-lingual (XL) speech recognition, to train the converter and inverter of UWSpeech jointly. Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech.	翻訳日:2022-11-21 13:23:22 公開日:2020-12-17
# 導入コストを伴わない転校学習回帰の一般授業 A General Class of Transfer Learning Regression without Implementation Cost ( http://arxiv.org/abs/2006.13228v2 ) ライセンス: Link先を確認	Shunya Minami, Song Liu, Stephen Wu, Kenji Fukumizu, Ryo Yoshida	(参考訳) 本稿では,既存の回帰学習手法を統一し,拡張する新しいフレームワークを提案する。対象タスクにおける事前学習されたソースモデルをモデルにブリッジするために,事前分布を持つベイズフレームワークによって推定される密度比関数を導入する。 2つの内在的ハイパーパラメータを変更して密度比モデルを選択することにより、クロスドメイン類似性正規化に基づくTL、密度比推定を用いた確率的TL、事前訓練されたニューラルネットワークの微調整という3つの一般的なTLの方法を統合することができる。さらに,本手法は,既存の出力変数が単純に新しい出力変数に変換されるような教師付き学習のためのオフ・ザ・シェルフ・ライブラリを用いて,回帰モデルを十分に訓練することができる。様々な実データアプリケーションを用いて,その単純さ,汎用性,適用性を示す。 We propose a novel framework that unifies and extends existing methods of transfer learning (TL) for regression. To bridge a pretrained source model to the model on a target task, we introduce a density-ratio reweighting function, which is estimated through the Bayesian framework with a specific prior distribution. By changing two intrinsic hyperparameters and the choice of the density-ratio model, the proposed method can integrate three popular methods of TL: TL based on cross-domain similarity regularization, a probabilistic TL using the density-ratio estimation, and fine-tuning of pretrained neural networks. Moreover, the proposed method can benefit from its simple implementation without any additional cost; the regression model can be fully trained using off-the-shelf libraries for supervised learning in which the original output variable is simply transformed to a new output variable. We demonstrate its simplicity, generality, and applicability using various real data applications.	翻訳日:2022-11-17 22:16:30 公開日:2020-12-17
# 重み付き多数投票のための2次PAC-Bayesian境界 Second Order PAC-Bayesian Bounds for the Weighted Majority Vote ( http://arxiv.org/abs/2007.13532v2 ) ライセンス: Link先を確認	Andr\'es R. Masegosa and Stephan S. Lorenzen and Christian Igel and Yevgeny Seldin	(参考訳) マルチクラス分類における重み付け多数決の予測リスクについて, 新たな分析を行った。この分析は、アンサンブルメンバーによる予測の相関を考慮に入れ、効率的な最小化に適応可能な境界を提供し、多数決の重み付けを改善する。さらにバウンド・フォー・バイナリ分類(bound for binary classification)の特別なバージョンも提供しています。実験では,無作為林における樹木の重み付けを改善するためにバウンドを適用し,一般に使用される1次バウンドとは対照的に,新しいバウンドの最小化は通常,アンサンブルの試験誤差が低下しないことを示した。 We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble.	翻訳日:2022-11-14 22:27:55 公開日:2020-12-17
# 禁止か禁止か - ベイジアンアテンションネットワークによる、信頼できるヘイトスピーチ検出 To BAN or not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection ( http://arxiv.org/abs/2007.05304v7 ) ライセンス: Link先を確認	Kristian Miok, Blaz Skrlj, Daniela Zaharie and Marko Robnik-Sikonja	(参考訳) ヘイトスピーチはユーザ生成コンテンツの管理において重要な問題である。悪質なコンテンツを削除するか、悪質なユーザーを禁止するには、コンテンツモデレーターは信頼できるヘイトスピーチ検知器が必要である。近年、(多言語)bertモデルのようなトランスフォーマーアーキテクチャに基づくディープニューラルネットワークは、ヘイトスピーチ検出を含む多くの自然言語分類タスクにおいて優れた性能を達成している。これまでのところ、これらの手法は信頼性の観点からアウトプットを定量化できなかった。本研究では,モンテカルロドロップアウトをトランスフォーマーモデルの注意層内に配置し,信頼性評価を行うベイズ法を提案する。いくつかの言語におけるヘイトスピーチ検出問題に対する提案手法の結果を評価し,可視化する。さらに,ヘイトスピーチ分類において,BERTモデルによって抽出された情報を感情次元で拡張できるかどうかを検証した。実験の結果,モンテカルロドロップアウトはトランスフォーマネットワークの信頼性評価に有効なメカニズムであることがわかった。 BERTモデルでの使用により、最先端の分類性能が向上し、信頼性の低い予測を検出できる。また,センティック・コンピューティング手法を用いて抽出した感情次元は,ヘイトスピーチに関わる感情の解釈に対する洞察を与えることができた。提案手法は,最先端の多言語BERTモデルの分類性能を向上するだけでなく,計算された信頼性スコアも,検査や再注釈キャンペーンにおける作業負荷を大幅に削減する。提供された視覚化は、境界線の結果を理解するのに役立つ。 Hate speech is an important problem in the management of user-generated content. To remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on the transformer architecture, such as the (multilingual) BERT model, achieve superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate and visualize the results of the proposed approach on hate speech detection problems in several languages. Additionally, we test if affective dimensions can enhance the information extracted by the BERT model in hate speech classification. Our experiments show that Monte Carlo dropout provides a viable mechanism for reliability estimation in transformer networks. Used within the BERT model, it ofers state-of-the-art classification performance and can detect less trusted predictions. Also, it was observed that affective dimensions extracted using sentic computing methods can provide insights toward interpretation of emotions involved in hate speech. Our approach not only improves the classification performance of the state-of-the-art multilingual BERT model but the computed reliability scores also significantly reduce the workload in an inspection of ofending cases and reannotation campaigns. The provided visualization helps to understand the borderline outcomes.	翻訳日:2022-11-11 22:26:06 公開日:2020-12-17
# 逆蒸留による深部コンテクスト臨床予測 Deep Contextual Clinical Prediction with Reverse Distillation ( http://arxiv.org/abs/2007.05611v2 ) ライセンス: Link先を確認	Rohan S. Kodialam, Rebecca Boiarsky, Justin Lim, Neil Dixit, Aditya Sai, David Sontag	(参考訳) 医療プロバイダーは、機械学習を使って患者の結果を予測し、意味のある介入をしている。しかしながら、この分野のイノベーションにもかかわらず、浅い線形モデルのパフォーマンスと一致することに苦慮するディープラーニングモデルが多く、そのようなテクニックを実際に活用することは困難である。本研究は,保険請求項から臨床予測の課題を動機とし,初期化のための高パフォーマンス線形モデルを用いて深層モデルを事前学習する逆蒸留と呼ばれる新しい手法を提案する。我々は, 保険請求データセットの縦断構造を用いて, 逆蒸留による自己注意を発達させ, 文脈埋め込み, 時間埋め込み, 自己照査機構を組み合わせたアーキテクチャであり, 逆蒸留によってもっとも重要な訓練を行う。 SARDは、複数の臨床予測結果に関する最先端の手法よりも優れており、逆蒸留がこれらの改善の原動力であることをアブレーション研究が明らかにしている。コードはhttps://github.com/clinicalml/omop-learnで入手できる。 Healthcare providers are increasingly using machine learning to predict patient outcomes to make meaningful interventions. However, despite innovations in this area, deep learning models often struggle to match performance of shallow linear models in predicting these outcomes, making it difficult to leverage such techniques in practice. In this work, motivated by the task of clinical prediction from insurance claims, we present a new technique called Reverse Distillation which pretrains deep models by using high-performing linear models for initialization. We make use of the longitudinal structure of insurance claims datasets to develop Self Attention with Reverse Distillation, or SARD, an architecture that utilizes a combination of contextual embedding, temporal embedding and self-attention mechanisms and most critically is trained via reverse distillation. SARD outperforms state-of-the-art methods on multiple clinical prediction outcomes, with ablation studies revealing that reverse distillation is a primary driver of these improvements. Code is available at https://github.com/clinicalml/omop-learn.	翻訳日:2022-11-11 20:58:49 公開日:2020-12-17
# ネットワークを用いた疾患遺伝子予測の最近の進歩 Recent Advances in Network-based Methods for Disease Gene Prediction ( http://arxiv.org/abs/2007.10848v2 ) ライセンス: Link先を確認	Sezin Kircali Ata, Min Wu, Yuan Fang, Le Ou-Yang, Chee Keong Kwoh and Xiao-Li Li	(参考訳) ゲノムワイド・アソシエーション研究(GWAS)による疾患遺伝子関連研究は、研究者にとって困難な課題である。特定の疾患と相関する単一ヌクレオチド多型(SNP)を調べるには、関連性の統計的解析が必要である。突然変異の可能性が大きいことを考えると、高いコストに加えて、GWAS分析のもう一つの重要な欠点は偽陽性の数が多すぎることである。そこで研究者たちは、さまざまな情報源で結果をクロスチェックする証拠を探す。代替の低コストの疾患遺伝子関連証拠を研究者に提供するため、計算アプローチが実施される。分子ネットワークは病気の分子間の複雑な相互作用を捉えることができるため、疾患遺伝子関連予測において最も広く用いられるデータの一つとなる。本調査では,ネットワークを用いた疾患遺伝子予測手法の総合的かつ最新のレビューを行う。また,14種類の最先端手法の実証分析を行った。まず,疾患遺伝子予測のタスク定義を明らかにする。次に,ネットワーク拡散法,手作りのグラフ特徴を持つ従来の機械学習法,グラフ表現学習法について検討した。第3に,7つの疾患にまたがって選択された方法の性能を評価する実験分析を行った。また,本研究の実証分析に基づいて,提案手法の判別を行った。最後に,今後の疾患遺伝子予測研究の方向性を明らかにする。 Disease-gene association through Genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms (SNPs) that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false-positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and an up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.	翻訳日:2022-11-09 00:49:31 公開日:2020-12-17
# 建物セグメンテーションマスクの機械学習規則化とポリゴン化 Machine-learned Regularization and Polygonization of Building Segmentation Masks ( http://arxiv.org/abs/2007.12587v3 ) ライセンス: Link先を確認	Stefano Zorzi, Ksenia Bittner, Friedrich Fraundorfer	(参考訳) 本稿では,建物セグメンテーションマスクの自動正則化と多角化のための機械学習手法を提案する。まずイメージを入力として,汎用的完全畳み込みネットワーク(FCN)を利用したセグメンテーションマップを構築する。次に、生成逆ネットワーク(gan)は、より現実的になるように境界を定式化すること、つまり、必要であれば直角を構成するより直線的なアウトラインを持つことに関係している。これは、入力画像が真である確率を与える判別器と、識別器の応答から学習してより現実的な画像を生成するジェネレータとの相互作用によって達成される。最後に,正規化された建物セグメント化結果から,建物コーナーに対応するばらばらな結果を予測するために適応したbackbone convolutional neural network(cnn)をトレーニングする。 3つの建物セグメンテーションデータセットを用いた実験により,提案手法は正確な結果を得るだけでなく,ポリゴンとしてパラメータ化された視覚的に快適な建物概要を生成することができることを示した。 We propose a machine learning based approach for automatic regularization and polygonization of building segmentation masks. Taking an image as input, we first predict building segmentation maps exploiting generic fully convolutional network (FCN). A generative adversarial network (GAN) is then involved to perform a regularization of building boundaries to make them more realistic, i.e., having more rectilinear outlines which construct right angles if required. This is achieved through the interplay between the discriminator which gives a probability of input image being true and generator that learns from discriminator's response to create more realistic images. Finally, we train the backbone convolutional neural network (CNN) which is adapted to predict sparse outcomes corresponding to building corners out of regularized building segmentation results. Experiments on three building segmentation datasets demonstrate that the proposed method is not only capable of obtaining accurate results, but also of producing visually pleasing building outlines parameterized as polygons.	翻訳日:2022-11-07 06:58:14 公開日:2020-12-17
# プロセス制御と最適化のための確率制約ポリシー最適化 Chance Constrained Policy Optimization for Process Control and Optimization ( http://arxiv.org/abs/2008.00030v2 ) ライセンス: Link先を確認	Panagiotis Petsagkourakis, Ilya Orson Sandoval, Eric Bradford, Federico Galvanin, Dongda Zhang and Ehecatl Antonio del Rio-Chanona	(参考訳) 化学プロセスの最適化と制御は影響を受けます 1) 植物モデルミスマッチ 2)プロセス障害、及び 3)安全運転の制約。政策最適化による強化学習は、確率性、プラントモデルミスマッチに対処する能力、そして将来の不確実性とそのフィードバックを適切な閉ループ方式で直接的に考慮する能力により、この問題を解決する自然な方法である。強化学習が産業プロセス(あるいはほとんどすべてのエンジニアリングアプリケーション)で考慮されていない主な理由の1つは、安全クリティカルな制約に対処するためのフレームワークがないことである。政策最適化の現在のアルゴリズムは、困難なペナルティパラメータを使用し、状態制約を確実に満たさないか、あるいは期待された場合にのみ保証を提示する。本稿では,安全上の重要な課題に欠かせない連関制約の満足度を高い確率で保証する確率制約付きポリシー最適化(CCPO)アルゴリズムを提案する。これは、フィードバックポリシーと同時に計算される制約引き締め(バックオフ)の導入によって達成される。バックオフは確率的制約の経験的累積分布関数を用いてベイズ最適化で調整され、したがって自己調整される。これにより、現在のポリシー最適化アルゴリズムに組み込むことができる一般的な方法論が実現され、高い確率で共同確率制約を満たすことができる。本稿では,提案手法の性能分析を行うケーススタディを提案する。 Chemical process optimization and control are affected by 1) plant-model mismatch, 2) process disturbances, and 3) constraints for safe operation. Reinforcement learning by policy optimization would be a natural way to solve this due to its ability to address stochasticity, plant-model mismatch, and directly account for the effect of future uncertainty and its feedback in a proper closed-loop manner; all without the need of an inner optimization loop. One of the main reasons why reinforcement learning has not been considered for industrial processes (or almost any engineering application) is that it lacks a framework to deal with safety critical constraints. Present algorithms for policy optimization use difficult-to-tune penalty parameters, fail to reliably satisfy state constraints or present guarantees only in expectation. We propose a chance constrained policy optimization (CCPO) algorithm which guarantees the satisfaction of joint chance constraints with a high probability - which is crucial for safety critical tasks. This is achieved by the introduction of constraint tightening (backoffs), which are computed simultaneously with the feedback policy. Backoffs are adjusted with Bayesian optimization using the empirical cumulative distribution function of the probabilistic constraints, and are therefore self-tuned. This results in a general methodology that can be imbued into present policy optimization algorithms to enable them to satisfy joint chance constraints with high probability. We present case studies that analyze the performance of the proposed approach.	翻訳日:2022-11-05 15:06:58 公開日:2020-12-17
# 畳み込みニューラルネットワークを用いた繊維配向分布の強化 Enhancing Fiber Orientation Distributions using convolutional Neural Networks ( http://arxiv.org/abs/2008.05409v2 ) ライセンス: Link先を確認	Oeslle Lucena, Sjoerd B. Vos, Vejay Vakharia, John Duncan, Keyoumars Ashkan, Rachel Sparks, Sebastien Ourselin	(参考訳) 拡散磁気共鳴イメージング(dMRI)に基づく高精度な局所繊維配向分布(FOD)モデリングは、多数の勾配方向(b-vecs)、最大b値(b-vals)、複数b値(multi-shells)をサンプリングする特定の取得プロトコルの恩恵を受けることができる。しかし、取得時間は臨床的に制限されており、商用スキャナはそのようなdmriシーケンスを提供しない。したがって、dMRIはしばしば単殻(単一のb値)として取得される。本研究では,商用MRIにおけるFODの改良について述べる。単一殻表現から複数殻のFOD表現を復元する能力に対して,パッチベースの3次元畳み込みニューラルネットワーク(CNN)を評価し,その表現は制約付き球状デコンボリューション(CSD)から得られた球面調和からFODをモデル化する。 u-net と highresnet の 3d cnn アーキテクチャを human connectome プロジェクトと社内データセットのデータで評価する。我々は各CNNモデルがいかに局所繊維配向を解消できるかを評価する。 1)同じdMRI取得プロトコルでデータセットのトレーニング及びテストを行う場合。 2) CNNモデルのトレーニングに使用するものとは異なるdMRI取得プロトコルでデータセットをテストする場合。 3) cnnモデルのトレーニングに使用するよりも、勾配方向の数が少ないデータセットでテストする場合。本手法は, 単殻dMRI取得プロトコルにおいて, 傾きの少ないCDDモデル推定が可能であり, 取得時間を短縮し, FOD推定の改善を時間限定臨床環境に翻訳しやすくする。 Accurate local fiber orientation distribution (FOD) modeling based on diffusion magnetic resonance imaging (dMRI) capable of resolving complex fiber configurations benefits from specific acquisition protocols that sample a high number of gradient directions (b-vecs), a high maximum b-value(b-vals), and multiple b-values (multi-shell). However, acquisition time is limited in a clinical setting and commercial scanners may not provide such dMRI sequences. Therefore, dMRI is often acquired as single-shell (single b-value). In this work, we learn improved FODs for commercially acquired MRI. We evaluate patch-based 3D convolutional neural networks (CNNs)on their ability to regress multi-shell FOD representations from single-shell representations, where the representation is a spherical harmonics obtained from constrained spherical deconvolution (CSD) to model FODs. We evaluate U-Net and HighResNet 3D CNN architectures on data from the Human Connectome Project and an in-house dataset. We evaluate how well each CNN model can resolve local fiber orientation 1) when training and testing on datasets with the same dMRI acquisition protocol; 2) when testing on a dataset with a different dMRI acquisition protocol than used to train the CNN models; and 3) when testing on a dataset with a fewer number of gradient directions than used to train the CNN models. Our approach may enable robust CSD model estimation on single-shell dMRI acquisition protocols with few gradient directions, reducing acquisition times, facilitating translation of improved FOD estimation to time-limited clinical environments.	翻訳日:2022-10-31 06:07:23 公開日:2020-12-17
# 専門家合意を金基準とした画像のバイオファウリング評価の自動化 Automating the assessment of biofouling in images using expert agreement as a gold standard ( http://arxiv.org/abs/2008.09289v2 ) ライセンス: Link先を確認	Nathaniel J. Bloomfield and Susan Wei and Bartholomew Woodham and Peter Wilkinson and Andrew Robinson	(参考訳) バイオファウリング(英: Biofouling)は、水に浸漬された表面上の生物の蓄積である。燃料コストを増加させ、非種族海洋種が新しい地域に定着するための経路を提供することによって、生物セキュリティのリスクをもたらすため、国際海運産業にとって特に懸念されている。生物汚染リスク管理規制を強化するための司法管轄区域内での関心が高まりつつあるが、船体の生物汚染状況を決定するために水中検査を行い、収集したデータを評価することは高価である。機械学習は後者の課題に取り組むのに適しており、深層学習を用いて水中検査からの画像の分類を自動化し、ファウリングの存在と重大さを特定する。水中調査から得られた1万枚以上の画像は,複数のデータセットを組み合わせて収集した。これらの画像の120サンプルのサブセットについて,3人の専門家によるアノテーションを比較し,89%の同意(95%CI:87-92%)を示した。これらの専門家の1人によるデータセット全体のラベル付けは、この専門家グループと同じようなレベルの合意を達成し、我々は、パフォーマンスが5%悪くなる(p=0.009-0.054)と定義した。これらの専門家ラベルを用いて,専門家グループと類似した深層学習モデル(p=0.001-0.014)を訓練し,画像中のバイオファウリングの自動解析が実現可能で有効であることを実証した。 Biofouling is the accumulation of organisms on surfaces immersed in water. It is of particular concern to the international shipping industry because it increases fuel costs and presents a biosecurity risk by providing a pathway for non-indigenous marine species to establish in new areas. There is growing interest within jurisdictions to strengthen biofouling risk-management regulations, but it is expensive to conduct in-water inspections and assess the collected data to determine the biofouling state of vessel hulls. Machine learning is well suited to tackle the latter challenge, and here we apply deep learning to automate the classification of images from in-water inspections to identify the presence and severity of fouling. We combined several datasets to obtain over 10,000 images collected from in-water surveys which were annotated by a group biofouling experts. We compared the annotations from three experts on a 120-sample subset of these images, and found that they showed 89% agreement (95% CI: 87-92%). Subsequent labelling of the whole dataset by one of these experts achieved similar levels of agreement with this group of experts, which we defined as performing at most 5% worse (p=0.009-0.054). Using these expert labels, we were able to train a deep learning model that also agreed similarly with the group of experts (p=0.001-0.014), demonstrating that automated analysis of biofouling in images is feasible and effective using this method.	翻訳日:2022-10-26 20:42:38 公開日:2020-12-17
# インスタンスセグメンテーションのための確率的深層学習 Probabilistic Deep Learning for Instance Segmentation ( http://arxiv.org/abs/2008.10678v2 ) ライセンス: Link先を確認	Josef Lorenz Rumberger, Lisa Mais, Dagmar Kainmueller	(参考訳) 点推定の代わりに予測の分布を予測する確率論的畳み込みニューラルネットワークは、画像再構成からセマンティックセグメンテーションまで、コンピュータビジョンの多くの領域で近年進歩している。技術ベンチマークの結果の他に、これらのネットワークは予測における局所的な不確実性を定量化することができた。これらはアクティブな学習フレームワークで、専門家の注釈のラベリングを目標にしたり、安全クリティカルな環境で予測の質を評価するために使われた。しかし、例えば、これらの手法は今のところ頻繁には使われていない。提案手法は,提案不要なインスタンスセグメンテーションモデル内のモデル独立不確実性推定値を求める。さらに,セマンティクスセグメンテーションから適応した指標を用いて不確実性推定の品質を分析する。提案手法をBBBC010Cで評価した。 elegansデータセットは、競合パフォーマンスを生み出すと同時に、誤った分割や誤ったマージといったオブジェクトレベルの不正確性に関する情報を運ぶ不確実性推定を予測します。我々は,このような不確実性推定を指導的証明読解で活用する可能性を示すシミュレーションを行う。 Probabilistic convolutional neural networks, which predict distributions of predictions instead of point estimates, led to recent advances in many areas of computer vision, from image reconstruction to semantic segmentation. Besides state of the art benchmark results, these networks made it possible to quantify local uncertainties in the predictions. These were used in active learning frameworks to target the labeling efforts of specialist annotators or to assess the quality of a prediction in a safety-critical environment. However, for instance segmentation problems these methods are not frequently used so far. We seek to close this gap by proposing a generic method to obtain model-inherent uncertainty estimates within proposal-free instance segmentation models. Furthermore, we analyze the quality of the uncertainty estimates with a metric adapted from semantic segmentation. We evaluate our method on the BBBC010 C.\ elegans dataset, where it yields competitive performance while also predicting uncertainty estimates that carry information about object-level inaccuracies like false splits and false merges. We perform a simulation to show the potential use of such uncertainty estimates in guided proofreading.	翻訳日:2022-10-25 09:15:06 公開日:2020-12-17
# 共起方向の再検討:スパース行列のシャーパ解析と効率的なアルゴリズム Revisiting Co-Occurring Directions: Sharper Analysis and Efficient Algorithm for Sparse Matrices ( http://arxiv.org/abs/2009.02553v2 ) ライセンス: Link先を確認	Luo Luo, Cheng Chen, Guangzeng Xie, Haishan Ye	(参考訳) 近似行列乗算(AMM)のストリーミングモデルについて検討した。我々は、アルゴリズムが限られたメモリを持つデータのみを渡すことができるというシナリオに興味を持っている。 AMMをストリーミングするための最先端の決定論的スケッチアルゴリズムは共起方向(COD)であり、ランダム化アルゴリズムよりも近似誤差がはるかに小さく、他の決定論的スケッチ手法を実証的に上回る。本稿では,確率的近似低ランク構造と入力行列の相関を主項とするCODに対して,より厳密な誤差境界を提供する。改良された誤差境界に対してCODが最適であることを示す。また,理論的保証付きスパース行列に対するCODの変種も提案する。実世界のスパースデータセットに関する実験は、提案アルゴリズムがベースライン法よりも効率的であることを示している。 We study the streaming model for approximate matrix multiplication (AMM). We are interested in the scenario that the algorithm can only take one pass over the data with limited memory. The state-of-the-art deterministic sketching algorithm for streaming AMM is the co-occurring directions (COD), which has much smaller approximation errors than randomized algorithms and outperforms other deterministic sketching methods empirically. In this paper, we provide a tighter error bound for COD whose leading term considers the potential approximate low-rank structure and the correlation of input matrices. We prove COD is space optimal with respect to our improved error bound. We also propose a variant of COD for sparse matrices with theoretical guarantees. The experiments on real-world sparse datasets show that the proposed algorithm is more efficient than baseline methods.	翻訳日:2022-10-21 20:53:41 公開日:2020-12-17
# 言語モデルの批判的思考 Critical Thinking for Language Models ( http://arxiv.org/abs/2009.07185v2 ) ライセンス: Link先を確認	Gregor Betz and Christian Voigt and Kyle Richardson	(参考訳) 本稿では,ニューラル自動回帰言語モデルの批判的思考カリキュラムに向けて第一歩を踏み出す。本稿では,帰納的有効引数の合成コーパスを導入し,gpt-2の学習と評価のための人工的議論テキストを生成する。 3つの単純なコアスキームでモデルをトレーニングすることで、異なる、より複雑なタイプの引数の結論を正確に達成することができます。言語モデルは、コア引数スキームを正しい方法で一般化する。さらに,NLUベンチマークに対して一貫した有望な結果が得られる。特に、議論スキームの事前訓練は、GLUE診断のゼロショット精度を最大15ポイント向上させる。この結果は、基本的な推論能力(批判的思考教科書など)を実証するテキストの中間的事前学習が、言語モデルが幅広い推論スキルを獲得するのに役立つことを示唆している。本稿では,このような「言語モデルのための批判的思考カリキュラム」を構築する上で有望な出発点である。 This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models. We introduce a synthetic corpus of deductively valid arguments, and generate artificial argumentative texts to train and evaluate GPT-2. Significant transfer learning effects can be observed: Training a model on three simple core schemes allows it to accurately complete conclusions of different, and more complex types of arguments, too. The language models generalize the core argument schemes in a correct way. Moreover, we obtain consistent and promising results for NLU benchmarks. In particular, pre-training on the argument schemes raises zero-shot accuracy on the GLUE diagnostics by up to 15 percentage points. The findings suggest that intermediary pre-training on texts that exemplify basic reasoning abilities (such as typically covered in critical thinking textbooks) might help language models to acquire a broad range of reasoning skills. The synthetic argumentative texts presented in this paper are a promising starting point for building such a "critical thinking curriculum for language models."	翻訳日:2022-10-18 05:20:45 公開日:2020-12-17
# 潜在的公平決定を伴う確率的モデリングによる集団公平性 Group Fairness by Probabilistic Modeling with Latent Fair Decisions ( http://arxiv.org/abs/2009.09031v2 ) ライセンス: Link先を確認	YooJung Choi, Meihua Dang, Guy Van den Broeck	(参考訳) 機械学習システムは、ローン申請や刑事司法リスクアセスメントなどの影響のある決定にますます使われており、これらのシステムの公正性を保証することが重要である。データ内のラベルが偏っているため、これはしばしば困難である。本稿では,隠蔽ラベルを表す潜在変数を明示的にモデル化し,バイアスデータから確率分布を学習する。特に,学習モデルに一定の不依存性を課すことにより,人口統計学の同等性の実現を目指す。また、これらの保証を提供するために使用される分布が実際に実世界のデータをキャプチャしている場合にのみ、グループフェアネス保証が有意義であることを示す。データ分布を密にモデル化するために,表現的かつトラクタブルな確率モデルである確率回路を用い,不完全データから学習するアルゴリズムを提案する。観測されたラベルが公正なラベルに由来するが、バイアスが増す合成データセットに対するアプローチを評価し、公正なラベルが正常に検索されることを示す。さらに,実世界のデータセットでは,既存のデータ生成方法よりも優れたモデルであるだけでなく,競合精度も達成できることを示す。 Machine learning systems are increasingly being used to make impactful decisions such as loan applications and criminal justice risk assessments, and as such, ensuring fairness of these systems is critical. This is often challenging as the labels in the data are biased. This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label. In particular, we aim to achieve demographic parity by enforcing certain independencies in the learned model. We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data. In order to closely model the data distribution, we employ probabilistic circuits, an expressive and tractable probabilistic model, and propose an algorithm to learn them from incomplete data. We evaluate our approach on a synthetic dataset in which observed labels indeed come from fair labels but with added bias, and demonstrate that the fair labels are successfully retrieved. Moreover, we show on real-world datasets that our approach not only is a better model than existing methods of how the data was generated but also achieves competitive accuracy.	翻訳日:2022-10-17 02:05:24 公開日:2020-12-17
# 事前コミットメントによるペアワイズおよびマルチプレイヤーインタラクションにおけるコーディネーションの進化 Evolution of Coordination in Pairwise and Multi-player Interactions via Prior Commitments ( http://arxiv.org/abs/2009.11727v2 ) ライセンス: Link先を確認	Ogbo Ndidi Bianca, Aiman Elgarig, The Anh Han	(参考訳) 集団的な努力を始めるとき、パートナーの好みと彼らが共通の目標にどれだけ強くコミットするかを理解することが重要です。後方利益の観点で事前の約束や合意を確立することは、協力を確保するための重要なメカニズムを提供する。本稿では,進化ゲーム理論(egt)の手法を参考にし,その成果が対数と多元的相互作用の両方において非対称な報酬構造を示す場合の協調性を高めるツールとして,事前コミットメントをどのように採用するかを分析する。協調問題にはいくつかの望ましい集団的成果があるかもしれない(協調的ジレンマにおいて唯一望ましい集団的成果である相互協力と比べれば)。分析および数値シミュレーションにより, 先行コミットメントが協調の強化に有効な進化のメカニズムであるか否か, 社会福祉全体は競争の集団的利益と重大さに強く依存し, さらには非対称的利益がコミットメント契約でどのように解決されるかが示唆された。さらに、マルチパーティインタラクションでは、最適な調整のために高いレベルのグループ多様性が必要な場合、事前のコミットメントが不可欠であることが証明される。結果は異なる選択強度に対して堅牢である。全体として,自律エージェント間の協調性を確保するための自己組織化・分散マルチエージェントシステムの設計だけでなく,人間のコミットメント能力による行動進化の複雑さと美しさに関する新たな知見を提供する。 Upon starting a collective endeavour, it is important to understand your partners' preferences and how strongly they commit to a common goal. Establishing a prior commitment or agreement in terms of posterior benefits and consequences from those engaging in it provides an important mechanism for securing cooperation. Resorting to methods from Evolutionary Game Theory (EGT), here we analyse how prior commitments can also be adopted as a tool for enhancing coordination when its outcomes exhibit an asymmetric payoff structure, in both pairwise and multiparty interactions. Arguably, coordination is more complex to achieve than cooperation since there might be several desirable collective outcomes in a coordination problem (compared to mutual cooperation, the only desirable collective outcome in cooperation dilemmas). Our analysis, both analytically and via numerical simulations, shows that whether prior commitment would be a viable evolutionary mechanism for enhancing coordination and the overall population social welfare strongly depends on the collective benefit and severity of competition, and more importantly, how asymmetric benefits are resolved in a commitment deal. Moreover, in multiparty interactions, prior commitments prove to be crucial when a high level of group diversity is required for optimal coordination. The results are robust for different selection intensities. Overall, our analysis provides new insights into the complexity and beauty of behavioral evolution driven by humans' capacity for commitment, as well as for the design of self-organised and distributed multi-agent systems for ensuring coordination among autonomous agents.	翻訳日:2022-10-15 05:25:25 公開日:2020-12-17
# ディープラーニングによるcloud cover nowcasting Cloud Cover Nowcasting with Deep Learning ( http://arxiv.org/abs/2009.11577v3 ) ライセンス: Link先を確認	L\'ea Berthomier, Bruno Pradel and Lior Perez	(参考訳) Nowcastingは気象学の分野であり、気象予報を数時間の短期間で行うことを目的としている。気象学の世界では、この分野はデータ外挿のような特定の技術を必要とするため、通常の気象学は一般に物理モデリングに基づいているため、かなり特異である。本稿では,衛星撮影の最適化や太陽光発電の発電予測など,応用分野が多様であるクラウドカバーの nowcasting に着目した。近年,複数の画像タスクにおけるディープラーニングの成功に続いて,衛星画像に深部畳み込みニューラルネットワークを適用した。画像セグメンテーションと時系列予測に特化しているいくつかのアーキテクチャの結果を示す。機械学習の指標と気象の指標に基づいて最適なモデルを選択した。選択されたアーキテクチャはすべて、永続性よりも大幅に改善され、よく知られたU-NetはAROME物理モデルを上回った。 Nowcasting is a field of meteorology which aims at forecasting weather on a short term of up to a few hours. In the meteorology landscape, this field is rather specific as it requires particular techniques, such as data extrapolation, where conventional meteorology is generally based on physical modeling. In this paper, we focus on cloud cover nowcasting, which has various application areas such as satellite shots optimisation and photovoltaic energy production forecast. Following recent deep learning successes on multiple imagery tasks, we applied deep convolutionnal neural networks on Meteosat satellite images for cloud cover nowcasting. We present the results of several architectures specialized in image segmentation and time series prediction. We selected the best models according to machine learning metrics as well as meteorological metrics. All selected architectures showed significant improvements over persistence and the well-known U-Net surpasses AROME physical model.	翻訳日:2022-10-15 04:06:02 公開日:2020-12-17
# トピック対応マルチターン対話モデリング Topic-Aware Multi-turn Dialogue Modeling ( http://arxiv.org/abs/2009.12539v2 ) ライセンス: Link先を確認	Yi Xu, Hai Zhao, Zhuosheng Zhang	(参考訳) 検索に基づくマルチターン対話モデルでは,コンテキスト発話中の有意な特徴を抽出することで,最も適切な応答を選択することが課題となっている。会話が進むにつれて、談話レベルのトピックシフトは、連続したマルチターン対話コンテキストを通じて自然に起こる。しかし,すべての検索ベースシステムは,文脈発話表現のための局所的な話題語の利用に満足しているが,会話レベルでのこのような重要なグローバルな話題認識の手がかりを捉えられなかった。本稿では,既存のシステムにおいて,トピックに依存しないn-gram発話を処理単位として扱う代わりに,トピック認識発話を教師なしの方法でセグメント抽出し,対話レベルでの健全なトピックシフトを把握し,マルチターン対話中のトピックフローを効果的に追跡する,マルチターン対話モデリングのための新しいトピック認識ソリューションを提案する。トピック認識モデリングは,新たに提案したトピック認識セグメンテーションアルゴリズムとトピック認識デュアルアテンションマッチング(TADAM)ネットワークによって実現されている。 3つの公開データセットの実験結果は、TADAMが最先端の手法、特に明らかなトピックシフトを持つEコマースデータセットの3.3%を上回っていることを示している。 In the retrieval-based multi-turn dialogue modeling, it remains a challenge to select the most appropriate response according to extracting salient features in context utterances. As a conversation goes on, topic shift at discourse-level naturally happens through the continuous multi-turn dialogue context. However, all known retrieval-based systems are satisfied with exploiting local topic words for context utterance representation but fail to capture such essential global topic-aware clues at discourse-level. Instead of taking topic-agnostic n-gram utterance as processing unit for matching purpose in existing systems, this paper presents a novel topic-aware solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way, so that the resulted model is capable of capturing salient topic shift at discourse-level in need and thus effectively track topic flow during multi-turn conversation. Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network, which matches each topic segment with the response in a dual cross-attention way. Experimental results on three public datasets show TADAM can outperform the state-of-the-art method, especially by 3.3% on E-commerce dataset that has an obvious topic shift.	翻訳日:2022-10-14 08:43:58 公開日:2020-12-17
# 教師なし)エンティティアライメントのためのビジュアルPivoting Visual Pivoting for (Unsupervised) Entity Alignment ( http://arxiv.org/abs/2009.13603v2 ) ライセンス: Link先を確認	Fangyu Liu, Muhao Chen, Dan Roth, Nigel Collier	(参考訳) この研究は、異種知識グラフ(KG)の実体を整列させる視覚的意味表現の使用を研究する。画像は多くの既存のkgの自然な構成要素です。視覚知識を他の補助情報と組み合わせることで,提案する新しいアプローチであるevaが,クロスグラフエンティティアライメントに強いシグナルを与える包括的エンティティ表現を生成することを示す。さらに、以前のエンティティアライメント手法では、可用性を制限するために、人間のラベル付きシードアライメントが必要となる。 EVAは、エンティティの視覚的類似性を活用して、初期シード辞書(視覚的なピボット)を作成する、完全に教師なしのソリューションを提供する。ベンチマークデータセットDBP15kとDWY15kの実験は、EVAがモノリンガルとクロスリンガルの両方のエンティティアライメントタスクに対して最先端のパフォーマンスを提供することを示している。さらに、画像は特に長い尾のKGエンティティの整列に有用であり、通信を捉えるのに必要な構造的コンテキストが本質的に欠如していることが判明した。 This work studies the use of visual semantic representations to align entities in heterogeneous knowledge graphs (KGs). Images are natural components of many existing KGs. By combining visual knowledge with other auxiliary information, we show that the proposed new approach, EVA, creates a holistic entity representation that provides strong signals for cross-graph entity alignment. Besides, previous entity alignment methods require human labelled seed alignment, restricting availability. EVA provides a completely unsupervised solution by leveraging the visual similarity of entities to create an initial seed dictionary (visual pivots). Experiments on benchmark data sets DBP15k and DWY15k show that EVA offers state-of-the-art performance on both monolingual and cross-lingual entity alignment tasks. Furthermore, we discover that images are particularly useful to align long-tail KG entities, which inherently lack the structural contexts necessary for capturing the correspondences.	翻訳日:2022-10-13 20:46:56 公開日:2020-12-17
# モデル共有ゲーム:自発参加下での連合学習の分析 Model-sharing Games: Analyzing Federated Learning Under Voluntary Participation ( http://arxiv.org/abs/2010.00753v3 ) ライセンス: Link先を確認	Kate Donahue and Jon Kleinberg	(参考訳) フェデレーション学習(federated learning)は、エージェントがそれぞれのデータソースにアクセスし、ローカルデータからモデルを組み合わせてグローバルモデルを作成するための設定である。しかし、エージェントが異なる分布からデータを引き出している場合、連合学習はそれぞれのエージェントに最適ではない偏りのあるグローバルモデルを生成するかもしれない。つまりエージェントは,グローバルモデルやローカルモデルを選択するべきか,という根本的な問題に直面しているのです。この状況は連立ゲーム理論の枠組みによって自然に分析できることを示す。異なるモデルパラメータを持つ不均質なプレイヤーが、彼らのデータ分布と、彼らが自分たちの分散から異常に引き出した異なる量のデータを支配している。各プレイヤーのゴールは、最小限の期待平均二乗誤差(MSE)を持つモデルを得ることである。彼らは自身のデータのみに基づいたモデルに適合するか、学習したパラメータと他のプレイヤーのサブセットのパラメータを組み合わせるかを選択できる。モデルを組み合わせることで、より多くのデータにアクセスすることでエラーの分散成分が減少するが、分布の不均一性のためにバイアスが増加する。ここでは線形回帰と平均推定における問題に対する正確なMSE値を導出する。次に, ヘドニックゲーム理論(hedonic game theory)の枠組みを用いて, 結果ゲームの解析を行い, プレイヤーが連立モデル(s)を構成する各プレイヤー群にどのように分割するかを検討した。異なるカスタマイズ度をモデル化する3つのフェデレーションの手法を分析した。統一連合では、エージェントは集合的に単一のモデルを生成する。粒度の粗いフェデレーションでは、各エージェントはローカルモデルとともにグローバルモデルを重み付けすることができる。微細なフェデレーションでは、各エージェントは、フェデレーション内の他のすべてのエージェントのモデルを柔軟に組み合わせることができる。各方法について,プレイヤーの安定な分割を連立に分析する。 Federated learning is a setting where agents, each with access to their own data source, combine models from local data to create a global model. If agents are drawing their data from different distributions, though, federated learning might produce a biased global model that is not optimal for each agent. This means that agents face a fundamental question: should they choose the global model or their local model? We show how this situation can be naturally analyzed through the framework of coalitional game theory. We propose the following game: there are heterogeneous players with different model parameters governing their data distribution and different amounts of data they have noisily drawn from their own distribution. Each player's goal is to obtain a model with minimal expected mean squared error (MSE) on their own distribution. They have a choice of fitting a model based solely on their own data, or combining their learned parameters with those of some subset of the other players. Combining models reduces the variance component of their error through access to more data, but increases the bias because of the heterogeneity of distributions. Here, we derive exact expected MSE values for problems in linear regression and mean estimation. We then analyze the resulting game in the framework of hedonic game theory; we study how players might divide into coalitions, where each set of players within a coalition jointly construct model(s). We analyze three methods of federation, modeling differing degrees of customization. In uniform federation, the agents collectively produce a single model. In coarse-grained federation, each agent can weight the global model together with their local model. In fine-grained federation, each agent can flexibly combine models from all other agents in the federation. For each method, we analyze the stable partitions of players into coalitions.	翻訳日:2022-10-12 02:26:17 公開日:2020-12-17
# スパイクニューラルネットワークのニューロモルフィックハードウェアへのサーマルアウェアコンパイル Thermal-Aware Compilation of Spiking Neural Networks to Neuromorphic Hardware ( http://arxiv.org/abs/2010.04773v2 ) ライセンス: Link先を確認	Twisha Titirsha and Anup Das	(参考訳) ニューロモルフィックコンピューティングのハードウェア実装は、スパイクニューラルネットワーク(SNN)で実装された機械学習タスクのパフォーマンスとエネルギー効率を大幅に向上させ、これらのハードウェアプラットフォームは組み込みシステムや他のエネルギー制約のある環境に特に適している。ハードウェアのクロスバーの長いビット線とワード線は、通常非揮発性メモリ(NVM)で設計されるシナプス要素を介してスパイクを伝播する際に、大きな電流変化を生じさせる。このような変化は、ハードウェアの各クロスバー内で、機械学習のワークロードと、これらのクロスバーへの負荷のニューロンとシナプスのマッピングに依存する熱勾配を生成する。この温度勾配は、スケールされた技術ノードにおいて重要となり、ハードウェアのリーク電力を増加させ、エネルギー消費を増加させる。ニューロモルフィックハードウェアにSNNベースの機械学習ワークロードのニューロンとシナプスをマッピングする新しい手法を提案する。我々は2つの新しい貢献をした。まず, 各NVM系シナプス細胞の温度を計算し, 隣接するセルの熱的寄与を考慮し, 負荷依存性を取り入れたニューロモルフィックハードウェアにおけるクロスバーの詳細な熱モデルを構築した。第2に、この熱モデルを、丘登りヒューリスティックを用いてSNNベースのワークロードのニューロンとシナプスのマッピングに組み込む。クロスバーの熱勾配を減少させることが目的である。我々は、最先端のニューロモルフィックハードウェアのための10の機械学習ワークロードを用いて、ニューロンとシナプスマッピング手法を評価する。ハードウェアの各クロスバーの平均温度を平均11.4K削減し,性能指向SNNマッピング技術と比較して,リーク電力消費量(総エネルギー消費率11%)を52%削減した。 Hardware implementation of neuromorphic computing can significantly improve performance and energy efficiency of machine learning tasks implemented with spiking neural networks (SNNs), making these hardware platforms particularly suitable for embedded systems and other energy-constrained environments. We observe that the long bitlines and wordlines in a crossbar of the hardware create significant current variations when propagating spikes through its synaptic elements, which are typically designed with non-volatile memory (NVM). Such current variations create a thermal gradient within each crossbar of the hardware, depending on the machine learning workload and the mapping of neurons and synapses of the workload to these crossbars. \mr{This thermal gradient becomes significant at scaled technology nodes and it increases the leakage power in the hardware leading to an increase in the energy consumption.} We propose a novel technique to map neurons and synapses of SNN-based machine learning workloads to neuromorphic hardware. We make two novel contributions. First, we formulate a detailed thermal model for a crossbar in a neuromorphic hardware incorporating workload dependency, where the temperature of each NVM-based synaptic cell is computed considering the thermal contributions from its neighboring cells. Second, we incorporate this thermal model in the mapping of neurons and synapses of SNN-based workloads using a hill-climbing heuristic. The objective is to reduce the thermal gradient in crossbars. We evaluate our neuron and synapse mapping technique using 10 machine learning workloads for a state-of-the-art neuromorphic hardware. We demonstrate an average 11.4K reduction in the average temperature of each crossbar in the hardware, leading to a 52% reduction in the leakage power consumption (11% lower total energy consumption) compared to a performance-oriented SNN mapping technique.	翻訳日:2022-10-09 05:40:33 公開日:2020-12-17
# セルオートマタの挙動類似性の測定 Measuring Behavioural Similarity of Cellular Automata ( http://arxiv.org/abs/2010.08431v2 ) ライセンス: Link先を確認	Peter D. Turney	(参考訳) コンウェイのゲーム・オブ・ライフは最も有名なセル・オートマトンである。出現と自己組織化の古典的なモデルであり、チューリング完全であり、普遍的なコンストラクタをシミュレートすることができる。ゲーム・オブ・ライフ(game of life)は262,144人のメンバーを持つ半トータル的なセル・オートマトンに属する。これらのオートマトンの多くは、ゲーム・オブ・ライフほど注目に値するかもしれない。ここでの課題は、この大きな家族を組織化し、興味深いオートマトンを見つけやすくし、オートマトン間の関係を理解するための構造を提供することです。 Packard and Wolfram (1985) は、規則の観察された振る舞いに基づいて、家族を4つのクラスに分けた。 eppstein (2010) は規則の形式に基づいた代替の4クラスシステムを提案した。クラスベースの組織の代わりに、各オートマトンが空間内の点によって表現される連続的な高次元ベクトル空間を提案する。この空間における2つのオートマトン間の距離は、その行動特性の差に対応する。この空間に最も近い近隣の地域も同様の行動をとる。この空間は、研究者が半トータル主義的な規則の家族の構造を観察し、家族の中に隠れた宝石を見つけるのが容易になる。 Conway's Game of Life is the best-known cellular automaton. It is a classic model of emergence and self-organization, it is Turing-complete, and it can simulate a universal constructor. The Game of Life belongs to the set of semi-totalistic cellular automata, a family with 262,144 members. Many of these automata may deserve as much attention as the Game of Life, if not more. The challenge we address here is to provide a structure for organizing this large family, to make it easier to find interesting automata, and to understand the relations between automata. Packard and Wolfram (1985) divided the family into four classes, based on the observed behaviours of the rules. Eppstein (2010) proposed an alternative four-class system, based on the forms of the rules. Instead of a class-based organization, we propose a continuous high-dimensional vector space, where each automaton is represented by a point in the space. The distance between two automata in this space corresponds to the differences in their behavioural characteristics. Nearest neighbours in the space have similar behaviours. This space should make it easier for researchers to see the structure of the family of semi-totalistic rules and to find the hidden gems in the family.	翻訳日:2022-10-06 21:15:25 公開日:2020-12-17
# 非定常MDPの安全政策改善に向けて Towards Safe Policy Improvement for Non-Stationary MDPs ( http://arxiv.org/abs/2010.12645v2 ) ライセンス: Link先を確認	Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas	(参考訳) 現実世界のシーケンシャルな意思決定には、金融リスクと人命リスクを伴う重要なシステムが含まれる。過去にいくつかの研究がデプロイに安全な方法を提案しているが、根底にある問題は静止していると仮定している。しかし、多くの実世界の利害問題は非定常性を示し、利害関係が高ければ、偽の定常性仮定に関連するコストは受け入れがたい。我々は、スムーズに変化する非定常的な意思決定問題に対して、安全を確実にする第一歩を踏み出します。提案手法は,時系列解析を用いたモデルフリー強化学習の合成により,セルドンアルゴリズムと呼ばれる安全なアルゴリズムを拡張した。ポリシーの予測性能の逐次仮説テストを用いて安全性を保証し、ワイルドブートストラップを用いて信頼区間を求める。 Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy's forecasted performance, and confidence intervals are obtained using wild bootstrap.	翻訳日:2022-10-03 21:39:59 公開日:2020-12-17
# 離散化ランジュバンmcmcのr\'enyi divergence解析による高速微分プライベートサンプラー Faster Differentially Private Samplers via R\'enyi Divergence Analysis of Discretized Langevin MCMC ( http://arxiv.org/abs/2010.14658v2 ) ライセンス: Link先を確認	Arun Ganesh, Kunal Talwar	(参考訳) 様々な微分プライベートアルゴリズムは指数関数機構をインスタンス化し、適切な関数に対して$\exp(-f)$からサンプリングする必要がある。分布領域が高次元である場合、このサンプリングは計算的に困難である。ギブスサンプリングのようなヒューリスティックサンプリングスキームを使用すると、必ずしも証明可能なプライバシーにつながるとは限らない。 f$が凸であるとき、対数凹サンプリングの技術は多項式時間アルゴリズムに導かれる。ランゲヴィン力学に基づくアルゴリズムは、統計距離などの距離測度の下ではるかに高速な代替手段を提供する。本研究では,差分プライバシーに適合する距離尺度を用いて,これらのアルゴリズムの高速収束を実現する。滑らかで強凸な$f$ に対して、r\'enyi divergence の収束を証明する最初の結果を与える。これにより、そのような$f$の高速な微分プライベートアルゴリズムが得られます。我々の技術と単純で汎用的で、アンダーダムドランゲヴィン力学にも応用できる。 Various differentially private algorithms instantiate the exponential mechanism, and require sampling from the distribution $\exp(-f)$ for a suitable function $f$. When the domain of the distribution is high-dimensional, this sampling can be computationally challenging. Using heuristic sampling schemes such as Gibbs sampling does not necessarily lead to provable privacy. When $f$ is convex, techniques from log-concave sampling lead to polynomial-time algorithms, albeit with large polynomials. Langevin dynamics-based algorithms offer much faster alternatives under some distance measures such as statistical distance. In this work, we establish rapid convergence for these algorithms under distance measures more suitable for differential privacy. For smooth, strongly-convex $f$, we give the first results proving convergence in R\'enyi divergence. This gives us fast differentially private algorithms for such $f$. Our techniques and simple and generic and apply also to underdamped Langevin dynamics.	翻訳日:2022-10-02 13:25:45 公開日:2020-12-17
# SATベースのAI計画の形式的検証 Formally Verified SAT-Based AI Planning ( http://arxiv.org/abs/2010.14648v4 ) ライセンス: Link先を確認	Mohammad Abdulaziz and Friedrich Kurz	(参考訳) 本稿では,従来のAI計画のSAT符号化について述べる。定理証明器 Isabelle/HOL を用いて検証を行う。検証された符号化を実験的に検証し、合理的な大きさの標準計画ベンチマークに使用できることを示す。我々はまた、最先端のSATベースのプランナーをテストするための参照として使用し、時には問題に一定の長さの解がないと主張する。 We present an executable formally verified SAT encoding of classical AI planning. We use the theorem prover Isabelle/HOL to perform the verification. We experimentally test the verified encoding and show that it can be used for reasonably sized standard planning benchmarks. We also use it as a reference to test a state-of-the-art SAT-based planner, showing that it sometimes falsely claims that problems have no solutions of certain lengths.	翻訳日:2022-10-02 12:41:53 公開日:2020-12-17
# Deep Probabilistic Imaging:Computational Imagingのための不確かさの定量化とマルチモーダルソリューション評価 Deep Probabilistic Imaging: Uncertainty Quantification and Multi-modal Solution Characterization for Computational Imaging ( http://arxiv.org/abs/2010.14462v2 ) ライセンス: Link先を確認	He Sun, Katherine L. Bouman	(参考訳) 計算画像再構成アルゴリズムは一般に、不確実性や信頼性の尺度なしに単一の画像を生成する。 RML(Regularized Maximum Likelihood)と逆問題に対するフィードフォワード深層学習(Feed-forward Deep Learning)アプローチは通常、点推定の回復に重点を置いている。これは、未決定の撮像システムで作業する場合に深刻な制限であり、複数の画像モードが測定されたデータと一致することが考えられる。したがって、観測データを説明する確率的な画像の空間を特徴付けることが重要である。本稿では,再構成の不確かさを定量化するために,変分深い確率的イメージング手法を提案する。深部確率イメージング(Deep Probabilistic Imaging, DPI)は、未観測画像の後部分布を推定するために、訓練されていない深部生成モデルを用いる。このアプローチではトレーニングデータを必要としない。代わりに、ニューラルネットワークの重みを最適化して、特定の測定データセットに適合するイメージサンプルを生成する。ネットワークウェイトが学習されると、後方分布を効率的にサンプリングすることができる。このアプローチは、イベントホライズン望遠鏡によるブラックホールイメージングや、mri(compressed sensing magnetic resonance imaging)で用いられるインターフェロメトリ・ラジオイメージング(interferometric radio imaging)という文脈で実証されている。 Computational image reconstruction algorithms generally produce a single image without any measure of uncertainty or confidence. Regularized Maximum Likelihood (RML) and feed-forward deep learning approaches for inverse problems typically focus on recovering a point estimate. This is a serious limitation when working with underdetermined imaging systems, where it is conceivable that multiple image modes would be consistent with the measured data. Characterizing the space of probable images that explain the observational data is therefore crucial. In this paper, we propose a variational deep probabilistic imaging approach to quantify reconstruction uncertainty. Deep Probabilistic Imaging (DPI) employs an untrained deep generative model to estimate a posterior distribution of an unobserved image. This approach does not require any training data; instead, it optimizes the weights of a neural network to generate image samples that fit a particular measurement dataset. Once the network weights have been learned, the posterior distribution can be efficiently sampled. We demonstrate this approach in the context of interferometric radio imaging, which is used for black hole imaging with the Event Horizon Telescope, and compressed sensing Magnetic Resonance Imaging (MRI).	翻訳日:2022-10-02 11:58:56 公開日:2020-12-17
# ページ数は? メタデータからの紙長予測 How Many Pages? Paper Length Prediction from the Metadata ( http://arxiv.org/abs/2010.15924v2 ) ライセンス: Link先を確認	Erion \c{C}ano and Ond\v{r}ej Bojar	(参考訳) 科学論文の長さを予測することは、多くの状況で役立つかもしれない。本研究は,紙長予測タスクを回帰問題として定義し,一般的な機械学習モデルを用いて実験結果を報告する。また、出版メタデータと各ページの長さの巨大なデータセットを作成します。データセットは無償で提供され、この分野の研究を促進することを意図している。今後の取り組みとして、ニューラルネットワークと大きな事前学習された言語モデルに基づいた、より高度なレグレッシャを探求したいと思います。 Being able to predict the length of a scientific paper may be helpful in numerous situations. This work defines the paper length prediction task as a regression problem and reports several experimental results using popular machine learning models. We also create a huge dataset of publication metadata and the respective lengths in number of pages. The dataset will be freely available and is intended to foster research in this domain. As future work, we would like to explore more advanced regressors based on neural networks and big pretrained language models.	翻訳日:2022-10-01 22:26:57 公開日:2020-12-17
# 道路損傷検出のための効率的かつスケーラブルな深層学習手法 An Efficient and Scalable Deep Learning Approach for Road Damage Detection ( http://arxiv.org/abs/2011.09577v3 ) ライセンス: Link先を確認	Sadra Naddaf-Sh, M-Mahdi Naddaf-Sh, Amir R. Kashani and Hassan Zargarzadeh	(参考訳) 舗装条件の評価は予防的又はリハビリテーション的行動の時間と救難伝播の制御に不可欠である。タイムリーな評価ができないと、インフラの深刻な構造的・財政的損失と完全な再建につながる可能性がある。自動コンピュータ支援測量手法は、道路損傷パターンとその位置のデータベースを提供することができる。このデータベースは、メンテナンスの最小コストとアスファルトの最大耐久性を得るために、タイムリーな道路修理に利用できる。本稿では,画像に基づく難易度データをリアルタイムに解析する深層学習に基づく調査手法を提案する。携帯端末を用いて撮影した縦・横・アリゲータ亀裂などの亀裂の多様な集団からなるデータベースを用いる。次に、舗装き裂検出用に調整された効率的でスケーラブルなモデル群を訓練し、様々な補強策を検討する。提案されたモデルでは、F1スコアは52%から56%まで、平均推測時間は毎秒178-10枚だった。最後に、物体検出器の性能を調べ、様々な画像に対して誤差解析を報告する。ソースコードはhttps://github.com/mahdi65/roaddamagedetection2020で入手できる。 Pavement condition evaluation is essential to time the preventative or rehabilitative actions and control distress propagation. Failing to conduct timely evaluations can lead to severe structural and financial loss of the infrastructure and complete reconstructions. Automated computer-aided surveying measures can provide a database of road damage patterns and their locations. This database can be utilized for timely road repairs to gain the minimum cost of maintenance and the asphalt's maximum durability. This paper introduces a deep learning-based surveying scheme to analyze the image-based distress data in real-time. A database consisting of a diverse population of crack distress types such as longitudinal, transverse, and alligator cracks, photographed using mobile-device is used. Then, a family of efficient and scalable models that are tuned for pavement crack detection is trained, and various augmentation policies are explored. Proposed models, resulted in F1-scores, ranging from 52% to 56%, and average inference time from 178-10 images per second. Finally, the performance of the object detectors are examined, and error analysis is reported against various images. The source code is available at https://github.com/mahdi65/roadDamageDetection2020.	翻訳日:2022-09-24 04:39:45 公開日:2020-12-17
# 注意による分類:事前知識を用いたシーングラフ分類 Classification by Attention: Scene Graph Classification with Prior Knowledge ( http://arxiv.org/abs/2011.10084v2 ) ライセンス: Link先を確認	Sahand Sharifzadeh, Sina Moayed Baharlou, Volker Tresp	(参考訳) シーングラフ分類における大きな課題は、オブジェクトと関係の出現が、ある画像から別の画像に大きく異なる可能性があることである。以前の研究では、画像内のすべてのオブジェクトをリレーショナル推論したり、事前の知識を分類に組み込んだりすることでこの問題に対処してきた。先行研究とは異なり、知覚と事前知識について異なるモデルを検討することはない。代わりに、マルチタスク学習アプローチを採用し、注意層として分類を実装します。これにより、事前の知識が知覚モデル内に出現し、伝播することができる。モデルも前者を表現するように強制することで、強い帰納バイアスを達成できる。本モデルでは,この知識をシーン表現に反復的に注入することで,より高度な分類性能が得られることを示す。さらに、我々のモデルはトリプルとして与えられる外部知識に基づいて微調整することができる。自己教師付き学習と1%の注釈付き画像のみを組み合わせた場合、3%以上のオブジェクト分類の改善、26%のシーングラフ分類、36%の述語予測精度が得られる。 A major challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another. Previous works have addressed this by relational reasoning over all objects in an image or incorporating prior knowledge into classification. Unlike previous works, we do not consider separate models for perception and prior knowledge. Instead, we take a multi-task learning approach, where we implement the classification as an attention layer. This allows for the prior knowledge to emerge and propagate within the perception model. By enforcing the model also to represent the prior, we achieve a strong inductive bias. We show that our model can accurately generate commonsense knowledge and that the iterative injection of this knowledge to scene representations leads to significantly higher classification performance. Additionally, our model can be fine-tuned on external knowledge given as triples. When combined with self-supervised learning and with 1% of annotated images only, this gives more than 3% improvement in object classification, 26% in scene graph classification, and 36% in predicate prediction accuracy.	翻訳日:2022-09-23 20:43:33 公開日:2020-12-17
# XTQA: 教科書質問回答のSpan-Level説明 XTQA: Span-Level Explanations of the Textbook Question Answering ( http://arxiv.org/abs/2011.12662v3 ) ライセンス: Link先を確認	Jie Ma, Jun Liu, Junjun Li, Qinghua Zheng, Qingyu Yin, Jianlong Zhou, Yi Huang	(参考訳) 教科書質問応答 (tqa) は、豊富なエッセイと図からなる大きなマルチモーダルな文脈において、ダイアグラム/非ダイアグラムの質問に答えるべきタスクである。この課題の説明は学生を考慮すべき重要な側面として位置づけるべきである。この問題に対処するために,提案する粗粒粒度アルゴリズムに基づいて,tqa(span-level descriptions of the tqa)のスパンレベル説明に向けて,新たなアーキテクチャを考案する。このアルゴリズムはまずTF-IDF法を用いて質問に関する上位M$段落を粗末に選択し、各質問に対する情報ゲインを計算することにより、これらの段落内のすべての候補から上位K$段落を微妙に選択する。実験結果から,XTQAはベースラインに比べて最先端性能を著しく向上することがわかった。ソースコードはhttps://github.com/keep-smile-001/opentqaで入手できる。 Textbook Question Answering (TQA) is a task that one should answer a diagram/non-diagram question given a large multi-modal context consisting of abundant essays and diagrams. We argue that the explainability of this task should place students as a key aspect to be considered. To address this issue, we devise a novel architecture towards span-level eXplanations of the TQA (XTQA) based on our proposed coarse-to-fine grained algorithm, which can provide not only the answers but also the span-level evidences to choose them for students. This algorithm first coarsely chooses top $M$ paragraphs relevant to questions using the TF-IDF method, and then chooses top $K$ evidence spans finely from all candidate spans within these paragraphs by computing the information gain of each span to questions. Experimental results shows that XTQA significantly improves the state-of-the-art performance compared with baselines. The source code is available at https://github.com/keep-smile-001/opentqa	翻訳日:2022-09-21 01:42:50 公開日:2020-12-17
# 映像から映像へ視覚効果を伝達する学習 Learning to Transfer Visual Effects from Videos to Images ( http://arxiv.org/abs/2012.01642v2 ) ライセンス: Link先を確認	Christopher Thomas, Yale Song, Adriana Kovashka	(参考訳) 本研究では,ビデオのコレクションから時空間的効果(溶融など)を伝達することで,画像のアニメーション化の問題を研究する。視覚効果伝達における主な課題は, 1) 蒸留したい効果を捉える方法,2) 内容や芸術的スタイルではなく, 効果のみをソースビデオから入力画像に移す方法,の2つである。最初の課題に対処するために、我々は5つの損失関数を評価し、最も有望なものは、生成したアニメーションが、ソースビデオと似た光学的流れとテクスチャ運動を持つことを奨励する。第2の課題に対処するために、制約のないピクセル値を予測するのではなく、既存の画像ピクセルを以前のフレームから移動させることしかできない。これにより、入力画像のピクセルを使って視覚効果を発生させ、ソースビデオからの不要な芸術的スタイルや内容が出力に現れるのを防ぐ。提案手法を客観的および主観的設定で評価し,顔の融解や鹿の開花などの非定型的変換対象を示す興味深い定性的な結果を示す。 We study the problem of animating images by transferring spatio-temporal visual effects (such as melting) from a collection of videos. We tackle two primary challenges in visual effect transfer: 1) how to capture the effect we wish to distill; and 2) how to ensure that only the effect, rather than content or artistic style, is transferred from the source videos to the input image. To address the first challenge, we evaluate five loss functions; the most promising one encourages the generated animations to have similar optical flow and texture motions as the source videos. To address the second challenge, we only allow our model to move existing image pixels from the previous frame, rather than predicting unconstrained pixel values. This forces any visual effects to occur using the input image's pixels, preventing unwanted artistic style or content from the source video from appearing in the output. We evaluate our method in objective and subjective settings, and show interesting qualitative results which demonstrate objects undergoing atypical transformations, such as making a face melt or a deer bloom.	翻訳日:2021-05-23 14:58:35 公開日:2020-12-17
# SAFCAR:構成行動認識のための構造化注意融合 SAFCAR: Structured Attention Fusion for Compositional Action Recognition ( http://arxiv.org/abs/2012.02109v2 ) ライセンス: Link先を確認	Tae Soo Kim, Gregory D. Hager	(参考訳) 構成的行動認識のための一般的な枠組みを提示する。アクション認識では、ラベルはサブジェクトやアトミックアクション、オブジェクトといった単純なコンポーネントで構成されている。構成的行動認識の最大の課題は、基本的なコンポーネントを使って構成できる、組み合わせ可能なアクションのセットが多数存在することである。しかし、構成性はまた、利用可能な構造を提供する。そこで我々は,アクションの時系列構造をキャプチャする物体検出情報と,文脈情報をキャプチャする視覚手がかりとを組み合わせた,新しい構造化注意融合(saf)自己照準機構を開発し,検証する。提案手法は,新しい動詞句の合成を,現在の技術システムよりも効果的に認識し,いくつかのラベル付き例から非常に効率的なアクションカテゴリーに一般化することを示す。我々は,Something-V2データセットの課題であるSomesing-Elseタスクに対するアプローチを検証する。さらに、当社のフレームワークはフレキシブルで、Charades-Fewshotデータセット上で競合する結果を示すことによって、新たなドメインに一般化可能であることを示す。 We present a general framework for compositional action recognition -- i.e. action recognition where the labels are composed out of simpler components such as subjects, atomic-actions and objects. The main challenge in compositional action recognition is that there is a combinatorially large set of possible actions that can be composed using basic components. However, compositionality also provides a structure that can be exploited. To do so, we develop and test a novel Structured Attention Fusion (SAF) self-attention mechanism to combine information from object detections, which capture the time-series structure of an action, with visual cues that capture contextual information. We show that our approach recognizes novel verb-noun compositions more effectively than current state of the art systems, and it generalizes to unseen action categories quite efficiently from only a few labeled examples. We validate our approach on the challenging Something-Else tasks from the Something-Something-V2 dataset. We further show that our framework is flexible and can generalize to a new domain by showing competitive results on the Charades-Fewshot dataset.	翻訳日:2021-05-23 14:52:24 公開日:2020-12-17
# 自律運転のためのコンピュータステレオビジョン Computer Stereo Vision for Autonomous Driving ( http://arxiv.org/abs/2012.03194v2 ) ライセンス: Link先を確認	Rui Fan, Li Wang, Mohammud Junaid Bocus, Ioannis Pitas	(参考訳) 自律システムの重要なコンポーネントとして、自律的な自動車認識は、最近の並列コンピューティングアーキテクチャの進歩で大きな飛躍を遂げた。小型だがフル機能の組み込みスーパーコンピュータを使用することで、コンピュータステレオビジョンは自動運転車の奥行き認識に広く採用されている。コンピュータステレオビジョンの2つの重要な側面は、スピードと精度である。これらはどちらも望ましいが相反する性質であり、より精度のよいアルゴリズムは計算の複雑さが高い。したがって、リソース制限ハードウェアのためのコンピュータステレオビジョンアルゴリズムを開発する主な目的は、速度と精度のトレードオフを改善することである。本章では,自律走行車システムにおけるコンピュータステレオビジョンのハードウェアとソフトウェアの両方について紹介する。次に, 視覚的特徴検出, 説明とマッチング, 2) 3D情報取得, 3) 物体検出/認識, 4) セマンティックイメージセグメンテーションの4つの自律車認識タスクについて議論する。次に、マルチスレッドCPUおよびGPUアーキテクチャにおけるコンピュータステレオビジョンと並列コンピューティングの原理を詳述する。 As an important component of autonomous systems, autonomous car perception has had a big leap with recent advances in parallel computing architectures. With the use of tiny but full-feature embedded supercomputers, computer stereo vision has been prevalently applied in autonomous cars for depth perception. The two key aspects of computer stereo vision are speed and accuracy. They are both desirable but conflicting properties, as the algorithms with better disparity accuracy usually have higher computational complexity. Therefore, the main aim of developing a computer stereo vision algorithm for resource-limited hardware is to improve the trade-off between speed and accuracy. In this chapter, we introduce both the hardware and software aspects of computer stereo vision for autonomous car systems. Then, we discuss four autonomous car perception tasks, including 1) visual feature detection, description and matching, 2) 3D information acquisition, 3) object detection/recognition and 4) semantic image segmentation. The principles of computer stereo vision and parallel computing on multi-threading CPU and GPU architectures are then detailed.	翻訳日:2021-05-21 13:58:25 公開日:2020-12-17
# (参考訳) スーパーマーケット記録を用いた季節インフルエンザ予測 Predicting seasonal influenza using supermarket retail records ( http://arxiv.org/abs/2012.04651v2 ) ライセンス: CC BY 4.0	Ioanna Miliou, Xinyue Xiong, Salvatore Rinzivillo, Qian Zhang, Giulio Rossetti, Fosca Giannotti, Dino Pedreschi, Alessandro Vespignani	(参考訳) 疫学データの可用性の向上、新しいデジタルデータストリーム、強力な機械学習アプローチの台頭により、リアルタイム流行予測システムの研究活動が急増している。本稿では,インフルエンザの季節予測を改善するために,新しいデータソース,すなわち小売市場データの利用を提案する。具体的には、スーパーマーケットの小売データを、選択された顧客の集団が一緒に購入したセンチネルバスケットの識別を通じて、インフルエンザの代理信号として捉えている。我々は、イタリアでインフルエンザの発生率を最大4週間前に見積もる nowcasting and forecasting framework を開発した。我々は,svrモデルを用いて季節性インフルエンザの発生予測を行う。我々の予測は,製品購入に基づくベースライン自己回帰モデルと第2ベースラインの両方を上回っている。その結果,疫病のリアルタイム分析に有効なプロキシとして,予測モデルに小売市場データを組み込むことの価値が定量的に示された。 Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy signal for influenza, through the identification of sentinel baskets, i.e., products bought together by a population of selected customers. We develop a nowcasting and forecasting framework that provides estimates for influenza incidence in Italy up to 4 weeks ahead. We make use of the Support Vector Regression (SVR) model to produce the predictions of seasonal flu incidence. Our predictions outperform both a baseline autoregressive model and a second baseline based on product purchases. The results show quantitatively the value of incorporating retail market data in forecasting models, acting as a proxy that can be used for the real-time analysis of epidemics.	翻訳日:2021-05-17 03:56:56 公開日:2020-12-17
# 品質多様性最適化 : 確率最適化の新分野 Quality-Diversity Optimization: a novel branch of stochastic optimization ( http://arxiv.org/abs/2012.04322v2 ) ライセンス: Link先を確認	Konstantinos Chatzilygeroudis, Antoine Cully, Vassilis Vassiliades and Jean-Baptiste Mouret	(参考訳) 従来の最適化アルゴリズムは、目的関数を最大化(または最小化)する単一のグローバル最適化を探索する。マルチモーダル最適化アルゴリズムは、1つ以上の探索空間で最も高いピークを探索する。品質多様性アルゴリズムは、進化的計算ツールボックスに最近追加されたもので、単一の局所光学系を探索するだけでなく、検索空間を照らそうとする。実際、彼らは高パフォーマンスなソリューションが検索空間全体にどのように分散されているかの全体像を提供する。マルチモーダル最適化アルゴリズムとの主な違いは、(1)品質の多様性は一般的に行動空間(または特徴空間)で機能し、ジェネティピック(またはパラメータ)空間では動作しない。本章では,品質と多様性の最適化について概説し,主要な代表的アルゴリズムと,コミュニティで検討中の主要なトピックについて論じる。この章を通じて、ディープラーニング、ロボット工学、強化学習を含む品質多様性アルゴリズムのいくつかの成功例についても論じる。 Traditional optimization algorithms search for a single global optimum that maximizes (or minimizes) the objective function. Multimodal optimization algorithms search for the highest peaks in the search space that can be more than one. Quality-Diversity algorithms are a recent addition to the evolutionary computation toolbox that do not only search for a single set of local optima, but instead try to illuminate the search space. In effect, they provide a holistic view of how high-performing solutions are distributed throughout a search space. The main differences with multimodal optimization algorithms are that (1) Quality-Diversity typically works in the behavioral space (or feature space), and not in the genotypic (or parameter) space, and (2) Quality-Diversity attempts to fill the whole behavior space, even if the niche is not a peak in the fitness landscape. In this chapter, we provide a gentle introduction to Quality-Diversity optimization, discuss the main representative algorithms, and the main current topics under consideration in the community. Throughout the chapter, we also discuss several successful applications of Quality-Diversity algorithms, including deep learning, robotics, and reinforcement learning.	翻訳日:2021-05-16 21:37:17 公開日:2020-12-17
# CNNを用いた胸部X線画像からのCOVID-19検出 COVID-19 Detection in Chest X-Ray Images using a New Channel Boosted CNN ( http://arxiv.org/abs/2012.05073v2 ) ライセンス: Link先を確認	Saddam Hussain Khan, Anabia Sohail, and Asifullah Khan	(参考訳) 新型コロナウイルス(COVID-19)は感染性の高い呼吸器感染症で、世界中の人口に影響を与え、その壊滅的な影響を継続している。感染範囲を制限するには、早期にcovid-19を検出することが不可欠である。本研究では, 深部畳み込みニューラルネットワーク(CNN)とチャネルブースティングに基づく新しい分類手法CB-STM-RENetを提案する。この接続では、新型コロナウイルス特異的な放射線画像パターンを学習するために、分割変換マージ(STM)に基づく新しい畳み込みブロックを開発する。この新しいブロックは、各ブランチの領域とエッジベースの操作を体系的に組み込んで、様々なレベルの様々な特徴、特に領域の均一性、テクスチュラルなバリエーション、および感染領域の境界に関する特徴を捉えている。提案したCNNアーキテクチャの学習と識別能力は、補助チャネルと元のチャネルを連結するチャネルブースティングのアイデアを活用することで向上する。補助チャネルは、Transfer Learningを用いて事前訓練されたCNNから生成される。 CB-STM-RENetの有効性を胸部X線(CoV-Healthy-6k,CoV-NonCoV-10k,CoV-NonCoV-15k)の3種類のデータセットを用いて評価した。提案したCB-STM-RENetと既存の技術との比較により,健康と他の種類の胸部感染症の鑑別において高い性能を示した。 CB-STM-RENetはこれらの3つのデータセットで最高のパフォーマンスを提供する。良好な検出率(97%)と高い精度(93%)は,感染症の診断に適応できることが示唆された。テストコードはhttps://github.com/PRLAB21/COVID-19-Detection-System-using-Chest-X-Ray-Imagesで公開されている。 COVID-19 is a highly contagious respiratory infection that has affected a large population across the world and continues with its devastating consequences. It is imperative to detect COVID-19 at the earliest to limit the span of infection. In this work, a new classification technique CB-STM-RENet based on deep Convolutional Neural Network (CNN) and Channel Boosting is proposed for the screening of COVID-19 in chest X-Rays. In this connection, to learn the COVID-19 specific radiographic patterns, a new convolution block based on split-transform-merge (STM) is developed. This new block systematically incorporates region and edge-based operations at each branch to capture the diverse set of features at various levels, especially those related to region homogeneity, textural variations, and boundaries of the infected region. The learning and discrimination capability of the proposed CNN architecture is enhanced by exploiting the Channel Boosting idea that concatenates the auxiliary channels along with the original channels. The auxiliary channels are generated from the pre-trained CNNs using Transfer Learning. The effectiveness of the proposed technique CB-STM-RENet is evaluated on three different datasets of chest X-Rays namely CoV-Healthy-6k, CoV-NonCoV-10k, and CoV-NonCoV-15k. The performance comparison of the proposed CB-STM-RENet with the existing techniques exhibits high performance both in discriminating COVID-19 chest infections from Healthy, as well as, other types of chest infections. CB-STM-RENet provides the highest performance on all these three datasets; especially on the stringent CoV-NonCoV-15k dataset. The good detection rate (97%), and high precision (93%) of the proposed technique suggest that it can be adapted for the diagnosis of COVID-19 infected patients. The test code is available at https://github.com/PRLAB21/COVID-19-Detection-System-using-Chest-X-Ray-Images.	翻訳日:2021-05-16 20:56:53 公開日:2020-12-17
# (参考訳) 胸部X線画像から解釈可能な肺癌スコーリングモデルの自動生成 Automatic Generation of Interpretable Lung Cancer Scoring Models from Chest X-Ray Images ( http://arxiv.org/abs/2012.05447v2 ) ライセンス: CC BY 4.0	Michael J. Horry, Subrata Chakraborty, Biswajeet Pradhan, Manoranjan Paul, Douglas P. S. Gomes, Anwaar Ul-Haq	(参考訳) 肺癌は、がんが世界中で最も多い死因であり、早期発見が患者の予後の鍵である。多くの研究が、機械学習、特に深層学習は、肺がんの自動診断に有効であることを実証しているが、これらの技術は、まだ臨床で承認され、医療コミュニティによって採用されていない。この分野のほとんどの研究は、人工放射線学的第二読取を提供するための結節検出の狭いタスクに焦点を当てている。代わりに,胸部X線画像から肺がんに関連する幅広い病態を,大規模なデータセットで訓練されたコンピュータビジョンモデルを用いて抽出することに焦点を当てた。次に、肺癌の悪性度メタデータを提供する独立した、より小さなデータセットに対する最適な意思決定ツリーのセットを見つける。この小さな推論データセットでは, 感度と特異度はそれぞれ85%, 75%であり, 正の予測値は85%であり, 人体放射線技師の性能に匹敵する。さらに、本手法により作成された決定木は、臨床応用可能な多変量肺癌スコアリングおよび診断モデルへの医療専門家による改良の出発点とみなすことができる。 Lung cancer is the leading cause of cancer death worldwide with early detection being the key to a positive patient prognosis. Although a multitude of studies have demonstrated that machine learning, and particularly deep learning, techniques are effective at automatically diagnosing lung cancer, these techniques have yet to be clinically approved and adopted by the medical community. Most research in this field is focused on the narrow task of nodule detection to provide an artificial radiological second reading. We instead focus on extracting, from chest X-ray images, a wider range of pathologies associated with lung cancer using a computer vision model trained on a large dataset. We then find the set of best fit decision trees against an independent, smaller dataset for which lung cancer malignancy metadata is provided. For this small inferencing dataset, our best model achieves sensitivity and specificity of 85% and 75% respectively with a positive predictive value of 85% which is comparable to the performance of human radiologists. Furthermore, the decision trees created by this method may be considered as a starting point for refinement by medical experts into clinically usable multi-variate lung cancer scoring and diagnostic models.	翻訳日:2021-05-15 23:10:10 公開日:2020-12-17
# (参考訳) structured gromov-wasserstein barycentersによる学習グラフ Learning Graphons via Structured Gromov-Wasserstein Barycenters ( http://arxiv.org/abs/2012.05644v2 ) ライセンス: CC BY 4.0	Hongteng Xu, Dixin Luo, Lawrence Carin, Hongyuan Zha	(参考訳) 無限次元空間で定義され任意の大きさのグラフを表すgraphonと呼ばれる非パラメトリックグラフモデルを学ぶための新しい原理的手法を提案する。グラトンの理論による弱正則補題に基づいて、ステップ関数を利用してグラトンを近似する。グラノンの切断距離は、ステップ関数のグロモフ・ワッサーシュタイン距離に緩和可能であることを示す。したがって、基礎となるグラフによって生成されるグラフの集合を考えると、対応するステップ函数は与えられたグラフのグロモフ=ヴァッサーシュタインバリ中心として学習する。さらに,基本アルゴリズムである$e.g.$,学習グラフの連続性を保証するための平滑化gromov-wasserstein barycenter,および複数の構造化グラフを学ぶための混合gromov-wasserstein barycenterのいくつかの拡張と拡張を開発した。提案手法は, 従来の最先端手法の欠点を克服し, 合成データと実データの両方でそれを上回る。コードはhttps://github.com/HongtengXu/SGWB-Graphonで公開されている。 We propose a novel and principled method to learn a nonparametric graph model called graphon, which is defined in an infinite-dimensional space and represents arbitrary-size graphs. Based on the weak regularity lemma from the theory of graphons, we leverage a step function to approximate a graphon. We show that the cut distance of graphons can be relaxed to the Gromov-Wasserstein distance of their step functions. Accordingly, given a set of graphs generated by an underlying graphon, we learn the corresponding step function as the Gromov-Wasserstein barycenter of the given graphs. Furthermore, we develop several enhancements and extensions of the basic algorithm, $e.g.$, the smoothed Gromov-Wasserstein barycenter for guaranteeing the continuity of the learned graphons and the mixed Gromov-Wasserstein barycenters for learning multiple structured graphons. The proposed approach overcomes drawbacks of prior state-of-the-art methods, and outperforms them on both synthetic and real-world data. The code is available at https://github.com/HongtengXu/SGWB-Graphon.	翻訳日:2021-05-15 16:18:58 公開日:2020-12-17
# 関連遊びの視線と逐次的合理性 Hindsight and Sequential Rationality of Correlated Play ( http://arxiv.org/abs/2012.05874v2 ) ライセンス: Link先を確認	Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling	(参考訳) 2人のプレイヤーによるゼロサムゲーム解決とゲームの成功によって、ゲームにおける人工知能の作業は、均衡ベースの戦略を生み出すアルゴリズムにますます焦点が当てられている。しかし、このアプローチは、一般シュームゲームにおける有能なプレイヤーや2人以上のプレイヤーに対して、2人のプレイヤーがゼロシュームゲームよりも効果的ではない。魅力的な選択肢は、修正された動作で達成できたことに対して、後見の強いパフォーマンスを保証する適応アルゴリズムを検討することである。このアプローチはまた、ゲーム理論的な分析につながるが、均衡におけるエージェントの行動の因子ではなく、共同学習のダイナミクスから生じる相関プレイにおいて生じる。我々は,学習の隠れた合理的な枠組みを,逐次的意思決定の場面で開発し,提唱する。この目的のために、我々は広範形式のゲームにおける平衡と偏差の型を再検討し、過去の誤解をより完全に理解し解決する。我々は,文献における各種類の平衡の強さと弱さを示す一連の例を示し,他のすべての概念に従わないことを証明した。この調査の行は、反実的後悔最小化(CFR)ファミリーのアルゴリズムに対応する偏差と平衡のクラスの定義において、文学における他のすべてのものと関係している。 cfrをより詳細に調べると、後見評価に自然に適用される方法で逐次合理性を拡張する相関遊びにおける合理性の新しい再帰的な定義がもたらされる。 Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.	翻訳日:2021-05-15 06:14:23 公開日:2020-12-17
# (参考訳) 単一衛星画像からのストリートビューパノラマ映像合成 Street-view Panoramic Video Synthesis from a Single Satellite Image ( http://arxiv.org/abs/2012.06628v2 ) ライセンス: CC BY 4.0	Zuoyue Li, Zhaopeng Cui, Martin R. Oswald, Marc Pollefeys	(参考訳) 本研究では,1つの衛星画像とカメラ軌道から時間的および幾何学的に一貫したストリートビューパノラマ映像を合成する方法を提案する。既存のクロスビュー合成アプローチは画像にフォーカスしているが、このような場合のビデオ合成はまだ十分に注目されていない。単一画像合成アプローチは、ビデオの重要な特性である時間的一貫性が欠如しているため、ビデオ合成には適していない。この目的のために,我々は3dポイントクラウド表現を明示的に作成し,衛星画像から推定した幾何学的シーン構成を反映したフレーム間の密接な3d-2d対応を維持する。我々は,セマンティクスとクラス毎の潜在ベクトルからポイントクラウドを色分けするために,2つの時間ガラスモジュールを備えたカスケードネットワークアーキテクチャを実装した。生成したストリートビュービデオフレームは3次元の幾何学的シーン構造に従属し,時間的一貫性を維持する。定性的かつ定量的な実験は、時間的あるいは幾何学的整合性に欠ける他の最先端のクロスビュー合成手法よりも優れた結果を示す。私たちの知る限りでは、クロスビュー画像をビデオに合成する最初の作品です。 We present a novel method for synthesizing both temporally and geometrically consistent street-view panoramic video from a given single satellite image and camera trajectory. Existing cross-view synthesis approaches focus more on images, while video synthesis in such a case has not yet received enough attention. Single image synthesis approaches are not well suited for video synthesis since they lack temporal consistency which is a crucial property of videos. To this end, our approach explicitly creates a 3D point cloud representation of the scene and maintains dense 3D-2D correspondences across frames that reflect the geometric scene configuration inferred from the satellite view. We implement a cascaded network architecture with two hourglass modules for successive coarse and fine generation for colorizing the point cloud from the semantics and per-class latent vectors. By leveraging computed correspondences, the produced street-view video frames adhere to the 3D geometric scene structure and maintain temporal consistency. Qualitative and quantitative experiments demonstrate superior results compared to other state-of-the-art cross-view synthesis approaches that either lack temporal or geometric consistency. To the best of our knowledge, our work is the first work to synthesize cross-view images to video.	翻訳日:2021-05-11 04:35:59 公開日:2020-12-17
# D$^2$IM-Net: 単一画像から詳細な遠方界を学習する D$^2$IM-Net: Learning Detail Disentangled Implicit Fields from Single Images ( http://arxiv.org/abs/2012.06650v2 ) ライセンス: Link先を確認	Manyi Li, Hao Zhang	(参考訳) 地形形状と表面特徴の両方を含む入力画像から幾何学的詳細を復元することを目的とした,最初の単一ビュー3D再構成ネットワークを提案する。私たちのキーとなるアイデアは、粗い3D形状を表す暗黙のフィールドと細部をキャプチャするフィールドの2つの機能からなる、細部が絡み合った再構築をネットワークに教えることです。入力画像が与えられた場合、D$^2$IM-Netと呼ばれるネットワークは、これをグローバルとローカルの2つのデコーダにエンコードする。ベースデコーダは、大域的特徴を用いて、粗い暗黙のフィールドを再構築する一方、詳細デコーダは、局所的な特徴から、捕獲対象の前後に定義された2つの変位マップを再構成する。最後の3D再構成は、ベース形状と変位マップの融合であり、3つの損失は、新しいラプラシアン項による粗い形状、全体構造、表面の細部を回復させる。 We present the first single-view 3D reconstruction network aimed at recovering geometric details from an input image which encompass both topological shape structures and surface features. Our key idea is to train the network to learn a detail disentangled reconstruction consisting of two functions, one implicit field representing the coarse 3D shape and the other capturing the details. Given an input image, our network, coined D$^2$IM-Net, encodes it into global and local features which are respectively fed into two decoders. The base decoder uses the global features to reconstruct a coarse implicit field, while the detail decoder reconstructs, from the local features, two displacement maps, defined over the front and back sides of the captured object. The final 3D reconstruction is a fusion between the base shape and the displacement maps, with three losses enforcing the recovery of coarse shape, overall structure, and surface details via a novel Laplacian term.	翻訳日:2021-05-11 02:56:45 公開日:2020-12-17
# 抽象概念の出現に関する学習的視点--音素の奇妙な場合 A learning perspective on the emergence of abstractions: the curious case of phonemes ( http://arxiv.org/abs/2012.07499v3 ) ライセンス: Link先を確認	Petar Milin, Benjamin V. Tucker, and Dagmar Divjak	(参考訳) 本稿では,音声への露出から抽象電話が出現するかどうかを,様々なモデリング手法を用いて検証する。言語訓練を受けていない言語ユーザにおける言語知識の発達に関する2つの反対原理を,メモリベースラーニング(MBL)とエラー補正ラーニング(ECL)で検証する。一般化のプロセスは、言語学者が操作する抽象概念の基盤となり、言語抽象に類似した言語知識をMBLとECLが生み出すかどうかを調査した。各モデルには1人の話者が生成した大量の事前処理音声が提示された。モデルが学んだことの一貫性や安定性、そして抽象的なカテゴリを生み出す能力を評価しました。どちらのモデルもこれらのテストに関して異なる。 ECL学習モデルは抽象化を学習でき、少なくとも携帯電話の在庫の少なくとも一部を入力から確実に識別できることを示す。 In the present paper we use a range of modeling techniques to investigate whether an abstract phone could emerge from exposure to speech sounds. We test two opposing principles regarding the development of language knowledge in linguistically untrained language users: Memory-Based Learning (MBL) and Error-Correction Learning (ECL). A process of generalization underlies the abstractions linguists operate with, and we probed whether MBL and ECL could give rise to a type of language knowledge that resembles linguistic abstractions. Each model was presented with a significant amount of pre-processed speech produced by one speaker. We assessed the consistency or stability of what the models have learned and their ability to give rise to abstract categories. Both types of models fare differently with regard to these tests. We show that ECL learning models can learn abstractions and that at least part of the phone inventory can be reliably identified from the input.	翻訳日:2021-05-08 14:45:50 公開日:2020-12-17
# (参考訳) ドメイン適応意味セグメンテーションのためのクロスドメイングルーピングとアライメント Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation ( http://arxiv.org/abs/2012.08226v2 ) ライセンス: CC BY 4.0	Minsu Kim, Sunghun Joung, Seungryong Kim, JungIn Park, Ig-Jae Kim, Kwanghoon Sohn	(参考訳) deep convolutional neural network(cnns)内のソースドメインとターゲットドメインにセマンティクスセグメンテーションネットワークを適用する既存の技術では、グローバルあるいはカテゴリ対応の方法で、2つのドメインのすべてのサンプルを処理する。彼らは対象ドメイン自体や推定カテゴリ内のクラス間変異を考慮せず、マルチモーダルデータ分布を持つドメインをエンコードする制限を提供する。この制限を克服するために,学習可能なクラスタリングモジュールと,クロスドメイングルーピングとアライメントと呼ばれる新しいドメイン適応フレームワークを導入する。ソースドメインの正確なセグメンテーション能力を忘れずにドメインのアライメントを最大化する目的で、サンプルをクラスタリングするために、2つの損失関数、特にクラスタ間のセマンティック一貫性と直交性を促進するために提案する。また,従来の方法の他の限界であるクラス不均衡問題を解くために損失も提示する。実験の結果,提案手法はセマンティックセグメンテーションにおける適応性能を継続的に向上し,ドメイン適応設定における最先端性よりも優れていた。 Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) deal with all the samples from the two domains in a global or category-aware manner. They do not consider an inter-class variation within the target domain itself or estimated category, providing the limitation to encode the domains having a multi-modal data distribution. To overcome this limitation, we introduce a learnable clustering module, and a novel domain adaptation framework called cross-domain grouping and alignment. To cluster the samples across domains with an aim to maximize the domain alignment without forgetting precise segmentation ability on the source domain, we present two loss functions, in particular, for encouraging semantic consistency and orthogonality among the clusters. We also present a loss so as to solve a class imbalance problem, which is the other limitation of the previous methods. Our experiments show that our method consistently boosts the adaptation performance in semantic segmentation, outperforming the state-of-the-arts on various domain adaptation settings.	翻訳日:2021-05-08 06:13:23 公開日:2020-12-17
# (参考訳) SimpleChrome: 遺伝子発現予測のためのコンビネーションエフェクトのエンコード SimpleChrome: Encoding of Combinatorial Effects for Predicting Gene Expression ( http://arxiv.org/abs/2012.08671v2 ) ライセンス: CC BY 4.0	Wei Cheng, Ghulam Murtaza, Aaron Wang	(参考訳) 最先端のDNAシークエンシング技術の進歩により、ゲノムデータセットはユビキタスになった。大規模データセットの出現はゲノム学、特に遺伝子制御の理解を深める大きな機会となる。人体の各細胞は同じDNA情報を含んでいるが、遺伝子発現は遺伝子発現レベルとして知られる遺伝子をオンまたはオフすることでこれらの細胞の機能を制御する。それぞれの遺伝子の発現レベルを制御する重要な因子は2つあり、(1)ヒストン修飾などの遺伝子制御は遺伝子発現を直接制御することができる。 2) 隣り合う遺伝子は機能的に関連し, 相互に相互作用し, 遺伝子発現のレベルにも影響を及ぼす。前者は注意に基づくモデルを用いて対処しようと試みてきた。しかし、第二の問題に対処するには、モデルに潜在的なすべての遺伝子情報を組み込む必要がある。現代の機械学習とディープラーニングモデルは、中程度のサイズのデータに適用すると遺伝子発現信号をキャプチャできるが、データの高次元性によってデータの基盤となるシグナルを回復するのに苦労している。この問題を解決するために,遺伝子に潜伏したヒストン修飾表現を学習する深層学習モデルSimpleChromeを提案する。このモデルから得られた特徴は、遺伝子間相互作用と直接的遺伝子発現制御の組合せ効果をよりよく理解することを可能にする。本論文は,下流モデルの予測能力を大幅に改善し,頑健で汎用的なニューラルネットワークを学習するための大規模データセットの必要性を大幅に緩和することを示す。これらの結果はエピゲノミクス研究と薬物開発に直ちに下流効果をもたらす。 Due to recent breakthroughs in state-of-the-art DNA sequencing technology, genomics data sets have become ubiquitous. The emergence of large-scale data sets provides great opportunities for better understanding of genomics, especially gene regulation. Although each cell in the human body contains the same set of DNA information, gene expression controls the functions of these cells by either turning genes on or off, known as gene expression levels. There are two important factors that control the expression level of each gene: (1) Gene regulation such as histone modifications can directly regulate gene expression. (2) Neighboring genes that are functionally related to or interact with each other that can also affect gene expression level. Previous efforts have tried to address the former using Attention-based model. However, addressing the second problem requires the incorporation of all potentially related gene information into the model. Though modern machine learning and deep learning models have been able to capture gene expression signals when applied to moderately sized data, they have struggled to recover the underlying signals of the data due to the nature of the data's higher dimensionality. To remedy this issue, we present SimpleChrome, a deep learning model that learns the latent histone modification representations of genes. The features learned from the model allow us to better understand the combinatorial effects of cross-gene interactions and direct gene regulation on the target gene expression. The results of this paper show outstanding improvements on the predictive capabilities of downstream models and greatly relaxes the need for a large data set to learn a robust, generalized neural network. These results have immediate downstream effects in epigenomics research and drug development.	翻訳日:2021-05-07 06:50:20 公開日:2020-12-17
# ベイズ最適化における構成最適化の課題 Are we Forgetting about Compositional Optimisers in Bayesian Optimisation? ( http://arxiv.org/abs/2012.08240v2 ) ライセンス: Link先を確認	Antoine Grosnit, Alexander I. Cowen-Rivers, Rasul Tutunov, Ryan-Rhys Griffiths, Jun Wang, Haitham Bou-Ammar	(参考訳) ベイズ最適化は、グローバル最適化のためのサンプル効率のよい方法論を提供する。このフレームワークの中で重要な性能決定サブルーチンは、取得関数の最大化であり、取得関数は非凸であり、したがって最適化が非自明であるという事実に複雑である。本稿では,取得関数を最大化するためのアプローチに関する包括的実証研究を行う。加えて、人気獲得関数の新規かつ数学的に等価な合成形式を導出することにより、最大化タスクを構成最適化問題として再キャストし、この分野の広範な文献から恩恵を受けることができる。合成最適化タスクとベイズマルクのタスクからなる3958個の個別実験に対して, 獲得関数の最大化に対する構成的アプローチの実証的利点を強調した。獲得関数最大化サブルーチンの一般性を考えると、合成オプティマイザの採用はベイズ最適化が現在適用されているすべての領域で性能改善をもたらす可能性があると仮定する。 Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.	翻訳日:2021-05-07 05:34:30 公開日:2020-12-17
# 人間行動の起源における接地人工知能 Grounding Artificial Intelligence in the Origins of Human Behavior ( http://arxiv.org/abs/2012.08564v2 ) ライセンス: Link先を確認	Eleni Nisioti and Cl\'ement Moulin-Frier	(参考訳) 人工知能(AI)の最近の進歩は、オープンエンドのスキルのレパートリーを獲得できるエージェントの探求を復活させた。しかしながら、この能力は人間の知性の特徴と基本的に関係しているが、この分野での研究は、種の進化の過程で複雑な認知能力の出現を導く過程をほとんど考慮していない。人間行動生態学(HBE)の研究は、人間の自然を特徴づける行動が、我々の生態学的ニッチの構造に大きな変化に対する適応的な反応としてどのように考えられるかを理解することを目指している。本稿では,HBEの大きな仮説と近年の強化学習(RL)への貢献に基づく,オープンエンドスキル獲得における環境複雑性の役割を強調する枠組みを提案する。このフレームワークは、この2つの分野の基本的なリンクを強調し、生態系の複雑さをブートストラップするフィードバックループを特定し、AI研究者にとって有望な研究方向を作成するために使用します。 Recent advances in Artificial Intelligence (AI) have revived the quest for agents able to acquire an open-ended repertoire of skills. However, although this ability is fundamentally related to the characteristics of human intelligence, research in this field rarely considers the processes that may have guided the emergence of complex cognitive capacities during the evolution of the species. Research in Human Behavioral Ecology (HBE) seeks to understand how the behaviors characterizing human nature can be conceived as adaptive responses to major changes in the structure of our ecological niche. In this paper, we propose a framework highlighting the role of environmental complexity in open-ended skill acquisition, grounded in major hypotheses from HBE and recent contributions in Reinforcement learning (RL). We use this framework to highlight fundamental links between the two disciplines, as well as to identify feedback loops that bootstrap ecological complexity and create promising research directions for AI researchers.	翻訳日:2021-05-07 05:26:54 公開日:2020-12-17
# (参考訳) ドメイン適応人物再同定におけるサンプル不確かさの活用 Exploiting Sample Uncertainty for Domain Adaptive Person Re-Identification ( http://arxiv.org/abs/2012.08733v2 ) ライセンス: CC BY 4.0	Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang and Zheng-Jun Zha	(参考訳) unsupervised domain adaptive (uda) person re-identification (reid) アプローチの多くはクラスタリングに基づく擬似ラベル予測と特徴の微調整を組み合わせたものである。しかし、ドメインギャップのため、擬似ラベルは必ずしも信頼性がなく、ノイズ/誤りラベルが存在する。これは機能表現学習を誤解し、パフォーマンスを低下させる。本稿では,各試料に割り当てられた擬似ラベルの信頼性を推定・活用し,ノイズラベルの影響を軽減し,ノイズサンプルの寄与を抑制することを提案する。平均教師法を併用したベースラインフレームワークの構築と,さらに対照的な損失を生じさせる。我々は,クラスタリングによって間違った擬似ラベルを持つサンプルが,平均教師モデルと学生モデルの出力との整合性が弱いことを観察した。そこで本研究では,サンプルの擬似ラベルの信頼性評価に不確実性(一貫性レベルによって測定される)を活用し,サンプルごとのID分類損失,三重項損失,コントラスト損失など,様々なReID損失に再重み付けする不確実性を導入することを提案する。不確実性に基づく最適化は大幅な改善をもたらし、ベンチマークデータセットにおける最先端のパフォーマンスを達成します。 Many unsupervised domain adaptive (UDA) person re-identification (ReID) approaches combine clustering-based pseudo-label prediction with feature fine-tuning. However, because of domain gap, the pseudo-labels are not always reliable and there are noisy/incorrect labels. This would mislead the feature representation learning and deteriorate the performance. In this paper, we propose to estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels, by suppressing the contribution of noisy samples. We build our baseline framework using the mean teacher method together with an additional contrastive loss. We have observed that a sample with a wrong pseudo-label through clustering in general has a weaker consistency between the output of the mean teacher model and the student model. Based on this finding, we propose to exploit the uncertainty (measured by consistency levels) to evaluate the reliability of the pseudo-label of a sample and incorporate the uncertainty to re-weight its contribution within various ReID losses, including the identity (ID) classification loss per sample, the triplet loss, and the contrastive loss. Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.	翻訳日:2021-05-06 11:49:34 公開日:2020-12-17
# フェイクニュースにおけるテーマコヒーレンスの検討 Exploring Thematic Coherence in Fake News ( http://arxiv.org/abs/2012.09118v2 ) ライセンス: Link先を確認	Martins Samuel Dogo, Deepak P, Anna Jurek-Loughrey	(参考訳) 偽ニュースの拡散は依然として深刻な世界的な問題であり、理解と削減が最重要課題である。偽りの物語と真実の物語を区別する一つの方法は、その一貫性を分析することである。本研究は,インターネット上で共有されるクロスドメインニュースのコヒーレンスを分析するためのトピックモデルの利用について検討する。 7つのクロスドメインデータセットによる実験結果から、偽ニュースはその開始文と残りの文との主題的なずれが大きいことが示されている。 The spread of fake news remains a serious global issue; understanding and curtailing it is paramount. One way of differentiating between deceptive and truthful stories is by analyzing their coherence. This study explores the use of topic models to analyze the coherence of cross-domain news shared online. Experimental results on seven cross-domain datasets demonstrate that fake news shows a greater thematic deviation between its opening sentences and its remainder.	翻訳日:2021-05-03 03:00:42 公開日:2020-12-17
# (参考訳) ニューラルマッチングとファセット要約を用いた精密医学のための文献検索 Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization ( http://arxiv.org/abs/2012.09355v1 ) ライセンス: CC BY 4.0	Jiho Noh and Ramakanth Kavuluru	(参考訳) 精度医学(PM)のための情報検索(IR)は、患者を特徴づける複数の証拠を探すことを伴うことが多い。これは典型的には、患者に適用される少なくとも状態の名前と遺伝的変異を含む。その他の要因として、人口属性、同義性、社会的決定性などがある。このように、検索問題は、しばしばアドホック検索として定式化されるが、複数のファセット(例えば、病気、突然変異)を組み込む必要がある。本稿では,このような検索シナリオに対して,ニューラルクエリ文書マッチングとテキスト要約を組み合わせた文書再分類手法を提案する。アーキテクチャは基本的なBERTモデルに基づいており、3つの特定のコンポーネントを並べ替えています。 document-query matching (b) キーワード抽出と(c)。 facet-conditioned abstractive summarization b) と (c) の結果は、候補者の文書を本質的に簡潔な要約に変換するために使用され、これは手元のクエリと比較して関連度スコアを計算することができる。コンポーネント(a)は、クエリの候補文書のマッチングスコアを直接生成する。完全なアーキテクチャは、文書クエリマッチングの補完的なポテンシャルと、PMファセットに沿った要約に基づく新しい文書変換アプローチの恩恵を受ける。 NIST の TREC-PM トラックデータセット (2017-2019) を用いて評価した結果,本モデルが最先端の性能を達成することが示された。再現性を高めるために、私たちのコードはここで利用可能です。 Information retrieval (IR) for precision medicine (PM) often involves looking for multiple pieces of evidence that characterize a patient case. This typically includes at least the name of a condition and a genetic variation that applies to the patient. Other factors such as demographic attributes, comorbidities, and social determinants may also be pertinent. As such, the retrieval problem is often formulated as ad hoc search but with multiple facets (e.g., disease, mutation) that may need to be incorporated. In this paper, we present a document reranking approach that combines neural query-document matching and text summarization toward such retrieval scenarios. Our architecture builds on the basic BERT model with three specific components for reranking: (a). document-query matching (b). keyword extraction and (c). facet-conditioned abstractive summarization. The outcomes of (b) and (c) are used to essentially transform a candidate document into a concise summary that can be compared with the query at hand to compute a relevance score. Component (a) directly generates a matching score of a candidate document for a query. The full architecture benefits from the complementary potential of document-query matching and the novel document transformation approach based on summarization along PM facets. Evaluations using NIST's TREC-PM track datasets (2017--2019) show that our model achieves state-of-the-art performance. To foster reproducibility, our code is made available here: https://github.com/bionlproc/text-summ-for-doc-retrieval.	翻訳日:2021-05-03 00:34:49 公開日:2020-12-17
# (参考訳) フリーフォームテキストの自動処理による大学生への影響評価 Assessing COVID-19 Impacts on College Students via Automated Processing of Free-form Text ( http://arxiv.org/abs/2012.09369v1 ) ライセンス: CC BY 4.0	Ravi Sharma, Sri Divya Pagadala, Pratool Bharti, Sriram Chellappan, Trine Schmidt and Raj Goyal	(参考訳) 本稿では,covid-19が大学生に与える影響を,学生が生成した自由形式のテキストを処理して評価する実験結果について報告する。フリーテキスト(free-form texts)とは、大学生(米国大学4年中)が投稿したテキスト入力を、メンタルヘルスの評価と改善に特化したアプリを通じて意味する。 1451人の学生が4ヶ月以上(前と後)に収集した9000以上のテキストからなるデータセットを用いて、NLP技術を確立し、(a)学生の変化に最も関心を持つトピックが、(b)学生が前と後の各トピックで示す感情がどのように評価されるかを評価する。私たちの分析によると、新型コロナウイルス(COVID-19)後、学生にとって教育のようなトピックは明らかに重要ではなくなり、健康はより傾向が増した。また、新型コロナウイルス(covid-19)後の学生のネガティブな感情は、covid-19以前のものよりもずっと高かった。本研究は,大学管理者,教師,親,精神保健カウンセラーなど,さまざまな分野の高等教育政策立案者に与える影響を期待する。 In this paper, we report experimental results on assessing the impact of COVID-19 on college students by processing free-form texts generated by them. By free-form texts, we mean textual entries posted by college students (enrolled in a four year US college) via an app specifically designed to assess and improve their mental health. Using a dataset comprising of more than 9000 textual entries from 1451 students collected over four months (split between pre and post COVID-19), and established NLP techniques, a) we assess how topics of most interest to student change between pre and post COVID-19, and b) we assess the sentiments that students exhibit in each topic between pre and post COVID-19. Our analysis reveals that topics like Education became noticeably less important to students post COVID-19, while Health became much more trending. We also found that across all topics, negative sentiment among students post COVID-19 was much higher compared to pre-COVID-19. We expect our study to have an impact on policy-makers in higher education across several spectra, including college administrators, teachers, parents, and mental health counselors.	翻訳日:2021-05-03 00:17:27 公開日:2020-12-17
# (参考訳) masker: 信頼できるテキスト分類のためのマスク付きキーワード正規化 MASKER: Masked Keyword Regularization for Reliable Text Classification ( http://arxiv.org/abs/2012.09392v1 ) ライセンス: CC BY 4.0	Seung Jun Moon, Sangwoo Mo, Kimin Lee, Jaeho Lee, Jinwoo Shin	(参考訳) 事前訓練された言語モデルは、感情分析、自然言語推論、意味的なテキスト類似性など、様々なテキスト分類タスクにおいて最先端の精度を達成した。しかし、微調整テキスト分類器の信頼性は、しばしば見当たらない性能基準である。例えば、オフ・オブ・ディストリビューション(OOD)サンプル(トレーニング分布から遠く離れている)を検出したり、ドメインシフトに対して堅牢なモデルが欲しい場合もあります。信頼性に対する1つの大きな障害は、コンテキスト全体を見るのではなく、限られた数のキーワードでモデルの過度な信頼関係にあると主張する。特に, (a) OOD サンプルは分布内キーワードを含むことが多いが, (b) クロスドメインサンプルは必ずしもキーワードを含むとは限らない。そこで本研究では,文脈に基づく予測を容易にする簡易かつ効果的な微調整手法であるマスク付きキーワード正規化(MASKER)を提案する。 maskerはモデルを規則化し、他の単語からキーワードを再構築し、十分な文脈なしに低信頼の予測を行う。各種事前学習言語モデル(BERT,RoBERTa,ALBERT)に適用した場合,MASKERは分類精度を低下させることなくOODの検出とドメイン間一般化を改善する。コードはhttps://github.com/alinlab/MASKERで入手できる。 Pre-trained language models have achieved state-of-the-art accuracies on various text classification tasks, e.g., sentiment analysis, natural language inference, and semantic textual similarity. However, the reliability of the fine-tuned text classifiers is an often underlooked performance criterion. For instance, one may desire a model that can detect out-of-distribution (OOD) samples (drawn far from training distribution) or be robust against domain shifts. We claim that one central obstacle to the reliability is the over-reliance of the model on a limited number of keywords, instead of looking at the whole context. In particular, we find that (a) OOD samples often contain in-distribution keywords, while (b) cross-domain samples may not always contain keywords; over-relying on the keywords can be problematic for both cases. In light of this observation, we propose a simple yet effective fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. When applied to various pre-trained language models (e.g., BERT, RoBERTa, and ALBERT), we demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy. Code is available at https://github.com/alinlab/MASKER.	翻訳日:2021-05-03 00:06:30 公開日:2020-12-17
# (参考訳) オンラインマシン学習アドバイスを用いた計量タスクシステム Metrical Task Systems with Online Machine Learned Advice ( http://arxiv.org/abs/2012.09394v1 ) ライセンス: CC BY 4.0	Kevin Rao	(参考訳) 機械学習アルゴリズムは、既存のデータに基づいて、将来の正確な予測を行うように設計されているが、オンラインアルゴリズムは、将来を知らずに、いくつかのパフォーマンス指標(通常、競争比率)に縛り付けようとしている。 lykourisとvassilvitskiiは、オンラインアルゴリズムを機械学習予測器で拡張することで、予測器が適当に正確である限り、競争比が確実に低下することを示した。そこで本稿では,boodin,linial,saks らによって提起されたオンライン計量タスクシステム問題に対して,動的システム処理タスクの汎用モデルとして,この概念を適用した。我々は、$n$タスク上の一様タスクシステムの特定のクラスに焦点を当て、最良の決定論的アルゴリズムは$O(n)$競争であり、最良のランダム化アルゴリズムは$O(\log n)$競争である。オンラインのアルゴリズムで学習したオラクルに絶対的な予測誤差を$\eta_0$で有界でアクセスすることで、メートル法タスクシステムの一様問題に対して$\Theta(\min(\sqrt{\eta_0}, \log n))$の競合アルゴリズムを構築する。また、任意のランダム化アルゴリズムの競合比に対して、$\Theta(\log \sqrt{\eta_0})$低い境界を与える。 Machine learning algorithms are designed to make accurate predictions of the future based on existing data, while online algorithms seek to bound some performance measure (typically the competitive ratio) without knowledge of the future. Lykouris and Vassilvitskii demonstrated that augmenting online algorithms with a machine learned predictor can provably decrease the competitive ratio under as long as the predictor is suitably accurate. In this work we apply this idea to the Online Metrical Task System problem, which was put forth by Borodin, Linial, and Saks as a general model for dynamic systems processing tasks in an online fashion. We focus on the specific class of uniform task systems on $n$ tasks, for which the best deterministic algorithm is $O(n)$ competitive and the best randomized algorithm is $O(\log n)$ competitive. By giving an online algorithms access to a machine learned oracle with absolute predictive error bounded above by $\eta_0$, we construct a $\Theta(\min(\sqrt{\eta_0}, \log n))$ competitive algorithm for the uniform case of the metrical task systems problem. We also give a $\Theta(\log \sqrt{\eta_0})$ lower bound on the competitive ratio of any randomized algorithm.	翻訳日:2021-05-02 23:48:21 公開日:2020-12-17
# (参考訳) 組成制約下での確率的組成勾配降下 Stochastic Compositional Gradient Descent under Compositional constraints ( http://arxiv.org/abs/2012.09400v1 ) ライセンス: CC BY 4.0	Srujan Teja Thomdapu, Harshvardhan, Ketan Rajawat	(参考訳) 本研究は、目的関数と制約関数が凸であり、確率関数の合成として表現される確率的最適化問題を制約した。この問題は、公正な分類、公平な回帰、およびキューシステムの設計という文脈で生じる。特に興味深いのは、オラクルが構成関数の確率的勾配を提供する大規模な設定であり、その目的は、オラクルへの最小限の呼び出しで問題を解決することである。この問題は、公平な分類/回帰とキューシステムの設計に生じる。構成形式により、オラクルによって提供される確率勾配は、目的あるいは制約勾配の偏りのない見積もりを生じさせない。代わりに, 内関数評価を追跡することで近似勾配を構築し, 準次saddle pointアルゴリズムを導出する。提案アルゴリズムは最適かつ実現可能な解をほぼ確実に見つけることが保証されている。さらに、提案アルゴリズムでは、制約違反をゼロにしつつ、$\epsilon$-approximate の最適点を得るために$\mathcal{o}(1/\epsilon^4)$ データサンプルが必要であることも確認する。その結果、制約のない問題に対する確率的組成勾配降下法のサンプル複雑性が一致し、制約付き設定の最もよく知られたサンプル複雑性結果が改善される。提案アルゴリズムの有効性は、公平な分類と公平な回帰問題の両方で検証される。数値計算の結果,提案アルゴリズムは収束率の観点から最先端のアルゴリズムよりも優れていた。 This work studies constrained stochastic optimization problems where the objective and constraint functions are convex and expressed as compositions of stochastic functions. The problem arises in the context of fair classification, fair regression, and the design of queuing systems. Of particular interest is the large-scale setting where an oracle provides the stochastic gradients of the constituent functions, and the goal is to solve the problem with a minimal number of calls to the oracle. The problem arises in fair classification/regression and in the design of queuing systems. Owing to the compositional form, the stochastic gradients provided by the oracle do not yield unbiased estimates of the objective or constraint gradients. Instead, we construct approximate gradients by tracking the inner function evaluations, resulting in a quasi-gradient saddle point algorithm. We prove that the proposed algorithm is guaranteed to find the optimal and feasible solution almost surely. We further establish that the proposed algorithm requires $\mathcal{O}(1/\epsilon^4)$ data samples in order to obtain an $\epsilon$-approximate optimal point while also ensuring zero constraint violation. The result matches the sample complexity of the stochastic compositional gradient descent method for unconstrained problems and improves upon the best-known sample complexity results for the constrained settings. The efficacy of the proposed algorithm is tested on both fair classification and fair regression problems. The numerical results show that the proposed algorithm outperforms the state-of-the-art algorithms in terms of the convergence rate.	翻訳日:2021-05-02 23:35:09 公開日:2020-12-17
# (参考訳) 人工知能が3d頂点の重要性を命令 Artificial Intelligence ordered 3D vertex importance ( http://arxiv.org/abs/2012.10232v1 ) ライセンス: CC BY 4.0	Iva Vasic, Bata Vasic, and Zorica Nikolic	(参考訳) 多次元ネットワークのランキング頂点は、決定の重要性の選択と決定を含む多くの研究分野において重要である。いくつかの決定は他の決定よりも著しく重要であり、その重みの分類もまた不道徳である。本稿では,3次元ネットワーク頂点の重み付けのための人工知能を用いた重み付け決定手法を新たに定義し,量子化インデックス(qim)と誤り訂正符号の変調に基づいて,既存の順序統計頂点抽出追跡アルゴリズム(osveta)を改善した。本稿では,最新のニューラルネットワークの正確な予測手法をヒューリスティック手法に置き換え,統計的OSVETA基準によるネットワーク頂点の重要度決定の効率を大幅に向上させる手法を提案する。新しい人工知能技術により、3dメッシュの定義が大幅に改善され、トポロジカルな特徴をより良く評価できる。新たな手法により,安定頂点の定義精度が向上し,メッシュ頂点の削除確率が大幅に低下する。 Ranking vertices of multidimensional networks is crucial in many areas of research, including selecting and determining the importance of decisions. Some decisions are significantly more important than others, and their weight categorization is also imortant. This paper defines a completely new method for determining the weight decisions using artificial intelligence for importance ranking of three-dimensional network vertices, improving the existing Ordered Statistics Vertex Extraction and Tracking Algorithm (OSVETA) based on modulation of quantized indices (QIM) and error correction codes. The technique we propose in this paper offers significant improvements the efficiency of determination the importance of network vertices in relation to statistical OSVETA criteria, replacing heuristic methods with methods of precise prediction of modern neural networks. The new artificial intelligence technique enables a significantly better definition of the 3D meshes and a better assessment of their topological features. The new method contributions result in a greater precision in defining stable vertices, significantly reducing the probability of deleting mesh vertices.	翻訳日:2021-05-02 22:15:42 公開日:2020-12-17
# (参考訳) モーメントの変分法 The Variational Method of Moments ( http://arxiv.org/abs/2012.09422v1 ) ライセンス: CC BY 4.0	Andrew Bennett, Nathan Kallus	(参考訳) 条件モーメント問題は、可観測性の観点から構造因果パラメータを記述するための強力な定式化である。標準的なアプローチは、問題を限界モーメント条件の有限集合に還元し、最適に重み付けされたモーメントの一般化法(OWGMM)を適用することであるが、これは有限個のモーメントの特定を知っていなければならない。 OWGMMの変分極小修正により、条件モーメント問題に対する非常に一般的な推定器のクラスを定義し、このクラスはモーメントの変分法(VMM)と呼ばれ、無限個のモーメントを自然に制御できる。我々は、カーネル法やニューラルネットワークに基づく複数のVMM推定器の詳細な理論的解析を行い、これらの推定器が完全条件モーメントモデルにおいて一貫性があり、漸近的に正常であり、半パラメトリック的に効率的である適切な条件を提供する。これは、最適重み付けを組み込まず、漸近正規性を確立せず、半パラメトリック的に効率が良くない逆機械学習に基づく条件モーメント問題を解決する他の方法とは対照的である。 The conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables, a prominent example being instrumental variable regression. A standard approach is to reduce the problem to a finite set of marginal moment conditions and apply the optimally weighted generalized method of moments (OWGMM), but this requires we know a finite set of identifying moments, can still be inefficient even if identifying, or can be unwieldy and impractical if we use a growing sieve of moments. Motivated by a variational minimax reformulation of OWGMM, we define a very general class of estimators for the conditional moment problem, which we term the variational method of moments (VMM) and which naturally enables controlling infinitely-many moments. We provide a detailed theoretical analysis of multiple VMM estimators, including based on kernel methods and neural networks, and provide appropriate conditions under which these estimators are consistent, asymptotically normal, and semiparametrically efficient in the full conditional moment model. This is in contrast to other recently proposed methods for solving conditional moment problems based on adversarial machine learning, which do not incorporate optimal weighting, do not establish asymptotic normality, and are not semiparametrically efficient.	翻訳日:2021-05-02 20:43:15 公開日:2020-12-17
# (参考訳) Maximum EntropyはMaximum Likelihoodと競合する Maximum Entropy competes with Maximum Likelihood ( http://arxiv.org/abs/2012.09430v1 ) ライセンス: CC BY 4.0	A.E. Allahverdyan and N.H. Martirosyan	(参考訳) 最大エントロピー(MAXENT)法は、未知の確率を推定するための便利な非パラメトリックツールを提供するため、理論的および応用機械学習に多くの応用がある。この方法は確率的推論に対する統計物理学の大きな貢献である。しかし、その妥当性の限界に対する体系的なアプローチは現在欠落している。ここでは、ベイズ決定論においてMAXENTを研究する。未知の確率に対してよく定義されたディリクレ密度が存在すると仮定し、様々な推定器の品質と適用性を決定するために平均カルバック・リーブラー距離(KL)を用いることができる。これらは、様々なMAXENT制約の関連性を評価し、その一般的な適用性を確認し、MAXENTを以前のvizに様々な依存度を持つ推定器と比較することができる。正規化された最大可能性(ML)とベイズ推定器。 MAXENTはスパースデータレジームに適用されるが、特定の種類の事前情報を必要とする。特にMAXENTは、推定されたランダム量とその確率の間に事前のランク相関が存在することを仮定して、最適に正規化されたMLより優れている。 Maximum entropy (MAXENT) method has a large number of applications in theoretical and applied machine learning, since it provides a convenient non-parametric tool for estimating unknown probabilities. The method is a major contribution of statistical physics to probabilistic inference. However, a systematic approach towards its validity limits is currently missing. Here we study MAXENT in a Bayesian decision theory set-up, i.e. assuming that there exists a well-defined prior Dirichlet density for unknown probabilities, and that the average Kullback-Leibler (KL) distance can be employed for deciding on the quality and applicability of various estimators. These allow to evaluate the relevance of various MAXENT constraints, check its general applicability, and compare MAXENT with estimators having various degrees of dependence on the prior, viz. the regularized maximum likelihood (ML) and the Bayesian estimators. We show that MAXENT applies in sparse data regimes, but needs specific types of prior information. In particular, MAXENT can outperform the optimally regularized ML provided that there are prior rank correlations between the estimated random quantity and its probabilities.	翻訳日:2021-05-02 20:26:52 公開日:2020-12-17
# (参考訳) 機械学習による航空の環境影響低減を支援する Helping Reduce Environmental Impact of Aviation with Machine Learning ( http://arxiv.org/abs/2012.09433v1 ) ライセンス: CC BY 4.0	Ashish Kapoor	(参考訳) 商業航空は気候変動への最大の貢献の1つである。本稿では,飛行時間を短縮する解決策を検討することで,航空の環境への影響を低減することを提案する。具体的には、まず風速予測の改善を検討し、飛行計画立案者がより効率的なルートを見つけるためにより良い情報を利用できるようにした。第2に,風速予測の不確実性を考慮し,探索と搾取を最適に切り替えることで,目的地への最高速経路を探索する航空機のルーティング手法を提案する。 Commercial aviation is one of the biggest contributors towards climate change. We propose to reduce environmental impact of aviation by considering solutions that would reduce the flight time. Specifically, we first consider improving winds aloft forecast so that flight planners could use better information to find routes that are efficient. Secondly, we propose an aircraft routing method that seeks to find the fastest route to the destination by considering uncertainty in the wind forecasts and then optimally trading-off between exploration and exploitation.	翻訳日:2021-05-02 20:11:38 公開日:2020-12-17
# (参考訳) FG-Net:CorrelatedFeature MiningとGeometric-Aware Modelingを活用した高速大規模LiDARポイントクラウド FG-Net: Fast Large-Scale LiDAR Point CloudsUnderstanding Network Leveraging CorrelatedFeature Mining and Geometric-Aware Modelling ( http://arxiv.org/abs/2012.09439v1 ) ライセンス: CC BY-SA 4.0	Kangcheng Liu, Zhi Gao, Feng Lin, and Ben M. Chen	(参考訳) FG-Netは、1つのNVIDIA GTX 1080 GPUで正確かつリアルタイムなパフォーマンスを実現する、大規模なポイントクラウド理解のための一般的なディープラーニングフレームワークである。まず,後続の高レベルタスクを容易にするために,新しいノイズ・アウトリアーフィルタリング法を考案した。そこで本研究では,局所的特徴関係と幾何学的パターンを十分に活用できる,特徴マイニングと変形可能な畳み込みに基づく幾何認識モデルを用いた深層畳み込みニューラルネットワークを提案する。効率の面では,計算コストとメモリ消費をそれぞれ削減するために,逆密度サンプリング操作と特徴ピラミッドに基づく残差学習戦略を提案する。実世界の挑戦的データセットに関する大規模な実験は、我々のアプローチが精度と効率の点で最先端のアプローチより優れていることを示した。また,本手法の一般化能力を示すために,弱教師付き転送学習も行った。 This work presents FG-Net, a general deep learning framework for large-scale point clouds understanding without voxelizations, which achieves accurate and real-time performance with a single NVIDIA GTX 1080 GPU. First, a novel noise and outlier filtering method is designed to facilitate subsequent high-level tasks. For effective understanding purpose, we propose a deep convolutional neural network leveraging correlated feature mining and deformable convolution based geometric-aware modelling, in which the local feature relationships and geometric patterns can be fully exploited. For the efficiency issue, we put forward an inverse density sampling operation and a feature pyramid based residual learning strategy to save the computational cost and memory consumption respectively. Extensive experiments on real-world challenging datasets demonstrated that our approaches outperform state-of-the-art approaches in terms of accuracy and efficiency. Moreover, weakly supervised transfer learning is also conducted to demonstrate the generalization capacity of our method.	翻訳日:2021-05-02 19:49:07 公開日:2020-12-17
# (参考訳) 樹木オートエンコーダを用いた談話構造の教師なし学習 Unsupervised Learning of Discourse Structures using a Tree Autoencoder ( http://arxiv.org/abs/2012.09446v1 ) ライセンス: CC BY 4.0	Patrick Huber and Giuseppe Carenini	(参考訳) RSTやPDTBのような一般的な談話理論によって仮定された談話情報は、下流のNLPタスクの増加を改善し、重要な現実世界の応用と対話の肯定的な効果と相乗効果を示すことが示されている。言論を取り入れる手法はますます洗練されていくが、強固で一般的な言論構造の必要性は、通常、厳密な数のドメインで小さなデータセットで訓練された現在の言論パーサーによって十分に満たされていない。これにより、任意のタスクの予測がうるさいし、信頼できない。結果として生じる、高品質で高品質な談話ツリーの欠如は、さらなる進歩に深刻な制限をもたらす。この欠点を解消するために,潜在木誘導フレームワークを自動エンコーディング目的に拡張することにより,タスクに依存しない教師なし方式で木構造を生成する新しい手法を提案する。提案手法は,構文解析,談話解析などの木構造的目的に適用可能である。しかし,談話木を生成するのに特に難しいアノテーションプロセスのため,まず,より大きく多様な談話木バンクを生成する方法を開発した。本稿では,複数の領域における自然文の一般的な木構造を推定し,様々なタスクで有望な結果を示す。 Discourse information, as postulated by popular discourse theories, such as RST and PDTB, has been shown to improve an increasing number of downstream NLP tasks, showing positive effects and synergies of discourse with important real-world applications. While methods for incorporating discourse become more and more sophisticated, the growing need for robust and general discourse structures has not been sufficiently met by current discourse parsers, usually trained on small scale datasets in a strictly limited number of domains. This makes the prediction for arbitrary tasks noisy and unreliable. The overall resulting lack of high-quality, high-quantity discourse trees poses a severe limitation to further progress. In order the alleviate this shortcoming, we propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective. The proposed approach can be applied to any tree-structured objective, such as syntactic parsing, discourse parsing and others. However, due to the especially difficult annotation process to generate discourse trees, we initially develop a method to generate larger and more diverse discourse treebanks. In this paper we are inferring general tree structures of natural text in multiple domains, showing promising results on a diverse set of tasks.	翻訳日:2021-05-02 19:19:40 公開日:2020-12-17
# (参考訳) 効率的な局所探索によるバランスの取れたグラフエッジ分割の強化 Enhancing Balanced Graph Edge Partition with Effective Local Search ( http://arxiv.org/abs/2012.09451v1 ) ライセンス: CC0 1.0	Zhenyu Guo, Mingyu Xiao, Yi Zhou, Dongxiang Zhang, Kian-Lee Tan	(参考訳) グラフパーティションは、並列グラフ処理システムにおいて、ワークロードのバランスを達成し、ジョブ完了時間を短縮するための重要なコンポーネントである。様々なパーティション戦略の中で、エッジパーティションは頂点パーティションよりもパワーローグラフの方が有望な性能を示しており、既存のグラフシステムではデフォルトパーティション戦略として広く採用されている。エッジセットを複数のバランスのとれた部分に分割することで、コピーされた頂点の総数を最小化するグラフエッジ分割問題は、最適化とアルゴリズムの観点から広く研究されている。本稿では,既存の手法による分割結果を改善するために,局所探索アルゴリズムについて検討する。具体的には,2つの新しい概念,すなわち調整可能なエッジとブロックを提案する。これらの結果をもとに,max-flowモデルの特性を生かした検索アルゴリズムを改良し,欲張りなヒューリスティックを開発した。アルゴリズムの性能を評価するため,まず近似品質の観点から適切な理論的解析を行う。この問題に対する既知の近似比を大幅に改善する。そして、多数のベンチマークデータセットと最先端のエッジパーティション戦略に関する広範な実験を行う。その結果,提案する局所探索フレームワークは,グラフ分割のクオリティをさらに向上させることができることがわかった。 Graph partition is a key component to achieve workload balance and reduce job completion time in parallel graph processing systems. Among the various partition strategies, edge partition has demonstrated more promising performance in power-law graphs than vertex partition and thereby has been more widely adopted as the default partition strategy by existing graph systems. The graph edge partition problem, which is to split the edge set into multiple balanced parts to minimize the total number of copied vertices, has been widely studied from the view of optimization and algorithms. In this paper, we study local search algorithms for this problem to further improve the partition results from existing methods. More specifically, we propose two novel concepts, namely adjustable edges and blocks. Based on these, we develop a greedy heuristic as well as an improved search algorithm utilizing the property of the max-flow model. To evaluate the performance of our algorithms, we first provide adequate theoretical analysis in terms of the approximation quality. We significantly improve the previously known approximation ratio for this problem. Then we conduct extensive experiments on a large number of benchmark datasets and state-of-the-art edge partition strategies. The results show that our proposed local search framework can further improve the quality of graph partition by a wide margin.	翻訳日:2021-05-02 19:01:54 公開日:2020-12-17
# (参考訳) 肺がん予測のための半教師付き自己訓練法 A new semi-supervised self-training method for lung cancer prediction ( http://arxiv.org/abs/2012.09472v1 ) ライセンス: CC0 1.0	Kelvin Shak, Mundher Al-Shabi, Andrea Liew, Boon Leong Lan, Wai Yee Chan, Kwan Hoong Ng, Maxine Tan	(参考訳) 背景と目的:早期肺がんの発見は,ステージ3以上の患者に対して高い死亡率を示すため重要である。 ct(ct)スキャンから同時に結節を検出し分類する手法は比較的少ない。さらに、肺がん予測に半教師付き学習を用いた研究はほとんどない。本研究では,約4,000個のCTスキャンの総合的CT肺検診データセットを用いて,Nuisy Students法を用いて肺結節の検出と分類を行う。方法:本研究では,LUNA16,LIDC,NLSTの3つのデータセットを用いた。まず,3次元深層畳み込みニューラルネットワークモデルを用いて肺結節の検出を行った。 Maxout Local-Global Networkとして知られる分類モデルは、非ローカルネットワークを使用して、形状特徴、残留ブロック、結節テクスチャを含む局所的特徴の検出、結節変動を検出するMaxoutレイヤを含むグローバルな特徴を検出する。我々は,NLSTデータセットを用いた肺がん予測のために,Noisy Studentsモデルを用いた最初のセルフトレーニングを訓練した。次に,Mixup正則化を行い,提案手法を強化し,誤ラベルに対する堅牢性を実現した。結果と結論: 我々の新しいMixup Maxout Local-Globalネットワークは、NLSTデータセットから2,005個の完全に独立したテストスキャンに対して0.87のAUCを達成する。提案手法はデロング試験 (p = 0.0001) を用いて5%の重要度レベルにおいて, 次の最高性能法を有意に上回った。本研究では,Nuisy StudentsとMixup正則化を組み合わせた自己学習による肺がん予測手法を提案する。 2,005個のスキャンの完全な独立データセット上で,他の手法に比べて画像数が多くても最先端の性能を達成できた。 Background and Objective: Early detection of lung cancer is crucial as it has high mortality rate with patients commonly present with the disease at stage 3 and above. There are only relatively few methods that simultaneously detect and classify nodules from computed tomography (CT) scans. Furthermore, very few studies have used semi-supervised learning for lung cancer prediction. This study presents a complete end-to-end scheme to detect and classify lung nodules using the state-of-the-art Self-training with Noisy Student method on a comprehensive CT lung screening dataset of around 4,000 CT scans. Methods: We used three datasets, namely LUNA16, LIDC and NLST, for this study. We first utilise a three-dimensional deep convolutional neural network model to detect lung nodules in the detection stage. The classification model known as Maxout Local-Global Network uses non-local networks to detect global features including shape features, residual blocks to detect local features including nodule texture, and a Maxout layer to detect nodule variations. We trained the first Self-training with Noisy Student model to predict lung cancer on the unlabelled NLST datasets. Then, we performed Mixup regularization to enhance our scheme and provide robustness to erroneous labels. Results and Conclusions: Our new Mixup Maxout Local-Global network achieves an AUC of 0.87 on 2,005 completely independent testing scans from the NLST dataset. Our new scheme significantly outperformed the next highest performing method at the 5% significance level using DeLong's test (p = 0.0001). This study presents a new complete end-to-end scheme to predict lung cancer using Self-training with Noisy Student combined with Mixup regularization. On a completely independent dataset of 2,005 scans, we achieved state-of-the-art performance even with more images as compared to other methods.	翻訳日:2021-05-02 18:47:30 公開日:2020-12-17
# (参考訳) 人工知能の計算原理:ニューラルネットワークによる学習と推論 Computational principles of intelligence: learning and reasoning with neural networks ( http://arxiv.org/abs/2012.09477v1 ) ライセンス: CC BY 4.0	Abel Torres Montoya	(参考訳) 機械学習と人工知能に対する大きな成果と現在の関心にもかかわらず、汎用的で効率的な問題解決を可能にする知性理論の探求はほとんど進歩していない。この研究は、3つの原則に基づいた新しい知能の枠組みを提案し、この方向性に貢献しようとするものである。まず、学習した入力表現の生成とミラーリングの性質。第二に、学習、問題解決、想像力のための基礎的で本質的で反復的なプロセスです。第3に、抑制規則を用いた因果合成表現に対する推論機構のアドホックチューニング。これらの原則は、解釈可能性、継続的な学習、常識などを提供するシステムアプローチを生み出します。一般的な問題解決手法として、人間指向のツールとして、そして最後に、脳の情報処理のモデルとして、このフレームワークが開発されている。 Despite significant achievements and current interest in machine learning and artificial intelligence, the quest for a theory of intelligence, allowing general and efficient problem solving, has done little progress. This work tries to contribute in this direction by proposing a novel framework of intelligence based on three principles. First, the generative and mirroring nature of learned representations of inputs. Second, a grounded, intrinsically motivated and iterative process for learning, problem solving and imagination. Third, an ad hoc tuning of the reasoning mechanism over causal compositional representations using inhibition rules. Together, those principles create a systems approach offering interpretability, continuous learning, common sense and more. This framework is being developed from the following perspectives: as a general problem solving method, as a human oriented tool and finally, as model of information processing in the brain.	翻訳日:2021-05-02 18:30:13 公開日:2020-12-17
# (参考訳) 3次元CNNのグローバルローカルアテンションを用いた弱改善された行動局在と行動認識 Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN ( http://arxiv.org/abs/2012.09542v1 ) ライセンス: CC BY 4.0	Novanto Yudistira, Muthu Subash Kavitha, Takio Kurita	(参考訳) 3D畳み込みニューラルネットワーク(3D CNN)は、ビデオシーケンスなどの3Dデータに関する空間的および時間的情報をキャプチャする。しかし,畳み込み・プーリング機構により,情報損失は避けられないように思われる。 3d cnnの視覚的な説明と分類を改善するために,(1)学習した3dresnextネットワークを用いて,局所的(グローバル局所)離散勾配を階層的に集約し,(2)注意ゲーティングネットワークを実装し,動作認識の精度を向上させる手法を提案する。提案手法は,3d cnnにおけるグローバル・ローカル・アテンション (global-local attention) と呼ばれる各層の有用性を示すことを目的としている。まず、3dresnextを訓練し、最大予測クラスに関するバックプロパゲーションを用いたアクション分類に適用する。各層の勾配と活性化はアップサンプリングされる。その後、アグリゲーションはよりニュアンス的な注意を喚起するために使われ、予測されたクラスの入力ビデオの最も重要な部分を指し示している。我々は最終位置決めに最終注意の輪郭閾値を用いる。 3dcamによる細粒度映像によるトリミング映像の空間的および時間的動作の定位評価を行った。実験の結果,提案手法は視覚的な説明と識別的注意を生じさせることがわかった。さらに,各層における注意ゲーティングによる行動認識は,ベースラインモデルよりも優れた分類結果が得られる。 3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences. However, due to the convolution and pooling mechanism, the information loss seems unavoidable. To improve the visual explanations and classification in 3D CNN, we propose two approaches; i) aggregate layer-wise global to local (global-local) discrete gradients using trained 3DResNext network, and ii) implement attention gating network to improve the accuracy of the action recognition. The proposed approach intends to show the usefulness of every layer termed as global-local attention in 3D CNN via visual attribution, weakly-supervised action localization, and action recognition. Firstly, the 3DResNext is trained and applied for action classification using backpropagation concerning the maximum predicted class. The gradients and activations of every layer are then up-sampled. Later, aggregation is used to produce more nuanced attention, which points out the most critical part of the predicted class's input videos. We use contour thresholding of final attention for final localization. We evaluate spatial and temporal action localization in trimmed videos using fine-grained visual explanation via 3DCam. Experimental results show that the proposed approach produces informative visual explanations and discriminative attention. Furthermore, the action recognition via attention gating on each layer produces better classification results than the baseline model.	翻訳日:2021-05-02 18:10:53 公開日:2020-12-17
# (参考訳) 変圧器を用いた少数ショットシーケンス学習 Few-shot Sequence Learning with Transformers ( http://arxiv.org/abs/2012.09543v1 ) ライセンス: CC BY 4.0	Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam	(参考訳) 少数のトレーニング例でのみ提供される新しいタスクの学習を目的としている。本研究では,データポイントがトークン列である設定において,少数ショット学習を行い,トランスフォーマーに基づく効率的な学習アルゴリズムを提案する。最も簡単な設定では、実行すべき特定のタスクを表す入力シーケンスにトークンを付加し、ラベル付き例が少ないため、このトークンの埋め込みをオンザフライで最適化できることを示す。当社のアプローチでは,メタラーニングや少ショットラーニングの文献で現在普及しているアダプタ層や第2次微分計算といったモデルアーキテクチャの複雑な変更は必要としない。様々なタスクに対する我々のアプローチを実証し、いくつかのモデル変種およびベースラインアプローチの一般化特性を解析する。特に,構成的タスク記述子により性能が向上することを示す。実験により、我々のアプローチは、計算効率が向上しつつ、少なくとも他の手法と同様に動作することが示された。 Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens and propose an efficient learning algorithm based on Transformers. In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples. Our approach does not require complicated changes to the model architecture such as adapter layers nor computing second order derivatives as is currently popular in the meta-learning and few-shot learning literature. We demonstrate our approach on a variety of tasks, and analyze the generalization properties of several model variants and baseline approaches. In particular, we show that compositional task descriptors can improve performance. Experiments show that our approach works at least as well as other methods, while being more computationally efficient.	翻訳日:2021-05-02 18:02:20 公開日:2020-12-17
# (参考訳) 発展途上国の疾病発生に備えるツールとしてのcovid-19感情モニタリング COVID-19 Emotion Monitoring as a Tool to Increase Preparedness for Disease Outbreaks in Developing Regions ( http://arxiv.org/abs/2012.12184v1 ) ライセンス: CC BY 4.0	Santiago Cortes and Juan Mu\~noz and David Betancur and Mauricio Toro	(参考訳) 新型コロナウイルス(COVID-19)のパンデミックは、病院の入院管理から不安やうつ病などの精神疾患の緩和など、多くの課題を引き起こした。本稿では,最先端自然言語処理モデルに基づくtwitter感情監視システムを開発することにより,後発の問題に対する解決策を提案する。このシステムは、都市のアカウント上の6つの異なる感情をモニタし、政治家や保健当局のtwitterアカウントも監視する。感情モニターを匿名で使用することで、保健当局と民間の健康保険会社は、自殺や臨床抑うつなどの問題に取り組む戦略を開発することができる。そのようなタスクのために選択されたモデルは、スペインコーパス(BETO)で事前訓練された変換器(BERT)からの双方向エンコーダ表現である。モデルは検証データセットでうまく機能した。このシステムは、コロンビアのcovid-19のシミュレーションとデータ分析のためのwebアプリケーションの一部として、https://epidemiologia-matematica.orgで公開されている。 The COVID-19 pandemic brought many challenges, from hospital-occupation management to lock-down mental-health repercussions such as anxiety or depression. In this work, we present a solution for the later problem by developing a Twitter emotion-monitor system based on a state-of-the-art natural-language processing model. The system monitors six different emotions on accounts in cities, as well as politicians and health-authorities Twitter accounts. With an anonymous use of the emotion monitor, health authorities and private health-insurance companies can develop strategies to tackle problems such as suicide and clinical depression. The model chosen for such a task is a Bidirectional-Encoder Representations from Transformers (BERT) pre-trained on a Spanish corpus (BETO). The model performed well on a validation dataset. The system is deployed online as part of a web application for simulation and data analysis of COVID-19, in Colombia, available at https://epidemiologia-matematica.org.	翻訳日:2021-05-02 17:46:11 公開日:2020-12-17
# (参考訳) トランスフォーマーを用いた事象連鎖の自己回帰推論 Autoregressive Reasoning over Chains of Facts with Transformers ( http://arxiv.org/abs/2012.11321v1 ) ライセンス: CC BY 4.0	Ruben Cartuyvels, Graham Spinks and Marie-Francine Moens	(参考訳) 本稿では,テキストスニペットの形で関連する事実を検索し,自然言語による質問とその答えを求めるマルチホップ説明再生のための反復推論アルゴリズムを提案する。マルチホップ推論のための複数の証拠や事実の組み合わせは、推論に必要な情報源の数が増えるとますます難しくなる。提案アルゴリズムは, コーパスからの事象の選択を自己回帰的に分解し, 以前に選択した事実に対して次の繰り返しを条件にすることで, この問題に対処する。これにより、ペアワイズな学習とランクの損失が利用できます。本手法は,TextGraphs 2019 および 2020 Shared Tasks のデータセットを用いて,説明再生のための検証を行う。このタスクの既存の作業は、独立して事実を評価するか、事実の連鎖を人工的に制限する。本手法は, 事前学習したトランスフォーマーモデルを用いて, 精度, トレーニング時間, 推論効率の面では, 従来よりも優れていることを示す。 This paper proposes an iterative inference algorithm for multi-hop explanation regeneration, that retrieves relevant factual evidence in the form of text snippets, given a natural language question and its answer. Combining multiple sources of evidence or facts for multi-hop reasoning becomes increasingly hard when the number of sources needed to make an inference grows. Our algorithm copes with this by decomposing the selection of facts from a corpus autoregressively, conditioning the next iteration on previously selected facts. This allows us to use a pairwise learning-to-rank loss. We validate our method on datasets of the TextGraphs 2019 and 2020 Shared Tasks for explanation regeneration. Existing work on this task either evaluates facts in isolation or artificially limits the possible chains of facts, thus limiting multi-hop inference. We demonstrate that our algorithm, when used with a pre-trained transformer model, outperforms the previous state-of-the-art in terms of precision, training time and inference efficiency.	翻訳日:2021-05-02 17:41:44 公開日:2020-12-17
# (参考訳) リカレントオートエンコーダからの一貫性指向潜在符号を用いた軌道塩分検出 Trajectory saliency detection using consistency-oriented latent codes from a recurrent auto-encoder ( http://arxiv.org/abs/2012.09573v1 ) ライセンス: CC BY 4.0	L. Maczyta, P. Bouthemy and O. Le Meur	(参考訳) 本稿では,ビデオシーケンスから進行動的サリエンシを検出することに関心がある。より正確には、私たちは動きに関連する給与に興味があり、時間とともに徐々に現れる可能性が高い。アラームの起動、追加処理の献身、特定のイベントの検出に関連がある。軌道は、進行的な動的塩分検出をサポートする最善の方法である。そのため、トラジェクティブ・サリエンシーについて論じる。与えられた文脈に関連する共通の動きパターンを共有する通常の軌跡から逸脱した場合、軌跡は有能である。まず、軌跡のコンパクトかつ識別的な表現が必要である。ほぼ)教師なしの学習ベースのアプローチを採用しています。再帰オートエンコーダによって推定される潜在コードは、所望の表現を提供する。さらに、オートエンコーダ損失関数を用いて、通常の(類似した)軌道の整合性を強制する。軌道コードから正規性を考慮したプロトタイプコードまでの距離は、健全な軌道を検出する手段である。我々は,合成および実軌道データセット上での軌道塩分検出手法を検証し,その異なる成分の寄与を強調する。本手法は,駅で取得した歩行者軌跡の公開データセット(alahi 2014)から得られた複数のシナリオにおいて,既存の手法に勝ることを示す。 In this paper, we are concerned with the detection of progressive dynamic saliency from video sequences. More precisely, we are interested in saliency related to motion and likely to appear progressively over time. It can be relevant to trigger alarms, to dedicate additional processing or to detect specific events. Trajectories represent the best way to support progressive dynamic saliency detection. Accordingly, we will talk about trajectory saliency. A trajectory will be qualified as salient if it deviates from normal trajectories that share a common motion pattern related to a given context. First, we need a compact while discriminative representation of trajectories. We adopt a (nearly) unsupervised learning-based approach. The latent code estimated by a recurrent auto-encoder provides the desired representation. In addition, we enforce consistency for normal (similar) trajectories through the auto-encoder loss function. The distance of the trajectory code to a prototype code accounting for normality is the means to detect salient trajectories. We validate our trajectory saliency detection method on synthetic and real trajectory datasets, and highlight the contributions of its different components. We show that our method outperforms existing methods on several scenarios drawn from the publicly available dataset of pedestrian trajectories acquired in a railway station (Alahi 2014).	翻訳日:2021-05-02 17:23:17 公開日:2020-12-17
# (参考訳) 非対称マルチタスク特徴学習におけるタスク不確かさ損失の負の移動 Task Uncertainty Loss Reduce Negative Transfer in Asymmetric Multi-task Feature Learning ( http://arxiv.org/abs/2012.09575v1 ) ライセンス: CC BY 4.0	Rafael Peres da Silva, Chayaporn Suphavilai, Niranjan Nagarajan	(参考訳) マルチタスク学習(MTL)は、限られた訓練データに基づいて目標タスクを学習しなければならない設定で頻繁に使用されるが、関連する補助タスクから知識を活用できる。 mtlはシングルタスク学習(stl)と比較して全体的なタスクパフォーマンスを向上させることができるが、これらの改善は負の転送(nt)を隠すことができる。非対称マルチタスク特徴学習(AMTFL)は、損失値の高いタスクが他のタスクを学習するための特徴表現に与える影響を小さくすることで、この問題に対処しようとするアプローチである。タスク損失値は必ずしも特定のタスクのモデルの信頼性を示すものではない。本稿では,2つの直交データセット(画像認識と薬理ゲノミクス)にNTの例を示し,課題間の相対的信頼度を把握し,タスク損失の重みを設定することで,この課題に対処する。提案手法は,堅牢なMTLを実現するための新しいアプローチを提供するNTを削減できることを示す。 Multi-task learning (MTL) is frequently used in settings where a target task has to be learnt based on limited training data, but knowledge can be leveraged from related auxiliary tasks. While MTL can improve task performance overall relative to single-task learning (STL), these improvements can hide negative transfer (NT), where STL may deliver better performance for many individual tasks. Asymmetric multitask feature learning (AMTFL) is an approach that tries to address this by allowing tasks with higher loss values to have smaller influence on feature representations for learning other tasks. Task loss values do not necessarily indicate reliability of models for a specific task. We present examples of NT in two orthogonal datasets (image recognition and pharmacogenomics) and tackle this challenge by using aleatoric homoscedastic uncertainty to capture the relative confidence between tasks, and set weights for task loss. Our results show that this approach reduces NT providing a new approach to enable robust MTL.	翻訳日:2021-05-02 17:22:17 公開日:2020-12-17
# (参考訳) 金融機関向け高出力ニューラルネットワークモデルによる感性データ検出 Sensitive Data Detection with High-Throughput Neural Network Models for Financial Institutions ( http://arxiv.org/abs/2012.09597v1 ) ライセンス: CC0 1.0	Anh Truong, Austin Walters, Jeremy Goodsitt	(参考訳) 名前付きエンティティ認識は多くの分野で広く研究されている。しかし, ラベル付きデータセットが公開されていないため, 金融機関における生産システムへのセンシティブな実体検出の適用は十分に検討されていない。本稿では、内部および合成データセットを用いて、非構造化データフォーマットと構造化データフォーマットの両方において、金融機関内で一般的に見られるNPI(Nonpublic Personally Identibility)情報を検出する様々な方法を評価する。 CNN,LSTM,BiLSTM-CRF,CNN-CRFといった文字レベルのニューラルネットワークモデルは,複数のデータフォーマット上でのエンティティ検出と,表付きデータセット上でのカラム単位のエンティティ予測という2つの予測タスクについて検討した。これらのモデルを,f1-score,精度,リコール,スループットに関して,実データと合成データの両方における他の標準的なアプローチと比較した。実際のデータセットには、内部構造化データと、手動タグ付きラベル付き公開eメールデータが含まれる。実験の結果,CNNモデルは精度とスループットにおいてシンプルだが有効であり,本運用環境に展開する最も適した候補モデルであることが示唆された。最後に、データ制限、データラベリング、データエンティティの固有の重複について学んだ教訓をいくつか提供する。 Named Entity Recognition has been extensively investigated in many fields. However, the application of sensitive entity detection for production systems in financial institutions has not been well explored due to the lack of publicly available, labeled datasets. In this paper, we use internal and synthetic datasets to evaluate various methods of detecting NPI (Nonpublic Personally Identifiable) information commonly found within financial institutions, in both unstructured and structured data formats. Character-level neural network models including CNN, LSTM, BiLSTM-CRF, and CNN-CRF are investigated on two prediction tasks: (i) entity detection on multiple data formats, and (ii) column-wise entity prediction on tabular datasets. We compare these models with other standard approaches on both real and synthetic data, with respect to F1-score, precision, recall, and throughput. The real datasets include internal structured data and public email data with manually tagged labels. Our experimental results show that the CNN model is simple yet effective with respect to accuracy and throughput and thus, is the most suitable candidate model to be deployed in the production environment(s). Finally, we provide several lessons learned on data limitations, data labelling and the intrinsic overlap of data entities.	翻訳日:2021-05-02 17:18:59 公開日:2020-12-17
# (参考訳) XAI-P-T: 説明可能な人工知能の実践から理論へ XAI-P-T: A Brief Review of Explainable Artificial Intelligence from Practice to Theory ( http://arxiv.org/abs/2012.09636v1 ) ライセンス: CC BY 4.0	Nazanin Fouladgar and Kary Fr\"amling	(参考訳) 本稿では,いくつかの基礎文献で確認された説明可能なAI(XAI)の実践的・理論的側面について報告する。 XAIの背景の表現には膨大な作業があるが、コーパスの多くは思考の個別の方向を指し示している。実践と理論の同時に文学に洞察を与えることは、この分野ではまだギャップである。これは、初期のXAI研究者の学習プロセスを促進し、経験豊富なXAI学者に明るい立場を与えるためである。ここではまずブラックボックスの説明のカテゴリに注目し,実例を示す。その後、多分野の体に理論的な説明が根拠となっているかについて議論する。最後に、今後の作品の方向性を示す。 In this work, we report the practical and theoretical aspects of Explainable AI (XAI) identified in some fundamental literature. Although there is a vast body of work on representing the XAI backgrounds, most of the corpuses pinpoint a discrete direction of thoughts. Providing insights into literature in practice and theory concurrently is still a gap in this field. This is important as such connection facilitates a learning process for the early stage XAI researchers and give a bright stand for the experienced XAI scholars. Respectively, we first focus on the categories of black-box explanation and give a practical example. Later, we discuss how theoretically explanation has been grounded in the body of multidisciplinary fields. Finally, some directions of future works are presented.	翻訳日:2021-05-02 16:35:38 公開日:2020-12-17
# (参考訳) 映画脚本とストーリーに応用する概念的ソフトウェア工学 Conceptual Software Engineering Applied to Movie Scripts and Stories ( http://arxiv.org/abs/2012.11319v1 ) ライセンス: CC BY 4.0	Sabah Al-Fedaghi	(参考訳) 本研究は,他の研究分野に適用可能な,ソフトウェア工学ツール,概念モデリングの別の応用について紹介する。ソフトウェア工学と他の分野との関係を強化する一つの方法は、これらの分野の特異性に対処できる概念モデリングを行う良い方法を開発することである。この研究は人文科学と社会科学に焦点を合わせ、通常は抽象機械や(抽象的)機械から離れて、より柔らかいと考えられる。具体的には、ストーリーや映画の脚本の領域におけるソフトウェア工学ツール(UMLなど)としての概念モデリングに焦点を当てます。人文科学と社会科学の研究者たちは、エンジニアが行うような形式化は使っていないかもしれないが、概念モデリングは有用だと考えている。現在のモデリング技術(UMLなど)はこのタスクで失敗する。同様の概念モデリング言語(ConMLなど)は、人文科学や社会科学を念頭に置いて提案され、あらゆるものをモデル化することができる。この研究は、ソフトウェアモデリング技術であるthinging machine(tm)が映画脚本やストーリーに適用されるこの方向のベンチャーである。本稿では,映画脚本や物語の図形的静的・動的モデルを開発するための新しいアプローチを提案する。 tmモデルダイアグラムはナラティブな談話の中立的で独立した表現であり、参加者間のコミュニケーション手段として使用できる。提示された例は、プロップの妖精のモデルによる例で、鉄道児童と実際の映画の脚本は、アプローチの可能性を示唆しているようである。 This study introduces another application of software engineering tools, conceptual modeling, which can be applied to other fields of research. One way to strengthen the relationship between software engineering and other fields is to develop a good way to perform conceptual modeling that is capable of addressing the peculiarities of these fields of study. This study concentrates on humanities and social sciences, which are usually considered softer and further away from abstractions and (abstract) machines. Specifically, we focus on conceptual modeling as a software engineering tool (e.g., UML) in the area of stories and movie scripts. Researchers in the humanities and social sciences might not use the same degree of formalization that engineers do, but they still find conceptual modeling useful. Current modeling techniques (e.g., UML) fail in this task because they are geared toward the creation of software systems. Similar Conceptual Modeling Language (e.g., ConML) has been proposed with the humanities and social sciences in mind and, as claimed, can be used to model anything. This study is a venture in this direction, where a software modeling technique, Thinging Machine (TM), is applied to movie scripts and stories. The paper presents a novel approach to developing diagrammatic static/dynamic models of movie scripts and stories. The TM model diagram serves as a neutral and independent representation for narrative discourse and can be used as a communication instrument among participants. The examples presented include examples from Propp s model of fairytales; the railway children and an actual movie script seem to point to the viability of the approach.	翻訳日:2021-05-02 16:28:45 公開日:2020-12-17
# (参考訳) RainBench: 衛星画像による世界の降水予測に向けて RainBench: Towards Global Precipitation Forecasting from Satellite Imagery ( http://arxiv.org/abs/2012.09670v1 ) ライセンス: CC BY 4.0	Christian Schroeder de Witt, Catherine Tong, Valentina Zantedeschi, Daniele De Martini, Freddie Kalaitzis, Matthew Chantry, Duncan Watson-Parris, Piotr Bilinski	(参考訳) 激しい降雨や暴風雨のような極端な降雨は、発展途上国の経済や生活を日常的に破壊する。気候変動はこの問題をさらに悪化させる。データ駆動型ディープラーニングアプローチは、そのようなイベントを緩和するために、正確な複数日予測へのアクセスを広げる可能性がある。しかし、世界の降水量予測の研究に特化したベンチマークデータセットは今のところ存在しない。本稿では,データ駆動降水予測のための新しいマルチモーダルベンチマークデータセットである \textbf{RainBench} を紹介する。これには、シミュレーションされた衛星データ、era5の再分析製品からの関連する気象データの選択、およびimergの降水データが含まれる。また、大規模な降水データセットを効率的に処理するライブラリである \textbf{PyRain} もリリースしています。本研究では,提案するデータセットを広範囲に分析し,中規模降水予測タスクのベースラインを2つ確立する。最後に,既存の気象予報手法について考察し,今後の研究方法を提案する。 Extreme precipitation events, such as violent rainfall and hail storms, routinely ravage economies and livelihoods around the developing world. Climate change further aggravates this issue. Data-driven deep learning approaches could widen the access to accurate multi-day forecasts, to mitigate against such events. However, there is currently no benchmark dataset dedicated to the study of global precipitation forecasts. In this paper, we introduce \textbf{RainBench}, a new multi-modal benchmark dataset for data-driven precipitation forecasting. It includes simulated satellite data, a selection of relevant meteorological data from the ERA5 reanalysis product, and IMERG precipitation data. We also release \textbf{PyRain}, a library to process large precipitation datasets efficiently. We present an extensive analysis of our novel dataset and establish baseline results for two benchmark medium-range precipitation forecasting tasks. Finally, we discuss existing data-driven weather forecasting methodologies and suggest future research avenues.	翻訳日:2021-05-02 16:16:18 公開日:2020-12-17
# (参考訳) GANトレーニングにおける燃焼モード崩壊:ヘッセン固有値を用いた実証分析 Combating Mode Collapse in GAN training: An Empirical Analysis using Hessian Eigenvalues ( http://arxiv.org/abs/2012.09673v1 ) ライセンス: CC BY 4.0	Ricard Durall, Avraam Chatzimichailidis, Peter Labus and Janis Keuper	(参考訳) generative adversarial networks (gans) は最先端の成果を画像生成に提供します。しかし、非常に強力であるにもかかわらず、訓練は非常に困難である。これは特に、非常に非凸な最適化空間が多くの不安定性をもたらすために引き起こされる。中でもモード崩壊は、最も厄介なもののひとつとして際立っている。この望ましくないイベントは、モデルがデータ分散のいくつかのモードのみに適合できる場合に発生するが、その大半は無視される。本研究では,2次勾配情報を用いてモード崩壊と戦う。そのため、Hessian固有値を通して損失曲面を解析し、モード崩壊が鋭い最小値への収束と関連していることを示す。特に、$G$の固有値がモード崩壊の発生とどのように直接相関するかを観察する。最後に,これらの知見に動機づけられて,スペクトル情報を用いてモード崩壊を克服し,経験的により安定な収束特性を実現する,nudged-adam(nugan)と呼ばれる新しい最適化アルゴリズムを設計した。 Generative adversarial networks (GANs) provide state-of-the-art results in image generation. However, despite being so powerful, they still remain very challenging to train. This is in particular caused by their highly non-convex optimization space leading to a number of instabilities. Among them, mode collapse stands out as one of the most daunting ones. This undesirable event occurs when the model can only fit a few modes of the data distribution, while ignoring the majority of them. In this work, we combat mode collapse using second-order gradient information. To do so, we analyse the loss surface through its Hessian eigenvalues, and show that mode collapse is related to the convergence towards sharp minima. In particular, we observe how the eigenvalues of the $G$ are directly correlated with the occurrence of mode collapse. Finally, motivated by these findings, we design a new optimization algorithm called nudged-Adam (NuGAN) that uses spectral information to overcome mode collapse, leading to empirically more stable convergence properties.	翻訳日:2021-05-02 15:56:03 公開日:2020-12-17
# (参考訳) ベンガル語におけるヘイトスピーチ検出:データセットとそのベースライン評価 Hate Speech detection in the Bengali language: A dataset and its baseline evaluation ( http://arxiv.org/abs/2012.09686v1 ) ライセンス: CC BY 4.0	Nauros Romim, Mosahed Ahmed, Hriteshwar Talukder, Md Saiful Islam	(参考訳) YouTubeやFacebookといったソーシャルメディアサイトは、あらゆる人の生活に欠かせない存在となり、ここ数年、ソーシャルメディアのコメント欄でヘイトスピーチが急速に増えている。ソーシャルメディアwebサイトにおけるヘイトスピーチの検出は、小さな不均衡データセット、適切なモデルの発見、特徴分析方法の選択など、さまざまな課題に直面している。さらに、この問題は、金の標準ラベル付きデータセットがないため、ベンガル語話者コミュニティにとってより厳しいものである。本稿では,クラウドソーシングによってタグ付けされ,専門家によって検証された3万のユーザコメントのデータセットを提案する。コメントはすべてYouTubeとFacebookのコメントセクションから収集され、スポーツ、エンターテイメント、宗教、政治、犯罪、有名人、TikTok & Memeの7つのカテゴリーに分類される。合計50の注釈が各コメントに3回アノテートされ、過半数の投票が最終注釈とされた。それでも我々は,Word2VecやFastText,BengFastTextといったベンガル語を組み込んだベースライン実験や深層学習モデルをこのデータセット上で実施して,今後の研究機会の確保に努めてきた。実験の結果、すべてのディープラーニングモデルはうまく動作したが、SVMは87.5%の精度で最高の結果を得た。私たちの中心となる貢献は、ベンチマークデータセットを利用可能にして、ベンガルヘイトスピーチ検出の分野におけるさらなる研究を容易にすることです。 Social media sites such as YouTube and Facebook have become an integral part of everyone's life and in the last few years, hate speech in the social media comment section has increased rapidly. Detection of hate speech on social media websites faces a variety of challenges including small imbalanced data sets, the findings of an appropriate model and also the choice of feature analysis method. further more, this problem is more severe for the Bengali speaking community due to the lack of gold standard labelled datasets. This paper presents a new dataset of 30,000 user comments tagged by crowd sourcing and varified by experts. All the comments are collected from YouTube and Facebook comment section and classified into seven categories: sports, entertainment, religion, politics, crime, celebrity and TikTok & meme. A total of 50 annotators annotated each comment three times and the majority vote was taken as the final annotation. Nevertheless, we have conducted base line experiments and several deep learning models along with extensive pre-trained Bengali word embedding such as Word2Vec, FastText and BengFastText on this dataset to facilitate future research opportunities. The experiment illustrated that although all deep learning models performed well, SVM achieved the best result with 87.5% accuracy. Our core contribution is to make this benchmark dataset available and accessible to facilitate further research in the field of in the field of Bengali hate speech detection.	翻訳日:2021-05-02 15:45:13 公開日:2020-12-17
# (参考訳) sroll3: プランク高周波楽器マップにおける大規模系統効果低減のためのニューラルネットワークアプローチ SRoll3: A neural network approach to reduce large-scale systematic effects in the Planck High Frequency Instrument maps ( http://arxiv.org/abs/2012.09702v1 ) ライセンス: CC BY 4.0	Manuel L\'opez-Radcenco, Jean-Marc Delouis and Laurent Vibert	(参考訳) 本研究では,Planck High Frequency Instrument(Planck-HFI)データに対するマップ作成と,生成したスカイマップ内の大規模な系統的効果の除去に着目し,構造化汚染源の削減を目的としたニューラルネットワークに基づくデータインバージョン手法を提案する。汚染源の除去は、異なる時空間スケール間のカップリングを生み出す局所時空間相互作用によって特徴づけられるこれらの源の構造的性質によって可能となる。これらの結合を利用して最適な低次元表現を学習し、汚染源除去と地図作成の目的に最適化し、堅牢で効果的なデータインバージョンを実現する手段として、ニューラルネットワークの探索に焦点をあてる。提案手法の多種多様な変種を開発し,物理学的インフォームド制約とトランスファー学習技術の導入を検討する。さらに、専門家の知識を教師なしのネットワークトレーニングアプローチに統合するために、データ拡張技術を活用することに注力する。提案手法をPlanck-HFI 545 GHz Far Side Lobe シミュレーションデータに適用し,部分的,ギャップ満載,一貫性のないデータセットを含む理想的,非理想的事例を考察し,ニューラルネットワークに基づく次元性低減の可能性を示す。また,本論文では,実プランクhfi 857 ghzデータに適用し,汚染除去性能の面で最大1桁の利益を報告し,構造的汚染源を正確にモデル化・捕捉するための提案手法の妥当性を示す。本研究で開発された手法は,SRollアルゴリズムの新バージョン(SRoll3)に統合され,SRoll3 857 GHz検出器マップをコミュニティに公開する。 In the present work, we propose a neural network based data inversion approach to reduce structured contamination sources, with a particular focus on the mapmaking for Planck High Frequency Instrument (Planck-HFI) data and the removal of large-scale systematic effects within the produced sky maps. The removal of contamination sources is rendered possible by the structured nature of these sources, which is characterized by local spatiotemporal interactions producing couplings between different spatiotemporal scales. We focus on exploring neural networks as a means of exploiting these couplings to learn optimal low-dimensional representations, optimized with respect to the contamination source removal and mapmaking objectives, to achieve robust and effective data inversion. We develop multiple variants of the proposed approach, and consider the inclusion of physics informed constraints and transfer learning techniques. Additionally, we focus on exploiting data augmentation techniques to integrate expert knowledge into an otherwise unsupervised network training approach. We validate the proposed method on Planck-HFI 545 GHz Far Side Lobe simulation data, considering ideal and non-ideal cases involving partial, gap-filled and inconsistent datasets, and demonstrate the potential of the neural network based dimensionality reduction to accurately model and remove large-scale systematic effects. We also present an application to real Planck-HFI 857 GHz data, which illustrates the relevance of the proposed method to accurately model and capture structured contamination sources, with reported gains of up to one order of magnitude in terms of contamination removal performance. Importantly, the methods developed in this work are to be integrated in a new version of the SRoll algorithm (SRoll3), and we describe here SRoll3 857 GHz detector maps that will be released to the community.	翻訳日:2021-05-02 15:17:55 公開日:2020-12-17
# (参考訳) Deep Molecular Dreaming: Inverse Machine Learning for De-novo Molecular Design and Interpretability with surjective representations Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations ( http://arxiv.org/abs/2012.09712v1 ) ライセンス: CC BY 4.0	Cynthia Shen, Mario Krenn, Sagi Eppel, Alan Aspuru-Guzik	(参考訳) コンピュータによる機能分子のデノボ設計は、今日の化学情報学における最も顕著な課題の1つである。その結果、人工知能の分野からの生成的および進化的逆設計が急速に発展し、特定の化学的性質のために分子を最適化することを目指している。これらのモデルは「間接的に」化学空間を探索し、潜伏空間、政策、分布を学習したり、分子の集団に突然変異を施すことで探索する。しかし、SMILESの代替である分子のSELFIES文字列表現の最近の発展により、他の潜在的な技術が考えられるようになった。そこで本研究では,SELFIESに基づく直進勾配に基づく分子最適化手法PASITHEAを提案する。 PASITHEAは、ニューラルネットワークの学習プロセスを直接反転させることで勾配の利用を利用する。効果的に、これはある性質に最適化された分子変種を生成することができる逆回帰モデルを形成する。結果は予備的ではあるが,パシテアの生存可能性を明確に示し,逆訓練中の選択された属性の分布の変化を観察した。インセプション主義の驚くべき特性は、モデルがトレーニングした化学空間に対する理解を直接調査できることである。 PASITHEAをより大きなデータセット、分子、さらに複雑な性質に拡張することは、新しい機能分子の設計と機械学習モデルの解釈と説明につながると期待している。 Computer-based de-novo design of functional molecules is one of the most prominent challenges in cheminformatics today. As a result, generative and evolutionary inverse designs from the field of artificial intelligence have emerged at a rapid pace, with aims to optimize molecules for a particular chemical property. These models 'indirectly' explore the chemical space; by learning latent spaces, policies, distributions or by applying mutations on populations of molecules. However, the recent development of the SELFIES string representation of molecules, a surjective alternative to SMILES, have made possible other potential techniques. Based on SELFIES, we therefore propose PASITHEA, a direct gradient-based molecule optimization that applies inceptionism techniques from computer vision. PASITHEA exploits the use of gradients by directly reversing the learning process of a neural network, which is trained to predict real-valued chemical properties. Effectively, this forms an inverse regression model, which is capable of generating molecular variants optimized for a certain property. Although our results are preliminary, we observe a shift in distribution of a chosen property during inverse-training, a clear indication of PASITHEA's viability. A striking property of inceptionism is that we can directly probe the model's understanding of the chemical space it was trained on. We expect that extending PASITHEA to larger datasets, molecules and more complex properties will lead to advances in the design of new functional molecules as well as the interpretation and explanation of machine learning models.	翻訳日:2021-05-02 14:39:40 公開日:2020-12-17
# (参考訳) FERMI FELを用いた粒子加速器制御のためのモデルフリー・ベイズ組立モデルに基づく深部強化学習 Model-free and Bayesian Ensembling Model-based Deep Reinforcement Learning for Particle Accelerator Control Demonstrated on the FERMI FEL ( http://arxiv.org/abs/2012.09737v1 ) ライセンス: CC BY 4.0	Simon Hirlaender, Niky Bruchon	(参考訳) 強化学習は加速器制御において大きな可能性を秘めている。本研究の主な目的は, 加速器物理問題に対する運用レベルで, このアプローチをどのように活用できるかを示すことである。モデルなし強化学習がいくつかの領域で成功したにもかかわらず、サンプル効率は依然としてボトルネックであり、モデルベース手法によって包含される可能性がある。 ferMI FELシステムの強度最適化に応用したモデルベースとモデルフリー強化学習を比較した。モデルベースアプローチは,高い表現力とサンプル効率を示すが,モデルフリー手法の漸近的な性能は若干優れている。モデルベースアルゴリズムは不確実性認識モデルを用いてDYNA形式で実装され、モデルフリーアルゴリズムはカスタマイズされた深層Q-ラーニングに基づいている。いずれの場合もアルゴリズムが実装され、加速器制御問題におけるノイズロバスト性が増大する。コードはhttps://github.com/MathPhysSim/FERMI_RL_Paperで公開されている。 Reinforcement learning holds tremendous promise in accelerator controls. The primary goal of this paper is to show how this approach can be utilised on an operational level on accelerator physics problems. Despite the success of model-free reinforcement learning in several domains, sample-efficiency still is a bottle-neck, which might be encompassed by model-based methods. We compare well-suited purely model-based to model-free reinforcement learning applied to the intensity optimisation on the FERMI FEL system. We find that the model-based approach demonstrates higher representational power and sample-efficiency, while the asymptotic performance of the model-free method is slightly superior. The model-based algorithm is implemented in a DYNA-style using an uncertainty aware model, and the model-free algorithm is based on tailored deep Q-learning. In both cases, the algorithms were implemented in a way, which presents increased noise robustness as omnipresent in accelerator control problems. Code is released in https://github.com/MathPhysSim/FERMI_RL_Paper.	翻訳日:2021-05-02 14:30:30 公開日:2020-12-17
# (参考訳) MAGNet:ディープマルチエージェント強化学習のためのマルチエージェントグラフネットワーク MAGNet: Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning ( http://arxiv.org/abs/2012.09762v1 ) ライセンス: CC BY 4.0	Aleksandra Malysheva, Daniel Kudenko, Aleksei Shpilman	(参考訳) 近年、深層強化学習は複雑な単一エージェントタスクにおいて強い成功をおさめており、近年ではマルチエージェントドメインにもこのアプローチが適用されている。本稿では,自己着脱機構によって得られた環境の関連性グラフ表現とメッセージ生成手法を用いたマルチエージェント強化学習のための新しい手法であるmagnetを提案する。 MAGnetのアプローチを人工捕食者によるマルチエージェント環境とポンマーマンゲームに適用し、マルチエージェントディープQ-Networks(MADQN)、マルチエージェントディープ決定ポリシーグラディエント(MADDPG)、QMIX(QMIX)など、最先端のMARLソリューションを著しく上回っていることを示す。 Over recent years, deep reinforcement learning has shown strong successes in complex single-agent tasks, and more recently this approach has also been applied to multi-agent domains. In this paper, we propose a novel approach, called MAGNet, to multi-agent reinforcement learning that utilizes a relevance graph representation of the environment obtained by a self-attention mechanism, and a message-generation technique. We applied our MAGnet approach to the synthetic predator-prey multi-agent environment and the Pommerman game and the results show that it significantly outperforms state-of-the-art MARL solutions, including Multi-agent Deep Q-Networks (MADQN), Multi-agent Deep Deterministic Policy Gradient (MADDPG), and QMIX	翻訳日:2021-05-02 13:41:29 公開日:2020-12-17
# (参考訳) 化学空間を探究する好奇心 -深層分子強化学習への内在的報酬- Curiosity in exploring chemical space: Intrinsic rewards for deep molecular reinforcement learning ( http://arxiv.org/abs/2012.11293v1 ) ライセンス: CC BY 4.0	Luca A. Thiede, Mario Krenn, AkshatKumar Nigam, Alan Aspuru-Guzik	(参考訳) コンピュータ支援による分子の設計は、薬物や物質発見の分野をディスラプトする可能性がある。機械学習、特にディープラーニングは、この分野が急速に発展しているトピックである。強化学習は、事前知識なしで分子設計を可能にするため、特に有望なアプローチである。しかし,強化学習エージェントを用いた場合,検索空間は広く,効率的な探索が望ましい。本研究では,効率的な探索を支援するアルゴリズムを提案する。このアルゴリズムは、キュリオシティとして知られる概念にインスパイアされている。興味のあるエージェントがより優れた分子を見つけるための3つのベンチマークを示す。これは、自身のモチベーションから化学空間を探索できる強化学習エージェントのための、エキサイティングな新しい研究方向を示している。これは、人類がこれまで考えていなかった予期せぬ新しい分子を生み出す可能性がある。 Computer-aided design of molecules has the potential to disrupt the field of drug and material discovery. Machine learning, and deep learning, in particular, have been topics where the field has been developing at a rapid pace. Reinforcement learning is a particularly promising approach since it allows for molecular design without prior knowledge. However, the search space is vast and efficient exploration is desirable when using reinforcement learning agents. In this study, we propose an algorithm to aid efficient exploration. The algorithm is inspired by a concept known in the literature as curiosity. We show on three benchmarks that a curious agent finds better performing molecules. This indicates an exciting new research direction for reinforcement learning agents that can explore the chemical space out of their own motivation. This has the potential to eventually lead to unexpected new molecules that no human has thought about so far.	翻訳日:2021-05-02 13:32:13 公開日:2020-12-17
# (参考訳) 回転バウンディングボックスの円形損失関数を用いた終端物体追跡 End-to-end Deep Object Tracking with Circular Loss Function for Rotated Bounding Box ( http://arxiv.org/abs/2012.09771v1 ) ライセンス: CC BY 4.0	Vladislav Belyaev, Aleksandra Malysheva, Aleksei Shpilman	(参考訳) タスクオブジェクトのトラッキングは、自動運転、インテリジェントな監視、ロボット工学など、多くのアプリケーションで不可欠です。このタスクは、ビデオストリーム内のオブジェクトへのバウンディングボックスの割り当てを伴い、最初のフレームのオブジェクトのバウンディングボックスのみを与えられる。 2015年、軸に沿ったものの拡張として回転バウンディングボックスを導入した新しいタイプのビデオオブジェクト追跡(VOT)データセットが作成された。本研究では,Transformer Multi-Head Attentionアーキテクチャに基づくエンドツーエンドのディープラーニング手法を提案する。また,境界ボックスの重なりと向きを考慮に入れた新しいタイプの損失関数を提案する。円形損失関数(DOTCL)を用いたDeep Object Trackingモデルでは,現在の最先端のディープラーニングモデルよりも堅牢性が大幅に向上している。また、期待平均オーバーラップ(EAO)メトリックの観点から、VOT2018データセットの最先端のオブジェクトトラッキング手法よりも優れています。 The task object tracking is vital in numerous applications such as autonomous driving, intelligent surveillance, robotics, etc. This task entails the assigning of a bounding box to an object in a video stream, given only the bounding box for that object on the first frame. In 2015, a new type of video object tracking (VOT) dataset was created that introduced rotated bounding boxes as an extension of axis-aligned ones. In this work, we introduce a novel end-to-end deep learning method based on the Transformer Multi-Head Attention architecture. We also present a new type of loss function, which takes into account the bounding box overlap and orientation. Our Deep Object Tracking model with Circular Loss Function (DOTCL) shows an considerable improvement in terms of robustness over current state-of-the-art end-to-end deep learning models. It also outperforms state-of-the-art object tracking methods on VOT2018 dataset in terms of expected average overlap (EAO) metric.	翻訳日:2021-05-02 13:23:16 公開日:2020-12-17
# (参考訳) 野生におけるハンドオブジェクトインタラクションの再構築 Reconstructing Hand-Object Interactions in the Wild ( http://arxiv.org/abs/2012.09856v1 ) ライセンス: CC BY 4.0	Zhe Cao, Ilija Radosavovic, Angjoo Kanazawa, Jitendra Malik	(参考訳) 本研究では,野生におけるハンドオブジェクトインタラクションの再構築について検討する。この問題の主な課題は、適切な3Dラベル付きデータの欠如である。この問題を解決するために,直接3D監視を必要としない最適化手法を提案する。私たちが採用する一般的な戦略は,利用可能なすべての関連データ(2dバウンディングボックス,2dハンドキーポイント,2dインスタンスマスク,3dオブジェクトモデル,3d in-the-lab mocap)を活用して,3d再構成の制約を提供することです。手と物体を個別に最適化するのではなく、手オブジェクトの接触、衝突、閉塞に基づく追加の制約を課すことができるように、それらを共同で最適化する。提案手法は,EPIC Kitchens と 100 Days of Hands のデータセットから,様々な対象カテゴリにまたがる挑戦的なデータに対して,魅力的な再構築を行う。定量的に,我々のアプローチは,ground truth 3d アノテーションが利用可能なラボ環境における既存のアプローチと好適に比較できることを実証する。 In this work we explore reconstructing hand-object interactions in the wild. The core challenge of this problem is the lack of appropriate 3D labeled data. To overcome this issue, we propose an optimization-based procedure which does not require direct 3D supervision. The general strategy we adopt is to exploit all available related data (2D bounding boxes, 2D hand keypoints, 2D instance masks, 3D object models, 3D in-the-lab MoCap) to provide constraints for the 3D reconstruction. Rather than optimizing the hand and object individually, we optimize them jointly which allows us to impose additional constraints based on hand-object contact, collision, and occlusion. Our method produces compelling reconstructions on the challenging in-the-wild data from the EPIC Kitchens and the 100 Days of Hands datasets, across a range of object categories. Quantitatively, we demonstrate that our approach compares favorably to existing approaches in the lab settings where ground truth 3D annotations are available.	翻訳日:2021-05-02 11:18:20 公開日:2020-12-17
# (参考訳) FantastIC4: 4bit-Compact Multilayer Perceptronの効率的な動作のためのハードウェアソフトウェア共同設計手法 FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4bit-Compact Multilayer Perceptrons ( http://arxiv.org/abs/2012.11331v1 ) ライセンス: CC BY 4.0	Simon Wiedemann, Suhas Shivapakash, Pablo Wiedemann, Daniel Becking, Wojciech Samek, Friedel Gerfers, Thomas Wiegand	(参考訳) ディープラーニングモデルを"エッジ"にデプロイする需要が高まっているため、非常に厳密で限られたリソース制約の中で最先端のモデルを実行できる技術を開発することが最重要である。本研究では,完全接続層に基づくディープニューラルネットワーク(DNN)の高効率実行エンジンを実現するためのソフトウェアハードウェア最適化パラダイムを提案する。提案手法は,高い予測性能を有する多層パーセプトロン(MLP)の面積削減と電力要求の低減を目的とした圧縮を中心にしている。まず、ファンタスティック4と呼ばれる新しいハードウェアアーキテクチャを設計し、(1)完全連結層の複数のコンパクト表現の効率的なオンチップ実行をサポートし、(2)推論に必要な乗算器の数をわずか4(名前)まで最小化する。さらに、ファンタスティック4上での効率的な実行のためにモデルを改善可能にするため、4ビット量子化に頑健で、同時に圧縮性が高い新しいエントロピー拘束トレーニング手法を提案する。実験結果から,仮想超音速FPGA XCVU440デバイス実装において,総消費電力3.6Wの2.45TOPSのスループットを実現し,22nmプロセスASIC版では20.17TOPS/Wのスループットを実現することができた。 Google Speech Command(GSC)データセット用に設計された他の最先端アクセラレータと比較すると、スループットに関しては51$\times$、面積効率(GOPS/W)では145$\times$がよい。 With the growing demand for deploying deep learning models to the "edge", it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. Our approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named FantastIC4, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the models amenable for efficient execution on FantastIC4, we introduce a novel entropy-constrained training method that renders them to be robust to 4bit quantization and highly compressible in size simultaneously. The experimental results show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version. When compared to the other state-of-the-art accelerators designed for the Google Speech Command (GSC) dataset, FantastIC4 is better by 51$\times$ in terms of throughput and 145$\times$ in terms of area efficiency (GOPS/W).	翻訳日:2021-05-02 11:03:29 公開日:2020-12-17
# (参考訳) 注意に基づくイメージアップサンプリング Attention-based Image Upsampling ( http://arxiv.org/abs/2012.09904v1 ) ライセンス: CC BY 4.0	Souvik Kundu, Hesham Mostafa, Sharath Nittur Sridhar, Sairam Sundaresan	(参考訳) 畳み込み層は、コンピュータビジョンにおける多くのディープニューラルネットワークソリューションの不可欠な部分である。近年の研究では、標準畳み込み操作を自己注意に基づくメカニズムに置き換えることで、画像分類や物体検出タスクの性能が改善されている。本稿では,別の正準演算であるstrided transposed convolutionをアテンション機構で置き換える方法について述べる。特徴写像の空間的次元を増加/上昇させるので,新しい注意に基づく操作注意に基づくアップサンプリングと呼ぶ。単一画像の超解像とジョイント画像のアップサンプリングタスクの実験を通じて,従来のアップサンプリング手法よりも,より少ないパラメータを用いて,ストレート変換畳み込みや適応フィルタを基本としたアテンションベースアップサンプリングを一貫して上回っていることを示す。注意係数と注意目標の計算に別個のソースを使用できるアテンション機構の固有の柔軟性は、複数の画像モダリティからの情報を融合する際に、アテンションベースアップサンプリングが自然な選択であることを示す。 Convolutional layers are an integral part of many deep neural network solutions in computer vision. Recent work shows that replacing the standard convolution operation with mechanisms based on self-attention leads to improved performance on image classification and object detection tasks. In this work, we show how attention mechanisms can be used to replace another canonical operation: strided transposed convolution. We term our novel attention-based operation attention-based upsampling since it increases/upsamples the spatial dimensions of the feature maps. Through experiments on single image super-resolution and joint-image upsampling tasks, we show that attention-based upsampling consistently outperforms traditional upsampling methods based on strided transposed convolution or based on adaptive filters while using fewer parameters. We show that the inherent flexibility of the attention mechanism, which allows it to use separate sources for calculating the attention coefficients and the attention targets, makes attention-based upsampling a natural choice when fusing information from multiple image modalities.	翻訳日:2021-05-02 10:34:33 公開日:2020-12-17
# (参考訳) 病理的特徴を用いた不確実性処理 : 高リスク癌生存法開発のためのプライマリケアデータの活用の可能性 Handling uncertainty using features from pathology: opportunities in primary care data for developing high risk cancer survival methods ( http://arxiv.org/abs/2012.09976v1 ) ライセンス: CC BY 4.0	Goce Ristanoski, Jon Emery, Javiera Martinez-Gutierrez, Damien Mccarthy, Uwe Aickelin	(参考訳) 2019年、オーストラリア人144万人以上ががんと診断された。大多数は、スクリーニングプログラムが存在する癌であっても、まずgpの症状を呈する。プライマリケアにおけるがんの診断は、がん症状の非特異的な性質と頻度が低いため困難である。がんの症状の疫学と,プライマリケアデータから患者の医療史の提示パターンを理解することは,早期発見とがん予後を改善する上で重要であると考えられた。過去の患者の医療データは不完全、不規則、または欠如である可能性があるため、新しい診断に患者の歴史を使おうとする際、さらなる課題が生じる。本研究の目的は,患者がGPで利用できる病歴の機会を探ることであり,早期に高リスク癌予後と治療成績の関連性を検討するために,早期に注文された全血液計数の結果に焦点をあてることである。 2年以内に癌を生存しないリスクのある患者に焦点をあてて,過去の病理検査結果が癌の予後を予測するのに利用できる特徴の導出につながるかを検討した。この最初の研究は肺癌患者に焦点を当てているが、その方法論は他の種類のがんや他の医療記録に応用できる。病理組織学的検査は,不完全あるいは不明瞭な症例においても,癌リスクと生存率の予測に関連性のある特徴を生じさせるのに有用であると考えられた。以上の結果から,高リスク癌診断のための病理検査データの利用が強く示唆され,同様の目的で,新たな病理指標や他のプライマリケアデータセットの利用がさらに促進された。 More than 144 000 Australians were diagnosed with cancer in 2019. The majority will first present to their GP symptomatically, even for cancer for which screening programs exist. Diagnosing cancer in primary care is challenging due to the non-specific nature of cancer symptoms and its low prevalence. Understanding the epidemiology of cancer symptoms and patterns of presentation in patient's medical history from primary care data could be important to improve earlier detection and cancer outcomes. As past medical data about a patient can be incomplete, irregular or missing, this creates additional challenges when attempting to use the patient's history for any new diagnosis. Our research aims to investigate the opportunities in a patient's pathology history available to a GP, initially focused on the results within the frequently ordered full blood count to determine relevance to a future high-risk cancer prognosis, and treatment outcome. We investigated how past pathology test results can lead to deriving features that can be used to predict cancer outcomes, with emphasis on patients at risk of not surviving the cancer within 2-year period. This initial work focuses on patients with lung cancer, although the methodology can be applied to other types of cancer and other data within the medical record. Our findings indicate that even in cases of incomplete or obscure patient history, hematological measures can be useful in generating features relevant for predicting cancer risk and survival. The results strongly indicate to add the use of pathology test data for potential high-risk cancer diagnosis, and the utilize additional pathology metrics or other primary care datasets even more for similar purposes.	翻訳日:2021-05-02 09:14:50 公開日:2020-12-17
# (参考訳) コミュニティ分析のための二項尾 Binomial Tails for Community Analysis ( http://arxiv.org/abs/2012.09968v1 ) ライセンス: CC BY 4.0	Omid Madani, Thanh Ngo, Weifei Zeng, Sai Ankith Averine, Sasidhar Evuru, Varun Malhotra, Shashidhar Gandham, Navindra Yadav	(参考訳) ネットワークにおけるコミュニティ発見の重要な課題は、結果の重要性と、生成した候補グループのロバストなランキングを評価することである。多くの場合、多くの候補コミュニティが発見され、アナリストの時間を最も有望で有望な発見に集中することが重要です。二項モデルを用いて,末尾確率から導出した簡便なグループスコアリング関数を開発した。合成および多数の実世界のデータに関する実験は、二項スコアリングがコンダクタンスのような他の安価なスコアリング関数よりも堅牢なランク付けにつながることを示す。さらに、検出されたグループをフィルタリングしラベル付けするために使用できる信頼値(p$-values)を得る。我々の分析はアプローチの様々な特性に光を当てた。二項尾は単純で汎用的であり、コミュニティ分析の他の2つの応用として、コミュニティメンバーシップの度合い(それがグループスコア機能をもたらす)と、コミュニティが引き起こすグラフにおける重要なエッジの発見について述べる。 An important task of community discovery in networks is assessing significance of the results and robust ranking of the generated candidate groups. Often in practice, numerous candidate communities are discovered, and focusing the analyst's time on the most salient and promising findings is crucial. We develop simple efficient group scoring functions derived from tail probabilities using binomial models. Experiments on synthetic and numerous real-world data provides evidence that binomial scoring leads to a more robust ranking than other inexpensive scoring functions, such as conductance. Furthermore, we obtain confidence values ($p$-values) that can be used for filtering and labeling the discovered groups. Our analyses shed light on various properties of the approach. The binomial tail is simple and versatile, and we describe two other applications for community analysis: degree of community membership (which in turn yields group-scoring functions), and the discovery of significant edges in the community-induced graph.	翻訳日:2021-05-02 08:34:14 公開日:2020-12-17
# 自然言語処理における持続的生涯学習 : 調査 Continual Lifelong Learning in Natural Language Processing: A Survey ( http://arxiv.org/abs/2012.09823v1 ) ライセンス: Link先を確認	Magdalena Biesialska and Katarzyna Biesialska and Marta R. Costa-juss\`a	(参考訳) 連続学習(continual learning, cl)は,情報システムが時間を越えた連続的なデータストリームから学ぶことを可能にする。しかし,既存のディープラーニングアーキテクチャでは,従来の知識を忘れずに新しいタスクを学習することは困難である。さらに、CLは言語学習において特に困難であり、自然言語は曖昧である:それは離散的で構成的であり、その意味は文脈に依存している。本研究では,様々なNLPタスクのレンズを通してCLの問題を考察する。本調査では,CLにおける主な課題とニューラルネットワークモデルに適用された現在の手法について論じる。また,NLPにおける既存のCL評価手法とデータセットの批判的レビューを行う。最後に,今後の研究方向性について概観する。 Continual learning (CL) aims to enable information systems to learn from a continuous data stream across time. However, it is difficult for existing deep learning architectures to learn a new task without largely forgetting previously acquired knowledge. Furthermore, CL is particularly challenging for language learning, as natural language is ambiguous: it is discrete, compositional, and its meaning is context-dependent. In this work, we look at the problem of CL through the lens of various NLP tasks. Our survey discusses major challenges in CL and current methods applied in neural network models. We also provide a critical review of the existing CL evaluation methods and datasets in NLP. Finally, we present our outlook on future research directions.	翻訳日:2021-05-02 07:42:34 公開日:2020-12-17
# マルウェア検出への定記憶による極長の分類 Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection ( http://arxiv.org/abs/2012.09390v1 ) ライセンス: Link先を確認	Edward Raff, William Fleshman, Richard Zak, Hyrum S. Anderson, Bobby Filar, Mark McLean	(参考訳) 機械学習における最近の研究は、特に極端な長さのシーケンス分類問題をサイバーセキュリティが提示している。 Windows実行可能マルウェア検出の場合、入力は100ドル MB を超え、これは$T=100,000,000 ステップの時系列に対応する。現在、そのようなタスクを処理するための最も近いアプローチは、最大2000,000ドルのステップを処理できる畳み込みニューラルネットワークであるMalConvである。 CNNの$\mathcal{O}(T)$メモリは、CNNのマルウェアへのさらなる適用を妨げている。本研究では,時間的最大値プーリングに対する新たなアプローチを開発し,必要なメモリを列長$T$に不変にする。これにより、MalConv $116\times$ メモリ効率が向上し、25.8\times$ のトレーニング速度が向上し、MalConvへの入力長制限が取り除かれた。我々は,MalConvアーキテクチャを改良するために,新たなGlobal Channel Gating設計を導入し,従来のMalConv CNNに欠ける機能である1億のタイムステップにわたる機能インタラクションを効率的に学習する機構について検討した。私たちの実装はhttps://github.com/NeuromorphicComputationResearchProgram/MalConv2で確認できます。 Recent works within machine learning have been tackling inputs of ever-increasing size, with cybersecurity presenting sequence classification problems of particularly extreme lengths. In the case of Windows executable malware detection, inputs may exceed $100$ MB, which corresponds to a time series with $T=100,000,000$ steps. To date, the closest approach to handling such a task is MalConv, a convolutional neural network capable of processing up to $T=2,000,000$ steps. The $\mathcal{O}(T)$ memory of CNNs has prevented further application of CNNs to malware. In this work, we develop a new approach to temporal max pooling that makes the required memory invariant to the sequence length $T$. This makes MalConv $116\times$ more memory efficient, and up to $25.8\times$ faster to train on its original dataset, while removing the input length restrictions to MalConv. We re-invest these gains into improving the MalConv architecture by developing a new Global Channel Gating design, giving us an attention mechanism capable of learning feature interactions across 100 million time steps in an efficient manner, a capability lacked by the original MalConv CNN. Our implementation can be found at https://github.com/NeuromorphicComputationResearchProgram/MalConv2	翻訳日:2021-05-02 07:42:25 公開日:2020-12-17
# マルコフ等価DAGのカウントとサンプリングのための多項式時間アルゴリズム Polynomial-Time Algorithms for Counting and Sampling Markov Equivalent DAGs ( http://arxiv.org/abs/2012.09679v1 ) ライセンス: Link先を確認	Marcel Wien\"obst and Max Bannach and Maciej Li\'skiewicz	(参考訳) マルコフ同値類からの有向非巡回グラフ(DAG)の計数と一様サンプリングは、グラフィカル因果解析の基本的な課題である。本稿では,これらの課題を多項式時間で実行可能であることを示し,この領域における長年のオープン問題を解く。我々のアルゴリズムは効果的で容易に実装できる。実験結果から, アルゴリズムは最先端手法よりも優れていた。 Counting and uniform sampling of directed acyclic graphs (DAGs) from a Markov equivalence class are fundamental tasks in graphical causal analysis. In this paper, we show that these tasks can be performed in polynomial time, solving a long-standing open problem in this area. Our algorithms are effective and easily implementable. Experimental results show that the algorithms significantly outperform state-of-the-art methods.	翻訳日:2021-05-02 07:42:03 公開日:2020-12-17
# 生存分析としての研究の再現性 Research Reproducibility as a Survival Analysis ( http://arxiv.org/abs/2012.09932v1 ) ライセンス: Link先を確認	Edward Raff	(参考訳) 機械学習コミュニティでは、再現性危機に直面しているという懸念が高まっています。多くの人がこの問題に取り組み始めていますが、私たちは、再現性の問題を本質的なバイナリプロパティとして扱うことに気付いています。そこで我々は,論文の再現可能性のモデル化を生存分析問題として検討する。我々は、この視点が再現可能な研究のメタ科学的疑問のより正確なモデルであることを論じ、生存分析がいかにして、先行する縦断的なデータを説明するための新たな洞察を引き出すかを示す。データとコードはhttps://github.com/EdwardRaff/Research-Reproducibility-Survival-Analysisで確認できる。 There has been increasing concern within the machine learning community that we are in a reproducibility crisis. As many have begun to work on this problem, all work we are aware of treat the issue of reproducibility as an intrinsic binary property: a paper is or is not reproducible. Instead, we consider modeling the reproducibility of a paper as a survival analysis problem. We argue that this perspective represents a more accurate model of the underlying meta-science question of reproducible research, and we show how a survival analysis allows us to draw new insights that better explain prior longitudinal data. The data and code can be found at https://github.com/EdwardRaff/Research-Reproducibility-Survival-Analysis	翻訳日:2021-05-02 07:41:57 公開日:2020-12-17
# 変圧器に基づく物体検出に向けて Toward Transformer-Based Object Detection ( http://arxiv.org/abs/2012.09958v1 ) ライセンス: Link先を確認	Josh Beal, Eric Kim, Eric Tzeng, Dong Huk Park, Andrew Zhai, Dmitry Kislyuk	(参考訳) トランスフォーマーは、大量のデータに基づいて事前訓練を行い、微調整によってより小さな特定のタスクに移行する能力のため、自然言語処理において支配的なモデルとなっている。 Vision Transformerは、純粋なトランスフォーマーモデルを直接入力として画像に適用する最初の主要な試みであり、畳み込みネットワークと比較して、トランスフォーマーベースのアーキテクチャはベンチマーク分類タスクにおいて競合的な結果が得られることを示した。しかしながら、注意演算子の計算複雑性は、低解像度入力に制限されることを意味する。検出やセグメンテーションのようなより複雑なタスクでは、高いインプット解像度を維持することが、モデルがアウトプットの細部を適切に識別し、反映できるように不可欠である。これにより、Vision Transformerのようなトランスフォーマーベースのアーキテクチャが、分類以外のタスクを実行できるかどうかという疑問が自然に持ち上がる。本稿では、共通検出タスクヘッドによって、視覚変換器をバックボーンとして使用し、競合するCOCO結果を生成する。提案するモデルであるViT-FRCNNは,事前学習能力と高速な微調整性能を含む,変圧器に関連するいくつかの既知の特性を示す。また、ドメイン外画像の性能の向上、大規模オブジェクトの性能向上、非最大抑圧への依存の低減など、標準的な検出バックボーンの改善についても検討した。我々は、ViT-FRCNNを、オブジェクト検出などの複雑な視覚タスクの純粋変換器ソリューションに向けた重要なステップストーンであると考えている。 Transformers have become the dominant model in natural language processing, owing to their ability to pretrain on massive amounts of data, then transfer to smaller, more specific tasks via fine-tuning. The Vision Transformer was the first major attempt to apply a pure transformer model directly to images as input, demonstrating that as compared to convolutional networks, transformer-based architectures can achieve competitive results on benchmark classification tasks. However, the computational complexity of the attention operator means that we are limited to low-resolution inputs. For more complex tasks such as detection or segmentation, maintaining a high input resolution is crucial to ensure that models can properly identify and reflect fine details in their output. This naturally raises the question of whether or not transformer-based architectures such as the Vision Transformer are capable of performing tasks other than classification. In this paper, we determine that Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results. The model that we propose, ViT-FRCNN, demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance. We also investigate improvements over a standard detection backbone, including superior performance on out-of-domain images, better performance on large objects, and a lessened reliance on non-maximum suppression. We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.	翻訳日:2021-05-02 07:41:47 公開日:2020-12-17
# トランスフォーマーはアクションの効果を判断できるのか? Can Transformers Reason About Effects of Actions? ( http://arxiv.org/abs/2012.09938v1 ) ライセンス: Link先を確認	Pratyay Banerjee, Chitta Baral, Man Luo, Arindam Mitra, Kuntal Pal, Tran C. Son, Neeraj Varshney	(参考訳) 最近の研究では、トランスフォーマーは、ルールが結論を暗示する条件の結合の自然言語表現である限定された環境で、事実とルールを「合理化」することができることが示されている。これは、トランスフォーマーが自然言語で与えられた知識を推論するために使われることを示唆するので、我々は、共通の知識の形式とその対応する推論、すなわち行動の影響に関する推論に関して、厳密な評価を行う。行動と変化に関する推論は、AIの初期からAIの知識表現サブフィールドにおける最重要課題であり、最近では常識的質問応答において目立った側面となっている。我々は、自然言語で4つのアクションドメイン(Blocks World、Logistics、Dock-Worker-Robots、Generic Domain)を検討し、これらのドメインにおけるアクションの効果を推論するQAデータセットを作成します。 a)これらの領域における推論を学習するトランスフォーマーの能力について検討し、(b)一般的なドメインから他のドメインへの学習を伝達する。 A recent work has shown that transformers are able to "reason" with facts and rules in a limited setting where the rules are natural language expressions of conjunctions of conditions implying a conclusion. Since this suggests that transformers may be used for reasoning with knowledge given in natural language, we do a rigorous evaluation of this with respect to a common form of knowledge and its corresponding reasoning -- the reasoning about effects of actions. Reasoning about action and change has been a top focus in the knowledge representation subfield of AI from the early days of AI and more recently it has been a highlight aspect in common sense question answering. We consider four action domains (Blocks World, Logistics, Dock-Worker-Robots and a Generic Domain) in natural language and create QA datasets that involve reasoning about the effects of actions in these domains. We investigate the ability of transformers to (a) learn to reason in these domains and (b) transfer that learning from the generic domains to the other domains.	翻訳日:2021-05-02 07:40:49 公開日:2020-12-17
# 畳み込みニューラルネットワークを用いたマルチモーダル深さ推定 Multi-Modal Depth Estimation Using Convolutional Neural Networks ( http://arxiv.org/abs/2012.09667v1 ) ライセンス: Link先を確認	Sadique Adnan Siddiqui, Axel Vierling and Karsten Berns	(参考訳) 本稿では,厳密な距離センサデータと単一カメラ画像から,厳密な奥行き予測の問題点について考察する。本研究は,Deep Learning アプローチの適用による深度推定における,カメラ,レーダー,ライダーなどのセンサモードの重要性について検討する。リダーはレーダよりも深度感知能力が高く、多くの過去の研究でカメラ画像と統合されているが、ロバストなレーダ距離データとカメラ画像の融合に基づくCNNの深度推定はあまり研究されていない。本研究では,高密度特徴抽出のための初期化のために高パフォーマンス事前学習モデルを用いたエンコーダと,所望の深さをアップサンプリングし予測するデコーダとからなる,転置学習手法を用いて深層回帰ネットワークを提案する。これらの結果は,CARLAシミュレータを用いて作成したNuscenes,KITTI,およびSyntheticデータセットで実証された。また、建設現場でクレーンから撮影したトップビューのズームカメラ画像を評価し、地上からの重荷を積んだクレーンブームの距離を推定し、安全クリティカルな用途のユーザビリティを示す。 This paper addresses the problem of dense depth predictions from sparse distance sensor data and a single camera image on challenging weather conditions. This work explores the significance of different sensor modalities such as camera, Radar, and Lidar for estimating depth by applying Deep Learning approaches. Although Lidar has higher depth-sensing abilities than Radar and has been integrated with camera images in lots of previous works, depth estimation using CNN's on the fusion of robust Radar distance data and camera images has not been explored much. In this work, a deep regression network is proposed utilizing a transfer learning approach consisting of an encoder where a high performing pre-trained model has been used to initialize it for extracting dense features and a decoder for upsampling and predicting desired depth. The results are demonstrated on Nuscenes, KITTI, and a Synthetic dataset which was created using the CARLA simulator. Also, top-view zoom-camera images captured from the crane on a construction site are evaluated to estimate the distance of the crane boom carrying heavy loads from the ground to show the usability in safety-critical applications.	翻訳日:2021-05-02 07:40:31 公開日:2020-12-17
# ニューラルネットワーク圧縮を用いた効率的なCNN-LSTM画像キャプション Efficient CNN-LSTM based Image Captioning using Neural Network Compression ( http://arxiv.org/abs/2012.09708v1 ) ライセンス: Link先を確認	Harshit Rampal, Aman Mohanty	(参考訳) 現代のニューラルネットワークは、コンピュータビジョン、自然言語処理および関連する分野のタスクにおけるアートパフォーマンスの状態を達成している。しかし、彼らは、リソース制限されたエッジデバイスへのデプロイをさらに阻害する、猛烈なメモリと計算の食欲で悪名高い。エッジデプロイメントを実現するために、研究者はネットワークの有効性を損なうことなく圧縮するプラニングと量子化アルゴリズムを開発した。このような圧縮アルゴリズムはスタンドアロンのCNNおよびRNNアーキテクチャで広く実験されているが、本研究では、CNN-LSTMベースの画像キャプチャーモデルの非従来型エンドツーエンド圧縮パイプラインを示す。このモデルは、flickr8kデータセット上のエンコーダとLSTMデコーダとしてVGG16またはResNet50を使用してトレーニングされる。次に,異なる圧縮アーキテクチャがモデルに与える影響を調べ,モデルサイズを73.1%削減し,推論時間を71.3%削減し,非圧縮アーキテクチャに比べてbleuスコアを7.7%向上させる圧縮アーキテクチャを設計する。 Modern Neural Networks are eminent in achieving state of the art performance on tasks under Computer Vision, Natural Language Processing and related verticals. However, they are notorious for their voracious memory and compute appetite which further obstructs their deployment on resource limited edge devices. In order to achieve edge deployment, researchers have developed pruning and quantization algorithms to compress such networks without compromising their efficacy. Such compression algorithms are broadly experimented on standalone CNN and RNN architectures while in this work, we present an unconventional end to end compression pipeline of a CNN-LSTM based Image Captioning model. The model is trained using VGG16 or ResNet50 as an encoder and an LSTM decoder on the flickr8k dataset. We then examine the effects of different compression architectures on the model and design a compression architecture that achieves a 73.1% reduction in model size, 71.3% reduction in inference time and a 7.7% increase in BLEU score as compared to its uncompressed counterpart.	翻訳日:2021-05-02 07:40:14 公開日:2020-12-17
# ReferentialGym: (Visual) Referential Gamesにおける言語創発と接地のための命名と枠組み ReferentialGym: A Nomenclature and Framework for Language Emergence & Grounding in (Visual) Referential Games ( http://arxiv.org/abs/2012.09486v1 ) ライセンス: Link先を確認	Kevin Denamgana\"i and James Alfred Walker	(参考訳) 自然言語は、人間が情報を伝達し、共通の目標に向けて協力するための強力なツールである。彼らの値はコンポジション性、階層性、リカレント構文といったいくつかの主要な特性に関係しており、計算言語学者は言語ゲームによって引き起こされる人工言語における出現を研究している。ごく最近になって、AIコミュニティは、より良いヒューマンマシンインターフェースに向けた言語出現と基盤の研究を開始した。例えば、対話型/会話型AIアシスタントは、自身のビジョンと進行中の会話を関連付けることができる。本稿では,本研究への2つの貢献について述べる。第一に, 言語創発と接地の研究における主なイニシアティブを理解するための命名法を提案し, 仮定と制約のバリエーションを考察した。次に、PyTorchベースのディープラーニングフレームワークReferentialGymを紹介します。主要なアルゴリズムとメトリクスのベースライン実装を提供することで、多くの異なる機能やアプローチに加えて、referentialgymはフィールドへの参入障壁を緩和し、コミュニティに共通の実装を提供する。 Natural languages are powerful tools wielded by human beings to communicate information and co-operate towards common goals. Their values lie in some main properties like compositionality, hierarchy and recurrent syntax, which computational linguists have been researching the emergence of in artificial languages induced by language games. Only relatively recently, the AI community has started to investigate language emergence and grounding working towards better human-machine interfaces. For instance, interactive/conversational AI assistants that are able to relate their vision to the ongoing conversation. This paper provides two contributions to this research field. Firstly, a nomenclature is proposed to understand the main initiatives in studying language emergence and grounding, accounting for the variations in assumptions and constraints. Secondly, a PyTorch based deep learning framework is introduced, entitled ReferentialGym, which is dedicated to furthering the exploration of language emergence and grounding. By providing baseline implementations of major algorithms and metrics, in addition to many different features and approaches, ReferentialGym attempts to ease the entry barrier to the field and provide the community with common implementations.	翻訳日:2021-05-02 07:39:56 公開日:2020-12-17
# 高出力同期深部RL High-Throughput Synchronous Deep RL ( http://arxiv.org/abs/2012.09849v1 ) ライセンス: Link先を確認	Iou-Jen Liu and Raymond A. Yeh and Alexander G. Schwing	(参考訳) 深層強化学習(RL)は計算的に要求され、多くのデータポイントの処理を必要とする。同期メソッドは、データスループットを低くしながらトレーニングの安定性を楽しむ。対照的に、非同期メソッドは高いスループットを実現するが、安定性の問題や'スタックポリシー'によるサンプル効率の低下に悩まされる。両手法の利点を組み合わせるために,HTS-RL(High-Throughput Synchronous Deep Reinforcement Learning)を提案する。 HTS-RLでは,学習とロールアウトを同時に実施し,「安定ポリシー」を回避するシステム設計を考案し,アクターが完全な決定性を維持しつつ,非同期で環境レプリカと対話することを保証する。我々は,アタリゲームとGoogle Research Football環境に対するアプローチを評価した。同期ベースラインと比較して、HTS-RLは2-6$\times$高速である。最先端の非同期手法と比較して、HTS-RLは競争力があり、平均的なエピソード報酬を一貫して達成する。 Deep reinforcement learning (RL) is computationally demanding and requires processing of many data points. Synchronous methods enjoy training stability while having lower data throughput. In contrast, asynchronous methods achieve high throughput but suffer from stability issues and lower sample efficiency due to `stale policies.' To combine the advantages of both methods we propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL). In HTS-RL, we perform learning and rollouts concurrently, devise a system design which avoids `stale policies' and ensure that actors interact with environment replicas in an asynchronous manner while maintaining full determinism. We evaluate our approach on Atari games and the Google Research Football environment. Compared to synchronous baselines, HTS-RL is 2-6$\times$ faster. Compared to state-of-the-art asynchronous methods, HTS-RL has competitive throughput and consistently achieves higher average episode rewards.	翻訳日:2021-05-02 07:39:12 公開日:2020-12-17
# 低境界の損失フィードバックの専門家:統一フレームワーク Experts with Lower-Bounded Loss Feedback: A Unifying Framework ( http://arxiv.org/abs/2012.09537v1 ) ライセンス: Link先を確認	Eyal Gofer and Guy Gilboa	(参考訳) 最高の専門家問題の最も顕著なフィードバックモデルは、完全な情報とバンディットモデルである。本研究では,各ラウンドにおいて,バンディットフィードバックに加えて,各専門家の損失率を低く抑えるために,双方を一般化した単純なフィードバックモデルを検討する。このような低い境界は、例えば株式取引や特定の測定装置の誤差を評価する際の様々なシナリオで得られる。このモデルでは、Exp3の修正版に対する最適後悔境界(対数係数まで)を証明し、バンディットと全情報設定の両方に対してアルゴリズムと境界を一般化する。我々の2段階の統合的後悔分析は、2段階の損失更新をシミュレートし、3つのヘッセン語やヘッセン語のような表現を強調します。この結果から,各ラウンドにおける専門家の任意のサブセットからのフィードバックを,グラフ構造化されたフィードバックで受けられるようにした。しかし,本モデルでは,各損失に対する非自明な下限を許容することで,単者レベルでの部分的なフィードバックを許容する。 The most prominent feedback models for the best expert problem are the full information and bandit models. In this work we consider a simple feedback model that generalizes both, where on every round, in addition to a bandit feedback, the adversary provides a lower bound on the loss of each expert. Such lower bounds may be obtained in various scenarios, for instance, in stock trading or in assessing errors of certain measurement devices. For this model we prove optimal regret bounds (up to logarithmic factors) for modified versions of Exp3, generalizing algorithms and bounds both for the bandit and the full-information settings. Our second-order unified regret analysis simulates a two-step loss update and highlights three Hessian or Hessian-like expressions, which map to the full-information regret, bandit regret, and a hybrid of both. Our results intersect with those for bandits with graph-structured feedback, in that both settings can accommodate feedback from an arbitrary subset of experts on each round. However, our model also accommodates partial feedback at the single-expert level, by allowing non-trivial lower bounds on each loss.	翻訳日:2021-05-02 07:38:25 公開日:2020-12-17
# 対称ラプラシアン逆行列を用いた混合メンバーシップの推定 Estimating mixed-memberships using the Symmetric Laplacian Inverse Matrix ( http://arxiv.org/abs/2012.09561v1 ) ライセンス: Link先を確認	Huan Qing and Jingli Wang	(参考訳) コミュニティ検出はネットワーク分析においてよく研究されており、あるネットワークに対して高速で統計的に分析可能なスペクトルクラスタリングが人気である。しかし、混成会員コミュニティ検出のより現実的なケースは依然として課題である。本稿では,混合会員コミュニティ検出のためのスペクトルクラスタリング手法Mixed-SLIMを提案する。混合SLIMはシンメトリゼーションされたラプラシア逆行列 (SLIM) (Jing et al) に基づいて設計されている。 2021年) 度補正混合メンバーシップ(dcmm)モデル。このアルゴリズムとその正規化バージョン Mixed-SLIM {\tau} は、温和な条件下で漸近的に整合していることを示す。一方,Mixed-SLIMアポとその正規化バージョンであるMixed-SLIM {\tau}approは,大規模ネットワークを扱う場合のSLIM行列を近似することで提供する。これらの4つの混合SLIM法は,コミュニティ検出問題と混合コミュニティ検出問題の両方において,シミュレーションにおける最先端の手法と実際の実験データセットより優れている。 Community detection has been well studied in network analysis, and one popular technique is spectral clustering which is fast and statistically analyzable for detect-ing clusters for given networks. But the more realistic case of mixed membership community detection remains a challenge. In this paper, we propose a new spectral clustering method Mixed-SLIM for mixed membership community detection. Mixed-SLIM is designed based on the symmetrized Laplacian inverse matrix (SLIM) (Jing et al. 2021) under the degree-corrected mixed membership (DCMM) model. We show that this algorithm and its regularized version Mixed-SLIM {\tau} are asymptotically consistent under mild conditions. Meanwhile, we provide Mixed-SLIM appro and its regularized version Mixed-SLIM {\tau}appro by approximating the SLIM matrix when dealing with large networks in practice. These four Mixed-SLIM methods outperform state-of-art methods in simulations and substantial empirical datasets for both community detection and mixed membership community detection problems.	翻訳日:2021-05-02 07:38:06 公開日:2020-12-17
# DenseHMM:Dense表現の学習による隠れマルコフモデル学習 DenseHMM: Learning Hidden Markov Models by Learning Dense Representations ( http://arxiv.org/abs/2012.09783v1 ) ライセンス: Link先を確認	Joachim Sicking, Maximilian Pintz, Maram Akila, Tim Wirtz	(参考訳) 本研究では,隠れマルコフモデル(hidden markov model:hmms)の修正法である densehmm を提案する。標準的なHMMと比較して、遷移確率は原子ではなく、カーネル化によるこれらの表現で構成されている。本手法は制約なしおよび勾配ベース最適化を可能にする。本稿では,baum-welchアルゴリズムの改良と直接共起最適化という2つの最適化手法を提案する。後者は高度にスケーラブルで、標準的なhmmと比べて経験上パフォーマンスが損なわれない。カーネル化の非線形性は表現の表現性に不可欠であることを示す。 DenseHMMの学習された共起物やログのような性質は、合成および生医学的なデータセットで経験的に研究されている。 We propose DenseHMM - a modification of Hidden Markov Models (HMMs) that allows to learn dense representations of both the hidden states and the observables. Compared to the standard HMM, transition probabilities are not atomic but composed of these representations via kernelization. Our approach enables constraint-free and gradient-based optimization. We propose two optimization schemes that make use of this: a modification of the Baum-Welch algorithm and a direct co-occurrence optimization. The latter one is highly scalable and comes empirically without loss of performance compared to standard HMMs. We show that the non-linearity of the kernelization is crucial for the expressiveness of the representations. The properties of the DenseHMM like learned co-occurrences and log-likelihoods are studied empirically on synthetic and biomedical datasets.	翻訳日:2021-05-02 07:37:51 公開日:2020-12-17
# Marginal Likelihood Maximizationによるニューラルネットワークの初期化誘導 Guiding Neural Network Initialization via Marginal Likelihood Maximization ( http://arxiv.org/abs/2012.09943v1 ) ライセンス: Link先を確認	Anthony S. Tai, Chunfeng Huang	(参考訳) 本稿では,ハイパーパラメータ選択をニューラルネットワークの初期化に導くための簡易なデータ駆動手法を提案する。モデル初期化に望ましいハイパーパラメータ値を推定するために、対応する活性化関数と共分散関数を持つガウス過程モデルとニューラルネットワークの関係を利用する。実験の結果,実験条件下でのmnist分類タスクの最適に近い予測性能が得られた。さらに,提案手法の整合性を示す実験結果から,より少ないトレーニングセットで計算コストを大幅に削減できることが示唆された。 We propose a simple, data-driven approach to help guide hyperparameter selection for neural network initialization. We leverage the relationship between neural network and Gaussian process models having corresponding activation and covariance functions to infer the hyperparameter values desirable for model initialization. Our experiment shows that marginal likelihood maximization provides recommendations that yield near-optimal prediction performance on MNIST classification task under experiment constraints. Furthermore, our empirical results indicate consistency in the proposed technique, suggesting that computation cost for the procedure could be significantly reduced with smaller training sets.	翻訳日:2021-05-02 07:37:39 公開日:2020-12-17
# ベイズニューラルネットワークを用いた高次元レベルセット推定 High Dimensional Level Set Estimation with Bayesian Neural Network ( http://arxiv.org/abs/2012.09973v1 ) ライセンス: Link先を確認	Huong Ha, Sunil Gupta, Santu Rana, Svetha Venkatesh	(参考訳) レベルセット推定(LSE)は、材料設計、バイオテクノロジー、機械操作テストなど様々な分野の応用において重要な問題である。既存の技術ではスケーラビリティの問題、すなわちこれらの手法は高次元入力ではうまく動作しない。本稿では,ベイズニューラルネットワークを用いた高次元LSE問題の解法を提案する。特に, (1) しきい値レベルが固定ユーザ指定値である場合の \textit{explicit} lse問題, (2) 目標関数の(未知)最大値の割合として閾値が定義される場合の \textit{implicit} lse問題である。各問題に対して対応する理論情報に基づく取得関数を導出してデータポイントをサンプリングし、レベル設定精度を最大に向上させる。さらに,提案する取得関数の理論的時間複雑性を解析し,ネットワークハイパーパラメータを効率的に調整し,高いモデル精度を達成するための実用的な手法を提案する。合成データと実世界のデータの両方における数値実験により,提案手法が従来の最先端手法よりも優れた結果が得られることを示した。 Level Set Estimation (LSE) is an important problem with applications in various fields such as material design, biotechnology, machine operational testing, etc. Existing techniques suffer from the scalability issue, that is, these methods do not work well with high dimensional inputs. This paper proposes novel methods to solve the high dimensional LSE problems using Bayesian Neural Networks. In particular, we consider two types of LSE problems: (1) \textit{explicit} LSE problem where the threshold level is a fixed user-specified value, and, (2) \textit{implicit} LSE problem where the threshold level is defined as a percentage of the (unknown) maximum of the objective function. For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points so as to maximally increase the level set accuracy. Furthermore, we also analyse the theoretical time complexity of our proposed acquisition functions, and suggest a practical methodology to efficiently tune the network hyper-parameters to achieve high model accuracy. Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.	翻訳日:2021-05-02 07:37:29 公開日:2020-12-17
# 敵防衛としてのDenoising Strategieの限界について On the Limitations of Denoising Strategies as Adversarial Defenses ( http://arxiv.org/abs/2012.09384v1 ) ライセンス: Link先を確認	Zhonghan Niu, Zhaoxi Chen, Linyi Li, Yubin Yang, Bo Li, Jinfeng Yi	(参考訳) 機械学習モデルに対する敵対的な攻撃が懸念を増す中、多くのデノワズベースの防御アプローチが提案されている。本稿では,データのデノイジングと再構成($f+$逆$f$,$f-if$フレームワーク)による対称変換という形で防衛戦略を要約・分析する。特に、これらの認知戦略を3つの側面(すなわち)から分類する。空間領域、周波数領域、潜在空間においてそれぞれ雑音化される)。通常、対向的な例で防御が行われ、画像と摂動の両方が修正され、摂動に対してどのように防御するかを判断することは困難である。直感的にこれらの難読化戦略の頑健さを評価するため、敵の雑音自体を防御するために直接適用し、良識を犠牲にするのを防ぎます。意外なことに、実験の結果、各次元の摂動の大部分を排除しても、満足な堅牢性を得るのは難しいことが示されている。以上の結果と解析に基づき,ロバスト性を改善するため,特徴領域の異なる周波数帯域に対する適応圧縮戦略を提案する。実験の結果,適応圧縮戦略は,既存手法と比較して,逆摂動の抑制やロバスト性の向上を可能にした。 As adversarial attacks against machine learning models have raised increasing concerns, many denoising-based defense approaches have been proposed. In this paper, we summarize and analyze the defense strategies in the form of symmetric transformation via data denoising and reconstruction (denoted as $F+$ inverse $F$, $F-IF$ Framework). In particular, we categorize these denoising strategies from three aspects (i.e. denoising in the spatial domain, frequency domain, and latent space, respectively). Typically, defense is performed on the entire adversarial example, both image and perturbation are modified, making it difficult to tell how it defends against the perturbations. To evaluate the robustness of these denoising strategies intuitively, we directly apply them to defend against adversarial noise itself (assuming we have obtained all of it), which saving us from sacrificing benign accuracy. Surprisingly, our experimental results show that even if most of the perturbations in each dimension is eliminated, it is still difficult to obtain satisfactory robustness. Based on the above findings and analyses, we propose the adaptive compression strategy for different frequency bands in the feature domain to improve the robustness. Our experiment results show that the adaptive compression strategies enable the model to better suppress adversarial perturbations, and improve robustness compared with existing denoising strategies.	翻訳日:2021-05-02 07:37:07 公開日:2020-12-17
# 自律走行のための時間ライダーフレーム予測 Temporal LiDAR Frame Prediction for Autonomous Driving ( http://arxiv.org/abs/2012.09409v1 ) ライセンス: Link先を確認	David Deng and Avideh Zakhor	(参考訳) ダイナミックなシーンで未来を予測することは、自律運転やロボット工学など、多くの分野において重要である。本稿では,従来のLiDARフレームを予測するための新しいニューラルネットワークアーキテクチャのクラスを提案する。このアプリケーションの基本的真理は、単にシーケンスの次のフレームであるので、自己教師型でモデルをトレーニングすることができる。提案アーキテクチャはFlowNet3DとDynamic Graph CNNに基づいている。我々は、損失関数と評価指標として、Chamfer Distance (CD) と Earth Mover's Distance (EMD) を用いる。新たにリリースされたnuScenesデータセットを使ってモデルをトレーニングし、評価し、いくつかのベースラインでそれらのパフォーマンスと複雑さを特徴付ける。 FlowNet3Dを直接使用するのに比べ、提案するアーキテクチャはCDとEMDをほぼ1桁小さくする。さらに, ラベル付き監視を使わずに, 合理的なシーンフロー近似を生成できることを示す。 Anticipating the future in a dynamic scene is critical for many fields such as autonomous driving and robotics. In this paper we propose a class of novel neural network architectures to predict future LiDAR frames given previous ones. Since the ground truth in this application is simply the next frame in the sequence, we can train our models in a self-supervised fashion. Our proposed architectures are based on FlowNet3D and Dynamic Graph CNN. We use Chamfer Distance (CD) and Earth Mover's Distance (EMD) as loss functions and evaluation metrics. We train and evaluate our models using the newly released nuScenes dataset, and characterize their performance and complexity with several baselines. Compared to directly using FlowNet3D, our proposed architectures achieve CD and EMD nearly an order of magnitude lower. In addition, we show that our predictions generate reasonable scene flow approximations without using any labelled supervision.	翻訳日:2021-05-02 07:36:32 公開日:2020-12-17
# エピソード, 原型的ネットワーク, 数少ない学習について On Episodes, Prototypical Networks, and Few-shot Learning ( http://arxiv.org/abs/2012.09831v1 ) ライセンス: Link先を確認	Steinar Laenen and Luca Bertinetto	(参考訳) エピソディクス学習は、少数の学習に興味を持つ研究者や実践者の間で人気のある実践である。一連の学習問題のトレーニングを組織化し、それぞれが小さな"サポート"セットと"クエリ"セットに依存して、評価中に遭遇する数少ない状況を模倣する。本稿では,この手法を応用したアルゴリズムの2つである,プロトタイプネットワークとマッチングネットワークにおけるエピソード学習の有用性について検討する。驚くべきことに、私たちの実験では、プロトタイプネットワークとマッチングネットワークでは、トレーニングサンプルをサポートとクエリセットに分離するエピソディクス学習戦略を使うのは、トレーニングバッチを利用するデータ非効率な方法である、ということが分かりました。古典的な近傍成分分析と密接に関連しているこれらの「非エピソジック」変種は、複数のデータセットにおけるエピソジックな特徴よりも確実に改善され、非常に単純なにもかかわらず(プロトタイプネットワークの場合)最先端技術と競合する正確性を達成する。 Episodic learning is a popular practice among researchers and practitioners interested in few-shot learning. It consists of organising training in a series of learning problems, each relying on small "support" and "query" sets to mimic the few-shot circumstances encountered during evaluation. In this paper, we investigate the usefulness of episodic learning in Prototypical Networks and Matching Networks, two of the most popular algorithms making use of this practice. Surprisingly, in our experiments we found that, for Prototypical and Matching Networks, it is detrimental to use the episodic learning strategy of separating training samples between support and query set, as it is a data-inefficient way to exploit training batches. These "non-episodic" variants, which are closely related to the classic Neighbourhood Component Analysis, reliably improve over their episodic counterparts in multiple datasets, achieving an accuracy that (in the case of Prototypical Networks) is competitive with the state-of-the-art, despite being extremely simple.	翻訳日:2021-05-02 07:36:19 公開日:2020-12-17
# ビデオ分類と推薦のための平滑化ガウス混合モデル Smoothed Gaussian Mixture Models for Video Classification and Recommendation ( http://arxiv.org/abs/2012.11673v1 ) ライセンス: Link先を確認	Sirjan Kafle, Aman Gupta, Xue Xia, Ananth Sankar, Xi Chen, Di Wen, Liang Zhang	(参考訳) VLAD(Vector of Locally Aggregated Descriptors)のようなクラスタ・アンド・アグリゲート技術や、NetVLADのようなエンドツーエンドの差別的に訓練された同等品は、最近ビデオ分類やアクション認識タスクで人気がある。これらの手法は、ビデオフレームをクラスタに割り当て、各クラスタの平均に関するフレームの残余を集約することで、ビデオを表現する。一部のクラスタはビデオ特有のデータが少ないため、これらの機能は騒がしい。本稿では,sugmented gaussian mixture model (sgmm) と呼ばれる新しいクラスタ・アンド・アグリゲーション法と,そのエンドツーエンドの識別訓練された等価値である deep smoothed gaussian mixture model (dsgmm) を提案する。 SGMMは、そのビデオのために訓練されたガウス混合モデル(GMM)のパラメータによって、各ビデオを表す。ローカウントクラスタは、多数のビデオでトレーニングされたユニバーサルバックグラウンドモデル(UBM)を用いて、ビデオ固有の見積をスムースにすることで対処される。 VLADに対するSGMMの主な利点はスムージングであり、少数のトレーニングサンプルに対する感度が低下する。 youtube-8m分類タスクの広範な実験を通じて、sgmm/dsgmmはvlad/netvladよりも小さいが統計的に有意なマージンで一貫して優れていることを示した。また、LinkedInで作成されたデータセットを使って、メンバーがアップロードされたビデオを見るかどうかを予測する。 Cluster-and-aggregate techniques such as Vector of Locally Aggregated Descriptors (VLAD), and their end-to-end discriminatively trained equivalents like NetVLAD have recently been popular for video classification and action recognition tasks. These techniques operate by assigning video frames to clusters and then representing the video by aggregating residuals of frames with respect to the mean of each cluster. Since some clusters may see very little video-specific data, these features can be noisy. In this paper, we propose a new cluster-and-aggregate method which we call smoothed Gaussian mixture model (SGMM), and its end-to-end discriminatively trained equivalent, which we call deep smoothed Gaussian mixture model (DSGMM). SGMM represents each video by the parameters of a Gaussian mixture model (GMM) trained for that video. Low-count clusters are addressed by smoothing the video-specific estimates with a universal background model (UBM) trained on a large number of videos. The primary benefit of SGMM over VLAD is smoothing which makes it less sensitive to small number of training samples. We show, through extensive experiments on the YouTube-8M classification task, that SGMM/DSGMM is consistently better than VLAD/NetVLAD by a small but statistically significant margin. We also show results using a dataset created at LinkedIn to predict if a member will watch an uploaded video.	翻訳日:2021-05-02 07:36:00 公開日:2020-12-17
# ポインタージェネレータネットワークを用いた法域における名前付きエンティティ認識 Named Entity Recognition in the Legal Domain using a Pointer Generator Network ( http://arxiv.org/abs/2012.09936v1 ) ライセンス: Link先を確認	Stavroula Skylaki, Ali Oskooei, Omar Bari, Nadja Herger, Zac Kriegman (Thomson Reuters Labs)	(参考訳) 名前付きエンティティ認識(NER)は、名前付きエンティティを非構造化テキストで識別し分類するタスクである。法領域において,利害関係者は,当事者,裁判官,裁判所の名称,事件番号,法律への言及を含むことができる。我々は, 訴訟のPDFファイルからノイズテキストを抽出し, 法的NERの問題点を米国裁判所から調査した。 NERシステムの「ゴールドスタンダード」トレーニングデータは、テキストの各トークンに対応するエンティティまたは非エンティティラベルのアノテーションを提供する。文章中のエンティティの正確な位置が不明で、エンティティがタイプミスやocrミスを含む可能性があるという点で、gold標準nerデータとは異なる部分的な完全なトレーニングデータのみを扱う。ノイズの多いトレーニングデータの課題を克服するためですテキスト抽出エラーおよび/またはタイプミスおよび未知ラベルインデックスは、nerタスクをテキストからテキストへのシーケンス生成タスクとして定式化し、ポインタ生成ネットワークを訓練して文書内のエンティティを生成する。金標準データがない場合、ポインタジェネレータはNERに有効であり、長い法律文書において一般的なNERニューラルネットワークアーキテクチャよりも優れていることを示す。 Named Entity Recognition (NER) is the task of identifying and classifying named entities in unstructured text. In the legal domain, named entities of interest may include the case parties, judges, names of courts, case numbers, references to laws etc. We study the problem of legal NER with noisy text extracted from PDF files of filed court cases from US courts. The "gold standard" training data for NER systems provide annotation for each token of the text with the corresponding entity or non-entity label. We work with only partially complete training data, which differ from the gold standard NER data in that the exact location of the entities in the text is unknown and the entities may contain typos and/or OCR mistakes. To overcome the challenges of our noisy training data, e.g. text extraction errors and/or typos and unknown label indices, we formulate the NER task as a text-to-text sequence generation task and train a pointer generator network to generate the entities in the document rather than label them. We show that the pointer generator can be effective for NER in the absence of gold standard data and outperforms the common NER neural network architectures in long legal documents.	翻訳日:2021-05-02 07:35:32 公開日:2020-12-17
# 機械学習による量子状態再構成の実験的実現可能性について On the experimental feasibility of quantum state reconstruction via machine learning ( http://arxiv.org/abs/2012.09432v1 ) ライセンス: Link先を確認	Sanjaya Lohani, Thomas A. Searles, Brian T. Kirby, and Ryan T. Glasser	(参考訳) 最大4量子ビットのシステムに対して、推論とトレーニングの両方の観点から機械学習に基づく量子状態再構成手法のリソーススケーリングを決定する。さらに,高次元システムのトモグラフィーで発生する可能性のある低カウント状態におけるシステム性能について検討した。最後に、IBM Q量子コンピュータに量子状態再構成法を実装し、その結果を確認した。 We determine the resource scaling of machine learning-based quantum state reconstruction methods, in terms of both inference and training, for systems of up to four qubits. Further, we examine system performance in the low-count regime, likely to be encountered in the tomography of high-dimensional systems. Finally, we implement our quantum state reconstruction method on a IBM Q quantum computer and confirm our results.	翻訳日:2021-05-02 07:35:13 公開日:2020-12-17
# 幾何と密度のバランス:高次元データを用いた経路距離 Balancing Geometry and Density: Path Distances on High-Dimensional Data ( http://arxiv.org/abs/2012.09385v1 ) ライセンス: Link先を確認	Anna Little, Daniel McKenzie and James Murphy	(参考訳) pwspds(power-weighted shortest-path distances)の新しい幾何学的および計算的解析を行った。これらの指標が基礎となるデータにおける密度と幾何のバランスをとる方法を明らかにすることで、それらの重要なパラメータを明確にし、実際にどのように選択されるかについて議論する。カーネルベースの教師なしおよび半教師付き機械学習における密度の広範な役割を示す、関連するデータ駆動メトリクスと比較する。計算学的には、完全重み付きグラフ上のPWSPDと、重み付き隣接グラフ上の類似点を関連付け、ほぼ最適である同値性に対する高い確率保証を提供する。パーコレーション理論との結びつきは、有限標本設定におけるPWSPDのバイアスと分散を推定するために展開される。理論的結果は、幅広いデータ設定に対するPWSPDの汎用性を実証する実証実験によって裏付けられている。論文全体では、基礎となるデータは低次元多様体からサンプリングされ、その周囲の次元ではなく、この多様体の固有次元に決定的に依存することが求められている。 New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance density and geometry in the underlying data, we clarify their key parameters and discuss how they may be chosen in practice. Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semi-supervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results require only that the underlying data is sampled from a low-dimensional manifold, and depend crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.	翻訳日:2021-05-02 07:35:07 公開日:2020-12-17
# 深層学習におけるアンサンブル,知識蒸留,自己蒸留の理解に向けて Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning ( http://arxiv.org/abs/2012.09816v1 ) ライセンス: Link先を確認	Zeyuan Allen-Zhu and Yuanzhi Li	(参考訳) 深層学習モデルのアンサンブルがテスト精度を向上させる方法と、知識蒸留を用いた単一モデルにアンサンブルの優れた性能を蒸留する方法を正式に研究する。我々は,このアンサンブルが,一意に訓練された数個のニューラルネットワークのパットアーキテクチャによる出力の平均であり,パットデータセット上で,パットアルゴリズムを用いてトレーニングされている場合,初期化に使用するランダムなシードによってのみ異なる場合を考える。深層学習におけるアンサンブル・ナレッジ蒸留は従来の学習理論とは全く異なる働きをしており、特にランダム特徴マッピングやニューラルネットワーク-タンジェント-カーネル特徴マッピングとは異なっている。そこで, 深層学習におけるアンサンブルと知識蒸留を適切に理解するために, データが「マルチビュー」と呼ばれる構造を持つ場合, 独立に訓練されたニューラルネットワークのアンサンブルがテスト精度を向上し, 真のラベルの代わりにアンサンブルの出力に適合するように単一のモデルを訓練することにより, 優れたテスト精度を1つのモデルに証明可能とする理論を開発した。その結果、従来の定理とは全く異なる方法で、アンサンブルがディープラーニングでどのように機能するか、そして、真のデータラベルと比較して、知識蒸留に使用できるアンサンブルのアウトプットに「ダーク知識」がどのように隠されているかに光を当てている。最後に, 自己蒸留は, アンサンブルと知識蒸留を暗黙的に組み合わせて, 試験精度を向上させることができることを示した。 We formally study how Ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using Knowledge Distillation. We consider the challenging case where the ensemble is simply an average of the outputs of a few independently trained neural networks with the SAME architecture, trained using the SAME algorithm on the SAME data set, and they only differ by the random seeds used in the initialization. We empirically show that ensemble/knowledge distillation in deep learning works very differently from traditional learning theory, especially differently from ensemble of random feature mappings or the neural-tangent-kernel feature mappings, and is potentially out of the scope of existing theorems. Thus, to properly understand ensemble and knowledge distillation in deep learning, we develop a theory showing that when data has a structure we refer to as "multi-view", then ensemble of independently trained neural networks can provably improve test accuracy, and such superior test accuracy can also be provably distilled into a single model by training a single model to match the output of the ensemble instead of the true label. Our result sheds light on how ensemble works in deep learning in a way that is completely different from traditional theorems, and how the "dark knowledge" is hidden in the outputs of the ensemble -- that can be used in knowledge distillation -- comparing to the true data labels. In the end, we prove that self-distillation can also be viewed as implicitly combining ensemble and knowledge distillation to improve test accuracy.	翻訳日:2021-05-02 07:34:34 公開日:2020-12-17
# 畳み込みニューラルネットワークを用いたコントラスト合成視床核セグメンテーション法 A Contrast Synthesized Thalamic Nuclei Segmentation Scheme using Convolutional Neural Networks ( http://arxiv.org/abs/2012.09386v1 ) ライセンス: Link先を確認	Lavanya Umapathy, Mahesh Bharath Keerthivasan, Natalie M. Zahr, Ali Bilgin, Manojkumar Saranathan	(参考訳) 視床核はいくつかの神経疾患に関係している。 WMn-MPRAGE画像は従来のMPRAGE画像と比較して視床内核コントラストが良いことが示されているが、追加の取得は検査時間の増加をもたらす。本研究では,3次元畳み込みニューラルネットワーク(cnn)を用いた従来型mprage画像からの視床核パーセレーション手法について検討した。 MPRAGE画像から合成したWMn-MPRAGE画像を用いて, 合成コントラストセグメンテーション(NCS)と合成コントラストセグメンテーション(SCS)の2つの3次元CNNを開発した。 mprage image (n=35) とthalamic nuclei labels を用いた2つのセグメンテーションフレームワークをマルチアトラス法を用いて訓練した。健常者とアルコール使用障害(aud)患者(n=45)のコホートを用いて分節精度と臨床的有用性を評価した。 SCSネットワークは、NCSネットワークと比較すると、前腹側核(P=.001)と後腹側核(P=.01)の体積差が低い中間体生成核(P=.003)とセントロメディア核(P=.01)で高Diceスコアを得た。 Bland-Altman 解析により,SCS ネットワークで予測される実数量と実数量の変動係数の低い一致限界が明らかにされた。 scsネットワークは健常年齢対照群 (p=0.01) と比較し, aud患者で有意な後側核萎縮を認めたが, ncsネットワークでは後側核の急激な萎縮を認めた。 CNNによるコントラスト合成は、従来のMPRAGE画像から高速で正確な視床核セグメンテーションを提供することができる。 Thalamic nuclei have been implicated in several neurological diseases. WMn-MPRAGE images have been shown to provide better intra-thalamic nuclear contrast compared to conventional MPRAGE images but the additional acquisition results in increased examination times. In this work, we investigated 3D Convolutional Neural Network (CNN) based techniques for thalamic nuclei parcellation from conventional MPRAGE images. Two 3D CNNs were developed and compared for thalamic nuclei parcellation using MPRAGE images: a) a native contrast segmentation (NCS) and b) a synthesized contrast segmentation (SCS) using WMn-MPRAGE images synthesized from MPRAGE images. We trained the two segmentation frameworks using MPRAGE images (n=35) and thalamic nuclei labels generated on WMn-MPRAGE images using a multi-atlas based parcellation technique. The segmentation accuracy and clinical utility were evaluated on a cohort comprising of healthy subjects and patients with alcohol use disorder (AUD) (n=45). The SCS network yielded higher Dice scores in the Medial geniculate nucleus (P=.003) and Centromedian nucleus (P=.01) with lower volume differences for Ventral anterior (P=.001) and Ventral posterior lateral (P=.01) nuclei when compared to the NCS network. A Bland-Altman analysis revealed tighter limits of agreement with lower coefficient of variation between true volumes and those predicted by the SCS network. The SCS network demonstrated a significant atrophy in Ventral lateral posterior nucleus in AUD patients compared to healthy age-matched controls (P=0.01), agreeing with previous studies on thalamic atrophy in alcoholism, whereas the NCS network showed spurious atrophy of the Ventral posterior lateral nucleus. CNN-based contrast synthesis prior to segmentation can provide fast and accurate thalamic nuclei segmentation from conventional MPRAGE images.	翻訳日:2021-05-02 07:33:41 公開日:2020-12-17
# 縦型空中画像を用いた栄養不足ストレスの検出と予測 Detection and Prediction of Nutrient Deficiency Stress using Longitudinal Aerial Imagery ( http://arxiv.org/abs/2012.09654v1 ) ライセンス: Link先を確認	Saba Dadsetan, Gisele Rose, Naira Hovakimyan, Jennifer Hobbs	(参考訳) 早期に、栄養不足ストレス(NDS)の正確な検出は、環境への影響だけでなく、経済的にも重要であり、毛布の塗布に代えて化学物質の精密適用は、栽培者の運用コストを削減し、環境に不必要に侵入する化学物質の量を削減している。さらに、早期の処理は損失の量を減らすため、特定の季節に作物の生産を増加させる。このことを念頭に,高分解能空中画像のシーケンスを収集し,セマンティクスセグメンテーションモデルを構築し,フィールド全体のndsの検出と予測を行う。私たちの仕事は農業、リモートセンシング、現代のコンピュータビジョンとディープラーニングの交差点にあります。まず,NDSのフルフィールド検出のためのベースラインを構築し,事前学習,バックボーンアーキテクチャ,入力表現,サンプリング戦略の影響を定量化する。次に、unetに基づくシングルタイムスタンプモデルを構築して、シーズンの異なるポイントで利用可能な情報量を定量化する。次に,NDSを示すフィールドの領域を正確に検出するために,UNetと畳み込みLSTM層を組み合わせた時空間アーキテクチャを構築した。最後に, このアーキテクチャは, 後続飛行(将来3週間以上)でNDSを示すと予測されるフィールドの領域を予測するために, 予報までの距離に応じて, IOUスコア0.47-0.51を維持することができることを示す。私たちはまた、コンピュータビジョン、リモートセンシング、農業分野にメリットがあると信じているデータセットもリリースします。この研究は、リモートセンシングと農業の深層学習の発展に寄与し、経済と持続可能性に関する重要な社会的課題に対処している。 Early, precise detection of nutrient deficiency stress (NDS) has key economic as well as environmental impact; precision application of chemicals in place of blanket application reduces operational costs for the growers while reducing the amount of chemicals which may enter the environment unnecessarily. Furthermore, earlier treatment reduces the amount of loss and therefore boosts crop production during a given season. With this in mind, we collect sequences of high-resolution aerial imagery and construct semantic segmentation models to detect and predict NDS across the field. Our work sits at the intersection of agriculture, remote sensing, and modern computer vision and deep learning. First, we establish a baseline for full-field detection of NDS and quantify the impact of pretraining, backbone architecture, input representation, and sampling strategy. We then quantify the amount of information available at different points in the season by building a single-timestamp model based on a UNet. Next, we construct our proposed spatiotemporal architecture, which combines a UNet with a convolutional LSTM layer, to accurately detect regions of the field showing NDS; this approach has an impressive IOU score of 0.53. Finally, we show that this architecture can be trained to predict regions of the field which are expected to show NDS in a later flight -- potentially more than three weeks in the future -- maintaining an IOU score of 0.47-0.51 depending on how far in advance the prediction is made. We will also release a dataset which we believe will benefit the computer vision, remote sensing, as well as agriculture fields. This work contributes to the recent developments in deep learning for remote sensing and agriculture, while addressing a key social challenge with implications for economics and sustainability.	翻訳日:2021-05-02 07:33:04 公開日:2020-12-17
# 人工知能を用いた緑内障視神経頭の構造表現型記述 Describing the Structural Phenotype of the Glaucomatous Optic Nerve Head Using Artificial Intelligence ( http://arxiv.org/abs/2012.09755v1 ) ライセンス: Link先を確認	Satish K. Panda, Haris Cheong, Tin A. Tun, Sripad K. Devella, Ramaswami Krishnadas, Martin L. Buist, Shamira Perera, Ching-Yu Cheng, Tin Aung, Alexandre H. Thi\'ery, and Micha\"el J. A. Girard	(参考訳) 視神経頭(ONH)は通常、緑内障の発生と進行に伴う神経・結合組織構造の変化を経験し、これらの変化を監視することは緑内障クリニックの診断と予後の改善に重要である。 onhの構造変化を臨床的に評価するための金標準技術は光コヒーレンストモグラフィ(oct)である。しかし、octは、網膜神経線維層(rnfl)の厚みなどのいくつかの手工学パラメータの測定に限定されており、まだ緑内障の診断と予後診断のための単独の装置として認定されていない。これは、ONHの3D OCTスキャンで利用できる膨大な情報が十分に活用されていないためである。そこで本研究では, onh の oct スキャンからの情報を十分に活用できる深層学習手法を提案し, 緑内障診断ツールとして \textbf{(3)} を使用できることを提案する。具体的には,本アルゴリズムで同定された構造的特徴は緑内障の臨床観察と関係があることが判明した。これらの構造的特徴の診断精度は92.0 \pm 2.3 \%$であり、感度は90.0 \pm 2.4 \%$(95 \%$)である。ステップで等級を変えることで、オンの形状が'非グラコマ'から'グラコマ'状態へ遷移するにつれてどのように変化するかを明らかにすることができた。我々の研究は緑内障の病態の理解に強い臨床的意味を持ち、将来は視力喪失を予測できるように改善できると考えている。 The optic nerve head (ONH) typically experiences complex neural- and connective-tissue structural changes with the development and progression of glaucoma, and monitoring these changes could be critical for improved diagnosis and prognosis in the glaucoma clinic. The gold-standard technique to assess structural changes of the ONH clinically is optical coherence tomography (OCT). However, OCT is limited to the measurement of a few hand-engineered parameters, such as the thickness of the retinal nerve fiber layer (RNFL), and has not yet been qualified as a stand-alone device for glaucoma diagnosis and prognosis applications. We argue this is because the vast amount of information available in a 3D OCT scan of the ONH has not been fully exploited. In this study we propose a deep learning approach that can: \textbf{(1)} fully exploit information from an OCT scan of the ONH; \textbf{(2)} describe the structural phenotype of the glaucomatous ONH; and that can \textbf{(3)} be used as a robust glaucoma diagnosis tool. Specifically, the structural features identified by our algorithm were found to be related to clinical observations of glaucoma. The diagnostic accuracy from these structural features was $92.0 \pm 2.3 \%$ with a sensitivity of $90.0 \pm 2.4 \% $ (at $95 \%$ specificity). By changing their magnitudes in steps, we were able to reveal how the morphology of the ONH changes as one transitions from a `non-glaucoma' to a `glaucoma' condition. We believe our work may have strong clinical implication for our understanding of glaucoma pathogenesis, and could be improved in the future to also predict future loss of vision.	翻訳日:2021-05-02 07:32:36 公開日:2020-12-17
# 4次元ビュー合成とビデオ処理のためのニューラルラジアンスフロー Neural Radiance Flow for 4D View Synthesis and Video Processing ( http://arxiv.org/abs/2012.09790v1 ) ライセンス: Link先を確認	Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu	(参考訳) 本稿では,rgb画像から動的シーンの4次元空間-時間表現を学ぶためのニューラル・ラミアンス・フロー(nerflow)を提案する。我々のアプローチの鍵は、シーンの3D占有率、放射率、ダイナミックスを捉えることを学習する神経暗黙表現を使用することである。異なるモダリティにまたがる一貫性を強制することにより,水注,ロボットインタラクション,実画像など多様な動的シーンにおける多視点レンダリングが可能となり,空間-時空間ビュー合成における最先端手法を上回っている。私たちのアプローチは、入力画像が1つのカメラでキャプチャされる場合でも機能します。さらに,学習表現が先行して暗黙的なシーンとして機能できることを実証し,画像の超解像やノイズ除去といった映像処理タスクを,追加の監督なしに行えることを示した。 We present a method, Neural Radiance Flow (NeRFlow),to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images. Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene. By enforcing consistency across different modalities, our representation enables multi-view rendering in diverse dynamic scenes, including water pouring, robotic interaction, and real images, outperforming state-of-the-art methods for spatial-temporal view synthesis. Our approach works even when inputs images are captured with only one camera. We further demonstrate that the learned representation can serve as an implicit scene prior, enabling video processing tasks such as image super-resolution and de-noising without any additional supervision.	翻訳日:2021-05-02 07:32:10 公開日:2020-12-17
# 動的サイクル整合性を考慮した制御のためのクロスドメイン対応学習 Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency ( http://arxiv.org/abs/2012.09811v1 ) ライセンス: Link先を確認	Qiang Zhang, Tete Xiao, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang	(参考訳) 多くのロボティクス問題の核心は、ドメイン間の通信を学習することである。例えば、模倣学習は人間とロボットの対応を得る必要があり、sim-to-realは物理シミュレータと現実世界の対応を必要とする。本稿では,表現(視覚と内部状態),物理パラメータ(質量と摩擦),形態(手足の数)の異なる領域間の対応について学ぶことを目的とした。重要なことに、2つのドメインから無作為かつランダムに収集されたデータを用いて対応を学習する。本稿では,サイクル整合性制約を用いて2つの領域にまたがる動的ロボット動作を協調する「textit{dynamics cycles」を提案する。この対応が見つかると、第2のドメインで追加の微調整を必要とせずに、あるドメインでトレーニングされたポリシーを直接他のドメインに転送できます。我々は,シミュレーションと実ロボットの両方において,様々な問題領域で実験を行う。本フレームワークは,実ロボットアームの無補間単眼映像とシミュレーションアームの動的状態動作軌跡をペアデータなしで一致させることができる。結果のビデオデモは、https://sjtuzq.github.io/cycle_dynamics.htmlで見ることができる。 At the heart of many robotics problems is the challenge of learning correspondences across domains. For instance, imitation learning requires obtaining correspondence between humans and robots; sim-to-real requires correspondence between physics simulators and the real world; transfer learning requires correspondences between different robotics environments. This paper aims to learn correspondence across domains differing in representation (vision vs. internal state), physics parameters (mass and friction), and morphology (number of limbs). Importantly, correspondences are learned using unpaired and randomly collected data from the two domains. We propose \textit{dynamics cycles} that align dynamic robot behavior across two domains using a cycle-consistency constraint. Once this correspondence is found, we can directly transfer the policy trained on one domain to the other, without needing any additional fine-tuning on the second domain. We perform experiments across a variety of problem domains, both in simulation and on real robot. Our framework is able to align uncalibrated monocular video of a real robot arm to dynamic state-action trajectories of a simulated arm without paired data. Video demonstrations of our results are available at: https://sjtuzq.github.io/cycle_dynamics.html .	翻訳日:2021-05-02 07:31:55 公開日:2020-12-17
# マスアート雑音による半空間学習の難易度 Hardness of Learning Halfspaces with Massart Noise ( http://arxiv.org/abs/2012.09720v1 ) ライセンス: Link先を確認	Ilias Diakonikolas and Daniel M. Kane	(参考訳) マスアートノイズの存在下でのPAC学習ハーフスペースの複雑さについて検討した。具体的には、ラベル付き例 $(x, y)$ が分布 $D$ on $\mathbb{R}^{n} \times \{ \pm 1\}$ から与えられたとき、$x$ の辺分布は任意であり、そのラベルはマッサルトノイズが速度 $\eta<1/2$ で崩壊した未知の半空間によって生成されるので、小さな誤分類誤差で仮説を計算したい。マッサートモデルにおける半空間の効率的な学習可能性の特徴付けは、学習理論における長年の未解決問題である。最近の研究は、この問題の多項式時間学習アルゴリズムをエラー$\eta+\epsilon$で与えた。この誤差上限は、情報理論的に最適な$\mathrm{OPT}+\epsilon$の境界から遠く離れることができる。より最近の研究は、"em exact learning}、すなわちエラー $\mathrm{opt}+\epsilon$ を達成することは統計クエリ(sq)モデルでは難しいことを示した。本研究では,情報理論の最適誤差と多項式時間SQアルゴリズムで達成できる最良の誤差との間には指数的ギャップが存在することを示す。特に、我々の下界は、効率的なSQアルゴリズムが任意の多項式係数内で最適誤差を近似できないことを意味する。 We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise. Specifically, given labeled examples $(x, y)$ from a distribution $D$ on $\mathbb{R}^{n} \times \{ \pm 1\}$ such that the marginal distribution on $x$ is arbitrary and the labels are generated by an unknown halfspace corrupted with Massart noise at rate $\eta<1/2$, we want to compute a hypothesis with small misclassification error. Characterizing the efficient learnability of halfspaces in the Massart model has remained a longstanding open problem in learning theory. Recent work gave a polynomial-time learning algorithm for this problem with error $\eta+\epsilon$. This error upper bound can be far from the information-theoretically optimal bound of $\mathrm{OPT}+\epsilon$. More recent work showed that {\em exact learning}, i.e., achieving error $\mathrm{OPT}+\epsilon$, is hard in the Statistical Query (SQ) model. In this work, we show that there is an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a polynomial-time SQ algorithm. In particular, our lower bound implies that no efficient SQ algorithm can approximate the optimal error within any polynomial factor.	翻訳日:2021-05-02 07:31:37 公開日:2020-12-17
# insrl: 遠隔教師付き関係抽出のための複数の情報ソースを用いた多視点学習フレームワーク InSRL: A Multi-view Learning Framework Fusing Multiple Information Sources for Distantly-supervised Relation Extraction ( http://arxiv.org/abs/2012.09370v1 ) ライセンス: Link先を確認	Zhendong Chu, Haiyun Jiang, Yanghua Xiao, Wei Wang	(参考訳) 遠隔監視により、知識ベースを利用して関係抽出のための文の袋を自動的にラベル付けすることができるが、狭くうるさいバッグの問題に苦しむ。トレーニングデータを補完し、これらの問題を克服するために、追加の情報ソースが緊急に必要となる。本稿では,知識ベースに広く存在する2つの情報源,すなわちエンティティ記述と多粒体型を導入し,教師付きデータの充実を図る。我々は、情報ソースを複数のビューと見なし、十分な情報を持つ無傷空間を構築するためにそれらを融合させる。 Intact Space Representation Learning (InSRL) による関係抽出のために, エンドツーエンドのマルチビュー学習フレームワークを提案し, 単一ビューの表現を同時に学習する。さらに、インナービューとクロスビューアテンションメカニズムを用いて、異なるレベルの重要な情報をエンティティペアベースで強調する。一般的なベンチマークデータセットの実験結果から,追加の情報ソースの必要性とフレームワークの有効性が示された。匿名化レビューフェーズの後、複数の情報ソースを持つモデルとデータセットの実装をリリースします。 Distant supervision makes it possible to automatically label bags of sentences for relation extraction by leveraging knowledge bases, but suffers from the sparse and noisy bag issues. Additional information sources are urgently needed to supplement the training data and overcome these issues. In this paper, we introduce two widely-existing sources in knowledge bases, namely entity descriptions, and multi-grained entity types to enrich the distantly supervised data. We see information sources as multiple views and fusing them to construct an intact space with sufficient information. An end-to-end multi-view learning framework is proposed for relation extraction via Intact Space Representation Learning (InSRL), and the representations of single views are jointly learned simultaneously. Moreover, inner-view and cross-view attention mechanisms are used to highlight important information on different levels on an entity-pair basis. The experimental results on a popular benchmark dataset demonstrate the necessity of additional information sources and the effectiveness of our framework. We will release the implementation of our model and dataset with multiple information sources after the anonymized review phase.	翻訳日:2021-05-02 07:31:11 公開日:2020-12-17
# 強化学習による対話における対話的質問の明確化 Interactive Question Clarification in Dialogue via Reinforcement Learning ( http://arxiv.org/abs/2012.09411v1 ) ライセンス: Link先を確認	Xiang Hu, Zujie Wen, Yafang Wang, Xiaolong Li, Gerard de Melo	(参考訳) あいまいな質問への対処は、現実世界の対話システムにおける長年の問題である。質問による明確化はヒューマンインタラクションの一般的な形態であるが,ユーザからより具体的な意図を引き出すための適切な質問を定義することは困難である。本研究では,元のクエリの改良を提案することにより,あいまいな質問を明確化するための強化モデルを提案する。まず、コレクション分割問題を定式化し、潜在的な曖昧な意図を区別できるラベルのセットを選択する。我々は、選択したラベルをインテントフレーズとしてユーザにリストし、さらなる確認を行う。選択されたラベルと元のユーザクエリは、適切な応答をより容易に識別できる洗練されたクエリとして機能する。このモデルは、深層ポリシーネットワークを用いた強化学習を用いてトレーニングされる。我々は,実世界のユーザクリックに基づいてモデルを評価し,いくつかの実験で有意な改善を示す。 Coping with ambiguous questions has been a perennial problem in real-world dialogue systems. Although clarification by asking questions is a common form of human interaction, it is hard to define appropriate questions to elicit more specific intents from a user. In this work, we propose a reinforcement model to clarify ambiguous questions by suggesting refinements of the original query. We first formulate a collection partitioning problem to select a set of labels enabling us to distinguish potential unambiguous intents. We list the chosen labels as intent phrases to the user for further confirmation. The selected label along with the original user query then serves as a refined query, for which a suitable response can more easily be identified. The model is trained using reinforcement learning with a deep policy network. We evaluate our model based on real-world user clicks and demonstrate significant improvements across several different experiments.	翻訳日:2021-05-02 07:30:54 公開日:2020-12-17
# ルーフGAN:住宅用ルーフ形状と関係性の学習 Roof-GAN: Learning to Generate Roof Geometry and Relations for Residential Houses ( http://arxiv.org/abs/2012.09340v1 ) ライセンス: Link先を確認	Yiming Qian, Hao Zhang, Yasutaka Furukawa	(参考訳) 本稿では, 住宅用屋根構造の構造的幾何を屋根プリミティブの集合として生成する, 新規な対向ネットワークであるRoof-GANについて述べる。プリミティブの数を仮定すると、ジェネレータは、1)各ノードのラスター画像としてのプリミティブ幾何からなり、ファセットセグメンテーションと角度をエンコードするグラフ、2)各エッジにおけるプリミティブコリニア/コプランナ関係、3)新しい微分可能ベクトル化器によって生成された各ノードのベクトル形式におけるプリミティブ幾何からなる構造化屋根モデルを生成する。判別器は、完全なエンドツーエンドアーキテクチャで原始ラスタ幾何学、原始関係、原始ベクトル幾何学を評価するために訓練される。定量的・質的評価は, 構造幾何生成の課題として提案する新しい指標を用いて, 競合する手法よりも多様で現実的な屋根モデルを生成する手法の有効性を示す。私たちはコードとデータを共有します。 This paper presents Roof-GAN, a novel generative adversarial network that generates structured geometry of residential roof structures as a set of roof primitives and their relationships. Given the number of primitives, the generator produces a structured roof model as a graph, which consists of 1) primitive geometry as raster images at each node, encoding facet segmentation and angles; 2) inter-primitive colinear/coplanar relationships at each edge; and 3) primitive geometry in a vector format at each node, generated by a novel differentiable vectorizer while enforcing the relationships. The discriminator is trained to assess the primitive raster geometry, the primitive relationships, and the primitive vector geometry in a fully end-to-end architecture. Qualitative and quantitative evaluations demonstrate the effectiveness of our approach in generating diverse and realistic roof models over the competing methods with a novel metric proposed in this paper for the task of structured geometry generation. We will share our code and data.	翻訳日:2021-05-02 07:29:25 公開日:2020-12-17
# 1枚の画像から3次元シーン形状を復元する学習 Learning to Recover 3D Scene Shape from a Single Image ( http://arxiv.org/abs/2012.09365v1 ) ライセンス: Link先を確認	Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen	(参考訳) 野生個体における単眼深度推定の有意な進歩にもかかわらず,混合データ深度予測訓練におけるシフト不変再構成損失と未知のカメラ焦点長による未知の深度シフトによる正確な3次元シーン形状の復元には,最近の最新手法では使用できない。この問題を詳細に検討し,まずは未知のスケールで深度を予測し,単一の単眼画像からシフトする2段階のフレームワークを提案し,次に3Dポイント・クラウドエンコーダを用いて,現実的な3Dシーン形状を復元する。さらに,画像レベルの正規化回帰損失と正規化幾何損失を提案し,混合データセット上で訓練された深度予測モデルを強化する。 9つの未知のデータセットで深度モデルを検証し、ゼロショットデータセットの一般化で最先端のパフォーマンスを達成する。コードは、https://git.io/depthで入手できる。 Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth	翻訳日:2021-05-02 07:29:05 公開日:2020-12-17
# 半グローバル形状認識ネットワーク Semi-Global Shape-aware Network ( http://arxiv.org/abs/2012.09372v1 ) ライセンス: Link先を確認	Pengju Zhang, Yihong Wu, Jiagang Zhu	(参考訳) ローカルでない操作は、最近各位置へのグローバルコンテキストの集約を通じて、長距離依存関係をキャプチャするために使用される。しかし、ほとんどの手法は、特徴の類似性のみに焦点をあてるだけでオブジェクトの形状を保存できないが、長距離依存を捉えるために中央と他の位置との近接を無視する一方で、形状認識は多くのコンピュータビジョンタスクに有用である。本稿では,長距離依存をモデル化する際のオブジェクト形状の類似性と近接性を考慮したセミ・グローバル形状認識ネットワーク(SGSNet)を提案する。階層的な方法でグローバルなコンテキストを集約する。第1段階では、特徴地図全体における各位置は、類似度と近接度の両方に応じて、縦方向と横方向の文脈情報のみを集約する。そして、結果は第2のレベルに入力され、同じ操作を行います。この階層的な方法では、各中央位置ゲインは、他の全ての位置から支持され、類似性と近接の組み合わせにより、各位置ゲインは、ほとんど同じ意味オブジェクトから支持される。また,特徴マップ内の各行や列を二分木として扱い,類似性計算コストを低減させる,文脈情報集約のための線形時間アルゴリズムを提案する。セマンティックセグメンテーションと画像検索の実験により、既存のネットワークにSGSNetを追加することにより、精度と効率の両面で確固たる改善が得られた。 Non-local operations are usually used to capture long-range dependencies via aggregating global context to each position recently. However, most of the methods cannot preserve object shapes since they only focus on feature similarity but ignore proximity between central and other positions for capturing long-range dependencies, while shape-awareness is beneficial to many computer vision tasks. In this paper, we propose a Semi-Global Shape-aware Network (SGSNet) considering both feature similarity and proximity for preserving object shapes when modeling long-range dependencies. A hierarchical way is taken to aggregate global context. In the first level, each position in the whole feature map only aggregates contextual information in vertical and horizontal directions according to both similarity and proximity. And then the result is input into the second level to do the same operations. By this hierarchical way, each central position gains supports from all other positions, and the combination of similarity and proximity makes each position gain supports mostly from the same semantic object. Moreover, we also propose a linear time algorithm for the aggregation of contextual information, where each of rows and columns in the feature map is treated as a binary tree to reduce similarity computation cost. Experiments on semantic segmentation and image retrieval show that adding SGSNet to existing networks gains solid improvements on both accuracy and efficiency.	翻訳日:2021-05-02 07:28:47 公開日:2020-12-17
# 非ラベルデータ誘導半教師付き病理組織像分割 Unlabeled Data Guided Semi-supervised Histopathology Image Segmentation ( http://arxiv.org/abs/2012.09373v1 ) ライセンス: Link先を確認	Hongxiao Wang, Hao Zheng, Jianxu Chen, Lin Yang, Yizhe Zhang, Danny Z. Chen	(参考訳) 病理組織像の自動分割は疾患解析に不可欠である。制限付きラベル付きデータは、完全に教師された設定の下で訓練されたモデルの一般化を妨げます。生成法に基づく半教師付き学習(SSL)は多様な画像特性の活用に有効であることが証明されている。しかし、モデルトレーニングやそのような画像の使い方において、どのような生成画像がより有用かは明らかにされていない。本稿では,未ラベルデータ分布を利用した病理組織像分割のための新しいデータガイド生成法を提案する。まず、画像生成モジュールを設計する。画像コンテンツとスタイルは分離され、クラスタリングフレンドリーなスペースに埋め込まれて配布される。新しい画像は、コンテンツやスタイルのサンプリングと相互結合によって合成される。第2に,生成した画像を定量的にサンプリングするための効果的なデータ選択ポリシーを考案する。(1) 生成されたトレーニングセットをデータセットをよりよくカバーするために,(2) トレーニングプロセスをより効果的にするために,アノテーション付きトレーニングデータセットが不足するデータ中の「ハードケース」の画像を特定し,オーバーサンプリングする。本手法は腺および核データセット上で評価される。提案手法は,インダクティブ設定とトランスダクティブ設定の両方において,共通セグメンテーションモデルの性能を一貫して向上させ,最先端の結果を得る。 Automatic histopathology image segmentation is crucial to disease analysis. Limited available labeled data hinders the generalizability of trained models under the fully supervised setting. Semi-supervised learning (SSL) based on generative methods has been proven to be effective in utilizing diverse image characteristics. However, it has not been well explored what kinds of generated images would be more useful for model training and how to use such images. In this paper, we propose a new data guided generative method for histopathology image segmentation by leveraging the unlabeled data distributions. First, we design an image generation module. Image content and style are disentangled and embedded in a clustering-friendly space to utilize their distributions. New images are synthesized by sampling and cross-combining contents and styles. Second, we devise an effective data selection policy for judiciously sampling the generated images: (1) to make the generated training set better cover the dataset, the clusters that are underrepresented in the original training set are covered more; (2) to make the training process more effective, we identify and oversample the images of "hard cases" in the data for which annotated training data may be scarce. Our method is evaluated on glands and nuclei datasets. We show that under both the inductive and transductive settings, our SSL method consistently boosts the performance of common segmentation models and attains state-of-the-art results.	翻訳日:2021-05-02 07:28:25 公開日:2020-12-17
# 教師なし3次元姿勢推定のための不変教師と同変学生 Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose Estimation ( http://arxiv.org/abs/2012.09398v1 ) ライセンス: Link先を確認	Chenxin Xu, Siheng Chen, Maosen Li, Ya Zhang	(参考訳) 3dアノテーションやサイド情報のない3次元ポーズ推定のための教師・学生学習フレームワークに基づく新しい手法を提案する。教師ネットワークでは,この教師の学習課題を解決するために,ポーズディクショナリーモデルを用いて正規化を行い,物理的に妥当な3dポーズを推定する。教師ネットワークにおける分解のあいまいさに対処するため,教師ネットワークをトレーニングするための3次元回転不変性を促進するサイクル一貫性アーキテクチャを提案する。推定精度をさらに向上するため、学生ネットワークは3D座標を直接推定するフレキシビリティのための新しいグラフ畳み込みネットワークを採用している。 3次元回転同値性を促進するもう一つのサイクル一貫性アーキテクチャは、幾何学的一貫性を活用し、教師ネットワークからの知識蒸留と合わせてポーズ推定性能を向上させる。我々はHuman3.6MとMPI-INF-3DHPについて広範な実験を行った。本手法は,最先端の非教師付き手法と比較して3次元関節予測誤差を11.4%削減し,Human3.6Mの側情報を用いた弱い教師付き手法よりも優れている。コードはhttps://github.com/sjtuxcx/ITESで入手できる。 We propose a novel method based on teacher-student learning framework for 3D human pose estimation without any 3D annotation or side information. To solve this unsupervised-learning problem, the teacher network adopts pose-dictionary-based modeling for regularization to estimate a physically plausible 3D pose. To handle the decomposition ambiguity in the teacher network, we propose a cycle-consistent architecture promoting a 3D rotation-invariant property to train the teacher network. To further improve the estimation accuracy, the student network adopts a novel graph convolution network for flexibility to directly estimate the 3D coordinates. Another cycle-consistent architecture promoting 3D rotation-equivariant property is adopted to exploit geometry consistency, together with knowledge distillation from the teacher network to improve the pose estimation performance. We conduct extensive experiments on Human3.6M and MPI-INF-3DHP. Our method reduces the 3D joint prediction error by 11.4% compared to state-of-the-art unsupervised methods and also outperforms many weakly-supervised methods that use side information on Human3.6M. Code will be available at https://github.com/sjtuxcx/ITES.	翻訳日:2021-05-02 07:27:45 公開日:2020-12-17
# LIGHTEN:ビデオにおけるHOIのためのグラフと階層的テンポラルネットワークとのインタラクションの学習 LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos ( http://arxiv.org/abs/2012.09402v1 ) ライセンス: Link先を確認	Sai Praneeth Reddy Sunkesula, Rishabh Dabral, Ganesh Ramakrishnan	(参考訳) ビデオから人間とオブジェクト間の相互作用を分析することで、人間とビデオに存在するオブジェクトの関係を識別する。これは、物体の1つが人間でなければならない視覚関係検出の特殊なバージョンと考えることができる。従来の手法では,ビデオセグメントのシーケンスの推論として問題を定式化するが,階層的なアプローチであるLIGHTENを用いて視覚的特徴を学習し,ビデオ内の複数の粒度の時空間的手がかりを効果的に捉える。現在のアプローチとは異なり、LIGHTENは深度マップや3D人間のポーズのような地上の真実データの使用を避けるため、RGBD以外のデータセットも一般化される。さらに,手作りの空間的特徴ではなく,視覚的特徴のみを用いて同じことを実現する。本研究では,v-cocoデータセットにおける画像に基づくhoi検出に基づくcad-120のヒューマン・オブジェクト間インタラクション検出(88.9%,92.6%)と期待タスク,および競合結果を用いて,視覚特徴ベースアプローチの新しいベンチマークを設定する。 LIGHTENのコードはhttps://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-Temporal -Networks-for-HOIで公開されている。 Analyzing the interactions between humans and objects from a video includes identification of the relationships between humans and the objects present in the video. It can be thought of as a specialized version of Visual Relationship Detection, wherein one of the objects must be a human. While traditional methods formulate the problem as inference on a sequence of video segments, we present a hierarchical approach, LIGHTEN, to learn visual features to effectively capture spatio-temporal cues at multiple granularities in a video. Unlike current approaches, LIGHTEN avoids using ground truth data like depth maps or 3D human pose, thus increasing generalization across non-RGBD datasets as well. Furthermore, we achieve the same using only the visual features, instead of the commonly used hand-crafted spatial features. We achieve state-of-the-art results in human-object interaction detection (88.9% and 92.6%) and anticipation tasks of CAD-120 and competitive results on image based HOI detection in V-COCO dataset, setting a new benchmark for visual features based approaches. Code for LIGHTEN is available at https://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal -Networks-for-HOI	翻訳日:2021-05-02 07:27:04 公開日:2020-12-17
# 不確かさ認識混合による計算効率の良い知識蒸留 Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup ( http://arxiv.org/abs/2012.09413v1 ) ライセンス: Link先を確認	Guodong Xu, Ziwei Liu, Chen Change Loy	(参考訳) 学生ネットワークの学習を指導するために教師ネットワークから「暗黒知識」を抽出する知識蒸留が,モデル圧縮と伝達学習に不可欠な技術として登場した。学生ネットワークの正確さに焦点をあてた以前の研究とは違って,本研究では,知識蒸留の効率性について研究する。我々のゴールは、訓練中に計算コストの低い従来の知識蒸留に匹敵する性能を達成することである。我々は,Uncertainty-aware mIXup (UNIX) がクリーンで効果的なソリューションであることを示す。不確実性サンプリング戦略は、各トレーニングサンプルの情報性を評価するために使用される。適応混合は不確実なサンプルにコンパクトな知識に適用される。さらに、従来の知識蒸留の冗長性は、簡単なサンプルの過剰な学習にあることを示す。不確実性と混在性を組み合わせることで,提案手法は冗長性を低減し,教師ネットワークに対する各クエリをより活用する。 CIFAR100とImageNetのアプローチを検証する。特に,計算コストがわずか79%のCIFAR100では,従来の知識蒸留よりも優れており,ImageNetでは同等の結果が得られる。 Knowledge distillation, which involves extracting the "dark knowledge" from a teacher network to guide the learning of a student network, has emerged as an essential technique for model compression and transfer learning. Unlike previous works that focus on the accuracy of student network, here we study a little-explored but important question, i.e., knowledge distillation efficiency. Our goal is to achieve a performance comparable to conventional knowledge distillation with a lower computation cost during training. We show that the UNcertainty-aware mIXup (UNIX) can serve as a clean yet effective solution. The uncertainty sampling strategy is used to evaluate the informativeness of each training sample. Adaptive mixup is applied to uncertain samples to compact knowledge. We further show that the redundancy of conventional knowledge distillation lies in the excessive learning of easy samples. By combining uncertainty and mixup, our approach reduces the redundancy and makes better use of each query to the teacher network. We validate our approach on CIFAR100 and ImageNet. Notably, with only 79% computation cost, we outperform conventional knowledge distillation on CIFAR100 and achieve a comparable result on ImageNet.	翻訳日:2021-05-02 07:26:37 公開日:2020-12-17
# PanoNet3D:LiDARPointクラウド検出のための意味的および幾何学的理解の組み合わせ PanoNet3D: Combining Semantic and Geometric Understanding for LiDARPoint Cloud Detection ( http://arxiv.org/abs/2012.09418v1 ) ライセンス: Link先を確認	Xia Chen, Jianren Wang, David Held, Martial Hebert	(参考訳) カメラ画像やLiDAR点雲のような自律走行知覚における視覚データは、意味的特徴と幾何学的構造という2つの側面の混合として解釈できる。意味論は物体の外観と文脈からセンサーにもたらされ、幾何学的構造は点雲の実際の3d形状である。 LiDAR点雲上のほとんどの検出器は、実際の3次元空間における物体の幾何学的構造を分析することのみに焦点を当てている。先行研究とは異なり,多視点統合フレームワークを用いて意味的特徴と幾何学的構造の両方を学ぶことを提案する。提案手法は,2次元範囲画像のlidarスキャンの性質を活用し,よく検討された2次元畳み込みを意味的特徴抽出に適用する。意味的特徴と幾何学的特徴を融合することにより,この手法はすべてのカテゴリにおいて最先端のアプローチを大きなマージンで上回っている。意味的特徴と幾何学的特徴を組み合わせる手法は、実世界の3Dポイントクラウド検出の問題を考察するためのユニークな視点を提供する。 Visual data in autonomous driving perception, such as camera image and LiDAR point cloud, can be interpreted as a mixture of two aspects: semantic feature and geometric structure. Semantics come from the appearance and context of objects to the sensor, while geometric structure is the actual 3D shape of point clouds. Most detectors on LiDAR point clouds focus only on analyzing the geometric structure of objects in real 3D space. Unlike previous works, we propose to learn both semantic feature and geometric structure via a unified multi-view framework. Our method exploits the nature of LiDAR scans -- 2D range images, and applies well-studied 2D convolutions to extract semantic features. By fusing semantic and geometric features, our method outperforms state-of-the-art approaches in all categories by a large margin. The methodology of combining semantic and geometric features provides a unique perspective of looking at the problems in real-world 3D point cloud detection.	翻訳日:2021-05-02 07:26:19 公開日:2020-12-17
# 幾何学的変形と照度変化によるCTフィルムの復元:シミュレーションデータセットと深部モデル CT Film Recovery via Disentangling Geometric Deformation and Illumination Variation: Simulated Datasets and Deep Models ( http://arxiv.org/abs/2012.09491v1 ) ライセンス: Link先を確認	Quan Quan, Qiyuan Wang, Liu Li, Yuanqi Du, S. Kevin Zhou	(参考訳) コンピュータ断層撮影(CT)などの医用画像は病院PACSのDICOM形式で保存されているが, セルフストレージや二次コンサルテーションのために, フィルムを転写可能な媒体として印刷することは, 多くの国で日常的に行われている。また、携帯電話カメラのユビキタス性により、ctフィルムの写真を撮るのが一般的であり、残念ながら幾何学的変形や照明変化に苦しむ。本研究は,文献における最初の試みであるctフィルムの回収の問題点を,我々の知識を最大限に活用するために検討する。まず,広く使用されているコンピュータグラフィックスソフトウェアであるBlenderを用いて,約2万枚の画像からなる大規模頭部CTフィルムデータベースCTFilm20Kを構築した。また,幾何学的変形(3次元座標,深さ,正規分布,紫外線図など)と照明変化(アルベド写像など)に関する全ての情報を記録した。そこで本研究では,ctフィルムから抽出した多重地図を用いて,幾何変形と照明変動を解消する深い枠組みを提案する。シミュレーションおよび実画像に対する大規模な実験は、従来のアプローチよりもアプローチの優位性を実証している。我々はCTフィルム回収の研究を促進するためのシミュレーション画像と深部モデルをオープンソース化する(https://anonymous.4open.science/r/e6b1f6e3-9b36-423f-a225-55b7d0b55523/)。 While medical images such as computed tomography (CT) are stored in DICOM format in hospital PACS, it is still quite routine in many countries to print a film as a transferable medium for the purposes of self-storage and secondary consultation. Also, with the ubiquitousness of mobile phone cameras, it is quite common to take pictures of the CT films, which unfortunately suffer from geometric deformation and illumination variation. In this work, we study the problem of recovering a CT film, which marks the first attempt in the literature, to the best of our knowledge. We start with building a large-scale head CT film database CTFilm20K, consisting of approximately 20,000 pictures, using the widely used computer graphics software Blender. We also record all accompanying information related to the geometric deformation (such as 3D coordinate, depth, normal, and UV maps) and illumination variation (such as albedo map). Then we propose a deep framework to disentangle geometric deformation and illumination variation using the multiple maps extracted from the CT films to collaboratively guide the recovery process. Extensive experiments on simulated and real images demonstrate the superiority of our approach over the previous approaches. We plan to open source the simulated images and deep models for promoting the research on CT film recovery (https://anonymous.4open.science/r/e6b1f6e3-9b36-423f-a225-55b7d0b55523/).	翻訳日:2021-05-02 07:26:02 公開日:2020-12-17
# 学習可能な関節群を用いた手のポーズ推定 Exploiting Learnable Joint Groups for Hand Pose Estimation ( http://arxiv.org/abs/2012.09496v1 ) ライセンス: Link先を確認	Moran Li, Yuan Gao, Nong Sang	(参考訳) 本稿では, 関節の3次元座標をグループ的に復元し, 低関係の関節が自動的に異なるグループに分類され, 異なる特徴を示す3次元ハンドポーズを推定する。これは、全てのジョイントが階層的に考慮され、同じ特徴を共有する以前の方法とは異なる。提案手法の利点はマルチタスク学習(MTL)の原理,すなわち,低関係の関節を異なるグループ(異なるタスク)に分けて各グループごとに異なる特徴を学習することにより,負の移動を効果的に回避する。提案手法の鍵となるのは, 関連継手を自動的に同一群に選択する新しいバイナリセレクタである。学習可能なパラメータにgumbel softmaxを用いて構築した,具体的分布から確率的にサンプリングされたバイナリ値を持つセレクタを実装した。これにより、ネットワーク全体の差別化可能な特性を保存できます。さらに,これらの非関連グループからの機能を活用し,それらの間の機能融合方式を適用し,より識別的な特徴を学習する。これは、結合した特徴に対して複数の1x1畳み込みを実装することで実現され、各結合群は特徴融合のための1x1畳み込みを含む。いくつかのベンチマークデータセットにおける詳細なアブレーション解析と広範な実験は、最先端(sota)法に対する提案手法の有望な性能を示している。また,提案手法は,最新のfreihandコンペティションにおいて,密集した3d形状ラベルを使用しないすべての手法の中でトップ1を達成した。ソースコードとモデルはhttps://github.com/moranli-aca/learnablegroups-handで入手できる。 In this paper, we propose to estimate 3D hand pose by recovering the 3D coordinates of joints in a group-wise manner, where less-related joints are automatically categorized into different groups and exhibit different features. This is different from the previous methods where all the joints are considered holistically and share the same feature. The benefits of our method are illustrated by the principle of multi-task learning (MTL), i.e., by separating less-related joints into different groups (as different tasks), our method learns different features for each of them, therefore efficiently avoids the negative transfer (among less related tasks/groups of joints). The key of our method is a novel binary selector that automatically selects related joints into the same group. We implement such a selector with binary values stochastically sampled from a Concrete distribution, which is constructed using Gumbel softmax on trainable parameters. This enables us to preserve the differentiable property of the whole network. We further exploit features from those less-related groups by carrying out an additional feature fusing scheme among them, to learn more discriminative features. This is realized by implementing multiple 1x1 convolutions on the concatenated features, where each joint group contains a unique 1x1 convolution for feature fusion. The detailed ablation analysis and the extensive experiments on several benchmark datasets demonstrate the promising performance of the proposed method over the state-of-the-art (SOTA) methods. Besides, our method achieves top-1 among all the methods that do not exploit the dense 3D shape labels on the most recently released FreiHAND competition at the submission date. The source code and models are available at https://github.com/ moranli-aca/LearnableGroups-Hand.	翻訳日:2021-05-02 07:25:32 公開日:2020-12-17
# カモフラージュによる医療敵の攻撃に対する階層的特徴制約 A Hierarchical Feature Constraint to Camouflage Medical Adversarial Attacks ( http://arxiv.org/abs/2012.09501v1 ) ライセンス: Link先を確認	Qingsong Yao, Zecheng He, Yi Lin, Kai Ma, Yefeng Zheng and S. Kevin Zhou	(参考訳) 医療画像のためのディープニューラルネットワーク(DNN)は、臨床上の意思決定にセキュリティ上の懸念をもたらす敵例(AE)に対して極めて脆弱である。幸いなことに、医療用AEは階層的な特徴空間でも容易に検出できます。この現象をよりよく理解するために、我々は特徴空間における医療用aesの本質的特徴を徹底的に調査し、経験的証拠と理論的説明の両方を提供している。まず,自然画像とは対照的に,医用画像の深部表現の脆弱性を明らかにするためのストレステストを行った。次に,2次疾患診断ネットワークに対する典型的な敵対的攻撃が,脆弱な表現を一定方向に連続的に最適化することにより予測を操作できることを理論的に証明した。しかし、この脆弱性は機能領域にAEを隠すために利用することもできる。本稿では,既存の敵攻撃に対するアドオンとして,新しい階層的特徴制約 (HFC) を提案する。提案手法は,Fundoscopy と Chest X-Ray の2つの公開医用画像データセット上で評価する。実験結果から,攻撃手法よりも先進的対人検知器の配列をバイパスし,医療的特徴の重大な脆弱性により,攻撃者が対人表現を操作できる余地が大きくなることが示唆された。 Deep neural networks (DNNs) for medical images are extremely vulnerable to adversarial examples (AEs), which poses security concerns on clinical decision making. Luckily, medical AEs are also easy to detect in hierarchical feature space per our study herein. To better understand this phenomenon, we thoroughly investigate the intrinsic characteristic of medical AEs in feature space, providing both empirical evidence and theoretical explanations for the question: why are medical adversarial attacks easy to detect? We first perform a stress test to reveal the vulnerability of deep representations of medical images, in contrast to natural images. We then theoretically prove that typical adversarial attacks to binary disease diagnosis network manipulate the prediction by continuously optimizing the vulnerable representations in a fixed direction, resulting in outlier features that make medical AEs easy to detect. However, this vulnerability can also be exploited to hide the AEs in the feature space. We propose a novel hierarchical feature constraint (HFC) as an add-on to existing adversarial attacks, which encourages the hiding of the adversarial representation within the normal feature distribution. We evaluate the proposed method on two public medical image datasets, namely {Fundoscopy} and {Chest X-Ray}. Experimental results demonstrate the superiority of our adversarial attack method as it bypasses an array of state-of-the-art adversarial detectors more easily than competing attack methods, supporting that the great vulnerability of medical features allows an attacker more room to manipulate the adversarial representations.	翻訳日:2021-05-02 07:25:04 公開日:2020-12-17
# 意味セグメンテーションのための具体化ビジュアルアクティブラーニング Embodied Visual Active Learning for Semantic Segmentation ( http://arxiv.org/abs/2012.09503v1 ) ライセンス: Link先を確認	David Nilsson, Aleksis Pirinen, Erik G\"artner, Cristian Sminchisescu	(参考訳) エージェントが3次元環境を探索し、アノテーションを要求するビューを積極的に選択することで視覚的シーン理解を得ることを目的として、視覚的能動学習の課題について検討する。一部のベンチマークでは正確だが、今日のディープビジュアル認識パイプラインは、特定の現実世界のシナリオや異常な視点ではうまく一般化しない傾向がある。ロボットの知覚は、屋内環境の混乱や照明不足など、モバイルシステムの動作状況の認識能力を洗練する能力を必要としている。これにより,エージェントを視覚認識能力の向上を目的とした新しい環境に配置するタスクが提案される。視覚活動学習の具体化を研究するため,環境に関する知識の異なるエージェント(学習と事前特定の両方)の電池を開発する。エージェントはセマンティックセグメンテーションネットワークを備えており、それらのビューの周辺でアノテーションを広めるために情報的ビューを取得し、移動し、探索し、オンラインリトレーニングによって基礎となるセグメンテーションネットワークを洗練させる。トレーニング可能な方法は、深層強化学習を使用して、2つの競合する目標、すなわち、視覚認識精度として表現されるタスクのパフォーマンスと、アクティブな探索中に要求される必要量のアノテートされたデータとをバランスさせる。本稿では,フォトリアリスティックなMatterport3Dシミュレータを用いて提案手法を広範囲に評価し,より少ないアノテーションを要求しても,完全に学習した手法が比較対象よりも優れていることを示す。 We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding by actively selecting views for which to request annotation. While accurate on some benchmarks, today's deep visual recognition pipelines tend to not generalize well in certain real-world scenarios, or for unusual viewpoints. Robotic perception, in turn, requires the capability to refine the recognition capabilities for the conditions where the mobile system operates, including cluttered indoor environments or poor illumination. This motivates the proposed task, where an agent is placed in a novel environment with the objective of improving its visual recognition capability. To study embodied visual active learning, we develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the environment. The agents are equipped with a semantic segmentation network and seek to acquire informative views, move and explore in order to propagate annotations in the neighbourhood of those views, then refine the underlying segmentation network by online retraining. The trainable method uses deep reinforcement learning with a reward function that balances two competing objectives: task performance, represented as visual recognition accuracy, which requires exploring the environment, and the necessary amount of annotated data requested during active exploration. We extensively evaluate the proposed models using the photorealistic Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts, even when requesting fewer annotations.	翻訳日:2021-05-02 07:24:35 公開日:2020-12-17
# オープンセット映像認識における低遅延ストリームデータからのインクリメンタル学習 Incremental Learning from Low-labelled Stream Data in Open-Set Video Face Recognition ( http://arxiv.org/abs/2012.09571v1 ) ライセンス: Link先を確認	Eric Lopez-Lopez, Carlos V. Regueiro, Xose M. Pardo	(参考訳) ディープラーニングアプローチは、豊富な注釈付きデータがトレーニングのために提供される一般的な分類問題に対して、優れたパフォーマンスを備えたソリューションをもたらした。対照的に、ストリーミングデータの教師なし問題に主に適用した場合に、非定常クラスを連続的に学習する際の進歩は少ない。本稿では,深層機能エンコーダとSVMのオープンセット動的アンサンブルを組み合わせた新たなインクリメンタル学習手法を提案する。いくつかのビデオフレームで訓練された単純な弱い分類器から、教師なし操作データを用いて認識を向上させることができる。我々のアプローチは、破滅的な忘れを回避し、ミス適応から部分的に修復する新しいパターンに適応する。さらに、現実世界の条件に適合するように、システムはオープンセットで運用するように設計された。その結果、非適応的な最先端手法に対するF1スコアの最大15%向上効果が示された。 Deep Learning approaches have brought solutions, with impressive performance, to general classification problems where wealthy of annotated data are provided for training. In contrast, less progress has been made in continual learning of a set of non-stationary classes, mainly when applied to unsupervised problems with streaming data. Here, we propose a novel incremental learning approach which combines a deep features encoder with an Open-Set Dynamic Ensembles of SVM, to tackle the problem of identifying individuals of interest (IoI) from streaming face data. From a simple weak classifier trained on a few video-frames, our method can use unsupervised operational data to enhance recognition. Our approach adapts to new patterns avoiding catastrophic forgetting and partially heals itself from miss-adaptation. Besides, to better comply with real world conditions, the system was designed to operate in an open-set setting. Results show a benefit of up to 15% F1-score increase respect to non-adaptive state-of-the-art methods.	翻訳日:2021-05-02 07:24:07 公開日:2020-12-17
# 畳み込みニューラルネットワークによる銃器検出:エンドツーエンドソリューションに対する意味セグメンテーションモデルの比較 Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions ( http://arxiv.org/abs/2012.09662v1 ) ライセンス: Link先を確認	Alexander Egiazarov, Fabio Massimo Zennaro, Vasileios Mavroeidis	(参考訳) 武器の脅威検出とライブビデオからの攻撃的な行動は、テロリズムや一般犯罪、家庭内暴力などの致命的な事件の迅速検出と予防に利用できる。これを実現する1つの方法は、人工知能の使用と、特に機械学習による画像解析である。本稿では,従来のモノリシックなエンド・ツー・エンドのディープラーニングモデルと,セマンティクスセグメンテーションによって火花を検知する単純なニューラルネットワークのアンサンブルに基づく先行モデルの比較を行う。精度,計算量,データ複雑性,柔軟性,信頼性など,異なる観点から両モデルを評価した。その結果,セマンティクスセグメンテーションモデルは,従来の深層モデルと比べ,低データ環境においてかなりの柔軟性とレジリエンスを提供するが,その構成とチューニングはエンドツーエンドモデルと同等の精度を達成する上では困難であることがわかった。 Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents such as terrorism, general criminal offences, or even domestic violence. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. In this paper we conduct a comparison between a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation. We evaluated both models from different points of view, including accuracy, computational and data complexity, flexibility and reliability. Our results show that a semantic segmentation model provides considerable amount of flexibility and resilience in the low data environment compared to classical deep model models, although its configuration and tuning presents a challenge in achieving the same levels of accuracy as an end-to-end model.	翻訳日:2021-05-02 07:23:53 公開日:2020-12-17
# 複数ショットによるヒトメッシュの回復 Human Mesh Recovery from Multiple Shots ( http://arxiv.org/abs/2012.09843v1 ) ライセンス: Link先を確認	Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa	(参考訳) 映画のような編集されたメディアのビデオは、有用だが未調査の情報ソースである。これらの映画において、大きな時間的文脈で描かれた人間同士の多様な外観と相互作用は、貴重なデータ源となり得る。しかし、データの豊かさは、急激なショット変更や、重度のトランケーションを持つアクターのクローズアップといった基本的な課題を犠牲にされ、既存の人間の3D理解方法の適用性が制限される。本稿では,同一シーンのショット変更がフレーム間の不連続を生じさせるが,シーンの3d構造は依然としてスムーズに変化するという考察を加えて,これらの制約について述べる。これにより、撮影前後のフレームをマルチビュー信号として処理し、アクターの3D状態を復元する強力な手がかりを提供する。提案するマルチショット最適化フレームワークは,擬似基底真理3次元メッシュを用いた長周期の3次元再構成とマイニングを改善する。得られたデータは,人間のメッシュ回復モデルのトレーニングにおいて有用であることが示される: 単一画像の場合, 頑健性が向上する; ビデオの場合, 入力フレームのショット変化による観察の欠如を自然に処理できる純粋トランスフォーマーベースのテンポラルエンコーダを提案する。広範な実験を通じて,洞察と提案モデルの重要性を実証する。私たちが開発しているツールは、編集されたメディアの巨大なライブラリから3Dコンテンツを処理・分析するための扉を開きます。プロジェクトページ: https://geopavlakos.github.io/multishot Videos from edited media like movies are a useful, yet under-explored source of information. The rich variety of appearance and interactions between humans depicted over a large temporal context in these films could be a valuable source of data. However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncation, which limits the applicability of existing human 3D understanding methods. In this paper, we address these limitations with an insight that while shot changes of the same scene incur a discontinuity between frames, the 3D structure of the scene still changes smoothly. This allows us to handle frames before and after the shot change as multi-view signal that provide strong cues to recover the 3D state of the actors. We propose a multi-shot optimization framework, which leads to improved 3D reconstruction and mining of long sequences with pseudo ground truth 3D human mesh. We show that the resulting data is beneficial in the training of various human mesh recovery models: for single image, we achieve improved robustness; for video we propose a pure transformer-based temporal encoder, which can naturally handle missing observations due to shot changes in the input frames. We demonstrate the importance of the insight and proposed models through extensive experiments. The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media, which could be helpful for many downstream applications. Project page: https://geopavlakos.github.io/multishot	翻訳日:2021-05-02 07:22:33 公開日:2020-12-17
# スマートフォンで撮影した3dヘッドポートレート Relightable 3D Head Portraits from a Smartphone Video ( http://arxiv.org/abs/2012.09963v1 ) ライセンス: Link先を確認	Artem Sevastopolsky, Savva Ignatiev, Gonzalo Ferrer, Evgeny Burnaev, Victor Lempitsky	(参考訳) 本研究は、人間の頭部の光沢ある3D肖像画を作成するシステムについて述べる。私たちのニューラルパイプラインは、スマートフォンのカメラがフラッシュ点滅(フラッシュなしのフラッシュシーケンス)で撮影したフレームのシーケンスで動作します。 structure-from-motion software と multi-view denoising によって再構成された粗い点雲は、幾何学的なプロキシとして使われる。その後、深いレンダリングネットワークを訓練して、任意の新しい視点のために密なアルベド、ノーマル、環境照明マップを復元する。効果的に、プロキシジオメトリとレンダリングネットワークは、任意の視点から任意の照明下で合成可能な、再生可能な3dポートレートモデルを構成する。方向光、点光、あるいは環境マップ。このモデルは、アルベド光分解の可視性を強制する人間の顔特有の先行するフレーム列に適合し、対話的なフレームレートで動作させる。異なる照明条件および外挿視点下での性能評価を行い,既存の照明法との比較を行った。 In this work, a system for creating a relightable 3D portrait of a human head is presented. Our neural pipeline operates on a sequence of frames captured by a smartphone camera with the flash blinking (flash-no flash sequence). A coarse point cloud reconstructed via structure-from-motion software and multi-view denoising is then used as a geometric proxy. Afterwards, a deep rendering network is trained to regress dense albedo, normals, and environmental lighting maps for arbitrary new viewpoints. Effectively, the proxy geometry and the rendering network constitute a relightable 3D portrait model, that can be synthesized from an arbitrary viewpoint and under arbitrary lighting, e.g. directional light, point light, or an environment map. The model is fitted to the sequence of frames with human face-specific priors that enforce the plausibility of albedo-lighting decomposition and operates at the interactive frame rate. We evaluate the performance of the method under varying lighting conditions and at the extrapolated viewpoints and compare with existing relighting methods.	翻訳日:2021-05-02 07:21:50 公開日:2020-12-17
# BERTが販売開始 - 製品表現の分散モデルの比較 BERT Goes Shopping: Comparing Distributional Models for Product Representations ( http://arxiv.org/abs/2012.09807v1 ) ライセンス: Link先を確認	Federico Bianchi and Bingqing Yu and Jacopo Tagliabue	(参考訳) ワード埋め込み(例: word2vec)はprod2vecを通じてeコマース製品にうまく適用されている。コンテキスト化された埋め込みによってもたらされるいくつかのnlpタスクの最近のパフォーマンス改善に触発されて、我々はbertのようなアーキテクチャをeコマースに転送することを提案します。 ProdBERTは従来の手法よりもいくつかのシナリオで優れているが、最高の性能のモデルではリソースとハイパーパラメータの重要性を強調している。最後に、様々な計算およびデータ制約の下で埋め込みを訓練するためのガイドラインを提供することで結論付ける。 Word embeddings (e.g., word2vec) have been applied successfully to eCommerce products through prod2vec. Inspired by the recent performance improvements on several NLP tasks brought by contextualized embeddings, we propose to transfer BERT-like architectures to eCommerce: our model -- ProdBERT -- is trained to generate representations of products through masked session modeling. Through extensive experiments over multiple shops, different tasks, and a range of design choices, we systematically compare the accuracy of ProdBERT and prod2vec embeddings: while ProdBERT is found to be superior to traditional methods in several scenarios, we highlight the importance of resources and hyperparameters in the best performing models. Finally, we conclude by providing guidelines for training embeddings under a variety of computational and data constraints.	翻訳日:2021-05-02 07:21:26 公開日:2020-12-17
# DecAug: Decomposed Feature Representation と Semantic Augmentation によるアウト・オブ・ディストリビューションの一般化 DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation ( http://arxiv.org/abs/2012.09382v1 ) ライセンス: Link先を確認	Haoyue Bai, Rui Sun, Lanqing Hong, Fengwei Zhou, Nanyang Ye, Han-Jia Ye, S.-H. Gary Chan, Zhenguo Li	(参考訳) ディープラーニングは、独立で同一に分散した(IID)データを扱う強力な能力を示しているが、テストデータが別の分布(w.r.t)から来るようなOoD(out-of-distriion)の一般化に悩まされることが多い。訓練1号) 一般のOoD一般化フレームワークを広範囲のアプリケーションに設計することは、主に現実世界における相関シフトと多様性シフトによって困難である。以前のアプローチのほとんどは、ドメイン間のシフトや相関の補間など、ひとつの特定の分散シフトのみを解決できる。そこで本研究では,OoD一般化のための特徴表現と意味拡張手法であるDecAugを提案する。 DecAugはカテゴリ関連の機能とコンテキスト関連の機能を分離する。カテゴリ関連機能は対象オブジェクトの因果情報を含み、コンテキスト関連機能は属性、スタイル、背景、シーンを記述し、トレーニングデータとテストデータの間の分散シフトを引き起こす。この分解は2つの勾配(w.r.t)の直交化によって達成される。中間特徴) カテゴリーとコンテキストラベルの予測のための損失。さらに,学習表現のロバスト性を改善するために,文脈関連特徴の勾配に基づく拡張を行う。実験結果から、DecAugは様々なOoDデータセット上で、様々なタイプのOoD一般化課題に対処できる手法の中で、他の最先端手法よりも優れていることが示された。 While deep learning demonstrates its strong ability to handle independent and identically distributed (IID) data, it often suffers from out-of-distribution (OoD) generalization, where the test data come from another distribution (w.r.t. the training one). Designing a general OoD generalization framework to a wide range of applications is challenging, mainly due to possible correlation shift and diversity shift in the real world. Most of the previous approaches can only solve one specific distribution shift, such as shift across domains or the extrapolation of correlation. To address that, we propose DecAug, a novel decomposed feature representation and semantic augmentation approach for OoD generalization. DecAug disentangles the category-related and context-related features. Category-related features contain causal information of the target object, while context-related features describe the attributes, styles, backgrounds, or scenes, causing distribution shifts between training and test data. The decomposition is achieved by orthogonalizing the two gradients (w.r.t. intermediate features) of losses for predicting category and context labels. Furthermore, we perform gradient-based augmentation on context-related features to improve the robustness of the learned representations. Experimental results show that DecAug outperforms other state-of-the-art methods on various OoD datasets, which is among the very few methods that can deal with different types of OoD generalization challenges.	翻訳日:2021-05-02 07:21:15 公開日:2020-12-17
# ベイズネットワークモデルを用いた心臓疾患予測のための高速アルゴリズム A Fast Algorithm for Heart Disease Prediction using Bayesian Network Model ( http://arxiv.org/abs/2012.09429v1 ) ライセンス: Link先を確認	Mistura Muibideen and Rajesh Prasad (Department of Computer Science African University of Science and Technology, Abuja, Nigeria)	(参考訳) 心臓血管疾患は世界中の死因の1つである。データマイニングは、医療部門から利用可能なデータから貴重な知識を取得するのに役立つ。これは、臨床実験よりも速い患者の健康状態を予測するためのモデルをトレーニングするのに役立ちます。 Logistic Regression, K-Nearest Neighbor, Naive Bayes (NB), Support Vector Machineなど,さまざまな機械学習アルゴリズムの実装。クリーブランド心臓データセットに適用されているが、ベイジアンネットワーク(BN)を用いたモデリングには限界がある。本研究は,UCIレポジトリから収集したクリーブランド心臓データ14の関連属性の関係を明らかにするためにBNモデリングを適用した。その目的は、属性間の依存性が分類器のパフォーマンスにどう影響するかをチェックすることである。 BNは属性間の信頼性と透過的なグラフィカル表現を生成し、新しいシナリオを予測できる。このモデルは85%の精度を持つ。モデルでは80%の精度でNB分類器よりも優れていた。 Cardiovascular disease is the number one cause of death all over the world. Data mining can help to retrieve valuable knowledge from available data from the health sector. It helps to train a model to predict patients' health which will be faster as compared to clinical experimentation. Various implementation of machine learning algorithms such as Logistic Regression, K-Nearest Neighbor, Naive Bayes (NB), Support Vector Machine, etc. have been applied on Cleveland heart datasets but there has been a limit to modeling using Bayesian Network (BN). This research applied BN modeling to discover the relationship between 14 relevant attributes of the Cleveland heart data collected from The UCI repository. The aim is to check how the dependency between attributes affects the performance of the classifier. The BN produces a reliable and transparent graphical representation between the attributes with the ability to predict new scenarios. The model has an accuracy of 85%. It was concluded that the model outperformed the NB classifier which has an accuracy of 80%.	翻訳日:2021-05-02 07:20:38 公開日:2020-12-17
# 一般化保証によるAUUC最大化による治療目標設定 Treatment Targeting by AUUC Maximization with Generalization Guarantees ( http://arxiv.org/abs/2012.09897v1 ) ライセンス: Link先を確認	Artem Betlei, Eustache Diemert, Massih-Reza Amini	(参考訳) 個々の治療効果予測に基づいて治療課題を最適化する作業を検討する。このタスクはパーソナライズされた医療やターゲット広告といった多くのアプリケーションで見られ、近年はアップリフト・モデリング(uplift modeling)という名で関心を集めている。それは、最も有益であろう個人に対する治療を標的にしている。実生活のシナリオでは、地道的個別治療効果にアクセスできない場合には、一般に、個別治療効果(ITE)モデルの大半の学習目標とは異なるAUUC(Area Under the Uplift Curve)によって、それを行うモデルの能力が測定される。これらのモデルの学習は、不注意にauucを分解し、サブオプティカルな治療の割り当てにつながると論じている。この問題に対処するために,AUUCに縛られる一般化を提案し,AUUC-maxと呼ばれるこの境界の導出可能なサロゲートを最適化する新しい学習アルゴリズムを提案する。最後に,この一般化境界の厳密性,ハイパーパラメータチューニングの有効性を実証的に示し,従来の2つのベンチマークの幅広い基準値と比較し,提案アルゴリズムの有効性を示す。 We consider the task of optimizing treatment assignment based on individual treatment effect prediction. This task is found in many applications such as personalized medicine or targeted advertising and has gained a surge of interest in recent years under the name of Uplift Modeling. It consists in targeting treatment to the individuals for whom it would be the most beneficial. In real life scenarios, when we do not have access to ground-truth individual treatment effect, the capacity of models to do so is generally measured by the Area Under the Uplift Curve (AUUC), a metric that differs from the learning objectives of most of the Individual Treatment Effect (ITE) models. We argue that the learning of these models could inadvertently degrade AUUC and lead to suboptimal treatment assignment. To tackle this issue, we propose a generalization bound on the AUUC and present a novel learning algorithm that optimizes a derivable surrogate of this bound, called AUUC-max. Finally, we empirically demonstrate the tightness of this generalization bound, its effectiveness for hyper-parameter tuning and show the efficiency of the proposed algorithm compared to a wide range of competitive baselines on two classical benchmarks.	翻訳日:2021-05-02 07:19:39 公開日:2020-12-17
# 新型コロナウイルスの音声:感染の音響的相関 The voice of COVID-19: Acoustic correlates of infection ( http://arxiv.org/abs/2012.09478v1 ) ライセンス: Link先を確認	Katrin D. Bartl-Pokorny, Florian B. Pokorny, Anton Batliner, Shahin Amiriparian, Anastasia Semertzidou, Florian Eyben, Elena Kramer, Florian Schmidt, Rainer Sch\"onweiler, Markus Wehler, Bj\"orn W. Schuller	(参考訳) 新型コロナウイルス(covid-19)は世界の健康危機であり、ここ1年間、私たちの日常生活の多くの側面に影響を与えてきた。新型コロナウイルスの症状は重度連続体と異質である。症状のかなりの割合は声帯の病理学的変化と関連しており、COVID-19が発声に影響を及ぼす可能性があると仮定される。本研究は,本研究で初めて,包括的音響パラメータセットに基づいて,新型コロナウイルス感染の音声音響相関について検討することを目的とした。 i:/, /e:/, /o:/, /u:/, /a:/, /a:/の母音から抽出された88の音響的特徴を,11の症状性covid-19陽性者および11人の陰性ドイツ語話者参加者で比較した。我々はMann-Whitney Uテストを採用し、最も顕著なグループ差のある特徴を特定するために効果サイズを算出する。平均発声セグメント長と1秒あたりの発声セグメント数の差は、新型コロナウイルス陽性者の発声中の肺気流の不連続を示す母音全体において最も重要な違いとなる。前母音 /i:/ と /e:/ の群差は、基本周波数の変動と調和音-雑音比、後母音 /o:/ と /u:/ の群差、メル周波数ケプストラム係数とスペクトル傾斜の統計にさらに反映される。この研究の発見は、COVID-19に感染した個人を音声で識別する可能性を示す重要な概念実証として考えられる。 COVID-19 is a global health crisis that has been affecting many aspects of our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. A considerable proportion of symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the very first time, the present study aims to investigate voice acoustic correlates of an infection with COVID-19 on the basis of a comprehensive acoustic parameter set. We compare 88 acoustic features extracted from recordings of the vowels /i:/, /e:/, /o:/, /u:/, and /a:/ produced by 11 symptomatic COVID-19 positive and 11 COVID-19 negative German-speaking participants. We employ the Mann-Whitney U test and calculate effect sizes to identify features with the most prominent group differences. The mean voiced segment length and the number of voiced segments per second yield the most important differences across all vowels indicating discontinuities in the pulmonic airstream during phonation in COVID-19 positive participants. Group differences in the front vowels /i:/ and /e:/ are additionally reflected in the variation of the fundamental frequency and the harmonics-to-noise ratio, group differences in back vowels /o:/ and /u:/ in statistics of the Mel-frequency cepstral coefficients and the spectral slope. Findings of this study can be considered an important proof-of-concept contribution for a potential future voice-based identification of individuals infected with COVID-19.	翻訳日:2021-05-02 07:19:19 公開日:2020-12-17
# Clique: 都市規模における時空間物体の再同定 Clique: Spatiotemporal Object Re-identification at the City Scale ( http://arxiv.org/abs/2012.09329v1 ) ライセンス: Link先を確認	Tiantu Xu, Kaiwen Shen, Yang Fu, Humphrey Shi, Felix Xiaozhu Lin	(参考訳) オブジェクト再識別(ReID)は都市規模のカメラのキーとなる応用である。古典的なreidタスクは画像検索と見なされることが多いが、対象オブジェクトが現れる場所と時間についての時空間クエリとして扱う。時空間レイドは、コンピュータビジョンアルゴリズムと都市カメラからのコロッサルビデオの精度の限界に挑戦されている。 Clique は,(1) ReID アルゴリズムによって抽出されたファジィオブジェクトの特徴をクラスタリングすることで,ターゲット発生を判定する実用的 ReID エンジンであり,各クラスタは,入力にマッチする別物体の一般的な印象を表す。(2) ビデオで検索するために,Clique は時空間のカバレッジを最大化し,必要に応じてカメラを段階的に追加する。 25台のカメラから25時間のビデオを評価することで、Cliqueは70のクエリで0.87(リコールは5)に達し、高い精度で830倍の動画をリアルタイムに実行した。 Object re-identification (ReID) is a key application of city-scale cameras. While classic ReID tasks are often considered as image retrieval, we treat them as spatiotemporal queries for locations and times in which the target object appeared. Spatiotemporal reID is challenged by the accuracy limitation in computer vision algorithms and the colossal videos from city cameras. We present Clique, a practical ReID engine that builds upon two new techniques: (1) Clique assesses target occurrences by clustering fuzzy object features extracted by ReID algorithms, with each cluster representing the general impression of a distinct object to be matched against the input; (2) to search in videos, Clique samples cameras to maximize the spatiotemporal coverage and incrementally adds cameras for processing on demand. Through evaluation on 25 hours of videos from 25 cameras, Clique reached a high accuracy of 0.87 (recall at 5) across 70 queries and runs at 830x of video realtime in achieving high accuracy.	翻訳日:2021-05-02 07:18:08 公開日:2020-12-17
# ピクセルごとのバイアスドコントラスト閾値のイベントカメラ校正 Event Camera Calibration of Per-pixel Biased Contrast Threshold ( http://arxiv.org/abs/2012.09378v1 ) ライセンス: Link先を確認	Ziwei Wang, Yonhon Ng, Pieter van Goor, Robert Mahony	(参考訳) イベントカメラは、極端な照明条件下でも高い時間分解能で強度変化を表す非同期イベントを出力する。現在、既存の作品のほとんどは、すべてのピクセルの強度変化を推定するために単一のコントラスト閾値を使用している。しかし、複雑な回路バイアスと製造不完全さは、画素間のバイアス付き画素とミスマッチするコントラスト閾値を引き起こし、望ましくない出力に繋がる可能性がある。本稿では,イベント専用カメラとハイブリッドカメラを対象とする新しいイベントカメラモデルと2つのキャリブレーション手法を提案する。また,インテンシティ画像とイベントを同時に提供した場合,時間変動イベントレートに適応するイベントカメラのキャリブレーションを行う効率的なオンライン手法を提案する。提案手法の利点を,複数のイベントカメラデータセットにおける最新技術と比較した。 Event cameras output asynchronous events to represent intensity changes with a high temporal resolution, even under extreme lighting conditions. Currently, most of the existing works use a single contrast threshold to estimate the intensity change of all pixels. However, complex circuit bias and manufacturing imperfections cause biased pixels and mismatch contrast threshold among pixels, which may lead to undesirable outputs. In this paper, we propose a new event camera model and two calibration approaches which cover event-only cameras and hybrid image-event cameras. When intensity images are simultaneously provided along with events, we also propose an efficient online method to calibrate event cameras that adapts to time-varying event rates. We demonstrate the advantages of our proposed methods compared to the state-of-the-art on several different event camera datasets.	翻訳日:2021-05-02 07:17:50 公開日:2020-12-17
# スケール不変な特徴変換キーポイント記述子マッチングのための完全パイプラインFPGAアクセラレータ A fully pipelined FPGA accelerator for scale invariant feature transform keypoint descriptor matching, ( http://arxiv.org/abs/2012.09666v1 ) ライセンス: Link先を確認	Luka Daoud, Muhammad Kamran Latif, H S. Jacinto, Nader Rafla	(参考訳) スケール不変特徴変換(SIFT)アルゴリズムはコンピュータビジョンの分野における古典的特徴抽出アルゴリズムであると考えられている。 siftのキーポイント記述子マッチングは、消費されるデータ量による計算集約的なプロセスである。本研究では,SIFTキーポイント記述子マッチングのための完全パイプライン型ハードウェアアクセラレータアーキテクチャを設計した。加速器コアはfield programmable gate array (fpga) で実装・テストされた。提案するハードウェアアーキテクチャは,完全な実装に必要なメモリ帯域幅を適切に処理し,屋上性能モデルに到達し,潜在的な最大スループットを実現する。完全なパイプラインマッチングアーキテクチャは、共振角距離法に基づいて設計されている。アーキテクチャは16ビットの固定点演算に最適化され,Xilinx ZynqベースのFPGA開発ボードを用いてハードウェア上に実装された。提案アーキテクチャは,メモリ帯域幅制限を緩和し,高いスループットを維持しつつ,文学的手法と比較して,領域資源の顕著な削減を示す。その結果、使用済みデバイスリソースの最大91%がLUTで、99%がBRAMで削減された。私たちのハードウェア実装は、同等のソフトウェアアプローチの15.7倍高速です。 The scale invariant feature transform (SIFT) algorithm is considered a classical feature extraction algorithm within the field of computer vision. SIFT keypoint descriptor matching is a computationally intensive process due to the amount of data consumed. In this work, we designed a novel fully pipelined hardware accelerator architecture for SIFT keypoint descriptor matching. The accelerator core was implemented and tested on a field programmable gate array (FPGA). The proposed hardware architecture is able to properly handle the memory bandwidth necessary for a fully-pipelined implementation and hits the roofline performance model, achieving the potential maximum throughput. The fully pipelined matching architecture was designed based on the consine angle distance method. Our architecture was optimized for 16-bit fixed-point operations and implemented on hardware using a Xilinx Zynq-based FPGA development board. Our proposed architecture shows a noticeable reduction of area resources compared with its counterparts in literature, while maintaining high throughput by alleviating memory bandwidth restrictions. The results show a reduction in consumed device resources of up to 91 percent in LUTs and 79 percent of BRAMs. Our hardware implementation is 15.7 times faster than the comparable software approach.	翻訳日:2021-05-02 07:16:46 公開日:2020-12-17
# OCTAを用いた胎児血管域の高速3次元計測 Fast 3-dimensional estimation of the Foveal Avascular Zone from OCTA ( http://arxiv.org/abs/2012.09945v1 ) ライセンス: Link先を確認	Giovanni Ometto, Giovanni Montesano, Usha Chakravarthy, Frank Kee, Ruth E. Hogg and David P. Crabb	(参考訳) 光コヒーレンス断層撮影法(optical coherence tomography angiography:octa)のen face imageからのfoveal avascular zone(faz)領域は、この技術に基づいた最も一般的な測定方法の1つである。 FAZの体積測定はOCTAスキャンを特徴付ける高雑音で計算されるのに対し, 診療におけるFAZ領域の使用は, 正常者間でのFAZ領域の高変動によって制限される。本研究では,3次元領域における内網膜の毛細血管網を3次元で効率的に同定するために,en面画像の信号対ノイズ比を高く活用するアルゴリズムを考案した。その後、ネットワークは形態学的操作で処理され、内部網膜の境界領域内の3d fazを識別する。 430個の眼のデータセットを用いてFAZの体積と面積を算出した。次に,線形混合効果モデルを用いて,糖尿病網膜症を伴わない健常者,糖尿病性網膜症者(dr),糖尿病者(dr)の3群間の差を同定した。その結果, FAZ量は異なる群間で有意差を認めたが, 面積測定では認められなかった。これらの結果から,体積型FAZは平面型FAZよりも優れた診断検出器である可能性が示唆された。私たちが導入した効率的な手法は、内網膜の毛細血管ネットワークの3dセグメンテーションを提供するだけでなく、診療所におけるfazボリュームの高速計算を可能にします。 The area of the foveal avascular zone (FAZ) from en face images of optical coherence tomography angiography (OCTA) is one of the most common measurement based on this technology. However, its use in clinic is limited by the high variation of the FAZ area across normal subjects, while the calculation of the volumetric measurement of the FAZ is limited by the high noise that characterizes OCTA scans. We designed an algorithm that exploits the higher signal-to-noise ratio of en face images to efficiently identify the capillary network of the inner retina in 3-dimensions (3D), under the assumption that the capillaries in separate plexuses do not overlap. The network is then processed with morphological operations to identify the 3D FAZ within the bounding segmentations of the inner retina. The FAZ volume and area in different plexuses were calculated for a dataset of 430 eyes. Then, the measurements were analyzed using linear mixed effect models to identify differences between three groups of eyes: healthy, diabetic without diabetic retinopathy (DR) and diabetic with DR. Results showed significant differences in the FAZ volume between the different groups but not in the area measurements. These results suggest that volumetric FAZ could be a better diagnostic detector than the planar FAZ. The efficient methodology that we introduced could allow the fast calculation of the FAZ volume in clinics, as well as providing the 3D segmentation of the capillary network of the inner retina.	翻訳日:2021-05-02 07:16:30 公開日:2020-12-17
# 動的頭部の合成放射場を学習する Learning Compositional Radiance Fields of Dynamic Human Heads ( http://arxiv.org/abs/2012.09955v1 ) ライセンス: Link先を確認	Ziyan Wang, Timur Bagautdinov, Stephen Lombardi, Tomas Simon, Jason Saragih, Jessica Hodgins, Michael Zollh\"ofer	(参考訳) 動的人間のフォトリアリスティックなレンダリングは、テレプレゼンスシステム、仮想ショッピング、合成データ生成などにとって重要な能力である。近年,コンピュータグラフィックスと機械学習の技法を組み合わせたニューラルレンダリング手法が,人間と物体の高忠実度モデルを作成している。これらの手法のいくつかは、駆動可能な人間モデル(ニューラルボリューム)に十分な忠実度を持たず、一方、非常に長いレンダリング時間(NeRF)を持つ。本稿では,従来の手法の長所を組み合わせ,高解像度かつ高速な結果を生成する新しい合成3次元表現を提案する。アニメーションコードの粗い3次元構造を意識したグリッドと、各位置とその対応する局所アニメーションコードをビュー依存放射率と局所体積密度にマッピングする連続学習シーン関数を組み合わせることで、離散的かつ連続的なボリューム表現のギャップを埋める。異なるボリュームレンダリングは、人間の頭部と上半身のフォトリアリスティックな斬新なビューを計算したり、2次元の監督だけで新しい表現をエンドツーエンドに訓練したりするために用いられる。さらに,学習した動的放射場を用いて,グローバルなアニメーションコードに基づく新しい未知の表現を合成できることを示す。本研究は,人間の頭と上半身の新たな視線を合成する手法である。 Photorealistic rendering of dynamic humans is an important ability for telepresence systems, virtual shopping, synthetic data generation, and more. Recently, neural rendering methods, which combine techniques from computer graphics and machine learning, have created high-fidelity models of humans and objects. Some of these methods do not produce results with high-enough fidelity for driveable human models (Neural Volumes) whereas others have extremely long rendering times (NeRF). We propose a novel compositional 3D representation that combines the best of previous methods to produce both higher-resolution and faster results. Our representation bridges the gap between discrete and continuous volumetric representations by combining a coarse 3D-structure-aware grid of animation codes with a continuous learned scene function that maps every position and its corresponding local animation code to its view-dependent emitted radiance and local volume density. Differentiable volume rendering is employed to compute photo-realistic novel views of the human head and upper body as well as to train our novel representation end-to-end using only 2D supervision. In addition, we show that the learned dynamic radiance field can be used to synthesize novel unseen expressions based on a global animation code. Our approach achieves state-of-the-art results for synthesizing novel views of dynamic human heads and the upper body.	翻訳日:2021-05-02 07:16:04 公開日:2020-12-17
# 視覚質問応答のための自己教師付き学習による言語優先の克服 Overcoming Language Priors with Self-supervised Learning for Visual Question Answering ( http://arxiv.org/abs/2012.11528v1 ) ライセンス: Link先を確認	Xi Zhu, Zhendong Mao, Chunxiao Liu, Peng Zhang, Bin Wang, and Yongdong Zhang	(参考訳) ほとんどのVisual Question Answering (VQA)モデルは、固有のデータバイアスによって引き起こされる言語前の問題に悩まされている。具体的には、VQAモデルは質問に答える傾向がある(例えば、バナナは何色か? 画像内容を無視した高周波応答(例えばイエロー)に基づいて。既存のアプローチでは、繊細なモデルを作成したり、画像依存性を強化しながら質問依存を減らす視覚アノテーションを追加することでこの問題に対処している。しかし、データバイアスが緩和されてはいないため、まだ言語に先行する問題に直面している。本稿では,この問題を解決するための自己教師付き学習フレームワークを提案する。具体的には,まずラベル付きデータを自動生成してバイアスデータのバランスをとるとともに,バランスの取れたデータを活用する自己教師付き補助タスクを提案する。本手法は,外部アノテーションを導入することなく,バランスデータを生成することにより,データのバイアスを補償する。実験結果から,最も一般的に使用されているベンチマークVQA-CP v2の精度は49.50%から57.59%に向上した。言い換えれば、外部アノテーションを使わずにアノテーションベースのメソッドのパフォーマンスを16%向上させることができる。 Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases. Specifically, VQA models tend to answer questions (e.g., what color is the banana?) based on the high-frequency answers (e.g., yellow) ignoring image contents. Existing approaches tackle this problem by creating delicate models or introducing additional visual annotations to reduce question dependency while strengthening image dependency. However, they are still subject to the language prior problem since the data biases have not been even alleviated. In this paper, we introduce a self-supervised learning framework to solve this problem. Concretely, we first automatically generate labeled data to balance the biased data, and propose a self-supervised auxiliary task to utilize the balanced data to assist the base VQA model to overcome language priors. Our method can compensate for the data biases by generating balanced data without introducing external annotations. Experimental results show that our method can significantly outperform the state-of-the-art, improving the overall accuracy from 49.50% to 57.59% on the most commonly used benchmark VQA-CP v2. In other words, we can increase the performance of annotation-based methods by 16% without using external annotations.	翻訳日:2021-05-02 07:15:41 公開日:2020-12-17
# 小売の非行の因果学習 The Causal Learning of Retail Delinquency ( http://arxiv.org/abs/2012.09448v1 ) ライセンス: Link先を確認	Yiyan Huang, Cheuk Hang Leung, Xing Yan, Qi Wu, Nanbo Peng, Dongdong Wang, Zhixiang Huang	(参考訳) 本稿では、貸主の信用決定に変化があった場合の借主の返済の期待差に焦点を当てる。古典的推定器は相反する効果を見落とし、したがって推定誤差は壮大である。そこで我々は,誤差を大幅に低減できる推定器を構築するための別の手法を提案する。提案する推定器は, 理論解析と数値実験を組み合わせることで, 偏りがなく, 一貫性があり, 頑健であることが示されている。さらに,古典的推定器と提案した推定器の因果量の推定能力を比較する。比較は、線形回帰モデル、ツリーベースモデル、ニューラルネットワークベースのモデルなど、さまざまなレベルの因果性、異なる非線形性、異なる分布特性を示す異なるシミュレーションデータセットの下で、幅広いモデルでテストされる。最も重要なことは、当社のアプローチを、eコマースと融資ビジネスの両方を運用するグローバルテクノロジー企業が提供する大規模な観察データセットに適用することです。因果効果が正しく説明されれば, 推定誤差の相対的低減は極めて有意であることがわかった。 This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.	翻訳日:2021-05-02 07:14:57 公開日:2020-12-17
# アルゴリズム・暗号共設計によるスケーラブル・プライバシ保全型深層ニューラルネットワーク Towards Scalable and Privacy-Preserving Deep Neural Network via Algorithmic-Cryptographic Co-design ( http://arxiv.org/abs/2012.09364v1 ) ライセンス: Link先を確認	Chaochao Chen, Jun Zhou, Longfei Zheng, Yan Wang, Xiaolin Zheng, Bingzhe Wu, Cen Chen, Li Wang, and Jianwei Yin	(参考訳) ディープニューラルネットワーク(DNN)は、特に豊富なトレーニングデータを提供する場合、様々な現実世界のアプリケーションにおいて顕著な進歩を遂げている。しかし、データ分離は現在深刻な問題となっている。既存の作業は、アルゴリズムの観点からも暗号化の観点からも、DNNモデルをプライバシ保護する。前者は主にデータホルダとデータホルダとサーバでDNN計算グラフを分割するが、スケーラビリティは良好だが、精度の低下と潜在的なプライバシーリスクに悩まされている。対照的に後者は、プライバシーの保証は強いがスケーラビリティは乏しい、時間を要する暗号技術を利用している。本稿では,アルゴリズムと暗号を併用した,スケーラブルでプライバシ保護の深いニューラルネットワーク学習フレームワークSPNNを提案する。アルゴリズムの観点から,dnnモデルの計算グラフを,データホルダが行うプライベートデータ関連計算と,計算能力の高いサーバに委譲されるその他の重い計算の2つの部分に分割する。暗号の観点からは,秘密共有法と準同型暗号法という2種類の暗号手法を用いて,私的および協調的にプライベートデータ関連計算を行う手法を提案する。さらに,SPNNを分散環境で実装し,ユーザフレンドリなAPIを導入する。実世界のデータセットで行った実験結果はspnnの優位を示している。 Deep Neural Networks (DNNs) have achieved remarkable progress in various real-world applications, especially when abundant training data are provided. However, data isolation has become a serious problem currently. Existing works build privacy preserving DNN models from either algorithmic perspective or cryptographic perspective. The former mainly splits the DNN computation graph between data holders or between data holders and server, which demonstrates good scalability but suffers from accuracy loss and potential privacy risks. In contrast, the latter leverages time-consuming cryptographic techniques, which has strong privacy guarantee but poor scalability. In this paper, we propose SPNN - a Scalable and Privacy-preserving deep Neural Network learning framework, from algorithmic-cryptographic co-perspective. From algorithmic perspective, we split the computation graph of DNN models into two parts, i.e., the private data related computations that are performed by data holders and the rest heavy computations that are delegated to a server with high computation ability. From cryptographic perspective, we propose using two types of cryptographic techniques, i.e., secret sharing and homomorphic encryption, for the isolated data holders to conduct private data related computations privately and cooperatively. Furthermore, we implement SPNN in a decentralized setting and introduce user-friendly APIs. Experimental results conducted on real-world datasets demonstrate the superiority of SPNN.	翻訳日:2021-05-02 07:14:40 公開日:2020-12-17
# 薬物標的結合親和性予測のための距離対応分子グラフ注意ネットワーク Distance-aware Molecule Graph Attention Network for Drug-Target Binding Affinity Prediction ( http://arxiv.org/abs/2012.09624v1 ) ライセンス: Link先を確認	Jingbo Zhou, Shuangli Li, Liang Huang, Haoyi Xiong, Fan Wang, Tong Xu, Hui Xiong, Dejing Dou	(参考訳) 薬物とタンパク質の結合親和性を正確に予測することは、計算薬物発見の重要なステップである。グラフニューラルネットワーク(gnns)は様々なグラフ関連タスクで顕著な成功を収めているため、gnnは近年、結合親和性予測を改善する有望なツールと見なされている。しかし、既存のGNNアーキテクチャのほとんどは、その原子間の相対的な空間情報を考えることなく、薬物やタンパク質のトポロジカルグラフ構造を符号化することができる。ソーシャルネットワークやコモンセンス知識グラフのような他のグラフデータセットとは異なり、原子間の相対的な空間的位置と化学結合は結合親和性に大きな影響を及ぼす。そこで本研究では,ドラッグターゲット結合親和性予測に適したディスタンス対応分子グラフ注意ネットワーク(S-MAN)を提案する。そこで,我々はまず,構築したポケットリガンドグラフに位相構造と空間位置情報を統合する位置符号化機構を提案する。また,エッジレベルのアグリゲーションとノードレベルのアグリゲーションを有する新しいエッジノード階層型アグリゲーション構造を提案する。階層的注意集約は、原子間の空間的依存関係を捉えるだけでなく、原子間の複数の空間的関係を識別する能力で位置強調情報を融合することができる。最後に、S-MANの有効性を示すために、2つの標準データセットについて広範な実験を行った。 Accurately predicting the binding affinity between drugs and proteins is an essential step for computational drug discovery. Since graph neural networks (GNNs) have demonstrated remarkable success in various graph-related tasks, GNNs have been considered as a promising tool to improve the binding affinity prediction in recent years. However, most of the existing GNN architectures can only encode the topological graph structure of drugs and proteins without considering the relative spatial information among their atoms. Whereas, different from other graph datasets such as social networks and commonsense knowledge graphs, the relative spatial position and chemical bonds among atoms have significant impacts on the binding affinity. To this end, in this paper, we propose a diStance-aware Molecule graph Attention Network (S-MAN) tailored to drug-target binding affinity prediction. As a dedicated solution, we first propose a position encoding mechanism to integrate the topological structure and spatial position information into the constructed pocket-ligand graph. Moreover, we propose a novel edge-node hierarchical attentive aggregation structure which has edge-level aggregation and node-level aggregation. The hierarchical attentive aggregation can capture spatial dependencies among atoms, as well as fuse the position-enhanced information with the capability of discriminating multiple spatial relations among atoms. Finally, we conduct extensive experiments on two standard datasets to demonstrate the effectiveness of S-MAN.	翻訳日:2021-05-02 07:14:07 公開日:2020-12-17
# Fairkit, Fairkit, on the Wall, Who's the Fairest of Them All? フェアモデルトレーニングにおけるデータサイエンティストの支援 Fairkit, Fairkit, on the Wall, Who's the Fairest of Them All? Supporting Data Scientists in Training Fair Models ( http://arxiv.org/abs/2012.09951v1 ) ライセンス: Link先を確認	Brittany Johnson, Jesse Bartola, Rico Angell, Katherine Keith, Sam Witty, Stephen J. Giguere, Yuriy Brun	(参考訳) 現代のソフトウェアはデータと機械学習に大きく依存しており、世界を形成する決定に影響を与える。残念なことに、最近の研究では、データに偏りがあるため、ソフトウェアシステムは、女性の声よりも男性の声のより良い字幕の書き起こしを生成することから、金融ローンのために有色人種の人々を過大に引き込むことまで、彼らの決定にバイアスをしばしば注入していることが示されている。機械学習のバイアスに対処するために、データサイエンティストは、特定のデータ領域におけるモデル品質と公平性の間のトレードオフを理解するためのツールが必要である。その目的に向けて,データサイエンティストが公平性を判断し理解するためのツールキットであるfairkit-learnを提案する。 Fairkit-learnは最先端の機械学習ツールで動作し、同じインターフェースを使って採用を容易にする。複数の機械学習アルゴリズム、ハイパーパラメータ、データ置換によって生成される何千ものモデルを評価し、フェアネスと品質の間の最適なトレードオフを記述する小さなパレート最適モデルの集合を計算し視覚化することができる。その結果,fairkit-learnを利用する学生は,scikit-learn と ibm ai fairness 360 ツールキットを用いた学生よりも,公平性と品質のバランスが良いモデルを作成していることがわかった。 fairkit-learnでは、scikit-learnでトレーニングされるであろうモデルよりも、最大67%公平で10%精度の高いモデルを選択することができる。 Modern software relies heavily on data and machine learning, and affects decisions that shape our world. Unfortunately, recent studies have shown that because of biases in data, software systems frequently inject bias into their decisions, from producing better closed caption transcriptions of men's voices than of women's voices to overcharging people of color for financial loans. To address bias in machine learning, data scientists need tools that help them understand the trade-offs between model quality and fairness in their specific data domains. Toward that end, we present fairkit-learn, a toolkit for helping data scientists reason about and understand fairness. Fairkit-learn works with state-of-the-art machine learning tools and uses the same interfaces to ease adoption. It can evaluate thousands of models produced by multiple machine learning algorithms, hyperparameters, and data permutations, and compute and visualize a small Pareto-optimal set of models that describe the optimal trade-offs between fairness and quality. We evaluate fairkit-learn via a user study with 54 students, showing that students using fairkit-learn produce models that provide a better balance between fairness and quality than students using scikit-learn and IBM AI Fairness 360 toolkits. With fairkit-learn, users can select models that are up to 67% more fair and 10% more accurate than the models they are likely to train with scikit-learn.	翻訳日:2021-05-02 07:13:21 公開日:2020-12-17
# ビデオゲームにおける超解像の深層学習技術 Deep Learning Techniques for Super-Resolution in Video Games ( http://arxiv.org/abs/2012.09810v1 ) ライセンス: Link先を確認	Alexander Watson	(参考訳) ビデオゲームグラフィックスの計算コストは増加し、グラフィックス処理のハードウェアは追いつくのに苦労している。つまり、コンピュータ科学者はグラフィカル処理ハードウェアの性能を改善する創造的な新しい方法を開発する必要がある。ビデオ超解像のための深層学習技術は、計算コストの大部分を相殺しながら、高品質なグラフィックスを持つことができる。これらの新興技術は、消費者がビデオゲームのパフォーマンスと楽しみを改善し、ゲーム開発業界で標準になる可能性を秘めている。 The computational cost of video game graphics is increasing and hardware for processing graphics is struggling to keep up. This means that computer scientists need to develop creative new ways to improve the performance of graphical processing hardware. Deep learning techniques for video super-resolution can enable video games to have high quality graphics whilst offsetting much of the computational cost. These emerging technologies allow consumers to have improved performance and enjoyment from video games and have the potential to become standard within the game development industry.	翻訳日:2021-05-02 07:12:31 公開日:2020-12-17
# Treadmill Assisted Gait Spoofing (TAGS):ウェアラブルセンサーによる歩行認証への新たな脅威 Treadmill Assisted Gait Spoofing (TAGS): An Emerging Threat to wearable Sensor-based Gait Authentication ( http://arxiv.org/abs/2012.09950v1 ) ライセンス: Link先を確認	Rajesh Kumar and Can Isik and Vir V Phoha	(参考訳) 本研究では,Treadmill Assisted Gait Spoofing (TAGS) がWearable Sensor-based Gait Authentication (WSGait) に与える影響を検討する。我々は,加速度センサと固定された機能のセットのみに焦点を当てた,以前の研究よりも現実的な実装と展開のシナリオを検討する。具体的には、WSGaitの実装が1つ以上のセンサーを現代のスマートフォンに組み込むことができる状況について考察する。さらに、異なる機能セットや異なる分類アルゴリズム、あるいはその両方を使うこともできる。さまざまなセンサー、機能セット(相互情報によってランク付けされる)、および6つの異なる分類アルゴリズムが使用されているにもかかわらず、TAGSは平均FAR(False Accept Rate)を4%から26%に向上することができた。このような平均的なFARの大幅な増加、特に本研究で考慮された厳格な実装とデプロイメントのシナリオの下では、WSGaitの公開デプロイ前の評価設計に関するさらなる調査が求められている。 In this work, we examine the impact of Treadmill Assisted Gait Spoofing (TAGS) on Wearable Sensor-based Gait Authentication (WSGait). We consider more realistic implementation and deployment scenarios than the previous study, which focused only on the accelerometer sensor and a fixed set of features. Specifically, we consider the situations in which the implementation of WSGait could be using one or more sensors embedded into modern smartphones. Besides, it could be using different sets of features or different classification algorithms, or both. Despite the use of a variety of sensors, feature sets (ranked by mutual information), and six different classification algorithms, TAGS was able to increase the average False Accept Rate (FAR) from 4% to 26%. Such a considerable increase in the average FAR, especially under the stringent implementation and deployment scenarios considered in this study, calls for a further investigation into the design of evaluations of WSGait before its deployment for public use.	翻訳日:2021-05-02 07:12:24 公開日:2020-12-17
# ゼロショットモデル選択による音声強調 Speech Enhancement with Zero-Shot Model Selection ( http://arxiv.org/abs/2012.09359v1 ) ライセンス: Link先を確認	Ryandhimas E. Zezario, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao	(参考訳) 音声強調(SE)に関する最近の研究は、深層学習に基づく手法の出現を目にしている。多様なテスト条件下でSEの一般化性を高める効果的な方法を決定することは依然として難しい課題である。本稿では,ゼロショット学習とアンサンブル学習を組み合わせることで,se性能の一般化を促進するためのゼロショットモデル選択(zmos)手法を提案する。提案手法はオフラインとオンラインの2つのフェーズで実現されている。オフラインフェーズでは、トレーニングデータのセット全体を複数のサブセットにクラスタし、各サブセットで専用のseモデル(コンポーネントseモデルと呼ばれる)をトレーニングする。オンラインフェーズは、拡張を行うのに最も適したコンポーネントSEモデルを選択する。品質スコア(QS)に基づく選択と品質埋め込み(QE)に基づく選択の2つの選択戦略が開発されている。 qsとqeはいずれも、非侵入的品質評価ネットワークであるquality-netによって得られる。オフラインフェーズでは、トレーニングデータをクラスタにグループ化するために、トレーニング発話のqsまたはqeを使用する。オンラインフェーズでは、テスト発話のQSまたはQEを使用して、適切なコンポーネントSEモデルを特定し、テスト発話の強化を行う。実験結果から,提案手法の有効性を示唆するベースラインシステムと比較して,zmos法が観測されたノイズタイプと未検出ノイズタイプの両方において,より優れた性能が得られることを確認した。 Recent research on speech enhancement (SE) has seen the emergence of deep learning-based methods. It is still a challenging task to determine effective ways to increase the generalizability of SE under diverse test conditions. In this paper, we combine zero-shot learning and ensemble learning to propose a zero-shot model selection (ZMOS) approach to increase the generalization of SE performance. The proposed approach is realized in two phases, namely offline and online phases. The offline phase clusters the entire set of training data into multiple subsets, and trains a specialized SE model (termed component SE model) with each subset. The online phase selects the most suitable component SE model to carry out enhancement. Two selection strategies are developed: selection based on quality score (QS) and selection based on quality embedding (QE). Both QS and QE are obtained by a Quality-Net, a non-intrusive quality assessment network. In the offline phase, the QS or QE of a train-ing utterance is used to group the training data into clusters. In the online phase, the QS or QE of the test utterance is used to identify the appropriate component SE model to perform enhancement on the test utterance. Experimental results have confirmed that the proposed ZMOS approach can achieve better performance in both seen and unseen noise types compared to the baseline systems, which indicates the effectiveness of the proposed approach to provide robust SE performance.	翻訳日:2021-05-02 07:12:06 公開日:2020-12-17
# グラスマン層を有する浅部ReLUネットワークを用いた低次モデリング Reduced Order Modeling using Shallow ReLU Networks with Grassmann Layers ( http://arxiv.org/abs/2012.09940v1 ) ライセンス: Link先を確認	Kayla Bollinger and Hayden Schaeffer	(参考訳) 本稿では,ニューラルネットワークを用いた方程式系の非線形モデル削減手法を提案する。ニューラルネットワークは、グラスマン多様体上の第1層と同一性に設定された第1活性化関数を持つ「3層」ネットワークであり、残りのネットワークは標準の2層ReLUニューラルネットワークである。グラスマン層は入力空間の低減基底を決定するが、残りの層は非線形入力出力系を近似する。トレーニングは減弱基底と非線形近似の学習を交互に行い、減弱基底の修正やネットワークのみのトレーニングよりも効果的であることが示されている。このアプローチのさらなる利点は、低次元の部分空間上にあるデータに対して、ネットワーク内のパラメータの数が大きくなる必要はないことである。本稿では,ニューラルネットワークの近似に適さないデータスカース方式の科学的問題に対して,本手法が適用可能であることを示す。例えば、非線形力学系の低次モデリングや、いくつかの航空宇宙工学の問題がある。 This paper presents a nonlinear model reduction method for systems of equations using a structured neural network. The neural network takes the form of a "three-layer" network with the first layer constrained to lie on the Grassmann manifold and the first activation function set to identity, while the remaining network is a standard two-layer ReLU neural network. The Grassmann layer determines the reduced basis for the input space, while the remaining layers approximate the nonlinear input-output system. The training alternates between learning the reduced basis and the nonlinear approximation, and is shown to be more effective than fixing the reduced basis and training the network only. An additional benefit of this approach is, for data that lie on low-dimensional subspaces, that the number of parameters in the network does not need to be large. We show that our method can be applied to scientific problems in the data-scarce regime, which is typically not well-suited for neural network approximations. Examples include reduced order modeling for nonlinear dynamical systems and several aerospace engineering problems.	翻訳日:2021-05-02 07:11:45 公開日:2020-12-17

Title

Authors

Abstract

論文公表日・翻訳日

# SU($N$)フェルミオンの3次元気体におけるボゾン化の証拠

Evidence for Bosonization in a three-dimensional gas of SU($N$) fermions ( http://arxiv.org/abs/1912.12105v3 )

ライセンス: Link先を確認

Bo Song, Yangqian Yan, Chengdong He, Zejian Ren, Qi Zhou and Gyu-Boong Jo

(参考訳) ボーソンとフェルミオンの境界をぼやかすことは、凝縮物物理学から原子、分子、光学物理学、高エネルギー物理学まで、様々な分野の興味深い量子現象の中心にある。そのような例の1つは、su($n$)対称性を持つ多成分フェルミ気体で、大きなn$極限においてスピンレスボソンのように振る舞うことが期待され、多くの内部状態はパウリの排他原理から制約を弱める。しかし、su($n$)フェルミオンのボゾン化は、正確な解が存在しない高次元において決して確立されていない。ここでは, SU($N$) フェルミオンイッテルビウムガス中のボゾン化の直接的証拠を3次元でチューナブルな$N$で報告する(3D)。我々は、運動量分布から希薄な量子ガスを制御する中心的な量である接触を測定するとともに、スピン当たりの接触が、我々の理論的な予測と一致する低公準状態における1/N$スケールの定数に近づくことを発見した。このスケーリングは熱力学におけるフェルミオン統計の消滅する役割を意味し、単一の物理量を測定することによってボゾン化を検証することができる。我々の研究は、任意の一般次元における内部自由度を調整し、ボソニック統計とフェルミイオン統計を交換する、高度に制御可能な量子シミュレータを提供する。また、多成分量子系とそれに基づく接触対称性を探索する新たな経路も提案されている。

Blurring the boundary between bosons and fermions lies at the heart of a wide range of intriguing quantum phenomena in multiple disciplines, ranging from condensed matter physics and atomic, molecular and optical physics to high energy physics. One such example is a multi-component Fermi gas with SU($N$) symmetry that is expected to behave like spinless bosons in the large $N$ limit, where the large number of internal states weakens constraints from the Pauli exclusion principle. However, bosonization in SU($N$) fermions has never been established in high dimensions where exact solutions are absent. Here, we report direct evidence for bosonization in a SU($N$) fermionic ytterbium gas with tunable $N$ in three dimensions (3D). We measure contacts, the central quantity controlling dilute quantum gases, from the momentum distribution, and find that the contact per spin approaches a constant with a 1/$N$ scaling in the low fugacity regime consistent with our theoretical prediction. This scaling signifies the vanishing role of the fermionic statistics in thermodynamics, and allows us to verify bosonization through measuring a single physical quantity. Our work delivers a highly controllable quantum simulator to exchange the bosonic and fermionic statistics through tuning the internal degrees of freedom in any generic dimensions. It also suggests a new route towards exploring multi-component quantum systems and their underlying symmetries with contacts.

翻訳日:2023-06-09 23:36:28 公開日:2020-12-17

# IEEE 7010:人工知能の持つ意味を評価するための新しい標準

IEEE 7010: A New Standard for Assessing the Well-being Implications of Artificial Intelligence ( http://arxiv.org/abs/2005.06620v3 )

ライセンス: Link先を確認

Daniel S. Schiff, Aladdin Ayesh, Laura Musikanski, John C. Havens

(参考訳) 人工知能(AI)によって実現された製品やサービスは、日々の生活の基盤になりつつある。政府や企業はAIイノベーションの恩恵を享受したいと熱心に考えているが、これらの自律的でインテリジェントなシステムが人間の幸福に与える影響は、ますます深刻な問題になっている。本稿では、AIの社会的・倫理的意味に焦点をあてた最初の国際標準として、電気・電子工学研究所(IEEE)標準(Std)7010-2020 自律・知能システムの人間福祉への影響を評価するための推奨プラクティスを紹介する。 AIのライフサイクルを通じて幸福な要素を組み込むことは困難かつ緊急であり、IEEE 7010はこれらの技術を設計、デプロイ、調達する人々にとって重要なガイダンスを提供する。まず、ウェルビーイングを中心としたAIのアプローチの利点と、ウェルビーイングデータの計測から始める。次に、IEEE 7010の概要を紹介し、その鍵となる原則と、標準がAIコミュニティにおけるアプローチと視点にどのように関係しているかについて説明する。最後に、今後の取り組みがどこに必要かを示します。

Artificial intelligence (AI) enabled products and services are becoming a staple of everyday life. While governments and businesses are eager to enjoy the benefits of AI innovations, the mixed impact of these autonomous and intelligent systems on human well-being has become a pressing issue. This article introduces one of the first international standards focused on the social and ethical implications of AI: The Institute of Electrical and Electronics Engineering (IEEE) Standard (Std) 7010-2020 Recommended Practice for Assessing the Impact of Autonomous and Intelligent Systems on Human Well-being. Incorporating well-being factors throughout the lifecycle of AI is both challenging and urgent and IEEE 7010 provides key guidance for those who design, deploy, and procure these technologies. We begin by articulating the benefits of an approach for AI centered around well-being and the measurement of well-being data. Next, we provide an overview of IEEE 7010, including its key principles and how the standard relates to approaches and perspectives in place in the AI community. Finally, we indicate where future efforts are needed.

翻訳日:2023-05-20 22:18:30 公開日:2020-12-17

# コンテクストと非互換性

Contextuality versus Incompatibility ( http://arxiv.org/abs/2005.05124v3 )

ライセンス: Link先を確認

Andrei Khrennikov

(参考訳) 我々の目標は、量子物理学の文脈性の基本概念と非互換性を比較することである。文脈性という2つの異なる概念、すなわち Bohr-contextuality と Bell-contextuality を区別しなければならない。後者は非コンテキスト性(ベル型)の不等式に違反して運用的に定義される。このような文脈性は非互換性と比較される。量子可観測性に対して、非可換性のない文脈性は存在しないことを示すことは容易である。非互換性のないコンテキスト性とは何か? (「dry-residue」とは?) 一般にこれは非常に複雑な問題である。我々は4つの量子観測装置の文脈性に集中した。 chsh-scenarios (natural quantum observables) では文脈性が非可逆性に還元されることを示した。しかし、一般に、非互換性のない文脈性は、いくつかの物理的内容を持つかもしれない。不適合性から文脈性成分を抽出する数学的制約を見出した。しかし、この制約の物理的意味は明確ではない。付録1では、ボーアの相補性原理に基づく別の文脈性について簡潔に論じ、これは文脈性-相補性原理として扱われる。ボーアコンテキスト性は量子基盤において重要な役割を果たす。非互換は、事実、ボーアの文脈性の結果である。最後に、認知心理学や意思決定など物理学以外の分野において、非互換性を清めるベル・コンテクチュアリティが重要な役割を担っていることを述べる。

Our aim is to compare the fundamental notions of quantum physics - contextuality vs. incompatibility. One has to distinguish two different notions of contextuality, {\it Bohr-contextuality} and {\it Bell-contextuality}. The latter is defined operationally via violation of noncontextuality (Bell type) inequalities. This sort of contextuality will be compared with incompatibility. It is easy to show that, for quantum observables, there is {\it no contextuality without incompatibility.} The natural question arises: What is contextuality without incompatibility? (What is "dry-residue"?) Generally this is the very complex question. We concentrated on contextuality for four quantum observables. We shown that in the CHSH-scenarios (for "natural quantum observables") {\it contextuality is reduced to incompatibility.} However, generally contextuality without incompatibility may have some physical content. We found a mathematical constraint extracting the contextuality component from incompatibility. However, the physical meaning of this constraint is not clear. In appendix 1, we briefly discuss another sort of contextuality based on the Bohr's complementarity principle which is treated as the {\it contextuality-incompatibility principle}. Bohr-contextuality plays the crucial role in quantum foundations. Incompatibility is, in fact, a consequence of Bohr-contextuality. Finally, we remark that outside of physics, e.g., in cognitive psychology and decision making Bell-contextuality cleaned of incompatibility can play the important role.

翻訳日:2023-05-20 19:58:24 公開日:2020-12-17

# 新型コロナウイルス(covid-19)の接触追跡とプライバシ - 意見と選好の研究

COVID-19 Contact Tracing and Privacy: Studying Opinion and Preferences ( http://arxiv.org/abs/2005.06056v2 )

ライセンス: Link先を確認

Lucy Simko (1, 2 and 3), Ryan Calo (2 and 4), Franziska Roesner (1, 2 and 3), Tadayoshi Kohno (1, 2 and 3) ((1) Security and Privacy Research Lab, University of Washington, (2) Tech Policy Lab, University of Washington, (3) Paul G. Allen School of Computer Science & Engineering, University of Washington, (4) School of Law, University of Washington)

(参考訳) 新型コロナウイルス(COVID-19)感染の可能性がある患者を、感染した人の接触を全て通知することで、特定するプロセスだ。政府、技術系企業、研究グループは、スマートフォン、iotデバイス、ウェアラブル技術が自動的に「密接な接触」を追跡し、個人のポジティブなテストの際の事前連絡先を識別する可能性を認識している。しかし、現在、効果的な技術ベースの接触追跡と個人のプライバシーの間の緊張関係について重要な公的な議論がある。そこで本研究では,接触者追跡とプライバシに焦点をあてたオンライン調査の結果について報告する。第1回調査は4月1日と3日に実施され,第2回調査を中心に報告した。結果は、世論の多様性を示し、新型コロナウイルスの感染拡大を抑えるためにテクノロジーをどのように活用するかについて、公衆の議論に伝えることができる。引き続き縦断測定を行っており、2020年5月8日のレポートバージョン1.0を参照して、このレポートを時間とともに更新する。 NOTE: 2020年12月4日現在、このレポートはarXiv:2012.01553で発見されたReport Version 2.0に取って代わられている。 Report Version 2.0を読み、引用してください。

There is growing interest in technology-enabled contact tracing, the process of identifying potentially infected COVID-19 patients by notifying all recent contacts of an infected person. Governments, technology companies, and research groups alike recognize the potential for smartphones, IoT devices, and wearable technology to automatically track "close contacts" and identify prior contacts in the event of an individual's positive test. However, there is currently significant public discussion about the tensions between effective technology-based contact tracing and the privacy of individuals. To inform this discussion, we present the results of a sequence of online surveys focused on contact tracing and privacy, each with 100 participants. Our first surveys were on April 1 and 3, and we report primarily on those first two surveys, though we present initial findings from later survey dates as well. Our results present the diversity of public opinion and can inform the public discussion on whether and how to leverage technology to reduce the spread of COVID-19. We are continuing to conduct longitudinal measurements, and will update this report over time; citations to this version of the report should reference Report Version 1.0, May 8, 2020. NOTE: As of December 4, 2020, this report has been superseded by Report Version 2.0, found at arXiv:2012.01553. Please read and cite Report Version 2.0 instead.

翻訳日:2023-05-20 11:40:03 公開日:2020-12-17

# 格子対称性を持つアーベル位相位相の結晶ゲージ場と量子化離散幾何応答

Crystalline gauge fields and quantized discrete geometric response for Abelian topological phases with lattice symmetry ( http://arxiv.org/abs/2005.10265v3 )

ライセンス: Link先を確認

Naren Manjunath, Maissam Barkeshli

(参考訳) 連続体内のクリーン等方性量子ホール流体は、ホール伝導率、シフト、ホール粘度などの対称性で保護された量子化された不変量を持つ。ここでは、格子上で定義される位相相に対する対称性保護量子化不変量の理論を展開する。離散結晶ゲージ場を用いた位相場理論を開発し、(2+1)次元アーベル位相次数の量子化不変量を完全に特徴付け、対称群 $g = u(1) \times g_{\text{space}}$, ここで $g_{\text{space}}$ は、格子上の配向保存空間群対称性からなる。離散回転および並進対称性分数化は、離散スピンベクトル、連続体にアナログを持たない離散トーションベクトル、格子回転対称性がない領域ベクトル、また連続体にもアナログを持たない領域ベクトルによって特徴づけられることを示す。離散トーションベクトルは結晶運動量分数化の一種であり、これは2$, $3$, 4$-fold 回転対称性に対して非自明である。量子化トポロジカル応答理論は、偏光と角に分数電荷を結合するシフトの離散バージョン、偏光の分数量子化された角運動量、回転対称な分数電荷分極とその角運動量、単位セルあたりの電荷と角運動量に対する制約、および転位と面積の単位に束縛された量化運動量を含む。分数量子化された電荷偏極は、2ドル、3ドル、4ドルの回転対称性を持つ格子上でのみ自明であり、格子の転位に縛られた分数電荷と、境界に沿った単位長さ当たりの分数電荷を意味する。重要な役割は、格子の点群対称性に依存するバーガースベクトル上の有限群階数によって演じられる。

Clean isotropic quantum Hall fluids in the continuum possess a host of symmetry-protected quantized invariants, such as the Hall conductivity, shift and Hall viscosity. Here we develop a theory of symmetry-protected quantized invariants for topological phases defined on a lattice, where quantized invariants with no continuum analog can arise. We develop topological field theories using discrete crystalline gauge fields to fully characterize quantized invariants of (2+1)D Abelian topological orders with symmetry group $G = U(1) \times G_{\text{space}}$, where $G_{\text{space}}$ consists of orientation-preserving space group symmetries on the lattice. We show how discrete rotational and translational symmetry fractionalization can be characterized by a discrete spin vector, a discrete torsion vector which has no analog in the continuum or in the absence of lattice rotation symmetry, and an area vector, which also has no analog in the continuum. The discrete torsion vector implies a type of crystal momentum fractionalization that is only non-trivial for $2$, $3$, and $4$-fold rotation symmetry. The quantized topological response theory includes a discrete version of the shift, which binds fractional charge to disclinations and corners, a fractionally quantized angular momentum of disclinations, rotationally symmetric fractional charge polarization and its angular momentum counterpart, constraints on charge and angular momentum per unit cell, and quantized momentum bound to dislocations and units of area. The fractionally quantized charge polarization, which is non-trivial only on a lattice with $2$, $3$, and $4$-fold rotation symmetry, implies a fractional charge bound to lattice dislocations and a fractional charge per unit length along the boundary. An important role is played by a finite group grading on Burgers vectors, which depends on the point group symmetry of the lattice.

翻訳日:2023-05-19 05:41:18 公開日:2020-12-17

# 非マルコフ開量子系による高忠実テレポーテーションの実験的実現

Experimental realization of high-fidelity teleportation via non-Markovian open quantum system ( http://arxiv.org/abs/2007.01318v2 )

ライセンス: Link先を確認

Zhao-Di Liu, Yong-Nan Sun, Bi-Heng Liu, Chuan-Feng Li, Guang-Can Guo, Sina Hamedani Raja, Henri Lyyra, Jyrki Piilo

(参考訳) オープン量子系とデコヒーレンスの研究は、量子物理現象の基本的な理解にとって重要である。現実的な目的のために、量子資源を利用する多くの量子プロトコル(例えば絡み合い)が存在し、古典的な方法で達成できるものを超えることができる。我々は、オープン量子システムと量子情報科学の概念を結合し、非マルコフ開システムを介して量子プロトコルを効率的に実装できることを実証する実験実験を行う。 The results show that, at the time of implementation of the protocol, it is not necessary to have the quantum resource in the degree of freedom used for the basic protocol -- as long as there exists some other degree of freedom, or environment of an open system, which contains useful resources. The experiment is based on a pair of photons, where their polarizations act as open system qubits and frequencies as their environments -- while the path degree of freedom of one of the photons represents the state of Alice's qubit to be teleported to Bob's polarization qubit.

Open quantum systems and study of decoherence are important for our fundamental understanding of quantum physical phenomena. For practical purposes, there exists a large number of quantum protocols exploiting quantum resources, e.g. entanglement, which allows to go beyond what is possible to achieve by classical means. We combine concepts from open quantum systems and quantum information science, and give a proof-of-principle experimental demonstration -- with teleportation -- that it is possible to implement efficiently a quantum protocol via non-Markovian open system. The results show that, at the time of implementation of the protocol, it is not necessary to have the quantum resource in the degree of freedom used for the basic protocol -- as long as there exists some other degree of freedom, or environment of an open system, which contains useful resources. The experiment is based on a pair of photons, where their polarizations act as open system qubits and frequencies as their environments -- while the path degree of freedom of one of the photons represents the state of Alice's qubit to be teleported to Bob's polarization qubit.

翻訳日:2023-05-11 20:38:39 公開日:2020-12-17

# グリーンアルゴリズム:計算の炭素フットプリントの定量化

Green Algorithms: Quantifying the carbon footprint of computation ( http://arxiv.org/abs/2007.07610v5 )

ライセンス: Link先を確認

Lo\"ic Lannelongue, Jason Grealey and Michael Inouye

(参考訳) 気候変動は、人間社会、経済、健康など、地球上の生命のほぼ全ての側面に大きな影響を与えている。様々な人間の活動は、データセンターやその他の大規模計算の源を含む温室効果ガスの排出に責任がある。高性能コンピューティングの発展により、多くの重要な科学的マイルストーンが達成されているが、環境への影響は過小評価されている。本稿では,処理時間,計算コアの種類,利用可能なメモリ,計算施設の効率と位置に基づいて,計算タスクの炭素フットプリントを標準化された信頼性の高い方法で推定するための方法論的枠組みを提案する。温室効果ガスの排出を解釈し、コンテクスト化するための指標が定義されており、車や飛行機が移動する同等の距離や、炭素の隔離に必要な木月数が含まれる。我々は、ユーザが計算のカーボンフットプリントを見積り、報告できる無料のオンラインツールであるgreen algorithms(www.green-algorithms.org)を開発した。 Green Algorithmsツールは、最小限の情報を必要とするため計算処理と容易に統合でき、既存のコードに干渉せず、幅広いCPU、GPU、クラウドコンピューティング、ローカルサーバー、デスクトップコンピュータも考慮している。最後に,グリーンアルゴリズムを適用し,粒子物理シミュレーション,天気予報,自然言語処理などに用いるアルゴリズムの温室効果ガス排出量を定量化する。本研究は, ほぼ任意の計算の炭素フットプリントを定量化するための, 単純な一般化可能なフレームワークと自由利用可能なツールを開発する。不要なCO2排出量を最小化するための一連の勧告と組み合わさって、認識を高め、よりグリーンな計算を容易にしたいと思っています。

Climate change is profoundly affecting nearly all aspects of life on earth, including human societies, economies and health. Various human activities are responsible for significant greenhouse gas emissions, including data centres and other sources of large-scale computation. Although many important scientific milestones have been achieved thanks to the development of high-performance computing, the resultant environmental impact has been underappreciated. In this paper, we present a methodological framework to estimate the carbon footprint of any computational task in a standardised and reliable way, based on the processing time, type of computing cores, memory available and the efficiency and location of the computing facility. Metrics to interpret and contextualise greenhouse gas emissions are defined, including the equivalent distance travelled by car or plane as well as the number of tree-months necessary for carbon sequestration. We develop a freely available online tool, Green Algorithms (www.green-algorithms.org), which enables a user to estimate and report the carbon footprint of their computation. The Green Algorithms tool easily integrates with computational processes as it requires minimal information and does not interfere with existing code, while also accounting for a broad range of CPUs, GPUs, cloud computing, local servers and desktop computers. Finally, by applying Green Algorithms, we quantify the greenhouse gas emissions of algorithms used for particle physics simulations, weather forecasts and natural language processing. Taken together, this study develops a simple generalisable framework and freely available tool to quantify the carbon footprint of nearly any computation. Combined with a series of recommendations to minimise unnecessary CO2 emissions, we hope to raise awareness and facilitate greener computation.

翻訳日:2023-05-09 09:13:12 公開日:2020-12-17

# 一般量子エラーに対する十分条件と制約

Sufficient Conditions and Constraints for Reversing General Quantum Errors ( http://arxiv.org/abs/2007.11083v2 )

ライセンス: Link先を確認

Alvin Gonzales, Daniel Dilley, Mark S. Byrd

(参考訳) 例えば誤り訂正のように、量子進化の効果を逆転することは、信頼できる量子デバイスを生成するために量子システムを制御する上で重要なタスクである。進化が完全正の写像によって制御されるとき、量子誤差補正符号条件(quantum error correcting code conditions)と呼ばれる可逆性条件が存在し、これは部分空間、すなわち符号空間上の量子演算の可逆性に必要かつ十分な条件である。しかし、進化が完全に正の写像によって説明されないと仮定すると、必要十分条件は分かっていない。ここでは、必ずしも完全正の写像に対応しない進化を考える。我々は、完全に正のマップ誤り訂正符号条件が、マップの領域にない符号空間につながり得ることを証明し、つまり、マップの出力は正でないことを示す。我々の定理の補足は関連する例のクラスを提供する。最後に、正当性を確保しつつ、量子エラー訂正符号条件の使用を可能にする十分な条件のセットを提供する。

Reversing the effects of a quantum evolution, for example as is done in error correction, is an important task for controlling quantum systems in order to produce reliable quantum devices. When the evolution is governed by a completely positive map, there exist reversibility conditions, known as the quantum error correcting code conditions, which are necessary and sufficient conditions for the reversibility of a quantum operation on a subspace, the code space. However, if we suppose that the evolution is not described by a completely positive map, necessary and sufficient conditions are not known. Here we consider evolutions that do not necessarily correspond to a completely positive map. We prove that the completely positive map error correcting code conditions can lead to a code space that is not in the domain of the map, meaning that the output of the map is not positive. A corollary to our theorem provides a class of relevant examples. Finally, we provide a set of sufficient conditions that will enable the use of quantum error correcting code conditions while ensuring positivity.

翻訳日:2023-05-08 20:39:25 公開日:2020-12-17

# 非古典光による光電電流の増大

Enhancing photoelectric current by nonclassical light ( http://arxiv.org/abs/2008.03876v2 )

ライセンス: Link先を確認

Hai-Yan Yao, Sheng-Wen Li

(参考訳) 非古典光子統計を用いた駆動光による光電電流の研究を行う。非古典的な入力光子統計のため、駆動光を古典物理学のように平面波として扱うだけでは不十分である。 We make a quantum approach to study such problems, and find that: when the driving light starts from a coherent state as the initial state, our quantum treatment well returns the quasi-classical driving description; when the the driving light is a generic state with a certain P function, the full system dynamics can be reduced as the P function average of many "branches" -- in each dynamics branch, the driving light starts from a coherent state, thus again the system dynamics can be obtained in the above quasi-classical way. この量子的アプローチに基づいて、異なる光子統計が光電電流に違いをもたらすことが判明した。同じ光強度を持つ全ての古典光状態のうち、ポアソン統計量を持つ入力光が最大の光電電流を生成し、非古典的部分ポアソン光がこの古典的上界を超えることを証明した。

We study the photoelectric current generated by a driving light with nonclassical photon statistics. Due to the nonclassical input photon statistics, it is no longer enough to treat the driving light as a planar wave as in classical physics. We make a quantum approach to study such problems, and find that: when the driving light starts from a coherent state as the initial state, our quantum treatment well returns the quasi-classical driving description; when the the driving light is a generic state with a certain P function, the full system dynamics can be reduced as the P function average of many "branches" -- in each dynamics branch, the driving light starts from a coherent state, thus again the system dynamics can be obtained in the above quasi-classical way. Based on this quantum approach, it turns out the different photon statistics does make differences to the photoelectric current. Among all the classical light states with the same light intensity, we prove that the input light with Poisson statistics generates the largest photoelectric current, while a nonclassical sub-Poisson light could exceed this classical upper bound.

翻訳日:2023-05-06 16:15:59 公開日:2020-12-17

# 非エルミート的特異点の不確かさのハンティング

Hunting for the non-Hermitian exceptional points with fidelity susceptibility ( http://arxiv.org/abs/2009.07070v2 )

ライセンス: Link先を確認

Yu-Chin Tzeng, Chia-Yi Ju, Guang-Yin Chen, Wen-Min Huang

(参考訳) フィデリティ感受性は10年以上にわたってエルミート量子多体系の量子相転移を検知するために使われ、そこではフィデリティ感受性密度が熱力学的限界で$+\infty$に近づく。ここで、忠実性感受性$\chi$はヒルベルト空間の幾何学的構造を考慮して非エルミート量子系に一般化される。運動の計量方程式をスクラッチから解く代わりに、異点 (ep) でなければ、フィデリティが生物直交固有状態から成り、代数的または数値的に処理できるゲージを選んだ。 EP におけるヒルベルト空間幾何学の性質のため、$\chi$ が $-\infty$ に近づくと EP が見つかる。例えば、単純な$\mathcal{pt}$ symmetric $2\times2$ hamiltonian を単一のチューニングパラメータと非エルミート su-schriffer-heeger モデルで検討する。

The fidelity susceptibility has been used to detect quantum phase transitions in the Hermitian quantum many-body systems over a decade, where the fidelity susceptibility density approaches $+\infty$ in the thermodynamic limits. Here the fidelity susceptibility $\chi$ is generalized to non-Hermitian quantum systems by taking the geometric structure of the Hilbert space into consideration. Instead of solving the metric equation of motion from scratch, we chose a gauge where the fidelities are composed of biorthogonal eigenstates and can be worked out algebraically or numerically when not on the exceptional point (EP). Due to the properties of the Hilbert space geometry at EP, we found that EP can be found when $\chi$ approaches $-\infty$. As examples, we investigate the simplest $\mathcal{PT}$ symmetric $2\times2$ Hamiltonian with a single tuning parameter and the non-Hermitian Su-Schriffer-Heeger model.

翻訳日:2023-05-02 04:27:50 公開日:2020-12-17

# 同時メッセージパッシングモデルにおける光量子通信複雑性

Optical quantum communication complexity in the simultaneous message passing model ( http://arxiv.org/abs/2010.03195v2 )

ライセンス: Link先を確認

Ashutosh Marwah and Dave Touchette

(参考訳) 古典的なプロトコルの通信コストは通常、通信されるビット数によって測定され、プロトコルの通信に要する時間を決定する。同様に、有限次元の量子状態を使用する量子通信プロトコルの場合、通信コストは、通信される量子ビットの数で測定される。しかし、量子物理学では、通信プロトコルには光学量子状態のような無限次元の状態も使用できる。通信中に送信される(等価な)キュービット数のカウントに基づく通信コスト測定は、無限次元状態を使用するプロトコルのコストを直接測定することはできない。さらに、そのような量子ビットベースの通信コストを用いて無限次元プロトコルの物理的性質を推測することはできない。本稿では,無限次元プロトコルにおける物理資源の成長を理解するための枠組みを提供する。具体性のために光学プロトコルに焦点をあてる。通信に必要な時間と通信中のエネルギーは、そのようなプロトコルの重要な物理資源として識別される。光プロトコルでは、通信に必要な時間は、ある相手から別の相手へ送信されるタイムビンモードの数によって決定される。送信されるメッセージの平均光子数は、プロトコルにおける通信に必要なエネルギーを決定する。この2つの量の成長と問題の大きさの増大とのトレードオフが低いことを証明している。このようなトレードオフ関係を光量子通信複雑性関係と呼ぶ。

The communication cost of a classical protocol is typically measured in terms of the number of bits communicated for this determines the time required for communication during the protocol. Similarly, for quantum communication protocols, which use finite-dimensional quantum states, the communication cost is measured in terms of the number of qubits communicated. However, in quantum physics, one can also use infinite-dimensional states, like optical quantum states, for communication protocols. Communication cost measures based on counting the (equivalent) number of qubits transmitted during communication cannot be directly used to measure the cost of such protocols, which use infinite-dimensional states. Moreover, one cannot infer any physical property of infinite-dimensional protocols using such qubit based communication costs. In this paper, we provide a framework to understand the growth of physical resources in infinite-dimensional protocols. We focus on optical protocols for the sake of concreteness. The time required for communication and the energy expended during communication are identified as the important physical resources of such protocols. In an optical protocol, the time required for communication is determined by the number of time-bin modes that are transmitted from one party to another. The mean photon number of the messages sent determines the energy required during communication in the protocol. We prove a lower bound on the tradeoff between the growth of these two quantities with the growth of the problem size. We call such tradeoff relations optical quantum communication complexity relations.

翻訳日:2023-04-29 18:06:19 公開日:2020-12-17

# 安定化形式における自己テスト的極大次元真に絡み合った部分空間

Self-testing maximally-dimensional genuinely entangled subspaces within the stabilizer formalism ( http://arxiv.org/abs/2012.01164v2 )

ライセンス: Link先を確認

Owidiusz Makuta and Remigiusz Augusiak

(参考訳) 自己検査はもともと、絡み合った量子状態とそれらの上で実行される局所測定をデバイスに依存しない認証方法として導入された。近年, [f] では, Baccari \textit{et al。 }, arXiv:2003.02285] 状態自己テストの概念は、絡み合った部分空間に一般化され、模範的な真の絡み合った部分空間に対する最初の自己テスト戦略が与えられた。私たちの研究の主な目的は、この一連の研究を追求し、(次元の観点から)いかに「大きな」が真に絡み合った部分空間であり、それが自己テストされ、マルチキュービット安定化形式に集中しているかという問題に対処することである。この目的のために、まず、与えられた安定化部分空間が真に絡み合っているかどうかを効率的にチェックできるフレームワークを導入する。その上で、安定化部分空間内で構成できる真に絡み合った部分空間の最大次元を決定し、そのような極大次元部分空間を任意の数 qubit に対して例示的に構成する。第3に、ベルの不等式は、それらの部分空間からの絡み合った状態によって最大に破られ、従ってそれらをサポートする任意の混合状態であり、これらの不等式が自己テストに有用であることを示す。興味深いことに、ベルの不等式は、全ての観測者が2つの二コトミック測定を行う最も単純なマルチパーティイトベルシナリオにおいて、量子相関の集合の境界における高次元の顔構造を識別することができる。

Self-testing was originally introduced as a device-independent method of certification of entangled quantum states and local measurements performed on them. Recently, in [F. Baccari \textit{et al.}, arXiv:2003.02285] the notion of state self-testing has been generalized to entangled subspaces and the first self-testing strategies for exemplary genuinely entangled subspaces have been given. The main aim of our work is to pursue this line of research and to address the question how "large" (in terms of dimension) are genuinely entangled subspaces that can be self-tested, concentrating on the multiqubit stabilizer formalism. To this end, we first introduce a framework allowing to efficiently check whether a given stabilizer subspace is genuinely entangled. Building on it, we then determine the maximal dimension of genuinely entangled subspaces that can be constructed within the stabilizer subspaces and provide an exemplary construction of such maximally-dimensional subspaces for any number of qubits. Third, we construct Bell inequalities that are maximally violated by any entangled state from those subspaces and thus also any mixed states supported on them, and we show these inequalities to be useful for self-testing. Interestingly, our Bell inequalities allow for identification of higher-dimensional face structures in the boundaries of the sets of quantum correlations in the simplest multipartite Bell scenarios in which every observer performs two dichotomic measurements.

翻訳日:2023-04-22 07:56:15 公開日:2020-12-17

# 2つの並列光学ナノファイバーの誘導正規モードにおける磁場の空間分布

Spatial distributions of the fields in guided normal modes of two coupled parallel optical nanofibers ( http://arxiv.org/abs/2012.06078v2 )

ライセンス: Link先を確認

Fam Le Kien, Lewis Ruks, Sile Nic Chormaic, and Thomas Busch

(参考訳) 2つの並列光ナノファイバーの誘導正規モードにおけるフィールドの断面形状と空間分布について検討した。 2つの同一ナノファイバーの誘導正規モードにおける磁場成分の分布は、繊維の断面面における半径主軸と接主軸に対して対称または非対称であることを示す。主軸に対する磁場成分の対称性は電場成分の対称性とは反対である。例えば、$\mathcal{e}_z$-cosineモードでさえ、繊維間の電気的強度分布が支配的であり、2ファイバー中心にサドル点があることを示している。一方、奇数$\mathcal{E}_z$-sineモードの場合、二つのファイバー中心における電気強度分布は、ちょうど0の局所最小値に達する。その結果,ファイバー分離距離が小さく,ファイバー半径が小さいか,光波長が大きい場合,結合モード理論と正確なモード理論の差が大きいことがわかった。 2つのナノファイバーが同一でない場合、その強度分布は半径主軸に対して対称であり、接する主軸に対して非対称であることを示す。

We study the cross-sectional profiles and spatial distributions of the fields in guided normal modes of two coupled parallel optical nanofibers. We show that the distributions of the components of the field in a guided normal mode of two identical nanofibers are either symmetric or antisymmetric with respect to the radial principal axis and the tangential principal axis in the cross-sectional plane of the fibers. The symmetry of the magnetic field components with respect to the principal axes is opposite to that of the electric field components. We show that, in the case of even $\mathcal{E}_z$-cosine modes, the electric intensity distribution is dominant in the area between the fibers, with a saddle point at the two-fiber center. Meanwhile, in the case of odd $\mathcal{E}_z$-sine modes, the electric intensity distribution at the two-fiber center attains a local minimum of exactly zero. We find that the differences between the results of the coupled mode theory and the exact mode theory are large when the fiber separation distance is small and either the fiber radius is small or the light wavelength is large. We show that, in the case where the two nanofibers are not identical, the intensity distribution is symmetric about the radial principal axis and asymmetric about the tangential principal axis.

翻訳日:2023-04-21 03:33:49 公開日:2020-12-17

# 高スピン核の浴と結合した量子ドット電子スピンの駆動動力学

Driven dynamics of a quantum dot electron spin coupled to bath of higher-spin nuclei ( http://arxiv.org/abs/2012.07227v2 )

ライセンス: Link先を確認

Arian Vezvaee, Girish Sharma, Sophia E. Economou, and Edwin Barnes

(参考訳) 量子ドットに閉じ込められた電子とその周囲の核スピン環境の間の光駆動と超微細な相互作用の相互作用は、モード同期のような興味深い物理学を生み出す。本研究では、核スピンのユビキタススピン1/2近似を超えて、任意のスピンの核スピン浴に結合した自己集合量子ドットにおける光駆動電子スピンの包括的な理論的枠組みを示す。動的平均場法を用いて、四極子カップリングの有無にかかわらず核スピン分極分布を計算する。超微細相互作用は動的核分極とモードロックを促進するが、四極結合はこれらの効果に反する。これらの機構間の張力は定常状態の電子スピン発展にインプリントされ、量子ドットにおける四極子相互作用の重要性を測定する方法を提供する。その結果、四極子相互作用のような高スピン効果は、動的核偏極の発生とそれが電子スピンの進化に与える影響に大きな影響を与えることが示された。

The interplay of optical driving and hyperfine interaction between an electron confined in a quantum dot and its surrounding nuclear spin environment produces a range of interesting physics such as mode-locking. In this work, we go beyond the ubiquitous spin 1/2 approximation for nuclear spins and present a comprehensive theoretical framework for an optically driven electron spin in a self-assembled quantum dot coupled to a nuclear spin bath of arbitrary spin. Using a dynamical mean-field approach, we compute the nuclear spin polarization distribution with and without the quadrupolar coupling. We find that while hyperfine interactions drive dynamic nuclear polarization and mode-locking, quadrupolar couplings counteract these effects. The tension between these mechanisms is imprinted on the steady-state electron spin evolution, providing a way to measure the importance of quadrupolar interactions in a quantum dot. Our results show that higher-spin effects such as quadrupolar interactions can have a significant impact on the generation of dynamic nuclear polarization and how it influences the electron spin evolution.

翻訳日:2023-04-20 21:31:16 公開日:2020-12-17

# リング共振器を用いた超伝導量子プロセッサの長距離接続

Long-range connectivity in a superconducting quantum processor using a ring resonator ( http://arxiv.org/abs/2012.09463v1 )

ライセンス: Link先を確認

Sumeru Hazra, Anirban Bhattacharjee, Madhavi Chand, Kishor V. Salunkhe, Sriram Gopalakrishnan, Meghan P. Patankar and R. Vijay

(参考訳) 量子コヒーレンスとゲート忠実度は通常、量子プロセッサを特徴づける上で最も重要な2つの指標とみなされる。同様に重要なメトリックは、ゲート数を最小限に抑え、エラーの少ないアルゴリズムを効率的に実装できるため、ビット間接続である。しかし、超伝導プロセッサの量子ビット間接続は、物理的実現の実際的な制約のため、近隣に限られる傾向にある。本稿では,リング共振器を多経路結合素子とし,その周囲に一様分布する量子ビットを持つ新しい超伝導構造を提案する。我々の平面設計は、さらなる製造の複雑さを伴わずに、 art超伝導プロセッサの状態の接続性を大幅に向上させる。理論的には、量子ビット接続を解析し、各量子ビットが他の9つの量子ビットに接続可能な最大12個の量子ビットをサポートする装置で実験的に検証する。我々の概念はスケーラブルで、他のプラットフォームに適用可能であり、量子コンピューティング、アニール、シミュレーション、エラー修正の進歩を著しく加速する可能性がある。

Qubit coherence and gate fidelity are typically considered the two most important metrics for characterizing a quantum processor. An equally important metric is inter-qubit connectivity as it minimizes gate count and allows implementing algorithms efficiently with reduced error. However, inter-qubit connectivity in superconducting processors tends to be limited to nearest neighbour due to practical constraints in the physical realization. Here, we introduce a novel superconducting architecture that uses a ring resonator as a multi-path coupling element with the qubits uniformly distributed throughout its circumference. Our planar design provides significant enhancement in connectivity over state of the art superconducting processors without any additional fabrication complexity. We theoretically analyse the qubit connectivity and experimentally verify it in a device capable of supporting up to twelve qubits where each qubit can be connected to nine other qubits. Our concept is scalable, adaptable to other platforms and has the potential to significantly accelerate progress in quantum computing, annealing, simulations and error correction.

翻訳日:2023-04-20 08:46:33 公開日:2020-12-17

# インターフェロメトリスキームにおける決定論的量子相関

Deterministic quantum correlation in an interferometric scheme ( http://arxiv.org/abs/2012.09387v1 )

ライセンス: Link先を確認

Byoung S. Ham

(参考訳) 過去数十年にわたり、自発的パラメトリックダウン変換過程による非線形光学材料である \c{hi}^((2)) から生成される絡み合った光子対は、ベルの不等式違反や反相関といった様々な量子相関に対して集中的に研究されてきた。 mach-zehnder干渉計では、フォトニック・ド・ブロイの波長が標準量子限界を超えた位相分解能を持つ量子センシングでも研究されている。ここで, 量子性の基本原理は, 微視的状態における二部共役光子対だけでなく, マクロコヒーレンス絡み生成のための制御可能な量子相関のための干渉計方式で検討される。

Over the last several decades, entangled photon pairs generated from \c{hi}^((2)) nonlinear optical materials via spontaneous parametric down conversion processes have been intensively studied for various quantum correlations such as Bell inequality violation and anticorrelation. In a Mach-Zehnder interferometer, the photonic de Broglie wavelength has also been studied for quantum sensing with an enhanced phase resolution overcoming the standard quantum limit. Here, the fundamental principles of quantumness are investigated in an interferometric scheme for controllable quantum correlation not only for bipartite entangled photon pairs in a microscopic regime, but also for macroscopic coherence entanglement generation.

翻訳日:2023-04-20 08:45:10 公開日:2020-12-17

# KHOVID:デジタルコンタクトトレーシングを保護した相互運用可能なプライバシー

KHOVID: Interoperable Privacy Preserving Digital Contact Tracing ( http://arxiv.org/abs/2012.09375v1 )

ライセンス: Link先を確認

Xiang Cheng, Hanchao Yang, Archanaa S Krishnan, Patrick Schaumont and Yaling Yang

(参考訳) パンデミックの間、接触追跡は集団内の感染率を下げるための重要な手段である。手間のかかる手動接触追跡処理を加速するために、デジタル接触追跡(DCT)ツールは、ユビキタス携帯電話のセンシングおよび信号機能を用いて、透明かつプライベートに接触イベントを追跡することができる。しかし、効果的なDCTは、ユーザのプライバシーを守るだけでなく、既存の手動接触追跡プロセスを強化する必要がある。実際、人口の全員が携帯電話を所有したり、DCTアプリをインストールして有効にしたりできるわけではない。 KHOVIDは、手動接触追跡相互運用性とDCTユーザのプライバシを両立させる。 KHOVIDのコアは、位置情報データを使用してユーザトラジェクトリをエンコードするプライバシーフレンドリなメカニズムである。手動接触追跡データは、同じ位置情報フォーマットで統合することができる。本稿では,DCTからの位置情報データの精度をBluetooth近接検出により向上させ,Bluetooth短命IDを符号化する新しい手法を提案する。このコントリビューションでは、KHOVIDの詳細な設計、アプリケーションとサーバソフトウェアを含むプロトタイプの実装、シミュレーションとフィールド実験に基づく検証が紹介されている。また,KHOVIDの長所と従来のDCTの長所を比較した。

During a pandemic, contact tracing is an essential tool to drive down the infection rate within a population. To accelerate the laborious manual contact tracing process, digital contact tracing (DCT) tools can track contact events transparently and privately by using the sensing and signaling capabilities of the ubiquitous cell phone. However, an effective DCT must not only preserve user privacy but also augment the existing manual contact tracing process. Indeed, not every member of a population may own a cell phone or have a DCT app installed and enabled. We present KHOVID to fulfill the combined goal of manual contact-tracing interoperability and DCT user privacy. At KHOVID's core is a privacy-friendly mechanism to encode user trajectories using geolocation data. Manual contact tracing data can be integrated through the same geolocation format. The accuracy of the geolocation data from DCT is improved using Bluetooth proximity detection, and we propose a novel method to encode Bluetooth ephemeral IDs. This contribution describes the detailed design of KHOVID; presents a prototype implementation including an app and server software; and presents a validation based on simulation and field experiments. We also compare the strengths of KHOVID with other, earlier proposals of DCT.

翻訳日:2023-04-20 08:44:27 公開日:2020-12-17

# ドープPPLNによる中赤外スペクトル非相関光子生成 : 理論的研究

Mid-infrared spectrally-uncorrelated biphotons generation from doped PPLN: a theoretical investigation ( http://arxiv.org/abs/2012.09352v1 )

ライセンス: Link先を確認

Bei Wei, Wu-Hao Cai, Chunling Ding, Guang-Wei Deng, Ryosuke Shimizu, Qiang Zhou, Rui-Bo Jin

(参考訳) MgOドープLN,ZnOドープLN,In2O3ドープZnLNのドーピング比0～7mol%を含むドーピングLN結晶を用いた自然パラメトリックダウンコンバージョン法により,中赤外スペクトル非相関二光子(MIR)の合成を理論的に検討した。位相整合関数の傾き角と対応するポーリング周期は、タイプii、タイプi、タイプ-0の位相整合条件で計算される。また, ドープLN結晶の熱特性と, 香港-奥羽-マンデル干渉における特性を計算した。ドーピング比はgvm(group-velocity-matching)の波長に大きな影響を与えることがわかった。特に、共ドープしたInZnLN結晶のGVM2波長は678.7nmであり、従来の温度調整法によって達成された100nm未満の波長よりもはるかに広い。ドーピング比は二光子状態を操作する自由度として利用できると結論することができる。スペクトル的に非相関なバイフォトンは、純粋な単一光子源と絡み合った光子源を作るのに使用することができ、これはMIR範囲での量子エンハンスドセンシング、イメージング、通信に有望な応用をもたらす可能性がある。

We theoretically investigate the preparation of mid-infrared (MIR) spectrally-uncorrelated biphotons from a spontaneous parametric down-conversion process using doped LN crystals, including MgO doped LN, ZnO doped LN, and In2O3 doped ZnLN with doping ratio from 0 to 7 mol%. The tilt angle of the phase-matching function and the corresponding poling period are calculated under type-II, type-I, and type-0 phase-matching conditions. We also calculate the thermal properties of the doped LN crystals and their performance in Hong-Ou-Mandel interference. It is found that the doping ratio has a substantial impact on the group-velocity-matching (GVM) wavelengths. Especially, the GVM2 wavelength of co-doped InZnLN crystal has a tunable range of 678.7 nm, which is much broader than the tunable range of less than 100 nm achieved by the conventional method of adjusting the temperature. It can be concluded that the doping ratio can be utilized as a degree of freedom to manipulate the biphoton state. The spectrally uncorrelated biphotons can be used to prepare pure single-photon source and entangled photon source, which may have promising applications for quantum-enhanced sensing, imaging, and communications at the MIR range.

翻訳日:2023-04-20 08:43:53 公開日:2020-12-17

# 箱の中の量子力学的粒子の運動量に関する新しい概念

A New Concept for the Momentum of a Quantum Mechanical Particle in a Box ( http://arxiv.org/abs/2012.09596v1 )

ライセンス: Link先を確認

M. H. Al-Hashimi and U.-J. Wiese

(参考訳) 箱の中の粒子に対して、演算子 $- i \partial_x$ はヘルミタンではない。運動量演算子 $p = p_r + i p_i$ は、自己共役作用素に拡張できるエルミート成分 $p_r$ と反エルミート成分 $i p_i$ を持つ。これにより、箱の内部に厳密に制限された粒子上での運動量の測定が記述される。

For a particle in a box, the operator $- i \partial_x$ is not Hermitean. We provide an alternative construction of a momentum operator $p = p_R + i p_I$, which has a Hermitean component $p_R$ that can be extended to a self-adjoint operator, as well as an anti-Hermitean component $i p_I$. This leads to a description of momentum measurements performed on a particle that is strictly limited to the interior of a box.

翻訳日:2023-04-20 08:35:24 公開日:2020-12-17

# 完全量子状態におけるエントロピー生成の数値的「正確な」シミュレーション:ボルツマンエントロピー対フォン・ノイマンエントロピー

Numerically "exact" simulations of entropy production in the fully quantum regime: Boltzmann entropy versus von Neumann entropy ( http://arxiv.org/abs/2012.09546v1 )

ライセンス: Link先を確認

Souichi Sakamoto and Yoshitaka Tanimura

(参考訳) 本研究では, 時間依存外力下で熱浴に結合した系の熱力学変数を, 数値計算による階層的運動方程式 (heom) から準静的ヘルムホルツエネルギーを用いて評価する手法を提案する。種々の温度で非マルコフ熱浴と強く結合したスピン系のエントロピーを計算した。その結果,外乱の変化が十分に緩やかに起こると,系は常に熱平衡に達した。そこで我々は,等温過程におけるボルツマンエントロピーとフォン・ノイマンエントロピーを計算し,HEOMに基づく準静電平衡系における内部エネルギー,熱,仕事などの様々な熱力学的変数を計算した。 We found that, although the characteristic features of the system entropies in the Boltzmann and von Neumann cases as a function of the system--bath coupling strength are similar, those for the total entropy production are completely different. The total entropy production in the Boltzmann case is always positive, whereas that in the von Neumann case becomes negative if we chose a thermal equilibrium state of the total system (an unfactorized thermal equilibrium state) as the initial state. This is because the total entropy production in the von Neumann case does not properly take into account the contribution of the entropy from the system--bath interaction. したがって、ボルツマンエントロピーは全量子状態におけるエントロピー生成を調べるために用いられる必要がある。最後に,jarzynski等式の適用性について検討した。

We present a scheme to evaluate thermodynamic variables for a system coupled to a heat bath under a time-dependent external force using the quasi-static Helmholtz energy from the numerically "exact" hierarchical equations of motion (HEOM). We computed the entropy produced by a spin system strongly coupled to a non-Markovian heat bath for various temperatures. We showed that when changes to the external perturbation occurred sufficiently slowly, the system always reached thermal equilibrium. Thus, we calculated the Boltzmann entropy and the von Neumann entropy for an isothermal process, as well as various thermodynamic variables, such as changes of internal energies, heat, and work, for a system in quasi-static equilibrium based on the HEOM. We found that, although the characteristic features of the system entropies in the Boltzmann and von Neumann cases as a function of the system--bath coupling strength are similar, those for the total entropy production are completely different. The total entropy production in the Boltzmann case is always positive, whereas that in the von Neumann case becomes negative if we chose a thermal equilibrium state of the total system (an unfactorized thermal equilibrium state) as the initial state. This is because the total entropy production in the von Neumann case does not properly take into account the contribution of the entropy from the system--bath interaction. Thus, the Boltzmann entropy must be used to investigate entropy production in the fully quantum regime. Finally, we examined the applicability of the Jarzynski equality.

翻訳日:2023-04-20 08:34:52 公開日:2020-12-17

# SZX計算における対角ゲートの一考察

A note on diagonal gates in SZX-calculus ( http://arxiv.org/abs/2012.09540v1 )

ライセンス: Link先を確認

Titouan Carette

(参考訳) この注記では、スケーラブルなzxh計算が、計算ベースで対角的な量子ゲートをコンパクトに表現するためにどのように用いられるかを記述する。これには制御および多制御Zゲート、一般化、グラフ演算子とハイパーグラフ演算子、位相ガジェットが含まれる。

This note describes how the the scalable ZXH calculus can be used to represent in a compact way the quantum gates that are diagonal in the computational basis. This includes controlled and multi-controlled Z gates, their generalizations, respectively graph and hypergraph operators, and also phase gadgets.

翻訳日:2023-04-20 08:34:27 公開日:2020-12-17

# PURE: 近接性に基づく接触追跡プロトコルの分析フレームワーク

PURE: A Framework for Analyzing Proximity-based Contact Tracing Protocols ( http://arxiv.org/abs/2012.09520v1 )

ライセンス: Link先を確認

Fabrizio Cicala, Weicheng Wang, Tianhao Wang, Ninghui Li, Elisa Bertino, Faming Liang, Yang Yang

(参考訳) 多くの近接型トレース(pct)プロトコルが提案され、covid-19の拡散に対抗するためにデプロイされている。本稿では,PCTプロトコルを解析するための体系的なアプローチを提案する。プライバシ,ユーティリティ,レジリエンス,効率(PURE)の4つの側面から,コンタクトトレース設計の望ましい特性のリストを抽出する。また、pctプロトコルの主な設計上の選択として、患者がサーバに報告する情報とマッチングを行う相手の2つを特定した。これら2つの選択肢はPUREプロパティの大部分を決定し、既存のプロトコルの包括的な分析と比較を可能にする。

Many proximity-based tracing (PCT) protocols have been proposed and deployed to combat the spreading of COVID-19. In this paper, we take a systematic approach to analyze PCT protocols. We identify a list of desired properties of a contact tracing design from the four aspects of Privacy, Utility, Resiliency, and Efficiency (PURE). We also identify two main design choices for PCT protocols: what information patients report to the server, and which party performs the matching. These two choices determine most of the PURE properties and enable us to conduct a comprehensive analysis and comparison of the existing protocols.

翻訳日:2023-04-20 08:34:08 公開日:2020-12-17

# 方向増幅のためのトポロジカル入力出力理論

Topological input-output theory for directional amplification ( http://arxiv.org/abs/2012.09488v1 )

ライセンス: Link先を確認

Tom\'as Ramos, Juan Jos\'e Garc\'ia-Ripoll, and Diego Porras

(参考訳) 指向性増幅器として機能するフォトニック駆動散逸格子の入出力関係に対する位相的アプローチを提案する。この理論は、光学的非エルミートカップリング行列から有効な位相絶縁体ハミルトニアンへの写像に依存する。この写像は、逆行列が系の線形入力出力応答を決定する非エルミート結合行列の特異値分解に基づいている。位相的に非自明なレジームでは、格子の入出力応答は位相絶縁体における零エネルギー状態と同値の特異値を持つ特異ベクトルによって支配され、コヒーレント入力信号の方向増幅に繋がる。このようなトポロジカル増幅方式では、ゲイン、帯域幅、付加雑音、雑音-信号比などの量子デバイスの増幅特性を完全に特徴付けることができる。我々は1次元の非相互フォトニック格子でアイデアを例示し、完全な解析的予測を導出する。方向増幅は量子制限に近く、利得は指数関数的に増加し、システムサイズは$N$となり、ノイズ-信号比は1/\sqrt{N}$として抑制される。これは、量子信号増幅と単一光子検出に対する我々の理論の興味深い応用を示している。

We present a topological approach to the input-output relations of photonic driven-dissipative lattices acting as directional amplifiers. Our theory relies on a mapping from the optical non-Hermitian coupling matrix to an effective topological insulator Hamiltonian. This mapping is based on the singular value decomposition of non-Hermitian coupling matrices, whose inverse matrix determines the linear input-output response of the system. In topologically non-trivial regimes, the input-output response of the lattice is dominated by singular vectors with zero singular values that are the equivalent of zero-energy states in topological insulators, leading to directional amplification of a coherent input signal. In such topological amplification regime, our theoretical framework allows us to fully characterize the amplification properties of the quantum device such as gain, bandwidth, added noise, and noise-to-signal ratio. We exemplify our ideas in a one-dimensional non-reciprocal photonic lattice, for which we derive fully analytical predictions. We show that the directional amplification is near quantum-limited with a gain growing exponentially with system size, $N$, while the noise-to-signal ratio is suppressed as $1/\sqrt{N}$. This points out to interesting applications of our theory for quantum signal amplification and single-photon detection.

翻訳日:2023-04-20 08:33:39 公開日:2020-12-17

# 増幅のための究極の量子限界:鏡の前の1つの原子

Ultimate quantum limit for amplification: a single atom in front of a mirror ( http://arxiv.org/abs/2012.09800v1 )

ライセンス: Link先を確認

Emely Wiegand, Ping-Yi Wen, Per Delsing, Io-Chun Hoi, Anton Frisk Kockum

(参考訳) 1次元半無限導波路の終端付近の原子に結合する光場に対する3種類の増幅過程について検討した。 3レベルアトムの裸または服装ベースでドライブが集団反転を生成する2つのセットアップと、駆動する2レベルアトムの高次プロセスによる増幅による1つのセットアップを考察する。いずれの場合も、導波路の端は光の鏡として機能する。これにより、オープン導波路における同じセットアップに比べて増幅が2つの方法で向上することがわかった。まず、ミラーは原子からの全ての出力を2つの出力チャネルに分割するのではなく、一方向に移動するように強制する。第二に、ミラーによる干渉により、原子内の異なる遷移に対する緩和率の比率の調整が可能となり、集団の反転が増加する。これらの要因により増幅の促進度を定量化し,超伝導量子回路を用いた実験において標準パラメータを示せることを示した。

We investigate three types of amplification processes for light fields coupling to an atom near the end of a one-dimensional semi-infinite waveguide. We consider two setups where a drive creates population inversion in the bare or dressed basis of a three-level atom and one setup where the amplification is due to higher-order processes in a driven two-level atom. In all cases, the end of the waveguide acts as a mirror for the light. We find that this enhances the amplification in two ways compared to the same setups in an open waveguide. Firstly, the mirror forces all output from the atom to travel in one direction instead of being split up into two output channels. Secondly, interference due to the mirror enables tuning of the ratio of relaxation rates for different transitions in the atom to increase population inversion. We quantify the enhancement in amplification due to these factors and show that it can be demonstrated for standard parameters in experiments with superconducting quantum circuits.

翻訳日:2023-04-20 08:26:52 公開日:2020-12-17

# 数値最適化のためのディープラーニングの性能について:タンパク質構造予測への応用

On the performance of deep learning for numerical optimization: an application to protein structure prediction ( http://arxiv.org/abs/2012.09741v1 )

ライセンス: Link先を確認

Hojjat Rakhshani, Lhassane Idoumghar, Soheila Ghambari, Julien Lepagnot, Mathieu Br\'evilliers

(参考訳) 深層ニューラルネットワークは最近、知覚タスクのための人工知能モデルの構築と評価にかなりの注意を払っている。本稿では,グローバル最適化問題に対処するためのディープラーニングモデルの性能について検討する。提案手法は,ニューラルネットワークを効率的に生成して解決するニューラルネットワーク探索(neural architecture search, nas)の考え方を採用している。ネットワークアーキテクチャの空間は有向非循環グラフを用いて表現され、新しい未知のタスクの目的関数を最適化する最良のアーキテクチャを見つけることを目的としている。 GPU計算負荷と長いトレーニング時間を備えた非常に大きなネットワークの提案とは異なり、私たちは、最高のアーキテクチャを見つけるための軽量な実装を探すことに重点を置いています。 NASの性能は、最初にCEC 2017ベンチマークスイートで実証実験によって分析される。その後、一連のタンパク質構造予測(psp)問題に適用される。実験の結果,手作業で設計したアルゴリズムと比較して,生成した学習モデルは競争力のある結果が得られることがわかった。

Deep neural networks have recently drawn considerable attention to build and evaluate artificial learning models for perceptual tasks. Here, we present a study on the performance of the deep learning models to deal with global optimization problems. The proposed approach adopts the idea of the neural architecture search (NAS) to generate efficient neural networks for solving the problem at hand. The space of network architectures is represented using a directed acyclic graph and the goal is to find the best architecture to optimize the objective function for a new, previously unknown task. Different from proposing very large networks with GPU computational burden and long training time, we focus on searching for lightweight implementations to find the best architecture. The performance of NAS is first analyzed through empirical experiments on CEC 2017 benchmark suite. Thereafter, it is applied to a set of protein structure prediction (PSP) problems. The experiments reveal that the generated learning models can achieve competitive results when compared to hand-designed algorithms; given enough computational budget

翻訳日:2023-04-20 08:26:37 公開日:2020-12-17

# 分散量子コンピューティングのためのコンパイラ設計

Compiler Design for Distributed Quantum Computing ( http://arxiv.org/abs/2012.09680v1 )

ライセンス: Link先を確認

Davide Ferrari, Angela Sara Cacciapuoti, Michele Amoretti and Marcello Caleffi

(参考訳) 分散量子コンピューティングアーキテクチャでは、Quantum Internetが提供するネットワークと通信機能により、単一のNISQデバイスが自分では処理できない計算処理を実行するために、遠隔量子処理ユニット(QPU)が通信や協調を行うことができる。この目的のために、分散量子コンピューティングは、任意の量子アルゴリズムを任意の分散量子コンピューティングアーキテクチャにマッピングするために、新しい世代の量子コンパイラを必要とする。本稿では,まず,分散量子コンピューティングのコンパイラ設計において生じる主な課題について述べる。そして、分散量子コンピューティングのための量子コンパイルによって引き起こされるオーバーヘッドの上限を解析的に導出する。導出された境界は、基礎となるコンピューティングアーキテクチャによって引き起こされるオーバーヘッドと、サブオプティマイズ量子コンパイラによって引き起こされる追加のオーバーヘッド、すなわち汎用性、効率性、効率性という3つの重要な特徴を達成するために、論文を通じて明確に設計されている。最後に,解析結果を検証し,広範な性能解析によりコンパイラ設計の有効性を確認する。

In distributed quantum computing architectures, with the network and communications functionalities provided by the Quantum Internet, remote quantum processing units (QPUs) can communicate and cooperate for executing computational tasks that single NISQ devices cannot handle by themselves. To this aim, distributed quantum computing requires a new generation of quantum compilers, for mapping any quantum algorithm to any distributed quantum computing architecture. With this perspective, in this paper, we first discuss the main challenges arising with compiler design for distributed quantum computing. Then, we analytically derive an upper bound of the overhead induced by quantum compilation for distributed quantum computing. The derived bound accounts for the overhead induced by the underlying computing architecture as well as the additional overhead induced by the sub-optimal quantum compiler -- expressly designed through the paper to achieve three key features, namely, general-purpose, efficient and effective. Finally, we validate the analytical results and we confirm the validity of the compiler design through an extensive performance analysis.

翻訳日:2023-04-20 08:24:56 公開日:2020-12-17

# 位相的に非等価な量子化

Topologically inequivalent quantizations ( http://arxiv.org/abs/2012.09929v1 )

ライセンス: Link先を確認

Giovanni Acquaviva, Alfredo Iorio, Luca Smaldone

(参考訳) 自発的に破れた u(1) 内部対称性を持つスカラー量子場理論において, 量子化の代数, 標準交換関係の表現について, ナムブ・ゴールドストーン粒子の凝縮によって渦型の位相的欠陥が形成される場合に論じる。系の物理的不連続な相の存在に必要な等価でない表現を持つためには、通常の熱力学的極限は不要である。これは新しいタイプの不等式であり、位相空間の非自明な位相構造が有限体積に現れるためである。我々はこれを、位相的および熱力学的位相の統一的な視点への第一歩とみなし、このシナリオの量子重力への応用の可能性についてコメントする。

We discuss the representations of the algebra of quantization, the canonical commutation relations, in a scalar quantum field theory with spontaneously broken U(1) internal symmetry, when a topological defect of the vortex type is formed via the condensation of Nambu-Goldstone particles. We find that the usual thermodynamic limit is not necessary in order to have the inequivalent representations needed for the existence of physically disjoint phases of the system. This is a new type of inequivalence, due to the nontrivial topological structure of the phase space, that appears at finite volume. We regard this as a first step towards a unifying view of topological and thermodynamic phases, and offer here comments on the possible application of this scenario to quantum gravity.

翻訳日:2023-04-20 08:16:58 公開日:2020-12-17

# 古典的流行モデルと非散逸性および散逸性量子タイト結合モデルとの等価性

Equivalence between classical epidemic model and non-dissipative and dissipative quantum tight-binding model ( http://arxiv.org/abs/2012.09923v1 )

ライセンス: Link先を確認

Krzysztof Pomorski

(参考訳) 古典的流行モデルと非散逸性および散逸性量子タイト結合モデルとの等価性が導かれる。古典的な流行モデルは、非散逸性および散逸性の両方でフォン・ノイマンエントロピーによって記述された静電結合量子ビットの場合に現れる量子絡みを再現することができる。その結果、量子力学的現象は古典的統計モデルによってほぼ完全にシミュレートされる可能性が示唆された。量子のような絡み合いと状態の重畳を含む。したがって、古典力学の観点から古典システムによって表現される結合型流行モデルは、量子技術、特に量子のような計算や量子のような通信の基盤となる。古典密度行列は、反可換性の観点から運動方程式によって導かれ、記述される。ラビのような振動の存在は、古典的流行モデルにおいて指摘されている。さらに、量子系におけるアハロノフ・ボーム効果の存在も古典的な流行モデルによって再現できる。量子ドットから作られ、位置ベースの量子ビットを用いて単純化された強結合モデルによって記述された全ての量子系は、量子行列ハミルトンの2倍の大きさを持つS行列の非常に特異な構造を持つ古典的モデルによって効果的に記述することができる。得られた結果は、量子力学の基本的な性質とユニークな性質を部分的に疑問視し、量子力学のオントロジーを古典的な統計物理学の枠組みに置き、量子力学が有効であり、現象学的であり、現実の基本的な図像ではないことを示唆する他の基本的な理論の出現の動機をもたらす可能性がある。

The equivalence between classical epidemic model and nondissipative and dissipative quantum tight-binding model is derived. Classical epidemic model can reproduce the quantum entanglement emerging in the case of electrostatically coupled qubits described by von-Neumann entropy both in non-dissipative and dissipative case. The obtained results shows that quantum mechanical phenomena might be almost entirely simulated by classical statistical model. It includes the quantum like entanglement and superposition of states. Therefore coupled epidemic models expressed by classical systems in terms of classical physics can be the base for possible incorporation of quantum technologies and in particular for quantum like computation and quantum like communication. The classical density matrix is derived and described by the equation of motion in terms of anticommutator. Existence of Rabi like oscillations is pointed in classical epidemic model. Furthermore the existence of Aharonov-Bohm effect in quantum systems can also be reproduced by the classical epidemic model. Every quantum system made from quantum dots and described by simplistic tight-binding model by use of position-based qubits can be effectively described by classical model with very specific structure of S matrix that has twice bigger size as it is the case of quantum matrix Hamiltonian. Obtained results partly question fundamental and unique character of quantum mechanics and are placing ontology of quantum mechanics much in the framework of classical statistical physics what can bring motivation for emergence of other fundamental theories bringing suggestion that quantum mechanical is only effective and phenomenological but not fundamental picture of reality.

翻訳日:2023-04-20 08:16:45 公開日:2020-12-17

# 逆高調波振動子の物理-最下地平線から事象地平線まで-

Physics of the Inverted Harmonic Oscillator: From the lowest Landau level to event horizons ( http://arxiv.org/abs/2012.09875v1 )

ライセンス: Link先を確認

Varsha Subramanyan, Suraj S. Hegde, Smitha Vishveshwara and Barry Bradlyn

(参考訳) 本研究では, 逆調和振動子(IHO)ハミルトニアンを, 様々な物理系における散乱と時間縮退の量子力学を理解するためのパラダイムとして提示する。領域保存変換のジェネレータの1つとして、IHOハミルトニアンは拡張生成器、圧縮生成器、ローレンツ励起発生器、散乱ポテンシャルとして研究することができる。これらの異なる形態を確立するために、量子ホール系におけるホーキング・ウンルー効果と最低ランダウ準位(LLL)における散乱の異なる現象を基礎とするIHOの物理学を実証する。我々は、LLLにおけるIHOハミルトニアンの出現をゲージ不変な方法で導き、事象の地平線付近で量子力学を記述するリンドラー・ハミルトニアンと正確な平行性を示す。 ihoハミルトニアンを通じて同型リー代数によって記述される対称性を持つ特異な物理系を研究するこのアプローチにより、ウィグナー回転のような相対論的効果の観点から最低ランダウレベルの幾何学的応答を再解釈することができる。さらに、IHOの分析散乱行列は、量子化された時間遅延速度を持つスペクトルにおける準正規モード(QNM)の存在を指摘する。我々は、これらのqnmを波束散乱によってアクセスする方法を示し、ブラックホールで見られるものと平行な量子ホールポイントコンタクトジオメトリにおける新しい効果を提案する。

In this work, we present the inverted harmonic oscillator (IHO) Hamiltonian as a paradigm to understand the quantum mechanics of scattering and time-decay in a diverse set of physical systems. As one of the generators of area preserving transformations, the IHO Hamiltonian can be studied as a dilatation generator, squeeze generator, a Lorentz boost generator, or a scattering potential. In establishing these different forms, we demonstrate the physics of the IHO that underlies phenomena as disparate as the Hawking-Unruh effect and scattering in the lowest Landau level(LLL) in quantum Hall systems. We derive the emergence of the IHO Hamiltonian in the LLL in a gauge invariant way and show its exact parallels with the Rindler Hamiltonian that describes quantum mechanics near event horizons. This approach of studying distinct physical systems with symmetries described by isomorphic Lie algebras through the emergent IHO Hamiltonian enables us to reinterpret geometric response in the lowest Landau level in terms of relativistic effects such as Wigner rotation. Further, the analytic scattering matrix of the IHO points to the existence of quasinormal modes (QNMs) in the spectrum, which have quantized time-decay rates. We present a way to access these QNMs through wave packet scattering, thus proposing a novel effect in quantum Hall point contact geometries that parallels those found in black holes.

翻訳日:2023-04-20 08:16:06 公開日:2020-12-17

# 人工ゲージ場における超低温ボソンのダイナミクス:角運動量、フラグメンテーション、エントロピーの変動

Dynamics of Ultracold Bosons in Artificial Gauge Fields: Angular Momentum, Fragmentation, and the Variance of Entropy ( http://arxiv.org/abs/2012.09870v1 )

ライセンス: Link先を確認

Axel U.J. Lode, Sunayana Dutta, Camille L\'ev\^eque

(参考訳) 人工ゲージ場に突然切り替えることによって引き起こされる2次元相互作用する超低温ボソンのダイナミクスを考察する。このシステムは、高調波トラップ電位の基底状態において初期化される。応用された人工ゲージの強度の関数として、角運動量、断片化、および吸収のエントロピーのエントロピーと分散、あるいは単発画像のエントロピーをモニタリングすることにより、創発的ダイナミクスを解析する。我々は,マルチコンフィグレーション的時間依存ハーツリー法(mctdh-x)を用いて,時間依存多元ボソンschr\"odinger方程式を解く。人工ゲージ場がシステム内の角運動量に埋め込まれていることが分かる。フラクメンテーション (Fragmentation) - 縮小した一体密度行列の複数のマクロ固有値 - は角運動量の力学と同期して現れる: 多体状態のボソンは非自明な相関を発達させる。本研究では,超低温原子系の状態の標準的な投影計測である単発画像の画像エントロピーの分散を統計的に解析することにより,断片化と角運動量の評価が実験的に困難であることを実証する。

We consider the dynamics of two-dimensional interacting ultracold bosons triggered by suddenly switching on an artificial gauge field. The system is initialized in the ground state of a harmonic trapping potential. As a function of the strength of the applied artificial gauge, we analyze the emergent dynamics by monitoring the angular momentum, the fragmentation as well the entropy and variance of the entropy of absorption or single-shot images. We solve the underlying time-dependent many-boson Schr\"odinger equation using the multiconfigurational time-dependent Hartree method for indistinguishable particles (MCTDH-X). We find that the artificial gauge field implants angular momentum in the system. Fragmentation -- multiple macroscopic eigenvalues of the reduced one-body density matrix -- emerges in sync with the dynamics of angular momentum: the bosons in the many-body state develop non-trivial correlations. Fragmentation and angular momentum are experimentally difficult to assess; here, we demonstrate that they can be probed by statistically analyzing the variance of the image entropy of single-shot images that are the standard projective measurement of the state of ultracold atomic systems.

翻訳日:2023-04-20 08:15:40 公開日:2020-12-17

# 相互作用を持つ一般化オーブリー・アンドルーモデルにおける行列積状態をもつ多体移動エッジの探索

In search of a many-body mobility edge with matrix product states in a Generalized Aubry-Andr\'e model with interactions ( http://arxiv.org/abs/2012.09853v1 )

ライセンス: Link先を確認

Nicholas Pomata, Sriram Ganeshan, Tzu-Chieh Wei

(参考訳) 一般化された aubry-andr\'e (gaa) モデルにおける多体移動エッジの可能性について,shift-invert matrix product states (simps) アルゴリズム (phys) を用いて検討した。 Rev. Lett. 118, 017201 (2017)). 非相互作用GAAモデルは、自己双対誘導モビリティエッジを持つ1次元準周期モデルである。 SIMPSの利点は、エネルギー分解方式で多体状態をターゲットにしており、収束のために全多体状態を局所化する必要はなく、相互作用するGAAモデルが多体移動エッジを示すかどうかをテストすることができることである。解析の結果, 単一粒子移動エッジの存在下での標的状態は, 「MBL様」完全収束状態とSIMPSが収束しない完全非局在状態に一致しないことがわかった。有限結合次元の関数としての絡み合いスケーリング解析は、単一粒子移動端近傍の多体状態がSIMPS法において非局在状態がどのように現れるかに近い振る舞いを示す。

We investigate the possibility of a many-body mobility edge in the Generalized Aubry-Andr\'e (GAA) model with interactions using the Shift-Invert Matrix Product States (SIMPS) algorithm (Phys. Rev. Lett. 118, 017201 (2017)). The non-interacting GAA model is a one-dimensional quasiperiodic model with a self-duality induced mobility edge. The advantage of SIMPS is that it targets many-body states in an energy-resolved fashion and does not require all many-body states to be localized for convergence, which allows us to test if the interacting GAA model manifests a many-body mobility edge. Our analysis indicates that the targeted states in the presence of the single particle mobility edge match neither `MBL-like' fully-converged localized states nor the fully delocalized case where SIMPS fails to converge. An entanglement-scaling analysis as a function of the finite bond dimension indicates that the many-body states in the vicinity of a single-particle mobility edge behave closer to how delocalized states manifest within the SIMPS method.

翻訳日:2023-04-20 08:15:14 公開日:2020-12-17

# 非線形適応制御と予測における入射正則化と運動量アルゴリズム

Implicit regularization and momentum algorithms in nonlinear adaptive control and prediction ( http://arxiv.org/abs/1912.13154v6 )

ライセンス: Link先を確認

Nicholas M. Boffi, Jean-Jacques E. Slotine

(参考訳) 動的システムの安定した同時学習と制御は適応制御の主題である。多くの実用的応用と豊富な理論を持つ確立された分野であるにもかかわらず、非線形システムの適応制御の開発の多くは、いくつかの重要なアルゴリズムを中心に展開されている。古典的適応非線形制御技術と最近の最適化と機械学習の進歩とを強く結び付けることで,適応非線形制御と適応動的予測の両面において,アルゴリズム開発に未発達の可能性が示された。まず,自然勾配降下とミラー降下に触発された一階適応則を導入する。データに一貫性のある複数のダイナミクスが存在する場合、これらの非ユークリッド適応法則は学習モデルを暗黙的に規則化する。このように学習中に課される局所幾何は、スパーシティのような望ましい性質のために、完全な追跡や予測を達成する多くのパラメータベクトルを選択できる。この結果を正規化力学予測器とオブザーバの設計に適用し、具体的にはハミルトン系、ラグランジュ系、および繰り返しニューラルネットワークを考える。その後、ブレグマン・ラグランジアン(bregman lagrangian)に基づく変分形式を開発し、線形パラメータ化システムや単調性や凸性要件を満たす非線形パラメータ化システムに適用可能な運動量を持つ適応則を定義する。ブレグマン・ラグランジュ方程式のオイラー・ラグランジュ方程式は運動量を持つ自然な勾配やミラー降下のような適応法則を導いており、無限摩擦極限においてそれらの一階の類似を回復する。理論的結果を示すシミュレーションを用いて分析を行った。

Stable concurrent learning and control of dynamical systems is the subject of adaptive control. Despite being an established field with many practical applications and a rich theory, much of the development in adaptive control for nonlinear systems revolves around a few key algorithms. By exploiting strong connections between classical adaptive nonlinear control techniques and recent progress in optimization and machine learning, we show that there exists considerable untapped potential in algorithm development for both adaptive nonlinear control and adaptive dynamics prediction. We first introduce first-order adaptation laws inspired by natural gradient descent and mirror descent. We prove that when there are multiple dynamics consistent with the data, these non-Euclidean adaptation laws implicitly regularize the learned model. Local geometry imposed during learning thus may be used to select parameter vectors - out of the many that will achieve perfect tracking or prediction - for desired properties such as sparsity. We apply this result to regularized dynamics predictor and observer design, and as concrete examples consider Hamiltonian systems, Lagrangian systems, and recurrent neural networks. We subsequently develop a variational formalism based on the Bregman Lagrangian to define adaptation laws with momentum applicable to linearly parameterized systems and to nonlinearly parameterized systems satisfying monotonicity or convexity requirements. We show that the Euler Lagrange equations for the Bregman Lagrangian lead to natural gradient and mirror descent-like adaptation laws with momentum, and we recover their first-order analogues in the infinite friction limit. We illustrate our analyses with simulations demonstrating our theoretical results.

翻訳日:2023-01-16 21:18:35 公開日:2020-12-17

# 1つの広い層をもつ深層ネットワークのグローバル収束とピラミッドトポロジー

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology ( http://arxiv.org/abs/2002.07867v3 )

ライセンス: Link先を確認

Quynh Nguyen and Marco Mondelli

(参考訳) 最近の研究により、勾配降下は、すべての隠れた層が多項式的にスケールし、n$ (n$ はトレーニングサンプルの数) でスケールする、過パラメータニューラルネットワークのグローバル最小値を見つけることができることが示されている。本稿では,深層ネットワークにおいて,入力層に追従する1層の幅$N$が,同様の保証を確保するのに十分であることを示す。特に、残りの全ての層は一定の幅を持ち、ピラミッドトポロジーを形成することができる。我々は、広く使われているLeCunの初期化への我々の結果の適用を示し、オーダー$N^2.$の単一ワイド層に対するオーバーパラメータ化要件を得る。

Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with $N$ ($N$ being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width $N$ following the input layer suffices to ensure a similar guarantee. In particular, all the remaining layers are allowed to have constant widths, and form a pyramidal topology. We show an application of our result to the widely used LeCun's initialization and obtain an over-parameterization requirement for the single wide layer of order $N^2.$

翻訳日:2022-12-30 19:33:36 公開日:2020-12-17

# 選好からの高速ベイズ逆流推論による安全な模倣学習

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences ( http://arxiv.org/abs/2002.09089v4 )

ライセンス: Link先を確認

Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum

(参考訳) デモンストレーションによるベイズ報酬学習は、模倣学習を行う際の厳密な安全性と不確実性分析を可能にする。しかし、ベイジアン報酬学習法は一般に複雑な制御問題に対して計算的に難解である。ベイジアン・リワード補間法(Bayesian Reward Extrapolation, Bayesian REX)を提案する。ベイジアン・リワード学習アルゴリズムは, 自己教師付きタスクによる低次元特徴符号化を事前学習し, 実演よりも好みを生かして高速なベイジアン推定を行う。 Bayesian REXはデモからAtariゲームを学ぶことができ、ゲームスコアにアクセスすることなく、パーソナルラップトップでわずか5分で後部報酬関数から10万のサンプルを生成することができる。ベイジアンREXはまた、報酬関数の点推定のみを学習する最先端の手法と競合するか、それ以上の模倣学習性能をもたらす。最後に、ベイジアンREXは報酬関数のサンプルにアクセスすることなく、効率的な高信頼度ポリシー評価を可能にする。これらの信頼性の高いパフォーマンス境界は、さまざまな評価ポリシーのパフォーマンスとリスクをランク付けし、報酬ハッキング行動を検出する手段を提供するために使用できる。

Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally intractable for complex control problems. We propose Bayesian Reward Extrapolation (Bayesian REX), a highly efficient Bayesian reward learning algorithm that scales to high-dimensional imitation learning problems by pre-training a low-dimensional feature encoding via self-supervised tasks and then leveraging preferences over demonstrations to perform fast Bayesian inference. Bayesian REX can learn to play Atari games from demonstrations, without access to the game score and can generate 100,000 samples from the posterior over reward functions in only 5 minutes on a personal laptop. Bayesian REX also results in imitation learning performance that is competitive with or better than state-of-the-art methods that only learn point estimates of the reward function. Finally, Bayesian REX enables efficient high-confidence policy evaluation without having access to samples of the reward function. These high-confidence performance bounds can be used to rank the performance and risk of a variety of evaluation policies and provide a way to detect reward hacking behaviors.

翻訳日:2022-12-30 00:33:58 公開日:2020-12-17

# グラフ注意ニューラルネットワークを用いた目標知覚分類のための型付き構文依存性の検討

Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network ( http://arxiv.org/abs/2002.09685v3 )

ライセンス: Link先を確認

Xuefeng Bai, Pengbo Liu and Yue Zhang

(参考訳) 目標感情分類は、入力テキスト中の特定の目標言及に対する感情極性を予測する。支配的手法はニューラルネットワークを用いて入力文を符号化し、ターゲット参照とそれらのコンテキストの関係を抽出する。近年,タスクの依存関係構文を統合するためにグラフニューラルネットワークが研究され,最先端の結果が得られた。しかし、既存の手法では依存ラベル情報を考慮せず、直感的に有用である。そこで本研究では,型付き構文依存情報を統合する新しい関係グラフアテンションネットワークについて検討する。標準ベンチマークの結果,提案手法は感情分類性能を向上させるためにラベル情報を効果的に活用できることがわかった。最終的なモデルは最先端の構文ベースのアプローチを大幅に上回っています。

Targeted sentiment classification predicts the sentiment polarity on given target mentions in input texts. Dominant methods employ neural networks for encoding the input sentence and extracting relations between target mentions and their contexts. Recently, graph neural network has been investigated for integrating dependency syntax for the task, achieving the state-of-the-art results. However, existing methods do not consider dependency label information, which can be intuitively useful. To solve the problem, we investigate a novel relational graph attention network that integrates typed syntactic dependency information. Results on standard benchmarks show that our method can effectively leverage label information for improving targeted sentiment classification performances. Our final model significantly outperforms state-of-the-art syntax-based approaches.

翻訳日:2022-12-29 19:28:28 公開日:2020-12-17

# H\"古いクラスによってインデックス付けされた経験過程の上限の期待値の境界

Bounding the expectation of the supremum of empirical processes indexed by H\"older classes ( http://arxiv.org/abs/2003.13530v3 )

ライセンス: Link先を確認

Nicolas Schreuder

(参考訳) 本稿では、任意の滑らかなh\"olderクラスと$\mathbb r^d$ の有界集合上の任意の分布によってインデックスづけされた経験的過程の上限の期待値の上界を与える。これらの結果は、n$独立観測に基づいて未知の分布を経験的に推定し、その推定誤差を積分確率メトリクス(IPM)によって定量化する場合、非漸近的リスク境界と見なすことができる。特に、H\"古いクラスによってインデックスされたIMMを考慮し、対応するレートを導出する。これらの結果は、Wassertein-1 距離に対応する速度 $n^{-1/d}$ と、非常に滑らかな函数に対応する高速速度 $n^{-1/2}$(例えば、有界核で定義される RKHS の関数)の2つのよく知られた極端なケースの間で補間される。

In this note, we provide upper bounds on the expectation of the supremum of empirical processes indexed by H\"older classes of any smoothness and for any distribution supported on a bounded set in $\mathbb R^d$. These results can be alternatively seen as non-asymptotic risk bounds, when the unknown distribution is estimated by its empirical counterpart, based on $n$ independent observations, and the error of estimation is quantified by the integral probability metrics (IPM). In particular, the IPM indexed by a H\"older class is considered and the corresponding rates are derived. These results interpolate between the two well-known extreme cases: the rate $n^{-1/d}$ corresponding to the Wassertein-1 distance (the least smooth case) and the fast rate $n^{-1/2}$ corresponding to very smooth functions (for instance, functions from an RKHS defined by a bounded kernel).

翻訳日:2022-12-18 08:40:20 公開日:2020-12-17

# 異種ネットワーク表現学習:調査とベンチマークによる統一フレームワーク

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark ( http://arxiv.org/abs/2004.00216v3 )

ライセンス: Link先を確認

Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, Jiawei Han

(参考訳) 実世界のオブジェクトとその相互作用はしばしばマルチモーダルでマルチタイプであるため、ヘテロジニアスネットワークは伝統的な均質ネットワーク(graphs)のより強力で現実的な、汎用的なスーパークラスとして広く使われてきた。一方,表現学習 (\aka~embedding) は近年,様々なネットワークマイニングや分析作業において,集中的に研究されている。本研究では,既存のヘテロジニアス・ネットワーク・組み込み(hne)に関する研究を深く要約し,評価するための統一的なフレームワークを提供することを目的としている。この研究の最初の貢献として、HNEアルゴリズムの幅広い体系体が存在しており、既存のHNEアルゴリズムの利点に関する体系的な分類と分析のための一般的なパラダイムを提供する。さらに、既存のHNEアルゴリズムは概ね汎用性を備えているが、しばしば異なるデータセットで評価される。 HNEの応用上、このような間接的な比較は、特に実世界のアプリケーションデータから異種ネットワークを構築する様々な方法を考えると、効率的なデータ前処理や新しい技術設計へのタスクパフォーマンスの向上の適切な寄与を阻害する。したがって、第2の貢献として、スケール、構造、属性/ラベルの可用性、および \etcに関するさまざまな特性を持つ4つのベンチマークデータセットを作成します。異なる情報源から、HNEアルゴリズムの便利で公正な評価に向けて。第3のコントリビューションとして、実装を慎重にリファクタリングし、13の人気のあるHNEアルゴリズムの親和性のあるインターフェースを作成し、複数のタスクと実験的な設定に対して、それらの全周比較を提供する。

Since real-world objects and their interactions are often multi-modal and multi-typed, heterogeneous networks have been widely used as a more powerful, realistic, and generic superclass of traditional homogeneous networks (graphs). Meanwhile, representation learning (\aka~embedding) has recently been intensively studied and shown effective for various network mining and analytical tasks. In this work, we aim to provide a unified framework to deeply summarize and evaluate existing research on heterogeneous network embedding (HNE), which includes but goes beyond a normal survey. Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms. Moreover, existing HNE algorithms, though mostly claimed generic, are often evaluated on different datasets. Understandable due to the application favor of HNE, such indirect comparisons largely hinder the proper attribution of improved task performance towards effective data preprocessing and novel technical design, especially considering the various ways possible to construct a heterogeneous network from real-world application data. Therefore, as the second contribution, we create four benchmark datasets with various properties regarding scale, structure, attribute/label availability, and \etc.~from different sources, towards handy and fair evaluations of HNE algorithms. As the third contribution, we carefully refactor and amend the implementations and create friendly interfaces for 13 popular HNE algorithms, and provide all-around comparisons among them over multiple tasks and experimental settings.

翻訳日:2022-12-17 19:32:19 公開日:2020-12-17

# エネルギーモデルを用いた構成的視覚生成と推論

Compositional Visual Generation and Inference with Energy Based Models ( http://arxiv.org/abs/2004.06030v3 )

ライセンス: Link先を確認

Yilun Du, Shuang Li, Igor Mordatch

(参考訳) 人間の知能の重要な側面は、より単純なアイデアからますます複雑な概念を組み立て、迅速な学習と知識の適応を可能にする能力である。本稿では, 確率分布を直接組み合わせることで, エネルギーモデルでこの能力を発揮できることを示す。複合分布からのサンプルは概念の構成に対応する。例えば、笑顔の顔の分布と男性の顔の分布を考えると、笑顔の顔を生成するためにそれらを組み合わせることができる。これにより、コンビネーション、切断、概念の否定を同時に満足する自然画像を生成することができます。我々は,自然顔のCelebAデータセットと合成3Dシーン画像を用いて,モデルの構成生成能力を評価する。また、新たな概念を継続的に学習し、組み込む機能や、画像の基盤となる概念特性の合成を推論する機能など、我々のモデルに特有の利点も示しています。

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge. In this paper we show that energy-based models can exhibit this ability by directly combining probability distributions. Samples from the combined distribution correspond to compositions of concepts. For example, given a distribution for smiling faces, and another for male faces, we can combine them to generate smiling male faces. This allows us to generate natural images that simultaneously satisfy conjunctions, disjunctions, and negations of concepts. We evaluate compositional generation abilities of our model on the CelebA dataset of natural faces and synthetic 3D scene images. We also demonstrate other unique advantages of our model, such as the ability to continually learn and incorporate new concepts, or infer compositions of concept properties underlying an image.

翻訳日:2022-12-13 23:07:56 公開日:2020-12-17

# pfnn:複素幾何学上の二階境界値問題のクラスを解くペナルティフリーニューラルネットワーク法

PFNN: A Penalty-Free Neural Network Method for Solving a Class of Second-Order Boundary-Value Problems on Complex Geometries ( http://arxiv.org/abs/2004.06490v2 )

ライセンス: Link先を確認

Hailong Sheng and Chao Yang

(参考訳) 複素測地における2階境界値問題のクラスを効率的に解くために, ペナルティのないニューラルネットワーク手法であるPFNNを提案する。滑らかさの要求を減らすため、元の問題は弱形式に再構成され、高階導関数の評価は避けられる。 1つではなく2つのニューラルネットワークを用いて近似解を構築し、1つのネットワークが必須境界条件を満たし、もう1つはドメインの残りの部分を処理している。このように、制約付き最適化ではなく、制約付き最適化問題は、ペナルティ項を追加することなく解決される。 2つのネットワークの絡み合いは、スケール不変で複雑なジオメトリに適応できる長さ係数関数の助けを借りて解消される。本稿では,pfnn法の収束を証明し,線形および非線形の2次境界値問題に対する数値実験を行い,pfnnが既存の手法よりも精度,柔軟性,頑健性において優れていることを示す。

We present PFNN, a penalty-free neural network method, to efficiently solve a class of second-order boundary-value problems on complex geometries. To reduce the smoothness requirement, the original problem is reformulated to a weak form so that the evaluations of high-order derivatives are avoided. Two neural networks, rather than just one, are employed to construct the approximate solution, with one network satisfying the essential boundary conditions and the other handling the rest part of the domain. In this way, an unconstrained optimization problem, instead of a constrained one, is solved without adding any penalty terms. The entanglement of the two networks is eliminated with the help of a length factor function that is scale invariant and can adapt with complex geometries. We prove the convergence of the PFNN method and conduct numerical experiments on a series of linear and nonlinear second-order boundary-value problems to demonstrate that PFNN is superior to several existing approaches in terms of accuracy, flexibility and robustness.

翻訳日:2022-12-13 10:15:52 公開日:2020-12-17

# MangaGAN:マンガ図面の方法論に基づく未完成のフォト・ツー・マンガ翻訳

MangaGAN: Unpaired Photo-to-Manga Translation Based on The Methodology of Manga Drawing ( http://arxiv.org/abs/2004.10634v2 )

ライセンス: Link先を確認

Hao Su, Jianwei Niu, Xuefeng Liu, Qingfeng Li, Jiahe Cui, and Ji Wan

(参考訳) マンガ(manga)は、主に白黒のストローク線や幾何学的な誇張を用いて人間の容姿、ポーズ、行動などを表現した、日本発祥の世界的人気漫画である。本稿では, 生成的逆ネットワーク(gan, generative adversarial network)を基盤としたマンガガン(mangagan)を提案する。マンガアーティストがいかにマンガを描くかに触発されたMangaGANは、デザインされたGANモデルによってマンガの幾何学的特徴を生成し、カスタマイズされたマルチGANアーキテクチャにより、各顔領域をマンガドメインに微妙に翻訳する。 MangaGANのトレーニングのために,マンガの顔の特徴,ランドマーク,身体などを含む,人気マンガ作品から収集された新しいデータセットを構築した。さらに,高品質なマンガ面を作成するために,スムースなストロークラインに対する構造的平滑化損失とノイズ画素の回避,およびフォトとマンガのドメイン間の類似性を向上させるための類似性保持モジュールを提案する。広汎な実験により、マンガガンは、顔の類似性と人気マンガスタイルの両方を保ち、他の関連する最先端の手法よりも優れた高品質なマンガフェイスを生成できることが示されている。

Manga is a world popular comic form originated in Japan, which typically employs black-and-white stroke lines and geometric exaggeration to describe humans' appearances, poses, and actions. In this paper, we propose MangaGAN, the first method based on Generative Adversarial Network (GAN) for unpaired photo-to-manga translation. Inspired by how experienced manga artists draw manga, MangaGAN generates the geometric features of manga face by a designed GAN model and delicately translates each facial region into the manga domain by a tailored multi-GANs architecture. For training MangaGAN, we construct a new dataset collected from a popular manga work, containing manga facial features, landmarks, bodies, and so on. Moreover, to produce high-quality manga faces, we further propose a structural smoothing loss to smooth stroke-lines and avoid noisy pixels, and a similarity preserving module to improve the similarity between domains of photo and manga. Extensive experiments show that MangaGAN can produce high-quality manga faces which preserve both the facial similarity and a popular manga style, and outperforms other related state-of-the-art methods.

翻訳日:2022-12-10 18:33:26 公開日:2020-12-17

# ディープニューラルネットワークと対向ロバストネスを用いたfMRIデコーディングの解釈可能性の向上

Improving the Interpretability of fMRI Decoding using Deep Neural Networks and Adversarial Robustness ( http://arxiv.org/abs/2004.11114v3 )

ライセンス: Link先を確認

Patrick McClure, Dustin Moraczewski, Ka Chun Lam, Adam Thomas, Francisco Pereira

(参考訳) 機能的磁気共鳴画像(fMRI)データから予測するために、ディープニューラルネットワーク(DNN)がますます使われている。しかし、それらは広く解釈不能な「ブラックボックス」と見なされており、その過程でdnnがどの入力情報が使われているかを知ることは困難であり、認知神経科学と臨床応用の両方において重要なものである。サリエンシマップは、入力特徴の相対的重要性の解釈可能な可視化を作成するための一般的なアプローチである。しかし、DNNが入力ノイズに敏感であることや、入力に過度に集中し、モデルに過少なため、マップを作成する方法は失敗することが多い。また,正当性マップが真に関連した入力情報にどの程度対応しているかを評価することも困難である。本稿では,勾配に基づく塩分濃度分布図を作成するための様々な手法を概説し,DNNを入力雑音に頑健にするために開発した新しい逆方向学習法について述べる。本稿では,DNNや線形モデルを用いて画像データから情報を復号化するための訓練を行う場合,fMRIにおける2つの定量評価手法を提案する。我々は,複雑なアクティベーション構造が知られている合成データセットと,DNNで生成されるサリエンシマップとHuman Connectome Project(HCP)データセットにおけるタスクデコーディングのための線形モデルを用いて,その手順を評価する。我々の重要な発見は、合成fMRIデータとHCP fMRIデータの両方において、異なる方法で生成されるサリエンシマップが、解釈可能性において大きく異なることである。驚くべきことに、dnnと線形モデルが同等のパフォーマンスレベルでデコードする場合であっても、dnn saliency mapは(重みや勾配から派生した)線形モデルsaliency mapsよりも解釈可能性が高い。最後に,我々の対人訓練法で作成したサリエンシマップは,他の方法よりも優れていた。

Deep neural networks (DNNs) are being increasingly used to make predictions from functional magnetic resonance imaging (fMRI) data. However, they are widely seen as uninterpretable "black boxes", as it can be difficult to discover what input information is used by the DNN in the process, something important in both cognitive neuroscience and clinical applications. A saliency map is a common approach for producing interpretable visualizations of the relative importance of input features for a prediction. However, methods for creating maps often fail due to DNNs being sensitive to input noise, or by focusing too much on the input and too little on the model. It is also challenging to evaluate how well saliency maps correspond to the truly relevant input information, as ground truth is not always available. In this paper, we review a variety of methods for producing gradient-based saliency maps, and present a new adversarial training method we developed to make DNNs robust to input noise, with the goal of improving interpretability. We introduce two quantitative evaluation procedures for saliency map methods in fMRI, applicable whenever a DNN or linear model is being trained to decode some information from imaging data. We evaluate the procedures using a synthetic dataset where the complex activation structure is known, and on saliency maps produced for DNN and linear models for task decoding in the Human Connectome Project (HCP) dataset. Our key finding is that saliency maps produced with different methods vary widely in interpretability, in both in synthetic and HCP fMRI data. Strikingly, even when DNN and linear models decode at comparable levels of performance, DNN saliency maps score higher on interpretability than linear model saliency maps (derived via weights or gradient). Finally, saliency maps produced with our adversarial training method outperform those from other methods.

翻訳日:2022-12-10 08:54:21 公開日:2020-12-17

# 半語彙言語 ---コンピュータビジョンにおける機械学習と記号推論の統合のための公式な基礎

Semi-Lexical Languages -- A Formal Basis for Unifying Machine Learning and Symbolic Reasoning in Computer Vision ( http://arxiv.org/abs/2004.12152v2 )

ライセンス: Link先を確認

Briti Gangopadhyay, Somnath Hazra and Pallab Dasgupta

(参考訳) 人間の視覚は、世界に関する事前の知識に基づいて推論することで、現実世界からの感覚入力の不完全性を補うことができる。しかし、ドメイン知識に基づく推論フレームワークが存在しないことで、複雑なシナリオを解釈する能力は制限されている。実世界の不完全なトークンを扱うための公式な基礎として半語彙言語を提案する。機械学習のパワーは不完全なトークンを言語のアルファベットにマッピングするために使用され、シンボリック推論は言語の入力のメンバーシップを決定するために使用される。半語彙言語はまた、入力の異なる部分で半語彙のトークンが解釈されるバリエーションを防ぐバインディングを持ち、それによって推論に頼り、個々のトークンの認識の質を高める。本稿では、純粋な機械学習と純粋にシンボリックな手法よりも、このようなフレームワークを使うことの利点を示すケーススタディを紹介する。

Human vision is able to compensate imperfections in sensory inputs from the real world by reasoning based on prior knowledge about the world. Machine learning has had a significant impact on computer vision due to its inherent ability in handling imprecision, but the absence of a reasoning framework based on domain knowledge limits its ability to interpret complex scenarios. We propose semi-lexical languages as a formal basis for dealing with imperfect tokens provided by the real world. The power of machine learning is used to map the imperfect tokens into the alphabet of the language and symbolic reasoning is used to determine the membership of input in the language. Semi-lexical languages also have bindings that prevent the variations in which a semi-lexical token is interpreted in different parts of the input, thereby leaning on deduction to enhance the quality of recognition of individual tokens. We present case studies that demonstrate the advantage of using such a framework over pure machine learning and pure symbolic methods.

翻訳日:2022-12-09 21:16:04 公開日:2020-12-17

# 文書品質予測のための構造タグの改善

Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction ( http://arxiv.org/abs/2005.00129v2 )

ライセンス: Link先を確認

Gideon Maillette de Buy Wenniger, Thomas van Dongen, Eleri Aedmaa, Herbert Teun Kruitbosch, Edwin A. Valentijn, and Lambert Schomaker

(参考訳) 長いテキスト、特に学術文書でのリカレントニューラルネットワークのトレーニングは、学習に問題を引き起こす。階層的注意ネットワーク(HAN)はこれらの問題を解決するのに有効であるが、テキストの構造に関する重要な情報を失う。これらの問題に対処するために、文書中の文の役割を示す構造タグとHANの使用を提案する。文にタグを追加し、タイトル、抽象的、あるいは本文に対応するマークを付けると、学術的な文書品質予測のための最先端技術よりも改善される。提案システムは,PeerReadデータセット上でのアクセプション/リジェクト予測のタスクに適用し,最近のBiLSTMモデルと共同テキスト+視覚モデル,および平易なHANとの比較を行った。通常のHANと比較すると、3つの領域で精度が向上する。計算と言語領域では、新しいモデルは全体として最もよく機能し、最良の文献結果よりも4.7%精度が向上します。また,allen ai s2orcデータセットから集計した88kの学術論文に対して,引用数予測用のタグを導入することで,改良を行った。構造タグを持つHANシステムでは,28.5%の分散が説明され,BiLSTMモデルの再実装よりも1.8%,通常のHANよりも1.0%向上した。

Training recurrent neural networks on long texts, in particular scholarly documents, causes problems for learning. While hierarchical attention networks (HANs) are effective in solving these problems, they still lose important information about the structure of the text. To tackle these problems, we propose the use of HANs combined with structure-tags which mark the role of sentences in the document. Adding tags to sentences, marking them as corresponding to title, abstract or main body text, yields improvements over the state-of-the-art for scholarly document quality prediction. The proposed system is applied to the task of accept/reject prediction on the PeerRead dataset and compared against a recent BiLSTM-based model and joint textual+visual model as well as against plain HANs. Compared to plain HANs, accuracy increases on all three domains. On the computation and language domain our new model works best overall, and increases accuracy 4.7% over the best literature result. We also obtain improvements when introducing the tags for prediction of the number of citations for 88k scientific publications that we compiled from the Allen AI S2ORC dataset. For our HAN-system with structure-tags we reach 28.5% explained variance, an improvement of 1.8% over our reimplementation of the BiLSTM-based model as well as 1.0% improvement over plain HANs.

翻訳日:2022-12-08 03:12:23 公開日:2020-12-17

# DramaQA:階層型QAによる文字中心のビデオストーリー理解

DramaQA: Character-Centered Video Story Understanding with Hierarchical QA ( http://arxiv.org/abs/2005.03356v2 )

ライセンス: Link先を確認

Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang

(参考訳) 近年のコンピュータビジョンと自然言語処理の進歩にもかかわらず、ビデオストーリーの本質的な難しさのため、ビデオストーリーを理解できる機械の開発はいまだに困難である。また,人間の認知過程に基づく映像理解の程度を評価する方法については,まだ研究が進んでいない。本稿では,ビデオストーリーを包括的に理解するために,新しいビデオ質問応答(ビデオQA)タスクであるDramaQAを提案する。 DramaQAは2つの視点に焦点を当てている。 1)人間の知能の認知発達段階に基づく評価指標としての階層的QA。 2) ストーリーの局所的コヒーレンスをモデル化するための文字中心のビデオアノテーション。我々のデータセットは、テレビドラマ『Another Miss Oh』の上に構築されており、17,983対のQAビデオクリップが23,928本あり、各QAペアは4つの難易度のうちの1つに属している。我々は217,308個のアノテーション付き画像を提供し,視覚境界ボックスや主要文字の動作や感情,解決されたスクリプトの同時参照など,文字中心のアノテーションを充実させた。さらに,ビデオの文字中心表現を階層的に理解し,質問に答えるマルチレベルコンテキストマッチングモデルを提案する。我々は,研究目的のためにデータセットとモデルを公開し,ビデオストーリー理解研究の新しい視点を提供することを期待している。

Despite recent progress on computer vision and natural language processing, developing a machine that can understand video story is still hard to achieve due to the intrinsic difficulty of video story. Moreover, researches on how to evaluate the degree of video understanding based on human cognitive process have not progressed as yet. In this paper, we propose a novel video question answering (Video QA) task, DramaQA, for a comprehensive understanding of the video story. The DramaQA focuses on two perspectives: 1) Hierarchical QAs as an evaluation metric based on the cognitive developmental stages of human intelligence. 2) Character-centered video annotations to model local coherence of the story. Our dataset is built upon the TV drama "Another Miss Oh" and it contains 17,983 QA pairs from 23,928 various length video clips, with each QA pair belonging to one of four difficulty levels. We provide 217,308 annotated images with rich character-centered annotations, including visual bounding boxes, behaviors and emotions of main characters, and coreference resolved scripts. Additionally, we suggest Multi-level Context Matching model which hierarchically understands character-centered representations of video to answer questions. We release our dataset and model publicly for research purposes, and we expect our work to provide a new perspective on video story understanding research.

翻訳日:2022-12-05 22:22:25 公開日:2020-12-17

# 多視点協調ネットワーク埋め込み

Multi-View Collaborative Network Embedding ( http://arxiv.org/abs/2005.08189v2 )

ライセンス: Link先を確認

Sezin Kircali Ata, Yuan Fang, Min Wu, Jiaqi Shi, Chee Keong Kwoh and Xiaoli Li

(参考訳) 現実世界のネットワークは複数のビューを持つことが多く、各ビューは共通のノード間の1つのタイプの相互作用を記述する。例えば、ビデオ共有ネットワークでは、2つのユーザーノードが1つのビューに共通のお気に入りのビデオがある場合リンクされるが、共通のサブスクライバーを共有する場合は別のビューでリンクすることもできる。従来のシングルビューネットワークとは異なり、複数のビューは互いに補完するために異なるセマンティクスを維持する。本稿では,低次元表現を学習するためのマルチビューネットワーク埋め込み手法MANEを提案する。多様性はビューを個々のセマンティクスを維持するのを可能にし、コラボレーションはビューを協調させる。また,これまで検討されていない新たな2次コラボレーション形態を発見し,優れたノード表現を実現するためのフレームワークに統合した。さらに,各ビューが異なるノードを持つ場合が多いので, mane+ というノード毎のビュー重要度をモデル化するために mane の注意に基づく拡張を提案する。最後に,実世界の3つのパブリックマルチビューネットワーク上で総合的な実験を行い,本モデルが最先端のアプローチを一貫して上回っていることを示す。

Real-world networks often exist with multiple views, where each view describes one type of interaction among a common set of nodes. For example, on a video-sharing network, while two user nodes are linked if they have common favorite videos in one view, they can also be linked in another view if they share common subscribers. Unlike traditional single-view networks, multiple views maintain different semantics to complement each other. In this paper, we propose MANE, a multi-view network embedding approach to learn low-dimensional representations. Similar to existing studies, MANE hinges on diversity and collaboration - while diversity enables views to maintain their individual semantics, collaboration enables views to work together. However, we also discover a novel form of second-order collaboration that has not been explored previously, and further unify it into our framework to attain superior node representations. Furthermore, as each view often has varying importance w.r.t. different nodes, we propose MANE+, an attention-based extension of MANE to model node-wise view importance. Finally, we conduct comprehensive experiments on three public, real-world multi-view networks, and the results demonstrate that our models consistently outperform state-of-the-art approaches.

翻訳日:2022-12-02 05:18:12 公開日:2020-12-17

# 会話型質問応答のためのfluent response生成

Fluent Response Generation for Conversational Question Answering ( http://arxiv.org/abs/2005.10464v2 )

ライセンス: Link先を確認

Ashutosh Baheti, Alan Ritter, Kevin Small

(参考訳) 質問応答(QA)はオープンドメイン会話エージェントの重要な側面であり、会話QA(ConvQA)サブタスクにおける特定の研究の焦点を定めている。最近のConvQAの取り組みの特筆すべき制限は、応答がターゲットコーパスから抽出されることであり、高品質な会話エージェントの自然言語生成(NLG)の側面を無視していることである。そこで本研究では,seq2seq nlg法を用いて,正確性を維持しつつ流麗な文法的応答を生成する手法を提案する。技術的な観点からは、エンドツーエンドシステムのトレーニングデータを生成するためにデータ拡張を使用します。具体的には,Syntactic Transformations(STs)を開発し,質問固有候補応答を生成し,BERTに基づく分類器(Devlin et al., 2019)を用いてランク付けする。 SQuAD 2.0データに対する人間による評価(Rajpurkar et al., 2018)は、提案モデルが会話応答の生成においてベースラインのCoQAおよびQuACモデルより優れていることを示す。さらに、CoQAデータセット上でテストを実行することで、モデルのスケーラビリティを示す。コードとデータはhttps://github.com/abaheti95/QADialogSystemで入手できる。

Question answering (QA) is an important aspect of open-domain conversational agents, garnering specific research focus in the conversational QA (ConvQA) subtask. One notable limitation of recent ConvQA efforts is the response being answer span extraction from the target corpus, thus ignoring the natural language generation (NLG) aspect of high-quality conversational agents. In this work, we propose a method for situating QA responses within a SEQ2SEQ NLG approach to generate fluent grammatical answer responses while maintaining correctness. From a technical perspective, we use data augmentation to generate training data for an end-to-end system. Specifically, we develop Syntactic Transformations (STs) to produce question-specific candidate answer responses and rank them using a BERT-based classifier (Devlin et al., 2019). Human evaluation on SQuAD 2.0 data (Rajpurkar et al., 2018) demonstrate that the proposed model outperforms baseline CoQA and QuAC models in generating conversational responses. We further show our model's scalability by conducting tests on the CoQA dataset. The code and data are available at https://github.com/abaheti95/QADialogSystem.

翻訳日:2022-11-30 23:30:13 公開日:2020-12-17

# BERTを用いた名前付きエンティティ認識のためのクロスセンスコンテキストの探索

Exploring Cross-sentence Contexts for Named Entity Recognition with BERT ( http://arxiv.org/abs/2006.01563v2 )

ライセンス: Link先を確認

Jouni Luoma, Sampo Pyysalo

(参考訳) 名前付きエンティティ認識(NER)はしばしば、各入力が1文のテキストからなるシーケンス分類タスクとして扱われる。にもかかわらず、タスクの有用な情報が単一文コンテキストの範囲外にあることがしばしばあることは明らかである。最近提案されたBERTのような自己認識モデルは、入力中の長距離関係を効率的にキャプチャし、複数の文からなる入力を表現し、自然言語処理タスクにクロスセンス情報を組み込んだアプローチのための新しいオポチュナイトを作成することができる。本稿では, BERT モデルを用いた NER におけるクロス文情報の利用を5言語で検討する。 BERT入力に追加文の形でコンテキストを追加することで、テスト対象言語やモデル上でのNER性能が体系的に向上することがわかった。各入力に複数の文を含めることで、異なる文脈で同じ文の予測を研究することもできる。そこで本稿では,文の様々な予測を組み合わせ,さらにNER性能を向上させるための簡単な手法であるCMV(Contextual Majority Voting)を提案する。我々のアプローチでは、トレーニングや予測のために再構成例に頼るのではなく、基盤となるBERTアーキテクチャを変更する必要はない。 CoNLL'02とCoNLL'03 NERベンチマークを含む確立されたデータセットの評価は、我々の提案した手法が、英語、オランダ語、フィンランド語における最先端のNER結果を改善し、ドイツで報告されたBERTベースの最良の結果が得られることを示す。この作業で実装されたすべてのメソッドをオープンライセンスでリリースします。

Named entity recognition (NER) is frequently addressed as a sequence classification task where each input consists of one sentence of text. It is nevertheless clear that useful information for the task can often be found outside of the scope of a single-sentence context. Recently proposed self-attention models such as BERT can both efficiently capture long-distance relationships in input as well as represent inputs consisting of several sentences, creating new opportunitites for approaches that incorporate cross-sentence information in natural language processing tasks. In this paper, we present a systematic study exploring the use of cross-sentence information for NER using BERT models in five languages. We find that adding context in the form of additional sentences to BERT input systematically increases NER performance on all of the tested languages and models. Including multiple sentences in each input also allows us to study the predictions of the same sentences in different contexts. We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT. Our approach does not require any changes to the underlying BERT architecture, rather relying on restructuring examples for training and prediction. Evaluation on established datasets, including the CoNLL'02 and CoNLL'03 NER benchmarks, demonstrates that our proposed approach can improve on the state-of-the-art NER results on English, Dutch, and Finnish, achieves the best reported BERT-based results on German, and is on par with performance reported with other BERT-based approaches in Spanish. We release all methods implemented in this work under open licenses.

翻訳日:2022-11-26 00:31:39 公開日:2020-12-17

# ニューラルパワーユニット

Neural Power Units ( http://arxiv.org/abs/2006.01681v4 )

ライセンス: Link先を確認

Niklas Heim, Tom\'a\v{s} Pevn\'y, V\'aclav \v{S}m\'idl

(参考訳) 従来のニューラルネットワークは、単純な算術演算を近似できるが、訓練中に見られた数の範囲を超えて一般化できない。ニューラル演算ユニットは、この困難を克服することを目指しているが、現在の演算ユニットは正の数で操作できるか、算術演算のサブセットしか表現できない。実数の全領域で動作するニューラルパワーユニット(NPU)を導入し,任意のパワー関数を単一層で学習することができる。したがって、NPUは既存の算術単位の欠点を修正し、その表現性を拡張する。ネットワークの複雑な数への変換を必要とせず、複雑な算術を用いてこれを実現する。 RealNPUへのユニットの単純化は、非常に透明なモデルをもたらす。我々は,NPUが人工算術データセットの精度と空間性において競合相手より優れており,RealNPUはデータからのみ動的システムの制御方程式を発見できることを示した。

Conventional Neural Networks can approximate simple arithmetic operations, but fail to generalize beyond the range of numbers that were seen during training. Neural Arithmetic Units aim to overcome this difficulty, but current arithmetic units are either limited to operate on positive numbers or can only represent a subset of arithmetic operations. We introduce the Neural Power Unit (NPU) that operates on the full domain of real numbers and is capable of learning arbitrary power functions in a single layer. The NPU thus fixes the shortcomings of existing arithmetic units and extends their expressivity. We achieve this by using complex arithmetic without requiring a conversion of the network to complex numbers. A simplification of the unit to the RealNPU yields a highly transparent model. We show that the NPUs outperform their competitors in terms of accuracy and sparsity on artificial arithmetic datasets, and that the RealNPU can discover the governing equations of a dynamical system only from data.

翻訳日:2022-11-25 23:09:56 公開日:2020-12-17

# 不確実性モデリングによる弱教師付き時間行動定位

Weakly-supervised Temporal Action Localization by Uncertainty Modeling ( http://arxiv.org/abs/2006.07006v3 )

ライセンス: Link先を確認

Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun

(参考訳) 弱教師付き時間的行動局所化は,ビデオレベルラベルのみを用いて時間的行動区間を検出することを目的としている。この目的のために、アクションクラスのフレームをバックグラウンドフレーム(つまり、どのアクションクラスにも属さないフレーム)から分離することが不可欠である。本稿では,背景フレームの非一貫性に関する分散サンプルとしてモデル化された背景フレームについて,新しい視点を提案する。フレームレベルのラベルを使わずに直接不確実性を学習することは不可能であり,不確実性として知られる各フレームの分布外確率を推定することにより,背景フレームを検出することができる。弱教師付き設定における不確実性学習を実現するために,複数のインスタンス学習定式化を利用する。さらに,すべてのアクションクラスに一様に分布する分布内(動作)確率を奨励することにより,背景フレームの識別性を向上する背景エントロピー損失も導入する。実験の結果,不確実性モデリングは背景フレームの干渉を軽減する効果があり,ベルやホイッスルを使わずに大きな性能向上をもたらすことがわかった。我々は,ベンチマークのTHUMOS'14とActivityNet(1.2と1.3)において,我々のモデルが最先端の手法を大幅に上回ることを示す。私たちのコードはhttps://github.com/pilhyeon/wtal-uncertainty-modelingで利用可能です。

Weakly-supervised temporal action localization aims to learn detecting temporal intervals of action classes with only video-level labels. To this end, it is crucial to separate frames of action classes from the background frames (i.e., frames not belonging to any action classes). In this paper, we present a new perspective on background frames where they are modeled as out-of-distribution samples regarding their inconsistency. Then, background frames can be detected by estimating the probability of each frame being out-of-distribution, known as uncertainty, but it is infeasible to directly learn uncertainty without frame-level labels. To realize the uncertainty learning in the weakly-supervised setting, we leverage the multiple instance learning formulation. Moreover, we further introduce a background entropy loss to better discriminate background frames by encouraging their in-distribution (action) probabilities to be uniformly distributed over all action classes. Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles. We demonstrate that our model significantly outperforms state-of-the-art methods on the benchmarks, THUMOS'14 and ActivityNet (1.2 & 1.3). Our code is available at https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling.

翻訳日:2022-11-22 03:17:14 公開日:2020-12-17

# UWSpeech: 無書き言語のための音声から音声への翻訳

UWSpeech: Speech to Speech Translation for Unwritten Languages ( http://arxiv.org/abs/2006.07926v2 )

ライセンス: Link先を確認

Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Kejun Zhang, Tie-Yan Liu

(参考訳) 既存の音声から音声への翻訳システムは、ターゲット言語のテキストに大きく依存している:通常、ソース言語を対象のテキストに翻訳し、テキストからターゲットの音声を合成するか、または補助訓練のためにターゲットのテキストでターゲットの音声に直接翻訳する。しかし、これらの方法は、テキストや音素が書けない、未記述のターゲット言語には適用できない。本稿では,UWSpeechと名づけられた非記述言語のための翻訳システムを開発する。これは,対象の非記述音声をコンバータで個別のトークンに変換し,次に翻訳器で対象の個別のトークンに翻訳し,最終的にターゲットの個別のトークンからインバータでターゲットの音声を合成する。本稿では,ベクトル量子化変分オートエンコーダ(VQ-VAE)と言語間音声認識(XL)を併用したXL-VAEという手法を提案する。スペイン語と英語の会話翻訳データセットの実験では、UWSpeechは直接翻訳とVQ-VAEベースラインをそれぞれ16と10のBLEUポイントで上回り、UWSpeechの利点と可能性を示している。

Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training. However, those methods cannot be applied to unwritten target languages, which have no written text or phoneme available. In this paper, we develop a translation system for unwritten languages, named as UWSpeech, which converts target unwritten speech into discrete tokens with a converter, and then translates source-language speech into target discrete tokens with a translator, and finally synthesizes target speech from target discrete tokens with an inverter. We propose a method called XL-VAE, which enhances vector quantized variational autoencoder (VQ-VAE) with cross-lingual (XL) speech recognition, to train the converter and inverter of UWSpeech jointly. Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech.

翻訳日:2022-11-21 13:23:22 公開日:2020-12-17

# 導入コストを伴わない転校学習回帰の一般授業

A General Class of Transfer Learning Regression without Implementation Cost ( http://arxiv.org/abs/2006.13228v2 )

ライセンス: Link先を確認

Shunya Minami, Song Liu, Stephen Wu, Kenji Fukumizu, Ryo Yoshida

(参考訳) 本稿では,既存の回帰学習手法を統一し,拡張する新しいフレームワークを提案する。対象タスクにおける事前学習されたソースモデルをモデルにブリッジするために,事前分布を持つベイズフレームワークによって推定される密度比関数を導入する。 2つの内在的ハイパーパラメータを変更して密度比モデルを選択することにより、クロスドメイン類似性正規化に基づくTL、密度比推定を用いた確率的TL、事前訓練されたニューラルネットワークの微調整という3つの一般的なTLの方法を統合することができる。さらに,本手法は,既存の出力変数が単純に新しい出力変数に変換されるような教師付き学習のためのオフ・ザ・シェルフ・ライブラリを用いて,回帰モデルを十分に訓練することができる。様々な実データアプリケーションを用いて,その単純さ,汎用性,適用性を示す。

We propose a novel framework that unifies and extends existing methods of transfer learning (TL) for regression. To bridge a pretrained source model to the model on a target task, we introduce a density-ratio reweighting function, which is estimated through the Bayesian framework with a specific prior distribution. By changing two intrinsic hyperparameters and the choice of the density-ratio model, the proposed method can integrate three popular methods of TL: TL based on cross-domain similarity regularization, a probabilistic TL using the density-ratio estimation, and fine-tuning of pretrained neural networks. Moreover, the proposed method can benefit from its simple implementation without any additional cost; the regression model can be fully trained using off-the-shelf libraries for supervised learning in which the original output variable is simply transformed to a new output variable. We demonstrate its simplicity, generality, and applicability using various real data applications.

翻訳日:2022-11-17 22:16:30 公開日:2020-12-17

# 重み付き多数投票のための2次PAC-Bayesian境界

Second Order PAC-Bayesian Bounds for the Weighted Majority Vote ( http://arxiv.org/abs/2007.13532v2 )

ライセンス: Link先を確認

Andr\'es R. Masegosa and Stephan S. Lorenzen and Christian Igel and Yevgeny Seldin

(参考訳) マルチクラス分類における重み付け多数決の予測リスクについて, 新たな分析を行った。この分析は、アンサンブルメンバーによる予測の相関を考慮に入れ、効率的な最小化に適応可能な境界を提供し、多数決の重み付けを改善する。さらにバウンド・フォー・バイナリ分類(bound for binary classification)の特別なバージョンも提供しています。実験では,無作為林における樹木の重み付けを改善するためにバウンドを適用し,一般に使用される1次バウンドとは対照的に,新しいバウンドの最小化は通常,アンサンブルの試験誤差が低下しないことを示した。

We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble.

翻訳日:2022-11-14 22:27:55 公開日:2020-12-17

# 禁止か禁止か - ベイジアンアテンションネットワークによる、信頼できるヘイトスピーチ検出

To BAN or not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection ( http://arxiv.org/abs/2007.05304v7 )

ライセンス: Link先を確認

Kristian Miok, Blaz Skrlj, Daniela Zaharie and Marko Robnik-Sikonja

(参考訳) ヘイトスピーチはユーザ生成コンテンツの管理において重要な問題である。悪質なコンテンツを削除するか、悪質なユーザーを禁止するには、コンテンツモデレーターは信頼できるヘイトスピーチ検知器が必要である。近年、(多言語)bertモデルのようなトランスフォーマーアーキテクチャに基づくディープニューラルネットワークは、ヘイトスピーチ検出を含む多くの自然言語分類タスクにおいて優れた性能を達成している。これまでのところ、これらの手法は信頼性の観点からアウトプットを定量化できなかった。本研究では,モンテカルロドロップアウトをトランスフォーマーモデルの注意層内に配置し,信頼性評価を行うベイズ法を提案する。いくつかの言語におけるヘイトスピーチ検出問題に対する提案手法の結果を評価し,可視化する。さらに,ヘイトスピーチ分類において,BERTモデルによって抽出された情報を感情次元で拡張できるかどうかを検証した。実験の結果,モンテカルロドロップアウトはトランスフォーマネットワークの信頼性評価に有効なメカニズムであることがわかった。 BERTモデルでの使用により、最先端の分類性能が向上し、信頼性の低い予測を検出できる。また,センティック・コンピューティング手法を用いて抽出した感情次元は,ヘイトスピーチに関わる感情の解釈に対する洞察を与えることができた。提案手法は,最先端の多言語BERTモデルの分類性能を向上するだけでなく,計算された信頼性スコアも,検査や再注釈キャンペーンにおける作業負荷を大幅に削減する。提供された視覚化は、境界線の結果を理解するのに役立つ。

Hate speech is an important problem in the management of user-generated content. To remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on the transformer architecture, such as the (multilingual) BERT model, achieve superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate and visualize the results of the proposed approach on hate speech detection problems in several languages. Additionally, we test if affective dimensions can enhance the information extracted by the BERT model in hate speech classification. Our experiments show that Monte Carlo dropout provides a viable mechanism for reliability estimation in transformer networks. Used within the BERT model, it ofers state-of-the-art classification performance and can detect less trusted predictions. Also, it was observed that affective dimensions extracted using sentic computing methods can provide insights toward interpretation of emotions involved in hate speech. Our approach not only improves the classification performance of the state-of-the-art multilingual BERT model but the computed reliability scores also significantly reduce the workload in an inspection of ofending cases and reannotation campaigns. The provided visualization helps to understand the borderline outcomes.

翻訳日:2022-11-11 22:26:06 公開日:2020-12-17

# 逆蒸留による深部コンテクスト臨床予測

Deep Contextual Clinical Prediction with Reverse Distillation ( http://arxiv.org/abs/2007.05611v2 )

ライセンス: Link先を確認

Rohan S. Kodialam, Rebecca Boiarsky, Justin Lim, Neil Dixit, Aditya Sai, David Sontag

(参考訳) 医療プロバイダーは、機械学習を使って患者の結果を予測し、意味のある介入をしている。しかしながら、この分野のイノベーションにもかかわらず、浅い線形モデルのパフォーマンスと一致することに苦慮するディープラーニングモデルが多く、そのようなテクニックを実際に活用することは困難である。本研究は,保険請求項から臨床予測の課題を動機とし,初期化のための高パフォーマンス線形モデルを用いて深層モデルを事前学習する逆蒸留と呼ばれる新しい手法を提案する。我々は, 保険請求データセットの縦断構造を用いて, 逆蒸留による自己注意を発達させ, 文脈埋め込み, 時間埋め込み, 自己照査機構を組み合わせたアーキテクチャであり, 逆蒸留によってもっとも重要な訓練を行う。 SARDは、複数の臨床予測結果に関する最先端の手法よりも優れており、逆蒸留がこれらの改善の原動力であることをアブレーション研究が明らかにしている。コードはhttps://github.com/clinicalml/omop-learnで入手できる。

Healthcare providers are increasingly using machine learning to predict patient outcomes to make meaningful interventions. However, despite innovations in this area, deep learning models often struggle to match performance of shallow linear models in predicting these outcomes, making it difficult to leverage such techniques in practice. In this work, motivated by the task of clinical prediction from insurance claims, we present a new technique called Reverse Distillation which pretrains deep models by using high-performing linear models for initialization. We make use of the longitudinal structure of insurance claims datasets to develop Self Attention with Reverse Distillation, or SARD, an architecture that utilizes a combination of contextual embedding, temporal embedding and self-attention mechanisms and most critically is trained via reverse distillation. SARD outperforms state-of-the-art methods on multiple clinical prediction outcomes, with ablation studies revealing that reverse distillation is a primary driver of these improvements. Code is available at https://github.com/clinicalml/omop-learn.

翻訳日:2022-11-11 20:58:49 公開日:2020-12-17

# ネットワークを用いた疾患遺伝子予測の最近の進歩

Recent Advances in Network-based Methods for Disease Gene Prediction ( http://arxiv.org/abs/2007.10848v2 )

ライセンス: Link先を確認

Sezin Kircali Ata, Min Wu, Yuan Fang, Le Ou-Yang, Chee Keong Kwoh and Xiao-Li Li

(参考訳) ゲノムワイド・アソシエーション研究(GWAS)による疾患遺伝子関連研究は、研究者にとって困難な課題である。特定の疾患と相関する単一ヌクレオチド多型(SNP)を調べるには、関連性の統計的解析が必要である。突然変異の可能性が大きいことを考えると、高いコストに加えて、GWAS分析のもう一つの重要な欠点は偽陽性の数が多すぎることである。そこで研究者たちは、さまざまな情報源で結果をクロスチェックする証拠を探す。代替の低コストの疾患遺伝子関連証拠を研究者に提供するため、計算アプローチが実施される。分子ネットワークは病気の分子間の複雑な相互作用を捉えることができるため、疾患遺伝子関連予測において最も広く用いられるデータの一つとなる。本調査では,ネットワークを用いた疾患遺伝子予測手法の総合的かつ最新のレビューを行う。また,14種類の最先端手法の実証分析を行った。まず,疾患遺伝子予測のタスク定義を明らかにする。次に,ネットワーク拡散法,手作りのグラフ特徴を持つ従来の機械学習法,グラフ表現学習法について検討した。第3に,7つの疾患にまたがって選択された方法の性能を評価する実験分析を行った。また,本研究の実証分析に基づいて,提案手法の判別を行った。最後に,今後の疾患遺伝子予測研究の方向性を明らかにする。

Disease-gene association through Genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms (SNPs) that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false-positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and an up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.

翻訳日:2022-11-09 00:49:31 公開日:2020-12-17

# 建物セグメンテーションマスクの機械学習規則化とポリゴン化

Machine-learned Regularization and Polygonization of Building Segmentation Masks ( http://arxiv.org/abs/2007.12587v3 )

ライセンス: Link先を確認

Stefano Zorzi, Ksenia Bittner, Friedrich Fraundorfer

(参考訳) 本稿では,建物セグメンテーションマスクの自動正則化と多角化のための機械学習手法を提案する。まずイメージを入力として,汎用的完全畳み込みネットワーク(FCN)を利用したセグメンテーションマップを構築する。次に、生成逆ネットワーク(gan)は、より現実的になるように境界を定式化すること、つまり、必要であれば直角を構成するより直線的なアウトラインを持つことに関係している。これは、入力画像が真である確率を与える判別器と、識別器の応答から学習してより現実的な画像を生成するジェネレータとの相互作用によって達成される。最後に,正規化された建物セグメント化結果から,建物コーナーに対応するばらばらな結果を予測するために適応したbackbone convolutional neural network(cnn)をトレーニングする。 3つの建物セグメンテーションデータセットを用いた実験により,提案手法は正確な結果を得るだけでなく,ポリゴンとしてパラメータ化された視覚的に快適な建物概要を生成することができることを示した。

We propose a machine learning based approach for automatic regularization and polygonization of building segmentation masks. Taking an image as input, we first predict building segmentation maps exploiting generic fully convolutional network (FCN). A generative adversarial network (GAN) is then involved to perform a regularization of building boundaries to make them more realistic, i.e., having more rectilinear outlines which construct right angles if required. This is achieved through the interplay between the discriminator which gives a probability of input image being true and generator that learns from discriminator's response to create more realistic images. Finally, we train the backbone convolutional neural network (CNN) which is adapted to predict sparse outcomes corresponding to building corners out of regularized building segmentation results. Experiments on three building segmentation datasets demonstrate that the proposed method is not only capable of obtaining accurate results, but also of producing visually pleasing building outlines parameterized as polygons.

翻訳日:2022-11-07 06:58:14 公開日:2020-12-17

# プロセス制御と最適化のための確率制約ポリシー最適化

Chance Constrained Policy Optimization for Process Control and Optimization ( http://arxiv.org/abs/2008.00030v2 )

ライセンス: Link先を確認

Panagiotis Petsagkourakis, Ilya Orson Sandoval, Eric Bradford, Federico Galvanin, Dongda Zhang and Ehecatl Antonio del Rio-Chanona

(参考訳) 化学プロセスの最適化と制御は影響を受けます 1) 植物モデルミスマッチ 2)プロセス障害、及び 3)安全運転の制約。政策最適化による強化学習は、確率性、プラントモデルミスマッチに対処する能力、そして将来の不確実性とそのフィードバックを適切な閉ループ方式で直接的に考慮する能力により、この問題を解決する自然な方法である。強化学習が産業プロセス(あるいはほとんどすべてのエンジニアリングアプリケーション)で考慮されていない主な理由の1つは、安全クリティカルな制約に対処するためのフレームワークがないことである。政策最適化の現在のアルゴリズムは、困難なペナルティパラメータを使用し、状態制約を確実に満たさないか、あるいは期待された場合にのみ保証を提示する。本稿では,安全上の重要な課題に欠かせない連関制約の満足度を高い確率で保証する確率制約付きポリシー最適化(CCPO)アルゴリズムを提案する。これは、フィードバックポリシーと同時に計算される制約引き締め(バックオフ)の導入によって達成される。バックオフは確率的制約の経験的累積分布関数を用いてベイズ最適化で調整され、したがって自己調整される。これにより、現在のポリシー最適化アルゴリズムに組み込むことができる一般的な方法論が実現され、高い確率で共同確率制約を満たすことができる。本稿では,提案手法の性能分析を行うケーススタディを提案する。

Chemical process optimization and control are affected by 1) plant-model mismatch, 2) process disturbances, and 3) constraints for safe operation. Reinforcement learning by policy optimization would be a natural way to solve this due to its ability to address stochasticity, plant-model mismatch, and directly account for the effect of future uncertainty and its feedback in a proper closed-loop manner; all without the need of an inner optimization loop. One of the main reasons why reinforcement learning has not been considered for industrial processes (or almost any engineering application) is that it lacks a framework to deal with safety critical constraints. Present algorithms for policy optimization use difficult-to-tune penalty parameters, fail to reliably satisfy state constraints or present guarantees only in expectation. We propose a chance constrained policy optimization (CCPO) algorithm which guarantees the satisfaction of joint chance constraints with a high probability - which is crucial for safety critical tasks. This is achieved by the introduction of constraint tightening (backoffs), which are computed simultaneously with the feedback policy. Backoffs are adjusted with Bayesian optimization using the empirical cumulative distribution function of the probabilistic constraints, and are therefore self-tuned. This results in a general methodology that can be imbued into present policy optimization algorithms to enable them to satisfy joint chance constraints with high probability. We present case studies that analyze the performance of the proposed approach.

翻訳日:2022-11-05 15:06:58 公開日:2020-12-17

# 畳み込みニューラルネットワークを用いた繊維配向分布の強化

Enhancing Fiber Orientation Distributions using convolutional Neural Networks ( http://arxiv.org/abs/2008.05409v2 )

ライセンス: Link先を確認

Oeslle Lucena, Sjoerd B. Vos, Vejay Vakharia, John Duncan, Keyoumars Ashkan, Rachel Sparks, Sebastien Ourselin

(参考訳) 拡散磁気共鳴イメージング(dMRI)に基づく高精度な局所繊維配向分布(FOD)モデリングは、多数の勾配方向(b-vecs)、最大b値(b-vals)、複数b値(multi-shells)をサンプリングする特定の取得プロトコルの恩恵を受けることができる。しかし、取得時間は臨床的に制限されており、商用スキャナはそのようなdmriシーケンスを提供しない。したがって、dMRIはしばしば単殻(単一のb値)として取得される。本研究では,商用MRIにおけるFODの改良について述べる。単一殻表現から複数殻のFOD表現を復元する能力に対して,パッチベースの3次元畳み込みニューラルネットワーク(CNN)を評価し,その表現は制約付き球状デコンボリューション(CSD)から得られた球面調和からFODをモデル化する。 u-net と highresnet の 3d cnn アーキテクチャを human connectome プロジェクトと社内データセットのデータで評価する。我々は各CNNモデルがいかに局所繊維配向を解消できるかを評価する。 1)同じdMRI取得プロトコルでデータセットのトレーニング及びテストを行う場合。 2) CNNモデルのトレーニングに使用するものとは異なるdMRI取得プロトコルでデータセットをテストする場合。 3) cnnモデルのトレーニングに使用するよりも、勾配方向の数が少ないデータセットでテストする場合。本手法は, 単殻dMRI取得プロトコルにおいて, 傾きの少ないCDDモデル推定が可能であり, 取得時間を短縮し, FOD推定の改善を時間限定臨床環境に翻訳しやすくする。

Accurate local fiber orientation distribution (FOD) modeling based on diffusion magnetic resonance imaging (dMRI) capable of resolving complex fiber configurations benefits from specific acquisition protocols that sample a high number of gradient directions (b-vecs), a high maximum b-value(b-vals), and multiple b-values (multi-shell). However, acquisition time is limited in a clinical setting and commercial scanners may not provide such dMRI sequences. Therefore, dMRI is often acquired as single-shell (single b-value). In this work, we learn improved FODs for commercially acquired MRI. We evaluate patch-based 3D convolutional neural networks (CNNs)on their ability to regress multi-shell FOD representations from single-shell representations, where the representation is a spherical harmonics obtained from constrained spherical deconvolution (CSD) to model FODs. We evaluate U-Net and HighResNet 3D CNN architectures on data from the Human Connectome Project and an in-house dataset. We evaluate how well each CNN model can resolve local fiber orientation 1) when training and testing on datasets with the same dMRI acquisition protocol; 2) when testing on a dataset with a different dMRI acquisition protocol than used to train the CNN models; and 3) when testing on a dataset with a fewer number of gradient directions than used to train the CNN models. Our approach may enable robust CSD model estimation on single-shell dMRI acquisition protocols with few gradient directions, reducing acquisition times, facilitating translation of improved FOD estimation to time-limited clinical environments.

翻訳日:2022-10-31 06:07:23 公開日:2020-12-17

# 専門家合意を金基準とした画像のバイオファウリング評価の自動化

Automating the assessment of biofouling in images using expert agreement as a gold standard ( http://arxiv.org/abs/2008.09289v2 )

ライセンス: Link先を確認

Nathaniel J. Bloomfield and Susan Wei and Bartholomew Woodham and Peter Wilkinson and Andrew Robinson

(参考訳) バイオファウリング(英: Biofouling)は、水に浸漬された表面上の生物の蓄積である。燃料コストを増加させ、非種族海洋種が新しい地域に定着するための経路を提供することによって、生物セキュリティのリスクをもたらすため、国際海運産業にとって特に懸念されている。生物汚染リスク管理規制を強化するための司法管轄区域内での関心が高まりつつあるが、船体の生物汚染状況を決定するために水中検査を行い、収集したデータを評価することは高価である。機械学習は後者の課題に取り組むのに適しており、深層学習を用いて水中検査からの画像の分類を自動化し、ファウリングの存在と重大さを特定する。水中調査から得られた1万枚以上の画像は,複数のデータセットを組み合わせて収集した。これらの画像の120サンプルのサブセットについて,3人の専門家によるアノテーションを比較し,89%の同意(95%CI:87-92%)を示した。これらの専門家の1人によるデータセット全体のラベル付けは、この専門家グループと同じようなレベルの合意を達成し、我々は、パフォーマンスが5%悪くなる(p=0.009-0.054)と定義した。これらの専門家ラベルを用いて,専門家グループと類似した深層学習モデル(p=0.001-0.014)を訓練し,画像中のバイオファウリングの自動解析が実現可能で有効であることを実証した。

Biofouling is the accumulation of organisms on surfaces immersed in water. It is of particular concern to the international shipping industry because it increases fuel costs and presents a biosecurity risk by providing a pathway for non-indigenous marine species to establish in new areas. There is growing interest within jurisdictions to strengthen biofouling risk-management regulations, but it is expensive to conduct in-water inspections and assess the collected data to determine the biofouling state of vessel hulls. Machine learning is well suited to tackle the latter challenge, and here we apply deep learning to automate the classification of images from in-water inspections to identify the presence and severity of fouling. We combined several datasets to obtain over 10,000 images collected from in-water surveys which were annotated by a group biofouling experts. We compared the annotations from three experts on a 120-sample subset of these images, and found that they showed 89% agreement (95% CI: 87-92%). Subsequent labelling of the whole dataset by one of these experts achieved similar levels of agreement with this group of experts, which we defined as performing at most 5% worse (p=0.009-0.054). Using these expert labels, we were able to train a deep learning model that also agreed similarly with the group of experts (p=0.001-0.014), demonstrating that automated analysis of biofouling in images is feasible and effective using this method.

翻訳日:2022-10-26 20:42:38 公開日:2020-12-17

# インスタンスセグメンテーションのための確率的深層学習

Probabilistic Deep Learning for Instance Segmentation ( http://arxiv.org/abs/2008.10678v2 )

ライセンス: Link先を確認

Josef Lorenz Rumberger, Lisa Mais, Dagmar Kainmueller

(参考訳) 点推定の代わりに予測の分布を予測する確率論的畳み込みニューラルネットワークは、画像再構成からセマンティックセグメンテーションまで、コンピュータビジョンの多くの領域で近年進歩している。技術ベンチマークの結果の他に、これらのネットワークは予測における局所的な不確実性を定量化することができた。これらはアクティブな学習フレームワークで、専門家の注釈のラベリングを目標にしたり、安全クリティカルな環境で予測の質を評価するために使われた。しかし、例えば、これらの手法は今のところ頻繁には使われていない。提案手法は,提案不要なインスタンスセグメンテーションモデル内のモデル独立不確実性推定値を求める。さらに,セマンティクスセグメンテーションから適応した指標を用いて不確実性推定の品質を分析する。提案手法をBBBC010Cで評価した。 elegansデータセットは、競合パフォーマンスを生み出すと同時に、誤った分割や誤ったマージといったオブジェクトレベルの不正確性に関する情報を運ぶ不確実性推定を予測します。我々は,このような不確実性推定を指導的証明読解で活用する可能性を示すシミュレーションを行う。

Probabilistic convolutional neural networks, which predict distributions of predictions instead of point estimates, led to recent advances in many areas of computer vision, from image reconstruction to semantic segmentation. Besides state of the art benchmark results, these networks made it possible to quantify local uncertainties in the predictions. These were used in active learning frameworks to target the labeling efforts of specialist annotators or to assess the quality of a prediction in a safety-critical environment. However, for instance segmentation problems these methods are not frequently used so far. We seek to close this gap by proposing a generic method to obtain model-inherent uncertainty estimates within proposal-free instance segmentation models. Furthermore, we analyze the quality of the uncertainty estimates with a metric adapted from semantic segmentation. We evaluate our method on the BBBC010 C.\ elegans dataset, where it yields competitive performance while also predicting uncertainty estimates that carry information about object-level inaccuracies like false splits and false merges. We perform a simulation to show the potential use of such uncertainty estimates in guided proofreading.

翻訳日:2022-10-25 09:15:06 公開日:2020-12-17

# 共起方向の再検討:スパース行列のシャーパ解析と効率的なアルゴリズム

Revisiting Co-Occurring Directions: Sharper Analysis and Efficient Algorithm for Sparse Matrices ( http://arxiv.org/abs/2009.02553v2 )

ライセンス: Link先を確認

Luo Luo, Cheng Chen, Guangzeng Xie, Haishan Ye

(参考訳) 近似行列乗算(AMM)のストリーミングモデルについて検討した。我々は、アルゴリズムが限られたメモリを持つデータのみを渡すことができるというシナリオに興味を持っている。 AMMをストリーミングするための最先端の決定論的スケッチアルゴリズムは共起方向(COD)であり、ランダム化アルゴリズムよりも近似誤差がはるかに小さく、他の決定論的スケッチ手法を実証的に上回る。本稿では,確率的近似低ランク構造と入力行列の相関を主項とするCODに対して,より厳密な誤差境界を提供する。改良された誤差境界に対してCODが最適であることを示す。また,理論的保証付きスパース行列に対するCODの変種も提案する。実世界のスパースデータセットに関する実験は、提案アルゴリズムがベースライン法よりも効率的であることを示している。

We study the streaming model for approximate matrix multiplication (AMM). We are interested in the scenario that the algorithm can only take one pass over the data with limited memory. The state-of-the-art deterministic sketching algorithm for streaming AMM is the co-occurring directions (COD), which has much smaller approximation errors than randomized algorithms and outperforms other deterministic sketching methods empirically. In this paper, we provide a tighter error bound for COD whose leading term considers the potential approximate low-rank structure and the correlation of input matrices. We prove COD is space optimal with respect to our improved error bound. We also propose a variant of COD for sparse matrices with theoretical guarantees. The experiments on real-world sparse datasets show that the proposed algorithm is more efficient than baseline methods.

翻訳日:2022-10-21 20:53:41 公開日:2020-12-17

# 言語モデルの批判的思考

Critical Thinking for Language Models ( http://arxiv.org/abs/2009.07185v2 )

ライセンス: Link先を確認

Gregor Betz and Christian Voigt and Kyle Richardson

(参考訳) 本稿では,ニューラル自動回帰言語モデルの批判的思考カリキュラムに向けて第一歩を踏み出す。本稿では,帰納的有効引数の合成コーパスを導入し,gpt-2の学習と評価のための人工的議論テキストを生成する。 3つの単純なコアスキームでモデルをトレーニングすることで、異なる、より複雑なタイプの引数の結論を正確に達成することができます。言語モデルは、コア引数スキームを正しい方法で一般化する。さらに,NLUベンチマークに対して一貫した有望な結果が得られる。特に、議論スキームの事前訓練は、GLUE診断のゼロショット精度を最大15ポイント向上させる。この結果は、基本的な推論能力(批判的思考教科書など)を実証するテキストの中間的事前学習が、言語モデルが幅広い推論スキルを獲得するのに役立つことを示唆している。本稿では,このような「言語モデルのための批判的思考カリキュラム」を構築する上で有望な出発点である。

This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models. We introduce a synthetic corpus of deductively valid arguments, and generate artificial argumentative texts to train and evaluate GPT-2. Significant transfer learning effects can be observed: Training a model on three simple core schemes allows it to accurately complete conclusions of different, and more complex types of arguments, too. The language models generalize the core argument schemes in a correct way. Moreover, we obtain consistent and promising results for NLU benchmarks. In particular, pre-training on the argument schemes raises zero-shot accuracy on the GLUE diagnostics by up to 15 percentage points. The findings suggest that intermediary pre-training on texts that exemplify basic reasoning abilities (such as typically covered in critical thinking textbooks) might help language models to acquire a broad range of reasoning skills. The synthetic argumentative texts presented in this paper are a promising starting point for building such a "critical thinking curriculum for language models."

翻訳日:2022-10-18 05:20:45 公開日:2020-12-17

# 潜在的公平決定を伴う確率的モデリングによる集団公平性

Group Fairness by Probabilistic Modeling with Latent Fair Decisions ( http://arxiv.org/abs/2009.09031v2 )

ライセンス: Link先を確認

YooJung Choi, Meihua Dang, Guy Van den Broeck

(参考訳) 機械学習システムは、ローン申請や刑事司法リスクアセスメントなどの影響のある決定にますます使われており、これらのシステムの公正性を保証することが重要である。データ内のラベルが偏っているため、これはしばしば困難である。本稿では,隠蔽ラベルを表す潜在変数を明示的にモデル化し,バイアスデータから確率分布を学習する。特に,学習モデルに一定の不依存性を課すことにより,人口統計学の同等性の実現を目指す。また、これらの保証を提供するために使用される分布が実際に実世界のデータをキャプチャしている場合にのみ、グループフェアネス保証が有意義であることを示す。データ分布を密にモデル化するために,表現的かつトラクタブルな確率モデルである確率回路を用い,不完全データから学習するアルゴリズムを提案する。観測されたラベルが公正なラベルに由来するが、バイアスが増す合成データセットに対するアプローチを評価し、公正なラベルが正常に検索されることを示す。さらに,実世界のデータセットでは,既存のデータ生成方法よりも優れたモデルであるだけでなく,競合精度も達成できることを示す。

Machine learning systems are increasingly being used to make impactful decisions such as loan applications and criminal justice risk assessments, and as such, ensuring fairness of these systems is critical. This is often challenging as the labels in the data are biased. This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label. In particular, we aim to achieve demographic parity by enforcing certain independencies in the learned model. We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data. In order to closely model the data distribution, we employ probabilistic circuits, an expressive and tractable probabilistic model, and propose an algorithm to learn them from incomplete data. We evaluate our approach on a synthetic dataset in which observed labels indeed come from fair labels but with added bias, and demonstrate that the fair labels are successfully retrieved. Moreover, we show on real-world datasets that our approach not only is a better model than existing methods of how the data was generated but also achieves competitive accuracy.

翻訳日:2022-10-17 02:05:24 公開日:2020-12-17

# 事前コミットメントによるペアワイズおよびマルチプレイヤーインタラクションにおけるコーディネーションの進化

Evolution of Coordination in Pairwise and Multi-player Interactions via Prior Commitments ( http://arxiv.org/abs/2009.11727v2 )

ライセンス: Link先を確認

Ogbo Ndidi Bianca, Aiman Elgarig, The Anh Han

(参考訳) 集団的な努力を始めるとき、パートナーの好みと彼らが共通の目標にどれだけ強くコミットするかを理解することが重要です。後方利益の観点で事前の約束や合意を確立することは、協力を確保するための重要なメカニズムを提供する。本稿では,進化ゲーム理論(egt)の手法を参考にし,その成果が対数と多元的相互作用の両方において非対称な報酬構造を示す場合の協調性を高めるツールとして,事前コミットメントをどのように採用するかを分析する。協調問題にはいくつかの望ましい集団的成果があるかもしれない(協調的ジレンマにおいて唯一望ましい集団的成果である相互協力と比べれば)。分析および数値シミュレーションにより, 先行コミットメントが協調の強化に有効な進化のメカニズムであるか否か, 社会福祉全体は競争の集団的利益と重大さに強く依存し, さらには非対称的利益がコミットメント契約でどのように解決されるかが示唆された。さらに、マルチパーティインタラクションでは、最適な調整のために高いレベルのグループ多様性が必要な場合、事前のコミットメントが不可欠であることが証明される。結果は異なる選択強度に対して堅牢である。全体として,自律エージェント間の協調性を確保するための自己組織化・分散マルチエージェントシステムの設計だけでなく,人間のコミットメント能力による行動進化の複雑さと美しさに関する新たな知見を提供する。

Upon starting a collective endeavour, it is important to understand your partners' preferences and how strongly they commit to a common goal. Establishing a prior commitment or agreement in terms of posterior benefits and consequences from those engaging in it provides an important mechanism for securing cooperation. Resorting to methods from Evolutionary Game Theory (EGT), here we analyse how prior commitments can also be adopted as a tool for enhancing coordination when its outcomes exhibit an asymmetric payoff structure, in both pairwise and multiparty interactions. Arguably, coordination is more complex to achieve than cooperation since there might be several desirable collective outcomes in a coordination problem (compared to mutual cooperation, the only desirable collective outcome in cooperation dilemmas). Our analysis, both analytically and via numerical simulations, shows that whether prior commitment would be a viable evolutionary mechanism for enhancing coordination and the overall population social welfare strongly depends on the collective benefit and severity of competition, and more importantly, how asymmetric benefits are resolved in a commitment deal. Moreover, in multiparty interactions, prior commitments prove to be crucial when a high level of group diversity is required for optimal coordination. The results are robust for different selection intensities. Overall, our analysis provides new insights into the complexity and beauty of behavioral evolution driven by humans' capacity for commitment, as well as for the design of self-organised and distributed multi-agent systems for ensuring coordination among autonomous agents.

翻訳日:2022-10-15 05:25:25 公開日:2020-12-17

# ディープラーニングによるcloud cover nowcasting

Cloud Cover Nowcasting with Deep Learning ( http://arxiv.org/abs/2009.11577v3 )

ライセンス: Link先を確認

L\'ea Berthomier, Bruno Pradel and Lior Perez

(参考訳) Nowcastingは気象学の分野であり、気象予報を数時間の短期間で行うことを目的としている。気象学の世界では、この分野はデータ外挿のような特定の技術を必要とするため、通常の気象学は一般に物理モデリングに基づいているため、かなり特異である。本稿では,衛星撮影の最適化や太陽光発電の発電予測など,応用分野が多様であるクラウドカバーの nowcasting に着目した。近年,複数の画像タスクにおけるディープラーニングの成功に続いて,衛星画像に深部畳み込みニューラルネットワークを適用した。画像セグメンテーションと時系列予測に特化しているいくつかのアーキテクチャの結果を示す。機械学習の指標と気象の指標に基づいて最適なモデルを選択した。選択されたアーキテクチャはすべて、永続性よりも大幅に改善され、よく知られたU-NetはAROME物理モデルを上回った。

Nowcasting is a field of meteorology which aims at forecasting weather on a short term of up to a few hours. In the meteorology landscape, this field is rather specific as it requires particular techniques, such as data extrapolation, where conventional meteorology is generally based on physical modeling. In this paper, we focus on cloud cover nowcasting, which has various application areas such as satellite shots optimisation and photovoltaic energy production forecast. Following recent deep learning successes on multiple imagery tasks, we applied deep convolutionnal neural networks on Meteosat satellite images for cloud cover nowcasting. We present the results of several architectures specialized in image segmentation and time series prediction. We selected the best models according to machine learning metrics as well as meteorological metrics. All selected architectures showed significant improvements over persistence and the well-known U-Net surpasses AROME physical model.

翻訳日:2022-10-15 04:06:02 公開日:2020-12-17

# トピック対応マルチターン対話モデリング

Topic-Aware Multi-turn Dialogue Modeling ( http://arxiv.org/abs/2009.12539v2 )

ライセンス: Link先を確認

Yi Xu, Hai Zhao, Zhuosheng Zhang

(参考訳) 検索に基づくマルチターン対話モデルでは,コンテキスト発話中の有意な特徴を抽出することで,最も適切な応答を選択することが課題となっている。会話が進むにつれて、談話レベルのトピックシフトは、連続したマルチターン対話コンテキストを通じて自然に起こる。しかし,すべての検索ベースシステムは,文脈発話表現のための局所的な話題語の利用に満足しているが,会話レベルでのこのような重要なグローバルな話題認識の手がかりを捉えられなかった。本稿では,既存のシステムにおいて,トピックに依存しないn-gram発話を処理単位として扱う代わりに,トピック認識発話を教師なしの方法でセグメント抽出し,対話レベルでの健全なトピックシフトを把握し,マルチターン対話中のトピックフローを効果的に追跡する,マルチターン対話モデリングのための新しいトピック認識ソリューションを提案する。トピック認識モデリングは,新たに提案したトピック認識セグメンテーションアルゴリズムとトピック認識デュアルアテンションマッチング(TADAM)ネットワークによって実現されている。 3つの公開データセットの実験結果は、TADAMが最先端の手法、特に明らかなトピックシフトを持つEコマースデータセットの3.3%を上回っていることを示している。

In the retrieval-based multi-turn dialogue modeling, it remains a challenge to select the most appropriate response according to extracting salient features in context utterances. As a conversation goes on, topic shift at discourse-level naturally happens through the continuous multi-turn dialogue context. However, all known retrieval-based systems are satisfied with exploiting local topic words for context utterance representation but fail to capture such essential global topic-aware clues at discourse-level. Instead of taking topic-agnostic n-gram utterance as processing unit for matching purpose in existing systems, this paper presents a novel topic-aware solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way, so that the resulted model is capable of capturing salient topic shift at discourse-level in need and thus effectively track topic flow during multi-turn conversation. Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network, which matches each topic segment with the response in a dual cross-attention way. Experimental results on three public datasets show TADAM can outperform the state-of-the-art method, especially by 3.3% on E-commerce dataset that has an obvious topic shift.

翻訳日:2022-10-14 08:43:58 公開日:2020-12-17

# 教師なし)エンティティアライメントのためのビジュアルPivoting

Visual Pivoting for (Unsupervised) Entity Alignment ( http://arxiv.org/abs/2009.13603v2 )

ライセンス: Link先を確認

Fangyu Liu, Muhao Chen, Dan Roth, Nigel Collier

(参考訳) この研究は、異種知識グラフ(KG)の実体を整列させる視覚的意味表現の使用を研究する。画像は多くの既存のkgの自然な構成要素です。視覚知識を他の補助情報と組み合わせることで,提案する新しいアプローチであるevaが,クロスグラフエンティティアライメントに強いシグナルを与える包括的エンティティ表現を生成することを示す。さらに、以前のエンティティアライメント手法では、可用性を制限するために、人間のラベル付きシードアライメントが必要となる。 EVAは、エンティティの視覚的類似性を活用して、初期シード辞書(視覚的なピボット)を作成する、完全に教師なしのソリューションを提供する。ベンチマークデータセットDBP15kとDWY15kの実験は、EVAがモノリンガルとクロスリンガルの両方のエンティティアライメントタスクに対して最先端のパフォーマンスを提供することを示している。さらに、画像は特に長い尾のKGエンティティの整列に有用であり、通信を捉えるのに必要な構造的コンテキストが本質的に欠如していることが判明した。

This work studies the use of visual semantic representations to align entities in heterogeneous knowledge graphs (KGs). Images are natural components of many existing KGs. By combining visual knowledge with other auxiliary information, we show that the proposed new approach, EVA, creates a holistic entity representation that provides strong signals for cross-graph entity alignment. Besides, previous entity alignment methods require human labelled seed alignment, restricting availability. EVA provides a completely unsupervised solution by leveraging the visual similarity of entities to create an initial seed dictionary (visual pivots). Experiments on benchmark data sets DBP15k and DWY15k show that EVA offers state-of-the-art performance on both monolingual and cross-lingual entity alignment tasks. Furthermore, we discover that images are particularly useful to align long-tail KG entities, which inherently lack the structural contexts necessary for capturing the correspondences.

翻訳日:2022-10-13 20:46:56 公開日:2020-12-17

# モデル共有ゲーム:自発参加下での連合学習の分析

Model-sharing Games: Analyzing Federated Learning Under Voluntary Participation ( http://arxiv.org/abs/2010.00753v3 )

ライセンス: Link先を確認

Kate Donahue and Jon Kleinberg

(参考訳) フェデレーション学習(federated learning)は、エージェントがそれぞれのデータソースにアクセスし、ローカルデータからモデルを組み合わせてグローバルモデルを作成するための設定である。しかし、エージェントが異なる分布からデータを引き出している場合、連合学習はそれぞれのエージェントに最適ではない偏りのあるグローバルモデルを生成するかもしれない。つまりエージェントは,グローバルモデルやローカルモデルを選択するべきか,という根本的な問題に直面しているのです。この状況は連立ゲーム理論の枠組みによって自然に分析できることを示す。異なるモデルパラメータを持つ不均質なプレイヤーが、彼らのデータ分布と、彼らが自分たちの分散から異常に引き出した異なる量のデータを支配している。各プレイヤーのゴールは、最小限の期待平均二乗誤差(MSE)を持つモデルを得ることである。彼らは自身のデータのみに基づいたモデルに適合するか、学習したパラメータと他のプレイヤーのサブセットのパラメータを組み合わせるかを選択できる。モデルを組み合わせることで、より多くのデータにアクセスすることでエラーの分散成分が減少するが、分布の不均一性のためにバイアスが増加する。ここでは線形回帰と平均推定における問題に対する正確なMSE値を導出する。次に, ヘドニックゲーム理論(hedonic game theory)の枠組みを用いて, 結果ゲームの解析を行い, プレイヤーが連立モデル(s)を構成する各プレイヤー群にどのように分割するかを検討した。異なるカスタマイズ度をモデル化する3つのフェデレーションの手法を分析した。統一連合では、エージェントは集合的に単一のモデルを生成する。粒度の粗いフェデレーションでは、各エージェントはローカルモデルとともにグローバルモデルを重み付けすることができる。微細なフェデレーションでは、各エージェントは、フェデレーション内の他のすべてのエージェントのモデルを柔軟に組み合わせることができる。各方法について,プレイヤーの安定な分割を連立に分析する。

Federated learning is a setting where agents, each with access to their own data source, combine models from local data to create a global model. If agents are drawing their data from different distributions, though, federated learning might produce a biased global model that is not optimal for each agent. This means that agents face a fundamental question: should they choose the global model or their local model? We show how this situation can be naturally analyzed through the framework of coalitional game theory. We propose the following game: there are heterogeneous players with different model parameters governing their data distribution and different amounts of data they have noisily drawn from their own distribution. Each player's goal is to obtain a model with minimal expected mean squared error (MSE) on their own distribution. They have a choice of fitting a model based solely on their own data, or combining their learned parameters with those of some subset of the other players. Combining models reduces the variance component of their error through access to more data, but increases the bias because of the heterogeneity of distributions. Here, we derive exact expected MSE values for problems in linear regression and mean estimation. We then analyze the resulting game in the framework of hedonic game theory; we study how players might divide into coalitions, where each set of players within a coalition jointly construct model(s). We analyze three methods of federation, modeling differing degrees of customization. In uniform federation, the agents collectively produce a single model. In coarse-grained federation, each agent can weight the global model together with their local model. In fine-grained federation, each agent can flexibly combine models from all other agents in the federation. For each method, we analyze the stable partitions of players into coalitions.

翻訳日:2022-10-12 02:26:17 公開日:2020-12-17

# スパイクニューラルネットワークのニューロモルフィックハードウェアへのサーマルアウェアコンパイル

Thermal-Aware Compilation of Spiking Neural Networks to Neuromorphic Hardware ( http://arxiv.org/abs/2010.04773v2 )

ライセンス: Link先を確認

Twisha Titirsha and Anup Das

(参考訳) ニューロモルフィックコンピューティングのハードウェア実装は、スパイクニューラルネットワーク(SNN)で実装された機械学習タスクのパフォーマンスとエネルギー効率を大幅に向上させ、これらのハードウェアプラットフォームは組み込みシステムや他のエネルギー制約のある環境に特に適している。ハードウェアのクロスバーの長いビット線とワード線は、通常非揮発性メモリ(NVM)で設計されるシナプス要素を介してスパイクを伝播する際に、大きな電流変化を生じさせる。このような変化は、ハードウェアの各クロスバー内で、機械学習のワークロードと、これらのクロスバーへの負荷のニューロンとシナプスのマッピングに依存する熱勾配を生成する。この温度勾配は、スケールされた技術ノードにおいて重要となり、ハードウェアのリーク電力を増加させ、エネルギー消費を増加させる。ニューロモルフィックハードウェアにSNNベースの機械学習ワークロードのニューロンとシナプスをマッピングする新しい手法を提案する。我々は2つの新しい貢献をした。まず, 各NVM系シナプス細胞の温度を計算し, 隣接するセルの熱的寄与を考慮し, 負荷依存性を取り入れたニューロモルフィックハードウェアにおけるクロスバーの詳細な熱モデルを構築した。第2に、この熱モデルを、丘登りヒューリスティックを用いてSNNベースのワークロードのニューロンとシナプスのマッピングに組み込む。クロスバーの熱勾配を減少させることが目的である。我々は、最先端のニューロモルフィックハードウェアのための10の機械学習ワークロードを用いて、ニューロンとシナプスマッピング手法を評価する。ハードウェアの各クロスバーの平均温度を平均11.4K削減し,性能指向SNNマッピング技術と比較して,リーク電力消費量(総エネルギー消費率11%)を52%削減した。

Hardware implementation of neuromorphic computing can significantly improve performance and energy efficiency of machine learning tasks implemented with spiking neural networks (SNNs), making these hardware platforms particularly suitable for embedded systems and other energy-constrained environments. We observe that the long bitlines and wordlines in a crossbar of the hardware create significant current variations when propagating spikes through its synaptic elements, which are typically designed with non-volatile memory (NVM). Such current variations create a thermal gradient within each crossbar of the hardware, depending on the machine learning workload and the mapping of neurons and synapses of the workload to these crossbars. \mr{This thermal gradient becomes significant at scaled technology nodes and it increases the leakage power in the hardware leading to an increase in the energy consumption.} We propose a novel technique to map neurons and synapses of SNN-based machine learning workloads to neuromorphic hardware. We make two novel contributions. First, we formulate a detailed thermal model for a crossbar in a neuromorphic hardware incorporating workload dependency, where the temperature of each NVM-based synaptic cell is computed considering the thermal contributions from its neighboring cells. Second, we incorporate this thermal model in the mapping of neurons and synapses of SNN-based workloads using a hill-climbing heuristic. The objective is to reduce the thermal gradient in crossbars. We evaluate our neuron and synapse mapping technique using 10 machine learning workloads for a state-of-the-art neuromorphic hardware. We demonstrate an average 11.4K reduction in the average temperature of each crossbar in the hardware, leading to a 52% reduction in the leakage power consumption (11% lower total energy consumption) compared to a performance-oriented SNN mapping technique.

翻訳日:2022-10-09 05:40:33 公開日:2020-12-17

# セルオートマタの挙動類似性の測定

Measuring Behavioural Similarity of Cellular Automata ( http://arxiv.org/abs/2010.08431v2 )

ライセンス: Link先を確認

Peter D. Turney

(参考訳) コンウェイのゲーム・オブ・ライフは最も有名なセル・オートマトンである。出現と自己組織化の古典的なモデルであり、チューリング完全であり、普遍的なコンストラクタをシミュレートすることができる。ゲーム・オブ・ライフ(game of life)は262,144人のメンバーを持つ半トータル的なセル・オートマトンに属する。これらのオートマトンの多くは、ゲーム・オブ・ライフほど注目に値するかもしれない。ここでの課題は、この大きな家族を組織化し、興味深いオートマトンを見つけやすくし、オートマトン間の関係を理解するための構造を提供することです。 Packard and Wolfram (1985) は、規則の観察された振る舞いに基づいて、家族を4つのクラスに分けた。 eppstein (2010) は規則の形式に基づいた代替の4クラスシステムを提案した。クラスベースの組織の代わりに、各オートマトンが空間内の点によって表現される連続的な高次元ベクトル空間を提案する。この空間における2つのオートマトン間の距離は、その行動特性の差に対応する。この空間に最も近い近隣の地域も同様の行動をとる。この空間は、研究者が半トータル主義的な規則の家族の構造を観察し、家族の中に隠れた宝石を見つけるのが容易になる。

Conway's Game of Life is the best-known cellular automaton. It is a classic model of emergence and self-organization, it is Turing-complete, and it can simulate a universal constructor. The Game of Life belongs to the set of semi-totalistic cellular automata, a family with 262,144 members. Many of these automata may deserve as much attention as the Game of Life, if not more. The challenge we address here is to provide a structure for organizing this large family, to make it easier to find interesting automata, and to understand the relations between automata. Packard and Wolfram (1985) divided the family into four classes, based on the observed behaviours of the rules. Eppstein (2010) proposed an alternative four-class system, based on the forms of the rules. Instead of a class-based organization, we propose a continuous high-dimensional vector space, where each automaton is represented by a point in the space. The distance between two automata in this space corresponds to the differences in their behavioural characteristics. Nearest neighbours in the space have similar behaviours. This space should make it easier for researchers to see the structure of the family of semi-totalistic rules and to find the hidden gems in the family.

翻訳日:2022-10-06 21:15:25 公開日:2020-12-17

# 非定常MDPの安全政策改善に向けて

Towards Safe Policy Improvement for Non-Stationary MDPs ( http://arxiv.org/abs/2010.12645v2 )

ライセンス: Link先を確認

Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas

(参考訳) 現実世界のシーケンシャルな意思決定には、金融リスクと人命リスクを伴う重要なシステムが含まれる。過去にいくつかの研究がデプロイに安全な方法を提案しているが、根底にある問題は静止していると仮定している。しかし、多くの実世界の利害問題は非定常性を示し、利害関係が高ければ、偽の定常性仮定に関連するコストは受け入れがたい。我々は、スムーズに変化する非定常的な意思決定問題に対して、安全を確実にする第一歩を踏み出します。提案手法は,時系列解析を用いたモデルフリー強化学習の合成により,セルドンアルゴリズムと呼ばれる安全なアルゴリズムを拡張した。ポリシーの予測性能の逐次仮説テストを用いて安全性を保証し、ワイルドブートストラップを用いて信頼区間を求める。

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy's forecasted performance, and confidence intervals are obtained using wild bootstrap.

翻訳日:2022-10-03 21:39:59 公開日:2020-12-17

# 離散化ランジュバンmcmcのr\'enyi divergence解析による高速微分プライベートサンプラー

Faster Differentially Private Samplers via R\'enyi Divergence Analysis of Discretized Langevin MCMC ( http://arxiv.org/abs/2010.14658v2 )

ライセンス: Link先を確認

Arun Ganesh, Kunal Talwar

(参考訳) 様々な微分プライベートアルゴリズムは指数関数機構をインスタンス化し、適切な関数に対して$\exp(-f)$からサンプリングする必要がある。分布領域が高次元である場合、このサンプリングは計算的に困難である。ギブスサンプリングのようなヒューリスティックサンプリングスキームを使用すると、必ずしも証明可能なプライバシーにつながるとは限らない。 f$が凸であるとき、対数凹サンプリングの技術は多項式時間アルゴリズムに導かれる。ランゲヴィン力学に基づくアルゴリズムは、統計距離などの距離測度の下ではるかに高速な代替手段を提供する。本研究では,差分プライバシーに適合する距離尺度を用いて,これらのアルゴリズムの高速収束を実現する。滑らかで強凸な$f$ に対して、r\'enyi divergence の収束を証明する最初の結果を与える。これにより、そのような$f$の高速な微分プライベートアルゴリズムが得られます。我々の技術と単純で汎用的で、アンダーダムドランゲヴィン力学にも応用できる。

Various differentially private algorithms instantiate the exponential mechanism, and require sampling from the distribution $\exp(-f)$ for a suitable function $f$. When the domain of the distribution is high-dimensional, this sampling can be computationally challenging. Using heuristic sampling schemes such as Gibbs sampling does not necessarily lead to provable privacy. When $f$ is convex, techniques from log-concave sampling lead to polynomial-time algorithms, albeit with large polynomials. Langevin dynamics-based algorithms offer much faster alternatives under some distance measures such as statistical distance. In this work, we establish rapid convergence for these algorithms under distance measures more suitable for differential privacy. For smooth, strongly-convex $f$, we give the first results proving convergence in R\'enyi divergence. This gives us fast differentially private algorithms for such $f$. Our techniques and simple and generic and apply also to underdamped Langevin dynamics.

翻訳日:2022-10-02 13:25:45 公開日:2020-12-17

# SATベースのAI計画の形式的検証

Formally Verified SAT-Based AI Planning ( http://arxiv.org/abs/2010.14648v4 )

ライセンス: Link先を確認

Mohammad Abdulaziz and Friedrich Kurz

(参考訳) 本稿では,従来のAI計画のSAT符号化について述べる。定理証明器 Isabelle/HOL を用いて検証を行う。検証された符号化を実験的に検証し、合理的な大きさの標準計画ベンチマークに使用できることを示す。我々はまた、最先端のSATベースのプランナーをテストするための参照として使用し、時には問題に一定の長さの解がないと主張する。

We present an executable formally verified SAT encoding of classical AI planning. We use the theorem prover Isabelle/HOL to perform the verification. We experimentally test the verified encoding and show that it can be used for reasonably sized standard planning benchmarks. We also use it as a reference to test a state-of-the-art SAT-based planner, showing that it sometimes falsely claims that problems have no solutions of certain lengths.

翻訳日:2022-10-02 12:41:53 公開日:2020-12-17

# Deep Probabilistic Imaging:Computational Imagingのための不確かさの定量化とマルチモーダルソリューション評価

Deep Probabilistic Imaging: Uncertainty Quantification and Multi-modal Solution Characterization for Computational Imaging ( http://arxiv.org/abs/2010.14462v2 )

ライセンス: Link先を確認

He Sun, Katherine L. Bouman

(参考訳) 計算画像再構成アルゴリズムは一般に、不確実性や信頼性の尺度なしに単一の画像を生成する。 RML(Regularized Maximum Likelihood)と逆問題に対するフィードフォワード深層学習(Feed-forward Deep Learning)アプローチは通常、点推定の回復に重点を置いている。これは、未決定の撮像システムで作業する場合に深刻な制限であり、複数の画像モードが測定されたデータと一致することが考えられる。したがって、観測データを説明する確率的な画像の空間を特徴付けることが重要である。本稿では,再構成の不確かさを定量化するために,変分深い確率的イメージング手法を提案する。深部確率イメージング(Deep Probabilistic Imaging, DPI)は、未観測画像の後部分布を推定するために、訓練されていない深部生成モデルを用いる。このアプローチではトレーニングデータを必要としない。代わりに、ニューラルネットワークの重みを最適化して、特定の測定データセットに適合するイメージサンプルを生成する。ネットワークウェイトが学習されると、後方分布を効率的にサンプリングすることができる。このアプローチは、イベントホライズン望遠鏡によるブラックホールイメージングや、mri(compressed sensing magnetic resonance imaging)で用いられるインターフェロメトリ・ラジオイメージング(interferometric radio imaging)という文脈で実証されている。

Computational image reconstruction algorithms generally produce a single image without any measure of uncertainty or confidence. Regularized Maximum Likelihood (RML) and feed-forward deep learning approaches for inverse problems typically focus on recovering a point estimate. This is a serious limitation when working with underdetermined imaging systems, where it is conceivable that multiple image modes would be consistent with the measured data. Characterizing the space of probable images that explain the observational data is therefore crucial. In this paper, we propose a variational deep probabilistic imaging approach to quantify reconstruction uncertainty. Deep Probabilistic Imaging (DPI) employs an untrained deep generative model to estimate a posterior distribution of an unobserved image. This approach does not require any training data; instead, it optimizes the weights of a neural network to generate image samples that fit a particular measurement dataset. Once the network weights have been learned, the posterior distribution can be efficiently sampled. We demonstrate this approach in the context of interferometric radio imaging, which is used for black hole imaging with the Event Horizon Telescope, and compressed sensing Magnetic Resonance Imaging (MRI).

翻訳日:2022-10-02 11:58:56 公開日:2020-12-17

# ページ数は? メタデータからの紙長予測

How Many Pages? Paper Length Prediction from the Metadata ( http://arxiv.org/abs/2010.15924v2 )

ライセンス: Link先を確認

Erion \c{C}ano and Ond\v{r}ej Bojar

(参考訳) 科学論文の長さを予測することは、多くの状況で役立つかもしれない。本研究は,紙長予測タスクを回帰問題として定義し,一般的な機械学習モデルを用いて実験結果を報告する。また、出版メタデータと各ページの長さの巨大なデータセットを作成します。データセットは無償で提供され、この分野の研究を促進することを意図している。今後の取り組みとして、ニューラルネットワークと大きな事前学習された言語モデルに基づいた、より高度なレグレッシャを探求したいと思います。

Being able to predict the length of a scientific paper may be helpful in numerous situations. This work defines the paper length prediction task as a regression problem and reports several experimental results using popular machine learning models. We also create a huge dataset of publication metadata and the respective lengths in number of pages. The dataset will be freely available and is intended to foster research in this domain. As future work, we would like to explore more advanced regressors based on neural networks and big pretrained language models.

翻訳日:2022-10-01 22:26:57 公開日:2020-12-17

# 道路損傷検出のための効率的かつスケーラブルな深層学習手法

An Efficient and Scalable Deep Learning Approach for Road Damage Detection ( http://arxiv.org/abs/2011.09577v3 )

ライセンス: Link先を確認

Sadra Naddaf-Sh, M-Mahdi Naddaf-Sh, Amir R. Kashani and Hassan Zargarzadeh

(参考訳) 舗装条件の評価は予防的又はリハビリテーション的行動の時間と救難伝播の制御に不可欠である。タイムリーな評価ができないと、インフラの深刻な構造的・財政的損失と完全な再建につながる可能性がある。自動コンピュータ支援測量手法は、道路損傷パターンとその位置のデータベースを提供することができる。このデータベースは、メンテナンスの最小コストとアスファルトの最大耐久性を得るために、タイムリーな道路修理に利用できる。本稿では,画像に基づく難易度データをリアルタイムに解析する深層学習に基づく調査手法を提案する。携帯端末を用いて撮影した縦・横・アリゲータ亀裂などの亀裂の多様な集団からなるデータベースを用いる。次に、舗装き裂検出用に調整された効率的でスケーラブルなモデル群を訓練し、様々な補強策を検討する。提案されたモデルでは、F1スコアは52%から56%まで、平均推測時間は毎秒178-10枚だった。最後に、物体検出器の性能を調べ、様々な画像に対して誤差解析を報告する。ソースコードはhttps://github.com/mahdi65/roaddamagedetection2020で入手できる。

Pavement condition evaluation is essential to time the preventative or rehabilitative actions and control distress propagation. Failing to conduct timely evaluations can lead to severe structural and financial loss of the infrastructure and complete reconstructions. Automated computer-aided surveying measures can provide a database of road damage patterns and their locations. This database can be utilized for timely road repairs to gain the minimum cost of maintenance and the asphalt's maximum durability. This paper introduces a deep learning-based surveying scheme to analyze the image-based distress data in real-time. A database consisting of a diverse population of crack distress types such as longitudinal, transverse, and alligator cracks, photographed using mobile-device is used. Then, a family of efficient and scalable models that are tuned for pavement crack detection is trained, and various augmentation policies are explored. Proposed models, resulted in F1-scores, ranging from 52% to 56%, and average inference time from 178-10 images per second. Finally, the performance of the object detectors are examined, and error analysis is reported against various images. The source code is available at https://github.com/mahdi65/roadDamageDetection2020.

翻訳日:2022-09-24 04:39:45 公開日:2020-12-17

# 注意による分類:事前知識を用いたシーングラフ分類

Classification by Attention: Scene Graph Classification with Prior Knowledge ( http://arxiv.org/abs/2011.10084v2 )

ライセンス: Link先を確認

Sahand Sharifzadeh, Sina Moayed Baharlou, Volker Tresp

(参考訳) シーングラフ分類における大きな課題は、オブジェクトと関係の出現が、ある画像から別の画像に大きく異なる可能性があることである。以前の研究では、画像内のすべてのオブジェクトをリレーショナル推論したり、事前の知識を分類に組み込んだりすることでこの問題に対処してきた。先行研究とは異なり、知覚と事前知識について異なるモデルを検討することはない。代わりに、マルチタスク学習アプローチを採用し、注意層として分類を実装します。これにより、事前の知識が知覚モデル内に出現し、伝播することができる。モデルも前者を表現するように強制することで、強い帰納バイアスを達成できる。本モデルでは,この知識をシーン表現に反復的に注入することで,より高度な分類性能が得られることを示す。さらに、我々のモデルはトリプルとして与えられる外部知識に基づいて微調整することができる。自己教師付き学習と1%の注釈付き画像のみを組み合わせた場合、3%以上のオブジェクト分類の改善、26%のシーングラフ分類、36%の述語予測精度が得られる。

A major challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another. Previous works have addressed this by relational reasoning over all objects in an image or incorporating prior knowledge into classification. Unlike previous works, we do not consider separate models for perception and prior knowledge. Instead, we take a multi-task learning approach, where we implement the classification as an attention layer. This allows for the prior knowledge to emerge and propagate within the perception model. By enforcing the model also to represent the prior, we achieve a strong inductive bias. We show that our model can accurately generate commonsense knowledge and that the iterative injection of this knowledge to scene representations leads to significantly higher classification performance. Additionally, our model can be fine-tuned on external knowledge given as triples. When combined with self-supervised learning and with 1% of annotated images only, this gives more than 3% improvement in object classification, 26% in scene graph classification, and 36% in predicate prediction accuracy.

翻訳日:2022-09-23 20:43:33 公開日:2020-12-17

# XTQA: 教科書質問回答のSpan-Level説明

XTQA: Span-Level Explanations of the Textbook Question Answering ( http://arxiv.org/abs/2011.12662v3 )

ライセンス: Link先を確認

Jie Ma, Jun Liu, Junjun Li, Qinghua Zheng, Qingyu Yin, Jianlong Zhou, Yi Huang

(参考訳) 教科書質問応答 (tqa) は、豊富なエッセイと図からなる大きなマルチモーダルな文脈において、ダイアグラム/非ダイアグラムの質問に答えるべきタスクである。この課題の説明は学生を考慮すべき重要な側面として位置づけるべきである。この問題に対処するために,提案する粗粒粒度アルゴリズムに基づいて,tqa(span-level descriptions of the tqa)のスパンレベル説明に向けて,新たなアーキテクチャを考案する。このアルゴリズムはまずTF-IDF法を用いて質問に関する上位M$段落を粗末に選択し、各質問に対する情報ゲインを計算することにより、これらの段落内のすべての候補から上位K$段落を微妙に選択する。実験結果から,XTQAはベースラインに比べて最先端性能を著しく向上することがわかった。ソースコードはhttps://github.com/keep-smile-001/opentqaで入手できる。

Textbook Question Answering (TQA) is a task that one should answer a diagram/non-diagram question given a large multi-modal context consisting of abundant essays and diagrams. We argue that the explainability of this task should place students as a key aspect to be considered. To address this issue, we devise a novel architecture towards span-level eXplanations of the TQA (XTQA) based on our proposed coarse-to-fine grained algorithm, which can provide not only the answers but also the span-level evidences to choose them for students. This algorithm first coarsely chooses top $M$ paragraphs relevant to questions using the TF-IDF method, and then chooses top $K$ evidence spans finely from all candidate spans within these paragraphs by computing the information gain of each span to questions. Experimental results shows that XTQA significantly improves the state-of-the-art performance compared with baselines. The source code is available at https://github.com/keep-smile-001/opentqa

翻訳日:2022-09-21 01:42:50 公開日:2020-12-17

# 映像から映像へ視覚効果を伝達する学習

Learning to Transfer Visual Effects from Videos to Images ( http://arxiv.org/abs/2012.01642v2 )

ライセンス: Link先を確認

Christopher Thomas, Yale Song, Adriana Kovashka

(参考訳) 本研究では,ビデオのコレクションから時空間的効果(溶融など)を伝達することで,画像のアニメーション化の問題を研究する。視覚効果伝達における主な課題は, 1) 蒸留したい効果を捉える方法,2) 内容や芸術的スタイルではなく, 効果のみをソースビデオから入力画像に移す方法,の2つである。最初の課題に対処するために、我々は5つの損失関数を評価し、最も有望なものは、生成したアニメーションが、ソースビデオと似た光学的流れとテクスチャ運動を持つことを奨励する。第2の課題に対処するために、制約のないピクセル値を予測するのではなく、既存の画像ピクセルを以前のフレームから移動させることしかできない。これにより、入力画像のピクセルを使って視覚効果を発生させ、ソースビデオからの不要な芸術的スタイルや内容が出力に現れるのを防ぐ。提案手法を客観的および主観的設定で評価し,顔の融解や鹿の開花などの非定型的変換対象を示す興味深い定性的な結果を示す。

We study the problem of animating images by transferring spatio-temporal visual effects (such as melting) from a collection of videos. We tackle two primary challenges in visual effect transfer: 1) how to capture the effect we wish to distill; and 2) how to ensure that only the effect, rather than content or artistic style, is transferred from the source videos to the input image. To address the first challenge, we evaluate five loss functions; the most promising one encourages the generated animations to have similar optical flow and texture motions as the source videos. To address the second challenge, we only allow our model to move existing image pixels from the previous frame, rather than predicting unconstrained pixel values. This forces any visual effects to occur using the input image's pixels, preventing unwanted artistic style or content from the source video from appearing in the output. We evaluate our method in objective and subjective settings, and show interesting qualitative results which demonstrate objects undergoing atypical transformations, such as making a face melt or a deer bloom.

翻訳日:2021-05-23 14:58:35 公開日:2020-12-17

# SAFCAR:構成行動認識のための構造化注意融合

SAFCAR: Structured Attention Fusion for Compositional Action Recognition ( http://arxiv.org/abs/2012.02109v2 )

ライセンス: Link先を確認

Tae Soo Kim, Gregory D. Hager

(参考訳) 構成的行動認識のための一般的な枠組みを提示する。アクション認識では、ラベルはサブジェクトやアトミックアクション、オブジェクトといった単純なコンポーネントで構成されている。構成的行動認識の最大の課題は、基本的なコンポーネントを使って構成できる、組み合わせ可能なアクションのセットが多数存在することである。しかし、構成性はまた、利用可能な構造を提供する。そこで我々は,アクションの時系列構造をキャプチャする物体検出情報と,文脈情報をキャプチャする視覚手がかりとを組み合わせた,新しい構造化注意融合(saf)自己照準機構を開発し,検証する。提案手法は,新しい動詞句の合成を,現在の技術システムよりも効果的に認識し,いくつかのラベル付き例から非常に効率的なアクションカテゴリーに一般化することを示す。我々は,Something-V2データセットの課題であるSomesing-Elseタスクに対するアプローチを検証する。さらに、当社のフレームワークはフレキシブルで、Charades-Fewshotデータセット上で競合する結果を示すことによって、新たなドメインに一般化可能であることを示す。

We present a general framework for compositional action recognition -- i.e. action recognition where the labels are composed out of simpler components such as subjects, atomic-actions and objects. The main challenge in compositional action recognition is that there is a combinatorially large set of possible actions that can be composed using basic components. However, compositionality also provides a structure that can be exploited. To do so, we develop and test a novel Structured Attention Fusion (SAF) self-attention mechanism to combine information from object detections, which capture the time-series structure of an action, with visual cues that capture contextual information. We show that our approach recognizes novel verb-noun compositions more effectively than current state of the art systems, and it generalizes to unseen action categories quite efficiently from only a few labeled examples. We validate our approach on the challenging Something-Else tasks from the Something-Something-V2 dataset. We further show that our framework is flexible and can generalize to a new domain by showing competitive results on the Charades-Fewshot dataset.

翻訳日:2021-05-23 14:52:24 公開日:2020-12-17

# 自律運転のためのコンピュータステレオビジョン

Computer Stereo Vision for Autonomous Driving ( http://arxiv.org/abs/2012.03194v2 )

ライセンス: Link先を確認

Rui Fan, Li Wang, Mohammud Junaid Bocus, Ioannis Pitas

(参考訳) 自律システムの重要なコンポーネントとして、自律的な自動車認識は、最近の並列コンピューティングアーキテクチャの進歩で大きな飛躍を遂げた。小型だがフル機能の組み込みスーパーコンピュータを使用することで、コンピュータステレオビジョンは自動運転車の奥行き認識に広く採用されている。コンピュータステレオビジョンの2つの重要な側面は、スピードと精度である。これらはどちらも望ましいが相反する性質であり、より精度のよいアルゴリズムは計算の複雑さが高い。したがって、リソース制限ハードウェアのためのコンピュータステレオビジョンアルゴリズムを開発する主な目的は、速度と精度のトレードオフを改善することである。本章では,自律走行車システムにおけるコンピュータステレオビジョンのハードウェアとソフトウェアの両方について紹介する。次に, 視覚的特徴検出, 説明とマッチング, 2) 3D情報取得, 3) 物体検出/認識, 4) セマンティックイメージセグメンテーションの4つの自律車認識タスクについて議論する。次に、マルチスレッドCPUおよびGPUアーキテクチャにおけるコンピュータステレオビジョンと並列コンピューティングの原理を詳述する。

As an important component of autonomous systems, autonomous car perception has had a big leap with recent advances in parallel computing architectures. With the use of tiny but full-feature embedded supercomputers, computer stereo vision has been prevalently applied in autonomous cars for depth perception. The two key aspects of computer stereo vision are speed and accuracy. They are both desirable but conflicting properties, as the algorithms with better disparity accuracy usually have higher computational complexity. Therefore, the main aim of developing a computer stereo vision algorithm for resource-limited hardware is to improve the trade-off between speed and accuracy. In this chapter, we introduce both the hardware and software aspects of computer stereo vision for autonomous car systems. Then, we discuss four autonomous car perception tasks, including 1) visual feature detection, description and matching, 2) 3D information acquisition, 3) object detection/recognition and 4) semantic image segmentation. The principles of computer stereo vision and parallel computing on multi-threading CPU and GPU architectures are then detailed.

翻訳日:2021-05-21 13:58:25 公開日:2020-12-17

# (参考訳) スーパーマーケット記録を用いた季節インフルエンザ予測

Predicting seasonal influenza using supermarket retail records ( http://arxiv.org/abs/2012.04651v2 )

ライセンス: CC BY 4.0

Ioanna Miliou, Xinyue Xiong, Salvatore Rinzivillo, Qian Zhang, Giulio Rossetti, Fosca Giannotti, Dino Pedreschi, Alessandro Vespignani

(参考訳) 疫学データの可用性の向上、新しいデジタルデータストリーム、強力な機械学習アプローチの台頭により、リアルタイム流行予測システムの研究活動が急増している。本稿では,インフルエンザの季節予測を改善するために,新しいデータソース,すなわち小売市場データの利用を提案する。具体的には、スーパーマーケットの小売データを、選択された顧客の集団が一緒に購入したセンチネルバスケットの識別を通じて、インフルエンザの代理信号として捉えている。我々は、イタリアでインフルエンザの発生率を最大4週間前に見積もる nowcasting and forecasting framework を開発した。我々は,svrモデルを用いて季節性インフルエンザの発生予測を行う。我々の予測は,製品購入に基づくベースライン自己回帰モデルと第2ベースラインの両方を上回っている。その結果,疫病のリアルタイム分析に有効なプロキシとして,予測モデルに小売市場データを組み込むことの価値が定量的に示された。

Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy signal for influenza, through the identification of sentinel baskets, i.e., products bought together by a population of selected customers. We develop a nowcasting and forecasting framework that provides estimates for influenza incidence in Italy up to 4 weeks ahead. We make use of the Support Vector Regression (SVR) model to produce the predictions of seasonal flu incidence. Our predictions outperform both a baseline autoregressive model and a second baseline based on product purchases. The results show quantitatively the value of incorporating retail market data in forecasting models, acting as a proxy that can be used for the real-time analysis of epidemics.

翻訳日:2021-05-17 03:56:56 公開日:2020-12-17

# 品質多様性最適化 : 確率最適化の新分野

Quality-Diversity Optimization: a novel branch of stochastic optimization ( http://arxiv.org/abs/2012.04322v2 )

ライセンス: Link先を確認

Konstantinos Chatzilygeroudis, Antoine Cully, Vassilis Vassiliades and Jean-Baptiste Mouret

(参考訳) 従来の最適化アルゴリズムは、目的関数を最大化(または最小化)する単一のグローバル最適化を探索する。マルチモーダル最適化アルゴリズムは、1つ以上の探索空間で最も高いピークを探索する。品質多様性アルゴリズムは、進化的計算ツールボックスに最近追加されたもので、単一の局所光学系を探索するだけでなく、検索空間を照らそうとする。実際、彼らは高パフォーマンスなソリューションが検索空間全体にどのように分散されているかの全体像を提供する。マルチモーダル最適化アルゴリズムとの主な違いは、(1)品質の多様性は一般的に行動空間(または特徴空間)で機能し、ジェネティピック(またはパラメータ)空間では動作しない。本章では,品質と多様性の最適化について概説し,主要な代表的アルゴリズムと,コミュニティで検討中の主要なトピックについて論じる。この章を通じて、ディープラーニング、ロボット工学、強化学習を含む品質多様性アルゴリズムのいくつかの成功例についても論じる。

Traditional optimization algorithms search for a single global optimum that maximizes (or minimizes) the objective function. Multimodal optimization algorithms search for the highest peaks in the search space that can be more than one. Quality-Diversity algorithms are a recent addition to the evolutionary computation toolbox that do not only search for a single set of local optima, but instead try to illuminate the search space. In effect, they provide a holistic view of how high-performing solutions are distributed throughout a search space. The main differences with multimodal optimization algorithms are that (1) Quality-Diversity typically works in the behavioral space (or feature space), and not in the genotypic (or parameter) space, and (2) Quality-Diversity attempts to fill the whole behavior space, even if the niche is not a peak in the fitness landscape. In this chapter, we provide a gentle introduction to Quality-Diversity optimization, discuss the main representative algorithms, and the main current topics under consideration in the community. Throughout the chapter, we also discuss several successful applications of Quality-Diversity algorithms, including deep learning, robotics, and reinforcement learning.

翻訳日:2021-05-16 21:37:17 公開日:2020-12-17

# CNNを用いた胸部X線画像からのCOVID-19検出

COVID-19 Detection in Chest X-Ray Images using a New Channel Boosted CNN ( http://arxiv.org/abs/2012.05073v2 )

ライセンス: Link先を確認

Saddam Hussain Khan, Anabia Sohail, and Asifullah Khan

(参考訳) 新型コロナウイルス(COVID-19)は感染性の高い呼吸器感染症で、世界中の人口に影響を与え、その壊滅的な影響を継続している。感染範囲を制限するには、早期にcovid-19を検出することが不可欠である。本研究では, 深部畳み込みニューラルネットワーク(CNN)とチャネルブースティングに基づく新しい分類手法CB-STM-RENetを提案する。この接続では、新型コロナウイルス特異的な放射線画像パターンを学習するために、分割変換マージ(STM)に基づく新しい畳み込みブロックを開発する。この新しいブロックは、各ブランチの領域とエッジベースの操作を体系的に組み込んで、様々なレベルの様々な特徴、特に領域の均一性、テクスチュラルなバリエーション、および感染領域の境界に関する特徴を捉えている。提案したCNNアーキテクチャの学習と識別能力は、補助チャネルと元のチャネルを連結するチャネルブースティングのアイデアを活用することで向上する。補助チャネルは、Transfer Learningを用いて事前訓練されたCNNから生成される。 CB-STM-RENetの有効性を胸部X線(CoV-Healthy-6k,CoV-NonCoV-10k,CoV-NonCoV-15k)の3種類のデータセットを用いて評価した。提案したCB-STM-RENetと既存の技術との比較により,健康と他の種類の胸部感染症の鑑別において高い性能を示した。 CB-STM-RENetはこれらの3つのデータセットで最高のパフォーマンスを提供する。良好な検出率(97%)と高い精度(93%)は,感染症の診断に適応できることが示唆された。テストコードはhttps://github.com/PRLAB21/COVID-19-Detection-System-using-Chest-X-Ray-Imagesで公開されている。

COVID-19 is a highly contagious respiratory infection that has affected a large population across the world and continues with its devastating consequences. It is imperative to detect COVID-19 at the earliest to limit the span of infection. In this work, a new classification technique CB-STM-RENet based on deep Convolutional Neural Network (CNN) and Channel Boosting is proposed for the screening of COVID-19 in chest X-Rays. In this connection, to learn the COVID-19 specific radiographic patterns, a new convolution block based on split-transform-merge (STM) is developed. This new block systematically incorporates region and edge-based operations at each branch to capture the diverse set of features at various levels, especially those related to region homogeneity, textural variations, and boundaries of the infected region. The learning and discrimination capability of the proposed CNN architecture is enhanced by exploiting the Channel Boosting idea that concatenates the auxiliary channels along with the original channels. The auxiliary channels are generated from the pre-trained CNNs using Transfer Learning. The effectiveness of the proposed technique CB-STM-RENet is evaluated on three different datasets of chest X-Rays namely CoV-Healthy-6k, CoV-NonCoV-10k, and CoV-NonCoV-15k. The performance comparison of the proposed CB-STM-RENet with the existing techniques exhibits high performance both in discriminating COVID-19 chest infections from Healthy, as well as, other types of chest infections. CB-STM-RENet provides the highest performance on all these three datasets; especially on the stringent CoV-NonCoV-15k dataset. The good detection rate (97%), and high precision (93%) of the proposed technique suggest that it can be adapted for the diagnosis of COVID-19 infected patients. The test code is available at https://github.com/PRLAB21/COVID-19-Detection-System-using-Chest-X-Ray-Images.

翻訳日:2021-05-16 20:56:53 公開日:2020-12-17

# (参考訳) 胸部X線画像から解釈可能な肺癌スコーリングモデルの自動生成

Automatic Generation of Interpretable Lung Cancer Scoring Models from Chest X-Ray Images ( http://arxiv.org/abs/2012.05447v2 )

ライセンス: CC BY 4.0

Michael J. Horry, Subrata Chakraborty, Biswajeet Pradhan, Manoranjan Paul, Douglas P. S. Gomes, Anwaar Ul-Haq

(参考訳) 肺癌は、がんが世界中で最も多い死因であり、早期発見が患者の予後の鍵である。多くの研究が、機械学習、特に深層学習は、肺がんの自動診断に有効であることを実証しているが、これらの技術は、まだ臨床で承認され、医療コミュニティによって採用されていない。この分野のほとんどの研究は、人工放射線学的第二読取を提供するための結節検出の狭いタスクに焦点を当てている。代わりに,胸部X線画像から肺がんに関連する幅広い病態を,大規模なデータセットで訓練されたコンピュータビジョンモデルを用いて抽出することに焦点を当てた。次に、肺癌の悪性度メタデータを提供する独立した、より小さなデータセットに対する最適な意思決定ツリーのセットを見つける。この小さな推論データセットでは, 感度と特異度はそれぞれ85%, 75%であり, 正の予測値は85%であり, 人体放射線技師の性能に匹敵する。さらに、本手法により作成された決定木は、臨床応用可能な多変量肺癌スコアリングおよび診断モデルへの医療専門家による改良の出発点とみなすことができる。

Lung cancer is the leading cause of cancer death worldwide with early detection being the key to a positive patient prognosis. Although a multitude of studies have demonstrated that machine learning, and particularly deep learning, techniques are effective at automatically diagnosing lung cancer, these techniques have yet to be clinically approved and adopted by the medical community. Most research in this field is focused on the narrow task of nodule detection to provide an artificial radiological second reading. We instead focus on extracting, from chest X-ray images, a wider range of pathologies associated with lung cancer using a computer vision model trained on a large dataset. We then find the set of best fit decision trees against an independent, smaller dataset for which lung cancer malignancy metadata is provided. For this small inferencing dataset, our best model achieves sensitivity and specificity of 85% and 75% respectively with a positive predictive value of 85% which is comparable to the performance of human radiologists. Furthermore, the decision trees created by this method may be considered as a starting point for refinement by medical experts into clinically usable multi-variate lung cancer scoring and diagnostic models.

翻訳日:2021-05-15 23:10:10 公開日:2020-12-17

# (参考訳) structured gromov-wasserstein barycentersによる学習グラフ

Learning Graphons via Structured Gromov-Wasserstein Barycenters ( http://arxiv.org/abs/2012.05644v2 )

ライセンス: CC BY 4.0

Hongteng Xu, Dixin Luo, Lawrence Carin, Hongyuan Zha

(参考訳) 無限次元空間で定義され任意の大きさのグラフを表すgraphonと呼ばれる非パラメトリックグラフモデルを学ぶための新しい原理的手法を提案する。グラトンの理論による弱正則補題に基づいて、ステップ関数を利用してグラトンを近似する。グラノンの切断距離は、ステップ関数のグロモフ・ワッサーシュタイン距離に緩和可能であることを示す。したがって、基礎となるグラフによって生成されるグラフの集合を考えると、対応するステップ函数は与えられたグラフのグロモフ=ヴァッサーシュタインバリ中心として学習する。さらに,基本アルゴリズムである$e.g.$,学習グラフの連続性を保証するための平滑化gromov-wasserstein barycenter,および複数の構造化グラフを学ぶための混合gromov-wasserstein barycenterのいくつかの拡張と拡張を開発した。提案手法は, 従来の最先端手法の欠点を克服し, 合成データと実データの両方でそれを上回る。コードはhttps://github.com/HongtengXu/SGWB-Graphonで公開されている。

We propose a novel and principled method to learn a nonparametric graph model called graphon, which is defined in an infinite-dimensional space and represents arbitrary-size graphs. Based on the weak regularity lemma from the theory of graphons, we leverage a step function to approximate a graphon. We show that the cut distance of graphons can be relaxed to the Gromov-Wasserstein distance of their step functions. Accordingly, given a set of graphs generated by an underlying graphon, we learn the corresponding step function as the Gromov-Wasserstein barycenter of the given graphs. Furthermore, we develop several enhancements and extensions of the basic algorithm, $e.g.$, the smoothed Gromov-Wasserstein barycenter for guaranteeing the continuity of the learned graphons and the mixed Gromov-Wasserstein barycenters for learning multiple structured graphons. The proposed approach overcomes drawbacks of prior state-of-the-art methods, and outperforms them on both synthetic and real-world data. The code is available at https://github.com/HongtengXu/SGWB-Graphon.

翻訳日:2021-05-15 16:18:58 公開日:2020-12-17

# 関連遊びの視線と逐次的合理性

Hindsight and Sequential Rationality of Correlated Play ( http://arxiv.org/abs/2012.05874v2 )

ライセンス: Link先を確認

Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling

(参考訳) 2人のプレイヤーによるゼロサムゲーム解決とゲームの成功によって、ゲームにおける人工知能の作業は、均衡ベースの戦略を生み出すアルゴリズムにますます焦点が当てられている。しかし、このアプローチは、一般シュームゲームにおける有能なプレイヤーや2人以上のプレイヤーに対して、2人のプレイヤーがゼロシュームゲームよりも効果的ではない。魅力的な選択肢は、修正された動作で達成できたことに対して、後見の強いパフォーマンスを保証する適応アルゴリズムを検討することである。このアプローチはまた、ゲーム理論的な分析につながるが、均衡におけるエージェントの行動の因子ではなく、共同学習のダイナミクスから生じる相関プレイにおいて生じる。我々は,学習の隠れた合理的な枠組みを,逐次的意思決定の場面で開発し,提唱する。この目的のために、我々は広範形式のゲームにおける平衡と偏差の型を再検討し、過去の誤解をより完全に理解し解決する。我々は,文献における各種類の平衡の強さと弱さを示す一連の例を示し,他のすべての概念に従わないことを証明した。この調査の行は、反実的後悔最小化(CFR)ファミリーのアルゴリズムに対応する偏差と平衡のクラスの定義において、文学における他のすべてのものと関係している。 cfrをより詳細に調べると、後見評価に自然に適用される方法で逐次合理性を拡張する相関遊びにおける合理性の新しい再帰的な定義がもたらされる。

Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.

翻訳日:2021-05-15 06:14:23 公開日:2020-12-17

# (参考訳) 単一衛星画像からのストリートビューパノラマ映像合成

Street-view Panoramic Video Synthesis from a Single Satellite Image ( http://arxiv.org/abs/2012.06628v2 )

ライセンス: CC BY 4.0

Zuoyue Li, Zhaopeng Cui, Martin R. Oswald, Marc Pollefeys

(参考訳) 本研究では,1つの衛星画像とカメラ軌道から時間的および幾何学的に一貫したストリートビューパノラマ映像を合成する方法を提案する。既存のクロスビュー合成アプローチは画像にフォーカスしているが、このような場合のビデオ合成はまだ十分に注目されていない。単一画像合成アプローチは、ビデオの重要な特性である時間的一貫性が欠如しているため、ビデオ合成には適していない。この目的のために,我々は3dポイントクラウド表現を明示的に作成し,衛星画像から推定した幾何学的シーン構成を反映したフレーム間の密接な3d-2d対応を維持する。我々は,セマンティクスとクラス毎の潜在ベクトルからポイントクラウドを色分けするために,2つの時間ガラスモジュールを備えたカスケードネットワークアーキテクチャを実装した。生成したストリートビュービデオフレームは3次元の幾何学的シーン構造に従属し,時間的一貫性を維持する。定性的かつ定量的な実験は、時間的あるいは幾何学的整合性に欠ける他の最先端のクロスビュー合成手法よりも優れた結果を示す。私たちの知る限りでは、クロスビュー画像をビデオに合成する最初の作品です。

We present a novel method for synthesizing both temporally and geometrically consistent street-view panoramic video from a given single satellite image and camera trajectory. Existing cross-view synthesis approaches focus more on images, while video synthesis in such a case has not yet received enough attention. Single image synthesis approaches are not well suited for video synthesis since they lack temporal consistency which is a crucial property of videos. To this end, our approach explicitly creates a 3D point cloud representation of the scene and maintains dense 3D-2D correspondences across frames that reflect the geometric scene configuration inferred from the satellite view. We implement a cascaded network architecture with two hourglass modules for successive coarse and fine generation for colorizing the point cloud from the semantics and per-class latent vectors. By leveraging computed correspondences, the produced street-view video frames adhere to the 3D geometric scene structure and maintain temporal consistency. Qualitative and quantitative experiments demonstrate superior results compared to other state-of-the-art cross-view synthesis approaches that either lack temporal or geometric consistency. To the best of our knowledge, our work is the first work to synthesize cross-view images to video.

翻訳日:2021-05-11 04:35:59 公開日:2020-12-17

# D$^2$IM-Net: 単一画像から詳細な遠方界を学習する

D$^2$IM-Net: Learning Detail Disentangled Implicit Fields from Single Images ( http://arxiv.org/abs/2012.06650v2 )

ライセンス: Link先を確認

Manyi Li, Hao Zhang

(参考訳) 地形形状と表面特徴の両方を含む入力画像から幾何学的詳細を復元することを目的とした,最初の単一ビュー3D再構成ネットワークを提案する。私たちのキーとなるアイデアは、粗い3D形状を表す暗黙のフィールドと細部をキャプチャするフィールドの2つの機能からなる、細部が絡み合った再構築をネットワークに教えることです。入力画像が与えられた場合、D$^2$IM-Netと呼ばれるネットワークは、これをグローバルとローカルの2つのデコーダにエンコードする。ベースデコーダは、大域的特徴を用いて、粗い暗黙のフィールドを再構築する一方、詳細デコーダは、局所的な特徴から、捕獲対象の前後に定義された2つの変位マップを再構成する。最後の3D再構成は、ベース形状と変位マップの融合であり、3つの損失は、新しいラプラシアン項による粗い形状、全体構造、表面の細部を回復させる。

We present the first single-view 3D reconstruction network aimed at recovering geometric details from an input image which encompass both topological shape structures and surface features. Our key idea is to train the network to learn a detail disentangled reconstruction consisting of two functions, one implicit field representing the coarse 3D shape and the other capturing the details. Given an input image, our network, coined D$^2$IM-Net, encodes it into global and local features which are respectively fed into two decoders. The base decoder uses the global features to reconstruct a coarse implicit field, while the detail decoder reconstructs, from the local features, two displacement maps, defined over the front and back sides of the captured object. The final 3D reconstruction is a fusion between the base shape and the displacement maps, with three losses enforcing the recovery of coarse shape, overall structure, and surface details via a novel Laplacian term.

翻訳日:2021-05-11 02:56:45 公開日:2020-12-17

# 抽象概念の出現に関する学習的視点--音素の奇妙な場合

A learning perspective on the emergence of abstractions: the curious case of phonemes ( http://arxiv.org/abs/2012.07499v3 )

ライセンス: Link先を確認

Petar Milin, Benjamin V. Tucker, and Dagmar Divjak

(参考訳) 本稿では,音声への露出から抽象電話が出現するかどうかを,様々なモデリング手法を用いて検証する。言語訓練を受けていない言語ユーザにおける言語知識の発達に関する2つの反対原理を,メモリベースラーニング(MBL)とエラー補正ラーニング(ECL)で検証する。一般化のプロセスは、言語学者が操作する抽象概念の基盤となり、言語抽象に類似した言語知識をMBLとECLが生み出すかどうかを調査した。各モデルには1人の話者が生成した大量の事前処理音声が提示された。モデルが学んだことの一貫性や安定性、そして抽象的なカテゴリを生み出す能力を評価しました。どちらのモデルもこれらのテストに関して異なる。 ECL学習モデルは抽象化を学習でき、少なくとも携帯電話の在庫の少なくとも一部を入力から確実に識別できることを示す。

In the present paper we use a range of modeling techniques to investigate whether an abstract phone could emerge from exposure to speech sounds. We test two opposing principles regarding the development of language knowledge in linguistically untrained language users: Memory-Based Learning (MBL) and Error-Correction Learning (ECL). A process of generalization underlies the abstractions linguists operate with, and we probed whether MBL and ECL could give rise to a type of language knowledge that resembles linguistic abstractions. Each model was presented with a significant amount of pre-processed speech produced by one speaker. We assessed the consistency or stability of what the models have learned and their ability to give rise to abstract categories. Both types of models fare differently with regard to these tests. We show that ECL learning models can learn abstractions and that at least part of the phone inventory can be reliably identified from the input.

翻訳日:2021-05-08 14:45:50 公開日:2020-12-17

# (参考訳) ドメイン適応意味セグメンテーションのためのクロスドメイングルーピングとアライメント

Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation ( http://arxiv.org/abs/2012.08226v2 )

ライセンス: CC BY 4.0

Minsu Kim, Sunghun Joung, Seungryong Kim, JungIn Park, Ig-Jae Kim, Kwanghoon Sohn

(参考訳) deep convolutional neural network(cnns)内のソースドメインとターゲットドメインにセマンティクスセグメンテーションネットワークを適用する既存の技術では、グローバルあるいはカテゴリ対応の方法で、2つのドメインのすべてのサンプルを処理する。彼らは対象ドメイン自体や推定カテゴリ内のクラス間変異を考慮せず、マルチモーダルデータ分布を持つドメインをエンコードする制限を提供する。この制限を克服するために,学習可能なクラスタリングモジュールと,クロスドメイングルーピングとアライメントと呼ばれる新しいドメイン適応フレームワークを導入する。ソースドメインの正確なセグメンテーション能力を忘れずにドメインのアライメントを最大化する目的で、サンプルをクラスタリングするために、2つの損失関数、特にクラスタ間のセマンティック一貫性と直交性を促進するために提案する。また,従来の方法の他の限界であるクラス不均衡問題を解くために損失も提示する。実験の結果,提案手法はセマンティックセグメンテーションにおける適応性能を継続的に向上し,ドメイン適応設定における最先端性よりも優れていた。

Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) deal with all the samples from the two domains in a global or category-aware manner. They do not consider an inter-class variation within the target domain itself or estimated category, providing the limitation to encode the domains having a multi-modal data distribution. To overcome this limitation, we introduce a learnable clustering module, and a novel domain adaptation framework called cross-domain grouping and alignment. To cluster the samples across domains with an aim to maximize the domain alignment without forgetting precise segmentation ability on the source domain, we present two loss functions, in particular, for encouraging semantic consistency and orthogonality among the clusters. We also present a loss so as to solve a class imbalance problem, which is the other limitation of the previous methods. Our experiments show that our method consistently boosts the adaptation performance in semantic segmentation, outperforming the state-of-the-arts on various domain adaptation settings.

翻訳日:2021-05-08 06:13:23 公開日:2020-12-17

# (参考訳) SimpleChrome: 遺伝子発現予測のためのコンビネーションエフェクトのエンコード

SimpleChrome: Encoding of Combinatorial Effects for Predicting Gene Expression ( http://arxiv.org/abs/2012.08671v2 )

ライセンス: CC BY 4.0

Wei Cheng, Ghulam Murtaza, Aaron Wang

(参考訳) 最先端のDNAシークエンシング技術の進歩により、ゲノムデータセットはユビキタスになった。大規模データセットの出現はゲノム学、特に遺伝子制御の理解を深める大きな機会となる。人体の各細胞は同じDNA情報を含んでいるが、遺伝子発現は遺伝子発現レベルとして知られる遺伝子をオンまたはオフすることでこれらの細胞の機能を制御する。それぞれの遺伝子の発現レベルを制御する重要な因子は2つあり、(1)ヒストン修飾などの遺伝子制御は遺伝子発現を直接制御することができる。 2) 隣り合う遺伝子は機能的に関連し, 相互に相互作用し, 遺伝子発現のレベルにも影響を及ぼす。前者は注意に基づくモデルを用いて対処しようと試みてきた。しかし、第二の問題に対処するには、モデルに潜在的なすべての遺伝子情報を組み込む必要がある。現代の機械学習とディープラーニングモデルは、中程度のサイズのデータに適用すると遺伝子発現信号をキャプチャできるが、データの高次元性によってデータの基盤となるシグナルを回復するのに苦労している。この問題を解決するために,遺伝子に潜伏したヒストン修飾表現を学習する深層学習モデルSimpleChromeを提案する。このモデルから得られた特徴は、遺伝子間相互作用と直接的遺伝子発現制御の組合せ効果をよりよく理解することを可能にする。本論文は,下流モデルの予測能力を大幅に改善し,頑健で汎用的なニューラルネットワークを学習するための大規模データセットの必要性を大幅に緩和することを示す。これらの結果はエピゲノミクス研究と薬物開発に直ちに下流効果をもたらす。

Due to recent breakthroughs in state-of-the-art DNA sequencing technology, genomics data sets have become ubiquitous. The emergence of large-scale data sets provides great opportunities for better understanding of genomics, especially gene regulation. Although each cell in the human body contains the same set of DNA information, gene expression controls the functions of these cells by either turning genes on or off, known as gene expression levels. There are two important factors that control the expression level of each gene: (1) Gene regulation such as histone modifications can directly regulate gene expression. (2) Neighboring genes that are functionally related to or interact with each other that can also affect gene expression level. Previous efforts have tried to address the former using Attention-based model. However, addressing the second problem requires the incorporation of all potentially related gene information into the model. Though modern machine learning and deep learning models have been able to capture gene expression signals when applied to moderately sized data, they have struggled to recover the underlying signals of the data due to the nature of the data's higher dimensionality. To remedy this issue, we present SimpleChrome, a deep learning model that learns the latent histone modification representations of genes. The features learned from the model allow us to better understand the combinatorial effects of cross-gene interactions and direct gene regulation on the target gene expression. The results of this paper show outstanding improvements on the predictive capabilities of downstream models and greatly relaxes the need for a large data set to learn a robust, generalized neural network. These results have immediate downstream effects in epigenomics research and drug development.

翻訳日:2021-05-07 06:50:20 公開日:2020-12-17

# ベイズ最適化における構成最適化の課題

Are we Forgetting about Compositional Optimisers in Bayesian Optimisation? ( http://arxiv.org/abs/2012.08240v2 )

ライセンス: Link先を確認

Antoine Grosnit, Alexander I. Cowen-Rivers, Rasul Tutunov, Ryan-Rhys Griffiths, Jun Wang, Haitham Bou-Ammar

(参考訳) ベイズ最適化は、グローバル最適化のためのサンプル効率のよい方法論を提供する。このフレームワークの中で重要な性能決定サブルーチンは、取得関数の最大化であり、取得関数は非凸であり、したがって最適化が非自明であるという事実に複雑である。本稿では,取得関数を最大化するためのアプローチに関する包括的実証研究を行う。加えて、人気獲得関数の新規かつ数学的に等価な合成形式を導出することにより、最大化タスクを構成最適化問題として再キャストし、この分野の広範な文献から恩恵を受けることができる。合成最適化タスクとベイズマルクのタスクからなる3958個の個別実験に対して, 獲得関数の最大化に対する構成的アプローチの実証的利点を強調した。獲得関数最大化サブルーチンの一般性を考えると、合成オプティマイザの採用はベイズ最適化が現在適用されているすべての領域で性能改善をもたらす可能性があると仮定する。

Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.

翻訳日:2021-05-07 05:34:30 公開日:2020-12-17

# 人間行動の起源における接地人工知能

Grounding Artificial Intelligence in the Origins of Human Behavior ( http://arxiv.org/abs/2012.08564v2 )

ライセンス: Link先を確認

Eleni Nisioti and Cl\'ement Moulin-Frier

(参考訳) 人工知能(AI)の最近の進歩は、オープンエンドのスキルのレパートリーを獲得できるエージェントの探求を復活させた。しかしながら、この能力は人間の知性の特徴と基本的に関係しているが、この分野での研究は、種の進化の過程で複雑な認知能力の出現を導く過程をほとんど考慮していない。人間行動生態学(HBE)の研究は、人間の自然を特徴づける行動が、我々の生態学的ニッチの構造に大きな変化に対する適応的な反応としてどのように考えられるかを理解することを目指している。本稿では,HBEの大きな仮説と近年の強化学習(RL)への貢献に基づく,オープンエンドスキル獲得における環境複雑性の役割を強調する枠組みを提案する。このフレームワークは、この2つの分野の基本的なリンクを強調し、生態系の複雑さをブートストラップするフィードバックループを特定し、AI研究者にとって有望な研究方向を作成するために使用します。

Recent advances in Artificial Intelligence (AI) have revived the quest for agents able to acquire an open-ended repertoire of skills. However, although this ability is fundamentally related to the characteristics of human intelligence, research in this field rarely considers the processes that may have guided the emergence of complex cognitive capacities during the evolution of the species. Research in Human Behavioral Ecology (HBE) seeks to understand how the behaviors characterizing human nature can be conceived as adaptive responses to major changes in the structure of our ecological niche. In this paper, we propose a framework highlighting the role of environmental complexity in open-ended skill acquisition, grounded in major hypotheses from HBE and recent contributions in Reinforcement learning (RL). We use this framework to highlight fundamental links between the two disciplines, as well as to identify feedback loops that bootstrap ecological complexity and create promising research directions for AI researchers.

翻訳日:2021-05-07 05:26:54 公開日:2020-12-17

# (参考訳) ドメイン適応人物再同定におけるサンプル不確かさの活用

Exploiting Sample Uncertainty for Domain Adaptive Person Re-Identification ( http://arxiv.org/abs/2012.08733v2 )

ライセンス: CC BY 4.0

Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang and Zheng-Jun Zha

(参考訳) unsupervised domain adaptive (uda) person re-identification (reid) アプローチの多くはクラスタリングに基づく擬似ラベル予測と特徴の微調整を組み合わせたものである。しかし、ドメインギャップのため、擬似ラベルは必ずしも信頼性がなく、ノイズ/誤りラベルが存在する。これは機能表現学習を誤解し、パフォーマンスを低下させる。本稿では,各試料に割り当てられた擬似ラベルの信頼性を推定・活用し,ノイズラベルの影響を軽減し,ノイズサンプルの寄与を抑制することを提案する。平均教師法を併用したベースラインフレームワークの構築と,さらに対照的な損失を生じさせる。我々は,クラスタリングによって間違った擬似ラベルを持つサンプルが,平均教師モデルと学生モデルの出力との整合性が弱いことを観察した。そこで本研究では,サンプルの擬似ラベルの信頼性評価に不確実性(一貫性レベルによって測定される)を活用し,サンプルごとのID分類損失,三重項損失,コントラスト損失など,様々なReID損失に再重み付けする不確実性を導入することを提案する。不確実性に基づく最適化は大幅な改善をもたらし、ベンチマークデータセットにおける最先端のパフォーマンスを達成します。

Many unsupervised domain adaptive (UDA) person re-identification (ReID) approaches combine clustering-based pseudo-label prediction with feature fine-tuning. However, because of domain gap, the pseudo-labels are not always reliable and there are noisy/incorrect labels. This would mislead the feature representation learning and deteriorate the performance. In this paper, we propose to estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels, by suppressing the contribution of noisy samples. We build our baseline framework using the mean teacher method together with an additional contrastive loss. We have observed that a sample with a wrong pseudo-label through clustering in general has a weaker consistency between the output of the mean teacher model and the student model. Based on this finding, we propose to exploit the uncertainty (measured by consistency levels) to evaluate the reliability of the pseudo-label of a sample and incorporate the uncertainty to re-weight its contribution within various ReID losses, including the identity (ID) classification loss per sample, the triplet loss, and the contrastive loss. Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.

翻訳日:2021-05-06 11:49:34 公開日:2020-12-17

# フェイクニュースにおけるテーマコヒーレンスの検討

Exploring Thematic Coherence in Fake News ( http://arxiv.org/abs/2012.09118v2 )

ライセンス: Link先を確認

Martins Samuel Dogo, Deepak P, Anna Jurek-Loughrey

(参考訳) 偽ニュースの拡散は依然として深刻な世界的な問題であり、理解と削減が最重要課題である。偽りの物語と真実の物語を区別する一つの方法は、その一貫性を分析することである。本研究は,インターネット上で共有されるクロスドメインニュースのコヒーレンスを分析するためのトピックモデルの利用について検討する。 7つのクロスドメインデータセットによる実験結果から、偽ニュースはその開始文と残りの文との主題的なずれが大きいことが示されている。

The spread of fake news remains a serious global issue; understanding and curtailing it is paramount. One way of differentiating between deceptive and truthful stories is by analyzing their coherence. This study explores the use of topic models to analyze the coherence of cross-domain news shared online. Experimental results on seven cross-domain datasets demonstrate that fake news shows a greater thematic deviation between its opening sentences and its remainder.

翻訳日:2021-05-03 03:00:42 公開日:2020-12-17

# (参考訳) ニューラルマッチングとファセット要約を用いた精密医学のための文献検索

Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization ( http://arxiv.org/abs/2012.09355v1 )

ライセンス: CC BY 4.0

Jiho Noh and Ramakanth Kavuluru

(参考訳) 精度医学(PM)のための情報検索(IR)は、患者を特徴づける複数の証拠を探すことを伴うことが多い。これは典型的には、患者に適用される少なくとも状態の名前と遺伝的変異を含む。その他の要因として、人口属性、同義性、社会的決定性などがある。このように、検索問題は、しばしばアドホック検索として定式化されるが、複数のファセット(例えば、病気、突然変異)を組み込む必要がある。本稿では,このような検索シナリオに対して,ニューラルクエリ文書マッチングとテキスト要約を組み合わせた文書再分類手法を提案する。アーキテクチャは基本的なBERTモデルに基づいており、3つの特定のコンポーネントを並べ替えています。 document-query matching (b) キーワード抽出と(c)。 facet-conditioned abstractive summarization b) と (c) の結果は、候補者の文書を本質的に簡潔な要約に変換するために使用され、これは手元のクエリと比較して関連度スコアを計算することができる。コンポーネント(a)は、クエリの候補文書のマッチングスコアを直接生成する。完全なアーキテクチャは、文書クエリマッチングの補完的なポテンシャルと、PMファセットに沿った要約に基づく新しい文書変換アプローチの恩恵を受ける。 NIST の TREC-PM トラックデータセット (2017-2019) を用いて評価した結果,本モデルが最先端の性能を達成することが示された。再現性を高めるために、私たちのコードはここで利用可能です。

Information retrieval (IR) for precision medicine (PM) often involves looking for multiple pieces of evidence that characterize a patient case. This typically includes at least the name of a condition and a genetic variation that applies to the patient. Other factors such as demographic attributes, comorbidities, and social determinants may also be pertinent. As such, the retrieval problem is often formulated as ad hoc search but with multiple facets (e.g., disease, mutation) that may need to be incorporated. In this paper, we present a document reranking approach that combines neural query-document matching and text summarization toward such retrieval scenarios. Our architecture builds on the basic BERT model with three specific components for reranking: (a). document-query matching (b). keyword extraction and (c). facet-conditioned abstractive summarization. The outcomes of (b) and (c) are used to essentially transform a candidate document into a concise summary that can be compared with the query at hand to compute a relevance score. Component (a) directly generates a matching score of a candidate document for a query. The full architecture benefits from the complementary potential of document-query matching and the novel document transformation approach based on summarization along PM facets. Evaluations using NIST's TREC-PM track datasets (2017--2019) show that our model achieves state-of-the-art performance. To foster reproducibility, our code is made available here: https://github.com/bionlproc/text-summ-for-doc-retrieval.

翻訳日:2021-05-03 00:34:49 公開日:2020-12-17

# (参考訳) フリーフォームテキストの自動処理による大学生への影響評価

Assessing COVID-19 Impacts on College Students via Automated Processing of Free-form Text ( http://arxiv.org/abs/2012.09369v1 )

ライセンス: CC BY 4.0

Ravi Sharma, Sri Divya Pagadala, Pratool Bharti, Sriram Chellappan, Trine Schmidt and Raj Goyal

(参考訳) 本稿では,covid-19が大学生に与える影響を,学生が生成した自由形式のテキストを処理して評価する実験結果について報告する。フリーテキスト(free-form texts)とは、大学生(米国大学4年中)が投稿したテキスト入力を、メンタルヘルスの評価と改善に特化したアプリを通じて意味する。 1451人の学生が4ヶ月以上(前と後)に収集した9000以上のテキストからなるデータセットを用いて、NLP技術を確立し、(a)学生の変化に最も関心を持つトピックが、(b)学生が前と後の各トピックで示す感情がどのように評価されるかを評価する。私たちの分析によると、新型コロナウイルス(COVID-19)後、学生にとって教育のようなトピックは明らかに重要ではなくなり、健康はより傾向が増した。また、新型コロナウイルス(covid-19)後の学生のネガティブな感情は、covid-19以前のものよりもずっと高かった。本研究は,大学管理者,教師,親,精神保健カウンセラーなど,さまざまな分野の高等教育政策立案者に与える影響を期待する。

In this paper, we report experimental results on assessing the impact of COVID-19 on college students by processing free-form texts generated by them. By free-form texts, we mean textual entries posted by college students (enrolled in a four year US college) via an app specifically designed to assess and improve their mental health. Using a dataset comprising of more than 9000 textual entries from 1451 students collected over four months (split between pre and post COVID-19), and established NLP techniques, a) we assess how topics of most interest to student change between pre and post COVID-19, and b) we assess the sentiments that students exhibit in each topic between pre and post COVID-19. Our analysis reveals that topics like Education became noticeably less important to students post COVID-19, while Health became much more trending. We also found that across all topics, negative sentiment among students post COVID-19 was much higher compared to pre-COVID-19. We expect our study to have an impact on policy-makers in higher education across several spectra, including college administrators, teachers, parents, and mental health counselors.

翻訳日:2021-05-03 00:17:27 公開日:2020-12-17

# (参考訳) masker: 信頼できるテキスト分類のためのマスク付きキーワード正規化

MASKER: Masked Keyword Regularization for Reliable Text Classification ( http://arxiv.org/abs/2012.09392v1 )

ライセンス: CC BY 4.0

Seung Jun Moon, Sangwoo Mo, Kimin Lee, Jaeho Lee, Jinwoo Shin

(参考訳) 事前訓練された言語モデルは、感情分析、自然言語推論、意味的なテキスト類似性など、様々なテキスト分類タスクにおいて最先端の精度を達成した。しかし、微調整テキスト分類器の信頼性は、しばしば見当たらない性能基準である。例えば、オフ・オブ・ディストリビューション(OOD)サンプル(トレーニング分布から遠く離れている)を検出したり、ドメインシフトに対して堅牢なモデルが欲しい場合もあります。信頼性に対する1つの大きな障害は、コンテキスト全体を見るのではなく、限られた数のキーワードでモデルの過度な信頼関係にあると主張する。特に, (a) OOD サンプルは分布内キーワードを含むことが多いが, (b) クロスドメインサンプルは必ずしもキーワードを含むとは限らない。そこで本研究では,文脈に基づく予測を容易にする簡易かつ効果的な微調整手法であるマスク付きキーワード正規化(MASKER)を提案する。 maskerはモデルを規則化し、他の単語からキーワードを再構築し、十分な文脈なしに低信頼の予測を行う。各種事前学習言語モデル(BERT,RoBERTa,ALBERT)に適用した場合,MASKERは分類精度を低下させることなくOODの検出とドメイン間一般化を改善する。コードはhttps://github.com/alinlab/MASKERで入手できる。

Pre-trained language models have achieved state-of-the-art accuracies on various text classification tasks, e.g., sentiment analysis, natural language inference, and semantic textual similarity. However, the reliability of the fine-tuned text classifiers is an often underlooked performance criterion. For instance, one may desire a model that can detect out-of-distribution (OOD) samples (drawn far from training distribution) or be robust against domain shifts. We claim that one central obstacle to the reliability is the over-reliance of the model on a limited number of keywords, instead of looking at the whole context. In particular, we find that (a) OOD samples often contain in-distribution keywords, while (b) cross-domain samples may not always contain keywords; over-relying on the keywords can be problematic for both cases. In light of this observation, we propose a simple yet effective fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. When applied to various pre-trained language models (e.g., BERT, RoBERTa, and ALBERT), we demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy. Code is available at https://github.com/alinlab/MASKER.

翻訳日:2021-05-03 00:06:30 公開日:2020-12-17

# (参考訳) オンラインマシン学習アドバイスを用いた計量タスクシステム

Metrical Task Systems with Online Machine Learned Advice ( http://arxiv.org/abs/2012.09394v1 )

ライセンス: CC BY 4.0

Kevin Rao

(参考訳) 機械学習アルゴリズムは、既存のデータに基づいて、将来の正確な予測を行うように設計されているが、オンラインアルゴリズムは、将来を知らずに、いくつかのパフォーマンス指標(通常、競争比率)に縛り付けようとしている。 lykourisとvassilvitskiiは、オンラインアルゴリズムを機械学習予測器で拡張することで、予測器が適当に正確である限り、競争比が確実に低下することを示した。そこで本稿では,boodin,linial,saks らによって提起されたオンライン計量タスクシステム問題に対して,動的システム処理タスクの汎用モデルとして,この概念を適用した。我々は、$n$タスク上の一様タスクシステムの特定のクラスに焦点を当て、最良の決定論的アルゴリズムは$O(n)$競争であり、最良のランダム化アルゴリズムは$O(\log n)$競争である。オンラインのアルゴリズムで学習したオラクルに絶対的な予測誤差を$\eta_0$で有界でアクセスすることで、メートル法タスクシステムの一様問題に対して$\Theta(\min(\sqrt{\eta_0}, \log n))$の競合アルゴリズムを構築する。また、任意のランダム化アルゴリズムの競合比に対して、$\Theta(\log \sqrt{\eta_0})$低い境界を与える。

Machine learning algorithms are designed to make accurate predictions of the future based on existing data, while online algorithms seek to bound some performance measure (typically the competitive ratio) without knowledge of the future. Lykouris and Vassilvitskii demonstrated that augmenting online algorithms with a machine learned predictor can provably decrease the competitive ratio under as long as the predictor is suitably accurate. In this work we apply this idea to the Online Metrical Task System problem, which was put forth by Borodin, Linial, and Saks as a general model for dynamic systems processing tasks in an online fashion. We focus on the specific class of uniform task systems on $n$ tasks, for which the best deterministic algorithm is $O(n)$ competitive and the best randomized algorithm is $O(\log n)$ competitive. By giving an online algorithms access to a machine learned oracle with absolute predictive error bounded above by $\eta_0$, we construct a $\Theta(\min(\sqrt{\eta_0}, \log n))$ competitive algorithm for the uniform case of the metrical task systems problem. We also give a $\Theta(\log \sqrt{\eta_0})$ lower bound on the competitive ratio of any randomized algorithm.

翻訳日:2021-05-02 23:48:21 公開日:2020-12-17

# (参考訳) 組成制約下での確率的組成勾配降下

Stochastic Compositional Gradient Descent under Compositional constraints ( http://arxiv.org/abs/2012.09400v1 )

ライセンス: CC BY 4.0

Srujan Teja Thomdapu, Harshvardhan, Ketan Rajawat

(参考訳) 本研究は、目的関数と制約関数が凸であり、確率関数の合成として表現される確率的最適化問題を制約した。この問題は、公正な分類、公平な回帰、およびキューシステムの設計という文脈で生じる。特に興味深いのは、オラクルが構成関数の確率的勾配を提供する大規模な設定であり、その目的は、オラクルへの最小限の呼び出しで問題を解決することである。この問題は、公平な分類/回帰とキューシステムの設計に生じる。構成形式により、オラクルによって提供される確率勾配は、目的あるいは制約勾配の偏りのない見積もりを生じさせない。代わりに, 内関数評価を追跡することで近似勾配を構築し, 準次saddle pointアルゴリズムを導出する。提案アルゴリズムは最適かつ実現可能な解をほぼ確実に見つけることが保証されている。さらに、提案アルゴリズムでは、制約違反をゼロにしつつ、$\epsilon$-approximate の最適点を得るために$\mathcal{o}(1/\epsilon^4)$ データサンプルが必要であることも確認する。その結果、制約のない問題に対する確率的組成勾配降下法のサンプル複雑性が一致し、制約付き設定の最もよく知られたサンプル複雑性結果が改善される。提案アルゴリズムの有効性は、公平な分類と公平な回帰問題の両方で検証される。数値計算の結果,提案アルゴリズムは収束率の観点から最先端のアルゴリズムよりも優れていた。

This work studies constrained stochastic optimization problems where the objective and constraint functions are convex and expressed as compositions of stochastic functions. The problem arises in the context of fair classification, fair regression, and the design of queuing systems. Of particular interest is the large-scale setting where an oracle provides the stochastic gradients of the constituent functions, and the goal is to solve the problem with a minimal number of calls to the oracle. The problem arises in fair classification/regression and in the design of queuing systems. Owing to the compositional form, the stochastic gradients provided by the oracle do not yield unbiased estimates of the objective or constraint gradients. Instead, we construct approximate gradients by tracking the inner function evaluations, resulting in a quasi-gradient saddle point algorithm. We prove that the proposed algorithm is guaranteed to find the optimal and feasible solution almost surely. We further establish that the proposed algorithm requires $\mathcal{O}(1/\epsilon^4)$ data samples in order to obtain an $\epsilon$-approximate optimal point while also ensuring zero constraint violation. The result matches the sample complexity of the stochastic compositional gradient descent method for unconstrained problems and improves upon the best-known sample complexity results for the constrained settings. The efficacy of the proposed algorithm is tested on both fair classification and fair regression problems. The numerical results show that the proposed algorithm outperforms the state-of-the-art algorithms in terms of the convergence rate.

翻訳日:2021-05-02 23:35:09 公開日:2020-12-17

# (参考訳) 人工知能が3d頂点の重要性を命令

Artificial Intelligence ordered 3D vertex importance ( http://arxiv.org/abs/2012.10232v1 )

ライセンス: CC BY 4.0

Iva Vasic, Bata Vasic, and Zorica Nikolic

(参考訳) 多次元ネットワークのランキング頂点は、決定の重要性の選択と決定を含む多くの研究分野において重要である。いくつかの決定は他の決定よりも著しく重要であり、その重みの分類もまた不道徳である。本稿では,3次元ネットワーク頂点の重み付けのための人工知能を用いた重み付け決定手法を新たに定義し,量子化インデックス(qim)と誤り訂正符号の変調に基づいて,既存の順序統計頂点抽出追跡アルゴリズム(osveta)を改善した。本稿では,最新のニューラルネットワークの正確な予測手法をヒューリスティック手法に置き換え,統計的OSVETA基準によるネットワーク頂点の重要度決定の効率を大幅に向上させる手法を提案する。新しい人工知能技術により、3dメッシュの定義が大幅に改善され、トポロジカルな特徴をより良く評価できる。新たな手法により,安定頂点の定義精度が向上し,メッシュ頂点の削除確率が大幅に低下する。

Ranking vertices of multidimensional networks is crucial in many areas of research, including selecting and determining the importance of decisions. Some decisions are significantly more important than others, and their weight categorization is also imortant. This paper defines a completely new method for determining the weight decisions using artificial intelligence for importance ranking of three-dimensional network vertices, improving the existing Ordered Statistics Vertex Extraction and Tracking Algorithm (OSVETA) based on modulation of quantized indices (QIM) and error correction codes. The technique we propose in this paper offers significant improvements the efficiency of determination the importance of network vertices in relation to statistical OSVETA criteria, replacing heuristic methods with methods of precise prediction of modern neural networks. The new artificial intelligence technique enables a significantly better definition of the 3D meshes and a better assessment of their topological features. The new method contributions result in a greater precision in defining stable vertices, significantly reducing the probability of deleting mesh vertices.

翻訳日:2021-05-02 22:15:42 公開日:2020-12-17

# (参考訳) モーメントの変分法

The Variational Method of Moments ( http://arxiv.org/abs/2012.09422v1 )

ライセンス: CC BY 4.0

Andrew Bennett, Nathan Kallus

(参考訳) 条件モーメント問題は、可観測性の観点から構造因果パラメータを記述するための強力な定式化である。標準的なアプローチは、問題を限界モーメント条件の有限集合に還元し、最適に重み付けされたモーメントの一般化法(OWGMM)を適用することであるが、これは有限個のモーメントの特定を知っていなければならない。 OWGMMの変分極小修正により、条件モーメント問題に対する非常に一般的な推定器のクラスを定義し、このクラスはモーメントの変分法(VMM)と呼ばれ、無限個のモーメントを自然に制御できる。我々は、カーネル法やニューラルネットワークに基づく複数のVMM推定器の詳細な理論的解析を行い、これらの推定器が完全条件モーメントモデルにおいて一貫性があり、漸近的に正常であり、半パラメトリック的に効率的である適切な条件を提供する。これは、最適重み付けを組み込まず、漸近正規性を確立せず、半パラメトリック的に効率が良くない逆機械学習に基づく条件モーメント問題を解決する他の方法とは対照的である。

The conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables, a prominent example being instrumental variable regression. A standard approach is to reduce the problem to a finite set of marginal moment conditions and apply the optimally weighted generalized method of moments (OWGMM), but this requires we know a finite set of identifying moments, can still be inefficient even if identifying, or can be unwieldy and impractical if we use a growing sieve of moments. Motivated by a variational minimax reformulation of OWGMM, we define a very general class of estimators for the conditional moment problem, which we term the variational method of moments (VMM) and which naturally enables controlling infinitely-many moments. We provide a detailed theoretical analysis of multiple VMM estimators, including based on kernel methods and neural networks, and provide appropriate conditions under which these estimators are consistent, asymptotically normal, and semiparametrically efficient in the full conditional moment model. This is in contrast to other recently proposed methods for solving conditional moment problems based on adversarial machine learning, which do not incorporate optimal weighting, do not establish asymptotic normality, and are not semiparametrically efficient.

翻訳日:2021-05-02 20:43:15 公開日:2020-12-17

# (参考訳) Maximum EntropyはMaximum Likelihoodと競合する

Maximum Entropy competes with Maximum Likelihood ( http://arxiv.org/abs/2012.09430v1 )

ライセンス: CC BY 4.0

A.E. Allahverdyan and N.H. Martirosyan

(参考訳) 最大エントロピー(MAXENT)法は、未知の確率を推定するための便利な非パラメトリックツールを提供するため、理論的および応用機械学習に多くの応用がある。この方法は確率的推論に対する統計物理学の大きな貢献である。しかし、その妥当性の限界に対する体系的なアプローチは現在欠落している。ここでは、ベイズ決定論においてMAXENTを研究する。未知の確率に対してよく定義されたディリクレ密度が存在すると仮定し、様々な推定器の品質と適用性を決定するために平均カルバック・リーブラー距離(KL)を用いることができる。これらは、様々なMAXENT制約の関連性を評価し、その一般的な適用性を確認し、MAXENTを以前のvizに様々な依存度を持つ推定器と比較することができる。正規化された最大可能性(ML)とベイズ推定器。 MAXENTはスパースデータレジームに適用されるが、特定の種類の事前情報を必要とする。特にMAXENTは、推定されたランダム量とその確率の間に事前のランク相関が存在することを仮定して、最適に正規化されたMLより優れている。

Maximum entropy (MAXENT) method has a large number of applications in theoretical and applied machine learning, since it provides a convenient non-parametric tool for estimating unknown probabilities. The method is a major contribution of statistical physics to probabilistic inference. However, a systematic approach towards its validity limits is currently missing. Here we study MAXENT in a Bayesian decision theory set-up, i.e. assuming that there exists a well-defined prior Dirichlet density for unknown probabilities, and that the average Kullback-Leibler (KL) distance can be employed for deciding on the quality and applicability of various estimators. These allow to evaluate the relevance of various MAXENT constraints, check its general applicability, and compare MAXENT with estimators having various degrees of dependence on the prior, viz. the regularized maximum likelihood (ML) and the Bayesian estimators. We show that MAXENT applies in sparse data regimes, but needs specific types of prior information. In particular, MAXENT can outperform the optimally regularized ML provided that there are prior rank correlations between the estimated random quantity and its probabilities.

翻訳日:2021-05-02 20:26:52 公開日:2020-12-17

# (参考訳) 機械学習による航空の環境影響低減を支援する

Helping Reduce Environmental Impact of Aviation with Machine Learning ( http://arxiv.org/abs/2012.09433v1 )

ライセンス: CC BY 4.0

Ashish Kapoor

(参考訳) 商業航空は気候変動への最大の貢献の1つである。本稿では,飛行時間を短縮する解決策を検討することで,航空の環境への影響を低減することを提案する。具体的には、まず風速予測の改善を検討し、飛行計画立案者がより効率的なルートを見つけるためにより良い情報を利用できるようにした。第2に,風速予測の不確実性を考慮し,探索と搾取を最適に切り替えることで,目的地への最高速経路を探索する航空機のルーティング手法を提案する。

Commercial aviation is one of the biggest contributors towards climate change. We propose to reduce environmental impact of aviation by considering solutions that would reduce the flight time. Specifically, we first consider improving winds aloft forecast so that flight planners could use better information to find routes that are efficient. Secondly, we propose an aircraft routing method that seeks to find the fastest route to the destination by considering uncertainty in the wind forecasts and then optimally trading-off between exploration and exploitation.

翻訳日:2021-05-02 20:11:38 公開日:2020-12-17

# (参考訳) FG-Net:CorrelatedFeature MiningとGeometric-Aware Modelingを活用した高速大規模LiDARポイントクラウド

FG-Net: Fast Large-Scale LiDAR Point CloudsUnderstanding Network Leveraging CorrelatedFeature Mining and Geometric-Aware Modelling ( http://arxiv.org/abs/2012.09439v1 )

ライセンス: CC BY-SA 4.0

Kangcheng Liu, Zhi Gao, Feng Lin, and Ben M. Chen

(参考訳) FG-Netは、1つのNVIDIA GTX 1080 GPUで正確かつリアルタイムなパフォーマンスを実現する、大規模なポイントクラウド理解のための一般的なディープラーニングフレームワークである。まず,後続の高レベルタスクを容易にするために,新しいノイズ・アウトリアーフィルタリング法を考案した。そこで本研究では,局所的特徴関係と幾何学的パターンを十分に活用できる,特徴マイニングと変形可能な畳み込みに基づく幾何認識モデルを用いた深層畳み込みニューラルネットワークを提案する。効率の面では,計算コストとメモリ消費をそれぞれ削減するために,逆密度サンプリング操作と特徴ピラミッドに基づく残差学習戦略を提案する。実世界の挑戦的データセットに関する大規模な実験は、我々のアプローチが精度と効率の点で最先端のアプローチより優れていることを示した。また,本手法の一般化能力を示すために,弱教師付き転送学習も行った。

This work presents FG-Net, a general deep learning framework for large-scale point clouds understanding without voxelizations, which achieves accurate and real-time performance with a single NVIDIA GTX 1080 GPU. First, a novel noise and outlier filtering method is designed to facilitate subsequent high-level tasks. For effective understanding purpose, we propose a deep convolutional neural network leveraging correlated feature mining and deformable convolution based geometric-aware modelling, in which the local feature relationships and geometric patterns can be fully exploited. For the efficiency issue, we put forward an inverse density sampling operation and a feature pyramid based residual learning strategy to save the computational cost and memory consumption respectively. Extensive experiments on real-world challenging datasets demonstrated that our approaches outperform state-of-the-art approaches in terms of accuracy and efficiency. Moreover, weakly supervised transfer learning is also conducted to demonstrate the generalization capacity of our method.

翻訳日:2021-05-02 19:49:07 公開日:2020-12-17

# (参考訳) 樹木オートエンコーダを用いた談話構造の教師なし学習

Unsupervised Learning of Discourse Structures using a Tree Autoencoder ( http://arxiv.org/abs/2012.09446v1 )

ライセンス: CC BY 4.0

Patrick Huber and Giuseppe Carenini

(参考訳) RSTやPDTBのような一般的な談話理論によって仮定された談話情報は、下流のNLPタスクの増加を改善し、重要な現実世界の応用と対話の肯定的な効果と相乗効果を示すことが示されている。言論を取り入れる手法はますます洗練されていくが、強固で一般的な言論構造の必要性は、通常、厳密な数のドメインで小さなデータセットで訓練された現在の言論パーサーによって十分に満たされていない。これにより、任意のタスクの予測がうるさいし、信頼できない。結果として生じる、高品質で高品質な談話ツリーの欠如は、さらなる進歩に深刻な制限をもたらす。この欠点を解消するために,潜在木誘導フレームワークを自動エンコーディング目的に拡張することにより,タスクに依存しない教師なし方式で木構造を生成する新しい手法を提案する。提案手法は,構文解析,談話解析などの木構造的目的に適用可能である。しかし,談話木を生成するのに特に難しいアノテーションプロセスのため,まず,より大きく多様な談話木バンクを生成する方法を開発した。本稿では,複数の領域における自然文の一般的な木構造を推定し,様々なタスクで有望な結果を示す。

Discourse information, as postulated by popular discourse theories, such as RST and PDTB, has been shown to improve an increasing number of downstream NLP tasks, showing positive effects and synergies of discourse with important real-world applications. While methods for incorporating discourse become more and more sophisticated, the growing need for robust and general discourse structures has not been sufficiently met by current discourse parsers, usually trained on small scale datasets in a strictly limited number of domains. This makes the prediction for arbitrary tasks noisy and unreliable. The overall resulting lack of high-quality, high-quantity discourse trees poses a severe limitation to further progress. In order the alleviate this shortcoming, we propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective. The proposed approach can be applied to any tree-structured objective, such as syntactic parsing, discourse parsing and others. However, due to the especially difficult annotation process to generate discourse trees, we initially develop a method to generate larger and more diverse discourse treebanks. In this paper we are inferring general tree structures of natural text in multiple domains, showing promising results on a diverse set of tasks.

翻訳日:2021-05-02 19:19:40 公開日:2020-12-17

# (参考訳) 効率的な局所探索によるバランスの取れたグラフエッジ分割の強化

Enhancing Balanced Graph Edge Partition with Effective Local Search ( http://arxiv.org/abs/2012.09451v1 )

ライセンス: CC0 1.0

Zhenyu Guo, Mingyu Xiao, Yi Zhou, Dongxiang Zhang, Kian-Lee Tan

(参考訳) グラフパーティションは、並列グラフ処理システムにおいて、ワークロードのバランスを達成し、ジョブ完了時間を短縮するための重要なコンポーネントである。様々なパーティション戦略の中で、エッジパーティションは頂点パーティションよりもパワーローグラフの方が有望な性能を示しており、既存のグラフシステムではデフォルトパーティション戦略として広く採用されている。エッジセットを複数のバランスのとれた部分に分割することで、コピーされた頂点の総数を最小化するグラフエッジ分割問題は、最適化とアルゴリズムの観点から広く研究されている。本稿では,既存の手法による分割結果を改善するために,局所探索アルゴリズムについて検討する。具体的には,2つの新しい概念,すなわち調整可能なエッジとブロックを提案する。これらの結果をもとに,max-flowモデルの特性を生かした検索アルゴリズムを改良し,欲張りなヒューリスティックを開発した。アルゴリズムの性能を評価するため,まず近似品質の観点から適切な理論的解析を行う。この問題に対する既知の近似比を大幅に改善する。そして、多数のベンチマークデータセットと最先端のエッジパーティション戦略に関する広範な実験を行う。その結果,提案する局所探索フレームワークは,グラフ分割のクオリティをさらに向上させることができることがわかった。

Graph partition is a key component to achieve workload balance and reduce job completion time in parallel graph processing systems. Among the various partition strategies, edge partition has demonstrated more promising performance in power-law graphs than vertex partition and thereby has been more widely adopted as the default partition strategy by existing graph systems. The graph edge partition problem, which is to split the edge set into multiple balanced parts to minimize the total number of copied vertices, has been widely studied from the view of optimization and algorithms. In this paper, we study local search algorithms for this problem to further improve the partition results from existing methods. More specifically, we propose two novel concepts, namely adjustable edges and blocks. Based on these, we develop a greedy heuristic as well as an improved search algorithm utilizing the property of the max-flow model. To evaluate the performance of our algorithms, we first provide adequate theoretical analysis in terms of the approximation quality. We significantly improve the previously known approximation ratio for this problem. Then we conduct extensive experiments on a large number of benchmark datasets and state-of-the-art edge partition strategies. The results show that our proposed local search framework can further improve the quality of graph partition by a wide margin.

翻訳日:2021-05-02 19:01:54 公開日:2020-12-17

# (参考訳) 肺がん予測のための半教師付き自己訓練法

A new semi-supervised self-training method for lung cancer prediction ( http://arxiv.org/abs/2012.09472v1 )

ライセンス: CC0 1.0

Kelvin Shak, Mundher Al-Shabi, Andrea Liew, Boon Leong Lan, Wai Yee Chan, Kwan Hoong Ng, Maxine Tan

(参考訳) 背景と目的:早期肺がんの発見は,ステージ3以上の患者に対して高い死亡率を示すため重要である。 ct(ct)スキャンから同時に結節を検出し分類する手法は比較的少ない。さらに、肺がん予測に半教師付き学習を用いた研究はほとんどない。本研究では,約4,000個のCTスキャンの総合的CT肺検診データセットを用いて,Nuisy Students法を用いて肺結節の検出と分類を行う。方法:本研究では,LUNA16,LIDC,NLSTの3つのデータセットを用いた。まず,3次元深層畳み込みニューラルネットワークモデルを用いて肺結節の検出を行った。 Maxout Local-Global Networkとして知られる分類モデルは、非ローカルネットワークを使用して、形状特徴、残留ブロック、結節テクスチャを含む局所的特徴の検出、結節変動を検出するMaxoutレイヤを含むグローバルな特徴を検出する。我々は,NLSTデータセットを用いた肺がん予測のために,Noisy Studentsモデルを用いた最初のセルフトレーニングを訓練した。次に,Mixup正則化を行い,提案手法を強化し,誤ラベルに対する堅牢性を実現した。結果と結論: 我々の新しいMixup Maxout Local-Globalネットワークは、NLSTデータセットから2,005個の完全に独立したテストスキャンに対して0.87のAUCを達成する。提案手法はデロング試験 (p = 0.0001) を用いて5%の重要度レベルにおいて, 次の最高性能法を有意に上回った。本研究では,Nuisy StudentsとMixup正則化を組み合わせた自己学習による肺がん予測手法を提案する。 2,005個のスキャンの完全な独立データセット上で,他の手法に比べて画像数が多くても最先端の性能を達成できた。

Background and Objective: Early detection of lung cancer is crucial as it has high mortality rate with patients commonly present with the disease at stage 3 and above. There are only relatively few methods that simultaneously detect and classify nodules from computed tomography (CT) scans. Furthermore, very few studies have used semi-supervised learning for lung cancer prediction. This study presents a complete end-to-end scheme to detect and classify lung nodules using the state-of-the-art Self-training with Noisy Student method on a comprehensive CT lung screening dataset of around 4,000 CT scans. Methods: We used three datasets, namely LUNA16, LIDC and NLST, for this study. We first utilise a three-dimensional deep convolutional neural network model to detect lung nodules in the detection stage. The classification model known as Maxout Local-Global Network uses non-local networks to detect global features including shape features, residual blocks to detect local features including nodule texture, and a Maxout layer to detect nodule variations. We trained the first Self-training with Noisy Student model to predict lung cancer on the unlabelled NLST datasets. Then, we performed Mixup regularization to enhance our scheme and provide robustness to erroneous labels. Results and Conclusions: Our new Mixup Maxout Local-Global network achieves an AUC of 0.87 on 2,005 completely independent testing scans from the NLST dataset. Our new scheme significantly outperformed the next highest performing method at the 5% significance level using DeLong's test (p = 0.0001). This study presents a new complete end-to-end scheme to predict lung cancer using Self-training with Noisy Student combined with Mixup regularization. On a completely independent dataset of 2,005 scans, we achieved state-of-the-art performance even with more images as compared to other methods.

翻訳日:2021-05-02 18:47:30 公開日:2020-12-17

# (参考訳) 人工知能の計算原理:ニューラルネットワークによる学習と推論

Computational principles of intelligence: learning and reasoning with neural networks ( http://arxiv.org/abs/2012.09477v1 )

ライセンス: CC BY 4.0

Abel Torres Montoya

(参考訳) 機械学習と人工知能に対する大きな成果と現在の関心にもかかわらず、汎用的で効率的な問題解決を可能にする知性理論の探求はほとんど進歩していない。この研究は、3つの原則に基づいた新しい知能の枠組みを提案し、この方向性に貢献しようとするものである。まず、学習した入力表現の生成とミラーリングの性質。第二に、学習、問題解決、想像力のための基礎的で本質的で反復的なプロセスです。第3に、抑制規則を用いた因果合成表現に対する推論機構のアドホックチューニング。これらの原則は、解釈可能性、継続的な学習、常識などを提供するシステムアプローチを生み出します。一般的な問題解決手法として、人間指向のツールとして、そして最後に、脳の情報処理のモデルとして、このフレームワークが開発されている。

Despite significant achievements and current interest in machine learning and artificial intelligence, the quest for a theory of intelligence, allowing general and efficient problem solving, has done little progress. This work tries to contribute in this direction by proposing a novel framework of intelligence based on three principles. First, the generative and mirroring nature of learned representations of inputs. Second, a grounded, intrinsically motivated and iterative process for learning, problem solving and imagination. Third, an ad hoc tuning of the reasoning mechanism over causal compositional representations using inhibition rules. Together, those principles create a systems approach offering interpretability, continuous learning, common sense and more. This framework is being developed from the following perspectives: as a general problem solving method, as a human oriented tool and finally, as model of information processing in the brain.

翻訳日:2021-05-02 18:30:13 公開日:2020-12-17

# (参考訳) 3次元CNNのグローバルローカルアテンションを用いた弱改善された行動局在と行動認識

Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN ( http://arxiv.org/abs/2012.09542v1 )

ライセンス: CC BY 4.0

Novanto Yudistira, Muthu Subash Kavitha, Takio Kurita

(参考訳) 3D畳み込みニューラルネットワーク(3D CNN)は、ビデオシーケンスなどの3Dデータに関する空間的および時間的情報をキャプチャする。しかし,畳み込み・プーリング機構により,情報損失は避けられないように思われる。 3d cnnの視覚的な説明と分類を改善するために,(1)学習した3dresnextネットワークを用いて,局所的(グローバル局所)離散勾配を階層的に集約し,(2)注意ゲーティングネットワークを実装し,動作認識の精度を向上させる手法を提案する。提案手法は,3d cnnにおけるグローバル・ローカル・アテンション (global-local attention) と呼ばれる各層の有用性を示すことを目的としている。まず、3dresnextを訓練し、最大予測クラスに関するバックプロパゲーションを用いたアクション分類に適用する。各層の勾配と活性化はアップサンプリングされる。その後、アグリゲーションはよりニュアンス的な注意を喚起するために使われ、予測されたクラスの入力ビデオの最も重要な部分を指し示している。我々は最終位置決めに最終注意の輪郭閾値を用いる。 3dcamによる細粒度映像によるトリミング映像の空間的および時間的動作の定位評価を行った。実験の結果,提案手法は視覚的な説明と識別的注意を生じさせることがわかった。さらに,各層における注意ゲーティングによる行動認識は,ベースラインモデルよりも優れた分類結果が得られる。

3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences. However, due to the convolution and pooling mechanism, the information loss seems unavoidable. To improve the visual explanations and classification in 3D CNN, we propose two approaches; i) aggregate layer-wise global to local (global-local) discrete gradients using trained 3DResNext network, and ii) implement attention gating network to improve the accuracy of the action recognition. The proposed approach intends to show the usefulness of every layer termed as global-local attention in 3D CNN via visual attribution, weakly-supervised action localization, and action recognition. Firstly, the 3DResNext is trained and applied for action classification using backpropagation concerning the maximum predicted class. The gradients and activations of every layer are then up-sampled. Later, aggregation is used to produce more nuanced attention, which points out the most critical part of the predicted class's input videos. We use contour thresholding of final attention for final localization. We evaluate spatial and temporal action localization in trimmed videos using fine-grained visual explanation via 3DCam. Experimental results show that the proposed approach produces informative visual explanations and discriminative attention. Furthermore, the action recognition via attention gating on each layer produces better classification results than the baseline model.

翻訳日:2021-05-02 18:10:53 公開日:2020-12-17

# (参考訳) 変圧器を用いた少数ショットシーケンス学習

Few-shot Sequence Learning with Transformers ( http://arxiv.org/abs/2012.09543v1 )

ライセンス: CC BY 4.0

Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

(参考訳) 少数のトレーニング例でのみ提供される新しいタスクの学習を目的としている。本研究では,データポイントがトークン列である設定において,少数ショット学習を行い,トランスフォーマーに基づく効率的な学習アルゴリズムを提案する。最も簡単な設定では、実行すべき特定のタスクを表す入力シーケンスにトークンを付加し、ラベル付き例が少ないため、このトークンの埋め込みをオンザフライで最適化できることを示す。当社のアプローチでは,メタラーニングや少ショットラーニングの文献で現在普及しているアダプタ層や第2次微分計算といったモデルアーキテクチャの複雑な変更は必要としない。様々なタスクに対する我々のアプローチを実証し、いくつかのモデル変種およびベースラインアプローチの一般化特性を解析する。特に,構成的タスク記述子により性能が向上することを示す。実験により、我々のアプローチは、計算効率が向上しつつ、少なくとも他の手法と同様に動作することが示された。

Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens and propose an efficient learning algorithm based on Transformers. In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples. Our approach does not require complicated changes to the model architecture such as adapter layers nor computing second order derivatives as is currently popular in the meta-learning and few-shot learning literature. We demonstrate our approach on a variety of tasks, and analyze the generalization properties of several model variants and baseline approaches. In particular, we show that compositional task descriptors can improve performance. Experiments show that our approach works at least as well as other methods, while being more computationally efficient.

翻訳日:2021-05-02 18:02:20 公開日:2020-12-17

# (参考訳) 発展途上国の疾病発生に備えるツールとしてのcovid-19感情モニタリング

COVID-19 Emotion Monitoring as a Tool to Increase Preparedness for Disease Outbreaks in Developing Regions ( http://arxiv.org/abs/2012.12184v1 )

ライセンス: CC BY 4.0

Santiago Cortes and Juan Mu\~noz and David Betancur and Mauricio Toro

(参考訳) 新型コロナウイルス(COVID-19)のパンデミックは、病院の入院管理から不安やうつ病などの精神疾患の緩和など、多くの課題を引き起こした。本稿では,最先端自然言語処理モデルに基づくtwitter感情監視システムを開発することにより,後発の問題に対する解決策を提案する。このシステムは、都市のアカウント上の6つの異なる感情をモニタし、政治家や保健当局のtwitterアカウントも監視する。感情モニターを匿名で使用することで、保健当局と民間の健康保険会社は、自殺や臨床抑うつなどの問題に取り組む戦略を開発することができる。そのようなタスクのために選択されたモデルは、スペインコーパス(BETO)で事前訓練された変換器(BERT)からの双方向エンコーダ表現である。モデルは検証データセットでうまく機能した。このシステムは、コロンビアのcovid-19のシミュレーションとデータ分析のためのwebアプリケーションの一部として、https://epidemiologia-matematica.orgで公開されている。

The COVID-19 pandemic brought many challenges, from hospital-occupation management to lock-down mental-health repercussions such as anxiety or depression. In this work, we present a solution for the later problem by developing a Twitter emotion-monitor system based on a state-of-the-art natural-language processing model. The system monitors six different emotions on accounts in cities, as well as politicians and health-authorities Twitter accounts. With an anonymous use of the emotion monitor, health authorities and private health-insurance companies can develop strategies to tackle problems such as suicide and clinical depression. The model chosen for such a task is a Bidirectional-Encoder Representations from Transformers (BERT) pre-trained on a Spanish corpus (BETO). The model performed well on a validation dataset. The system is deployed online as part of a web application for simulation and data analysis of COVID-19, in Colombia, available at https://epidemiologia-matematica.org.

翻訳日:2021-05-02 17:46:11 公開日:2020-12-17

# (参考訳) トランスフォーマーを用いた事象連鎖の自己回帰推論

Autoregressive Reasoning over Chains of Facts with Transformers ( http://arxiv.org/abs/2012.11321v1 )

ライセンス: CC BY 4.0

Ruben Cartuyvels, Graham Spinks and Marie-Francine Moens

(参考訳) 本稿では,テキストスニペットの形で関連する事実を検索し,自然言語による質問とその答えを求めるマルチホップ説明再生のための反復推論アルゴリズムを提案する。マルチホップ推論のための複数の証拠や事実の組み合わせは、推論に必要な情報源の数が増えるとますます難しくなる。提案アルゴリズムは, コーパスからの事象の選択を自己回帰的に分解し, 以前に選択した事実に対して次の繰り返しを条件にすることで, この問題に対処する。これにより、ペアワイズな学習とランクの損失が利用できます。本手法は,TextGraphs 2019 および 2020 Shared Tasks のデータセットを用いて,説明再生のための検証を行う。このタスクの既存の作業は、独立して事実を評価するか、事実の連鎖を人工的に制限する。本手法は, 事前学習したトランスフォーマーモデルを用いて, 精度, トレーニング時間, 推論効率の面では, 従来よりも優れていることを示す。

This paper proposes an iterative inference algorithm for multi-hop explanation regeneration, that retrieves relevant factual evidence in the form of text snippets, given a natural language question and its answer. Combining multiple sources of evidence or facts for multi-hop reasoning becomes increasingly hard when the number of sources needed to make an inference grows. Our algorithm copes with this by decomposing the selection of facts from a corpus autoregressively, conditioning the next iteration on previously selected facts. This allows us to use a pairwise learning-to-rank loss. We validate our method on datasets of the TextGraphs 2019 and 2020 Shared Tasks for explanation regeneration. Existing work on this task either evaluates facts in isolation or artificially limits the possible chains of facts, thus limiting multi-hop inference. We demonstrate that our algorithm, when used with a pre-trained transformer model, outperforms the previous state-of-the-art in terms of precision, training time and inference efficiency.

翻訳日:2021-05-02 17:41:44 公開日:2020-12-17

# (参考訳) リカレントオートエンコーダからの一貫性指向潜在符号を用いた軌道塩分検出

Trajectory saliency detection using consistency-oriented latent codes from a recurrent auto-encoder ( http://arxiv.org/abs/2012.09573v1 )

ライセンス: CC BY 4.0

L. Maczyta, P. Bouthemy and O. Le Meur

(参考訳) 本稿では,ビデオシーケンスから進行動的サリエンシを検出することに関心がある。より正確には、私たちは動きに関連する給与に興味があり、時間とともに徐々に現れる可能性が高い。アラームの起動、追加処理の献身、特定のイベントの検出に関連がある。軌道は、進行的な動的塩分検出をサポートする最善の方法である。そのため、トラジェクティブ・サリエンシーについて論じる。与えられた文脈に関連する共通の動きパターンを共有する通常の軌跡から逸脱した場合、軌跡は有能である。まず、軌跡のコンパクトかつ識別的な表現が必要である。ほぼ)教師なしの学習ベースのアプローチを採用しています。再帰オートエンコーダによって推定される潜在コードは、所望の表現を提供する。さらに、オートエンコーダ損失関数を用いて、通常の(類似した)軌道の整合性を強制する。軌道コードから正規性を考慮したプロトタイプコードまでの距離は、健全な軌道を検出する手段である。我々は,合成および実軌道データセット上での軌道塩分検出手法を検証し,その異なる成分の寄与を強調する。本手法は,駅で取得した歩行者軌跡の公開データセット(alahi 2014)から得られた複数のシナリオにおいて,既存の手法に勝ることを示す。

In this paper, we are concerned with the detection of progressive dynamic saliency from video sequences. More precisely, we are interested in saliency related to motion and likely to appear progressively over time. It can be relevant to trigger alarms, to dedicate additional processing or to detect specific events. Trajectories represent the best way to support progressive dynamic saliency detection. Accordingly, we will talk about trajectory saliency. A trajectory will be qualified as salient if it deviates from normal trajectories that share a common motion pattern related to a given context. First, we need a compact while discriminative representation of trajectories. We adopt a (nearly) unsupervised learning-based approach. The latent code estimated by a recurrent auto-encoder provides the desired representation. In addition, we enforce consistency for normal (similar) trajectories through the auto-encoder loss function. The distance of the trajectory code to a prototype code accounting for normality is the means to detect salient trajectories. We validate our trajectory saliency detection method on synthetic and real trajectory datasets, and highlight the contributions of its different components. We show that our method outperforms existing methods on several scenarios drawn from the publicly available dataset of pedestrian trajectories acquired in a railway station (Alahi 2014).

翻訳日:2021-05-02 17:23:17 公開日:2020-12-17

# (参考訳) 非対称マルチタスク特徴学習におけるタスク不確かさ損失の負の移動

Task Uncertainty Loss Reduce Negative Transfer in Asymmetric Multi-task Feature Learning ( http://arxiv.org/abs/2012.09575v1 )

ライセンス: CC BY 4.0

Rafael Peres da Silva, Chayaporn Suphavilai, Niranjan Nagarajan

(参考訳) マルチタスク学習(MTL)は、限られた訓練データに基づいて目標タスクを学習しなければならない設定で頻繁に使用されるが、関連する補助タスクから知識を活用できる。 mtlはシングルタスク学習(stl)と比較して全体的なタスクパフォーマンスを向上させることができるが、これらの改善は負の転送(nt)を隠すことができる。非対称マルチタスク特徴学習(AMTFL)は、損失値の高いタスクが他のタスクを学習するための特徴表現に与える影響を小さくすることで、この問題に対処しようとするアプローチである。タスク損失値は必ずしも特定のタスクのモデルの信頼性を示すものではない。本稿では,2つの直交データセット(画像認識と薬理ゲノミクス)にNTの例を示し,課題間の相対的信頼度を把握し,タスク損失の重みを設定することで,この課題に対処する。提案手法は,堅牢なMTLを実現するための新しいアプローチを提供するNTを削減できることを示す。

Multi-task learning (MTL) is frequently used in settings where a target task has to be learnt based on limited training data, but knowledge can be leveraged from related auxiliary tasks. While MTL can improve task performance overall relative to single-task learning (STL), these improvements can hide negative transfer (NT), where STL may deliver better performance for many individual tasks. Asymmetric multitask feature learning (AMTFL) is an approach that tries to address this by allowing tasks with higher loss values to have smaller influence on feature representations for learning other tasks. Task loss values do not necessarily indicate reliability of models for a specific task. We present examples of NT in two orthogonal datasets (image recognition and pharmacogenomics) and tackle this challenge by using aleatoric homoscedastic uncertainty to capture the relative confidence between tasks, and set weights for task loss. Our results show that this approach reduces NT providing a new approach to enable robust MTL.

翻訳日:2021-05-02 17:22:17 公開日:2020-12-17

# (参考訳) 金融機関向け高出力ニューラルネットワークモデルによる感性データ検出

Sensitive Data Detection with High-Throughput Neural Network Models for Financial Institutions ( http://arxiv.org/abs/2012.09597v1 )

ライセンス: CC0 1.0

Anh Truong, Austin Walters, Jeremy Goodsitt

(参考訳) 名前付きエンティティ認識は多くの分野で広く研究されている。しかし, ラベル付きデータセットが公開されていないため, 金融機関における生産システムへのセンシティブな実体検出の適用は十分に検討されていない。本稿では、内部および合成データセットを用いて、非構造化データフォーマットと構造化データフォーマットの両方において、金融機関内で一般的に見られるNPI(Nonpublic Personally Identibility)情報を検出する様々な方法を評価する。 CNN,LSTM,BiLSTM-CRF,CNN-CRFといった文字レベルのニューラルネットワークモデルは,複数のデータフォーマット上でのエンティティ検出と,表付きデータセット上でのカラム単位のエンティティ予測という2つの予測タスクについて検討した。これらのモデルを,f1-score,精度,リコール,スループットに関して,実データと合成データの両方における他の標準的なアプローチと比較した。実際のデータセットには、内部構造化データと、手動タグ付きラベル付き公開eメールデータが含まれる。実験の結果,CNNモデルは精度とスループットにおいてシンプルだが有効であり,本運用環境に展開する最も適した候補モデルであることが示唆された。最後に、データ制限、データラベリング、データエンティティの固有の重複について学んだ教訓をいくつか提供する。

Named Entity Recognition has been extensively investigated in many fields. However, the application of sensitive entity detection for production systems in financial institutions has not been well explored due to the lack of publicly available, labeled datasets. In this paper, we use internal and synthetic datasets to evaluate various methods of detecting NPI (Nonpublic Personally Identifiable) information commonly found within financial institutions, in both unstructured and structured data formats. Character-level neural network models including CNN, LSTM, BiLSTM-CRF, and CNN-CRF are investigated on two prediction tasks: (i) entity detection on multiple data formats, and (ii) column-wise entity prediction on tabular datasets. We compare these models with other standard approaches on both real and synthetic data, with respect to F1-score, precision, recall, and throughput. The real datasets include internal structured data and public email data with manually tagged labels. Our experimental results show that the CNN model is simple yet effective with respect to accuracy and throughput and thus, is the most suitable candidate model to be deployed in the production environment(s). Finally, we provide several lessons learned on data limitations, data labelling and the intrinsic overlap of data entities.

翻訳日:2021-05-02 17:18:59 公開日:2020-12-17

# (参考訳) XAI-P-T: 説明可能な人工知能の実践から理論へ

XAI-P-T: A Brief Review of Explainable Artificial Intelligence from Practice to Theory ( http://arxiv.org/abs/2012.09636v1 )

ライセンス: CC BY 4.0

Nazanin Fouladgar and Kary Fr\"amling

(参考訳) 本稿では,いくつかの基礎文献で確認された説明可能なAI(XAI)の実践的・理論的側面について報告する。 XAIの背景の表現には膨大な作業があるが、コーパスの多くは思考の個別の方向を指し示している。実践と理論の同時に文学に洞察を与えることは、この分野ではまだギャップである。これは、初期のXAI研究者の学習プロセスを促進し、経験豊富なXAI学者に明るい立場を与えるためである。ここではまずブラックボックスの説明のカテゴリに注目し,実例を示す。その後、多分野の体に理論的な説明が根拠となっているかについて議論する。最後に、今後の作品の方向性を示す。

In this work, we report the practical and theoretical aspects of Explainable AI (XAI) identified in some fundamental literature. Although there is a vast body of work on representing the XAI backgrounds, most of the corpuses pinpoint a discrete direction of thoughts. Providing insights into literature in practice and theory concurrently is still a gap in this field. This is important as such connection facilitates a learning process for the early stage XAI researchers and give a bright stand for the experienced XAI scholars. Respectively, we first focus on the categories of black-box explanation and give a practical example. Later, we discuss how theoretically explanation has been grounded in the body of multidisciplinary fields. Finally, some directions of future works are presented.

翻訳日:2021-05-02 16:35:38 公開日:2020-12-17

# (参考訳) 映画脚本とストーリーに応用する概念的ソフトウェア工学

Conceptual Software Engineering Applied to Movie Scripts and Stories ( http://arxiv.org/abs/2012.11319v1 )

ライセンス: CC BY 4.0

Sabah Al-Fedaghi

(参考訳) 本研究は,他の研究分野に適用可能な,ソフトウェア工学ツール,概念モデリングの別の応用について紹介する。ソフトウェア工学と他の分野との関係を強化する一つの方法は、これらの分野の特異性に対処できる概念モデリングを行う良い方法を開発することである。この研究は人文科学と社会科学に焦点を合わせ、通常は抽象機械や(抽象的)機械から離れて、より柔らかいと考えられる。具体的には、ストーリーや映画の脚本の領域におけるソフトウェア工学ツール(UMLなど)としての概念モデリングに焦点を当てます。人文科学と社会科学の研究者たちは、エンジニアが行うような形式化は使っていないかもしれないが、概念モデリングは有用だと考えている。現在のモデリング技術(UMLなど)はこのタスクで失敗する。同様の概念モデリング言語(ConMLなど)は、人文科学や社会科学を念頭に置いて提案され、あらゆるものをモデル化することができる。この研究は、ソフトウェアモデリング技術であるthinging machine(tm)が映画脚本やストーリーに適用されるこの方向のベンチャーである。本稿では,映画脚本や物語の図形的静的・動的モデルを開発するための新しいアプローチを提案する。 tmモデルダイアグラムはナラティブな談話の中立的で独立した表現であり、参加者間のコミュニケーション手段として使用できる。提示された例は、プロップの妖精のモデルによる例で、鉄道児童と実際の映画の脚本は、アプローチの可能性を示唆しているようである。

This study introduces another application of software engineering tools, conceptual modeling, which can be applied to other fields of research. One way to strengthen the relationship between software engineering and other fields is to develop a good way to perform conceptual modeling that is capable of addressing the peculiarities of these fields of study. This study concentrates on humanities and social sciences, which are usually considered softer and further away from abstractions and (abstract) machines. Specifically, we focus on conceptual modeling as a software engineering tool (e.g., UML) in the area of stories and movie scripts. Researchers in the humanities and social sciences might not use the same degree of formalization that engineers do, but they still find conceptual modeling useful. Current modeling techniques (e.g., UML) fail in this task because they are geared toward the creation of software systems. Similar Conceptual Modeling Language (e.g., ConML) has been proposed with the humanities and social sciences in mind and, as claimed, can be used to model anything. This study is a venture in this direction, where a software modeling technique, Thinging Machine (TM), is applied to movie scripts and stories. The paper presents a novel approach to developing diagrammatic static/dynamic models of movie scripts and stories. The TM model diagram serves as a neutral and independent representation for narrative discourse and can be used as a communication instrument among participants. The examples presented include examples from Propp s model of fairytales; the railway children and an actual movie script seem to point to the viability of the approach.

翻訳日:2021-05-02 16:28:45 公開日:2020-12-17

# (参考訳) RainBench: 衛星画像による世界の降水予測に向けて

RainBench: Towards Global Precipitation Forecasting from Satellite Imagery ( http://arxiv.org/abs/2012.09670v1 )

ライセンス: CC BY 4.0

Christian Schroeder de Witt, Catherine Tong, Valentina Zantedeschi, Daniele De Martini, Freddie Kalaitzis, Matthew Chantry, Duncan Watson-Parris, Piotr Bilinski

(参考訳) 激しい降雨や暴風雨のような極端な降雨は、発展途上国の経済や生活を日常的に破壊する。気候変動はこの問題をさらに悪化させる。データ駆動型ディープラーニングアプローチは、そのようなイベントを緩和するために、正確な複数日予測へのアクセスを広げる可能性がある。しかし、世界の降水量予測の研究に特化したベンチマークデータセットは今のところ存在しない。本稿では,データ駆動降水予測のための新しいマルチモーダルベンチマークデータセットである \textbf{RainBench} を紹介する。これには、シミュレーションされた衛星データ、era5の再分析製品からの関連する気象データの選択、およびimergの降水データが含まれる。また、大規模な降水データセットを効率的に処理するライブラリである \textbf{PyRain} もリリースしています。本研究では,提案するデータセットを広範囲に分析し,中規模降水予測タスクのベースラインを2つ確立する。最後に,既存の気象予報手法について考察し,今後の研究方法を提案する。

Extreme precipitation events, such as violent rainfall and hail storms, routinely ravage economies and livelihoods around the developing world. Climate change further aggravates this issue. Data-driven deep learning approaches could widen the access to accurate multi-day forecasts, to mitigate against such events. However, there is currently no benchmark dataset dedicated to the study of global precipitation forecasts. In this paper, we introduce \textbf{RainBench}, a new multi-modal benchmark dataset for data-driven precipitation forecasting. It includes simulated satellite data, a selection of relevant meteorological data from the ERA5 reanalysis product, and IMERG precipitation data. We also release \textbf{PyRain}, a library to process large precipitation datasets efficiently. We present an extensive analysis of our novel dataset and establish baseline results for two benchmark medium-range precipitation forecasting tasks. Finally, we discuss existing data-driven weather forecasting methodologies and suggest future research avenues.

翻訳日:2021-05-02 16:16:18 公開日:2020-12-17

# (参考訳) GANトレーニングにおける燃焼モード崩壊:ヘッセン固有値を用いた実証分析

Combating Mode Collapse in GAN training: An Empirical Analysis using Hessian Eigenvalues ( http://arxiv.org/abs/2012.09673v1 )

ライセンス: CC BY 4.0

Ricard Durall, Avraam Chatzimichailidis, Peter Labus and Janis Keuper

(参考訳) generative adversarial networks (gans) は最先端の成果を画像生成に提供します。しかし、非常に強力であるにもかかわらず、訓練は非常に困難である。これは特に、非常に非凸な最適化空間が多くの不安定性をもたらすために引き起こされる。中でもモード崩壊は、最も厄介なもののひとつとして際立っている。この望ましくないイベントは、モデルがデータ分散のいくつかのモードのみに適合できる場合に発生するが、その大半は無視される。本研究では,2次勾配情報を用いてモード崩壊と戦う。そのため、Hessian固有値を通して損失曲面を解析し、モード崩壊が鋭い最小値への収束と関連していることを示す。特に、$G$の固有値がモード崩壊の発生とどのように直接相関するかを観察する。最後に,これらの知見に動機づけられて,スペクトル情報を用いてモード崩壊を克服し,経験的により安定な収束特性を実現する,nudged-adam(nugan)と呼ばれる新しい最適化アルゴリズムを設計した。

Generative adversarial networks (GANs) provide state-of-the-art results in image generation. However, despite being so powerful, they still remain very challenging to train. This is in particular caused by their highly non-convex optimization space leading to a number of instabilities. Among them, mode collapse stands out as one of the most daunting ones. This undesirable event occurs when the model can only fit a few modes of the data distribution, while ignoring the majority of them. In this work, we combat mode collapse using second-order gradient information. To do so, we analyse the loss surface through its Hessian eigenvalues, and show that mode collapse is related to the convergence towards sharp minima. In particular, we observe how the eigenvalues of the $G$ are directly correlated with the occurrence of mode collapse. Finally, motivated by these findings, we design a new optimization algorithm called nudged-Adam (NuGAN) that uses spectral information to overcome mode collapse, leading to empirically more stable convergence properties.

翻訳日:2021-05-02 15:56:03 公開日:2020-12-17

# (参考訳) ベンガル語におけるヘイトスピーチ検出:データセットとそのベースライン評価

Hate Speech detection in the Bengali language: A dataset and its baseline evaluation ( http://arxiv.org/abs/2012.09686v1 )

ライセンス: CC BY 4.0

Nauros Romim, Mosahed Ahmed, Hriteshwar Talukder, Md Saiful Islam

(参考訳) YouTubeやFacebookといったソーシャルメディアサイトは、あらゆる人の生活に欠かせない存在となり、ここ数年、ソーシャルメディアのコメント欄でヘイトスピーチが急速に増えている。ソーシャルメディアwebサイトにおけるヘイトスピーチの検出は、小さな不均衡データセット、適切なモデルの発見、特徴分析方法の選択など、さまざまな課題に直面している。さらに、この問題は、金の標準ラベル付きデータセットがないため、ベンガル語話者コミュニティにとってより厳しいものである。本稿では,クラウドソーシングによってタグ付けされ,専門家によって検証された3万のユーザコメントのデータセットを提案する。コメントはすべてYouTubeとFacebookのコメントセクションから収集され、スポーツ、エンターテイメント、宗教、政治、犯罪、有名人、TikTok & Memeの7つのカテゴリーに分類される。合計50の注釈が各コメントに3回アノテートされ、過半数の投票が最終注釈とされた。それでも我々は,Word2VecやFastText,BengFastTextといったベンガル語を組み込んだベースライン実験や深層学習モデルをこのデータセット上で実施して,今後の研究機会の確保に努めてきた。実験の結果、すべてのディープラーニングモデルはうまく動作したが、SVMは87.5%の精度で最高の結果を得た。私たちの中心となる貢献は、ベンチマークデータセットを利用可能にして、ベンガルヘイトスピーチ検出の分野におけるさらなる研究を容易にすることです。

Social media sites such as YouTube and Facebook have become an integral part of everyone's life and in the last few years, hate speech in the social media comment section has increased rapidly. Detection of hate speech on social media websites faces a variety of challenges including small imbalanced data sets, the findings of an appropriate model and also the choice of feature analysis method. further more, this problem is more severe for the Bengali speaking community due to the lack of gold standard labelled datasets. This paper presents a new dataset of 30,000 user comments tagged by crowd sourcing and varified by experts. All the comments are collected from YouTube and Facebook comment section and classified into seven categories: sports, entertainment, religion, politics, crime, celebrity and TikTok & meme. A total of 50 annotators annotated each comment three times and the majority vote was taken as the final annotation. Nevertheless, we have conducted base line experiments and several deep learning models along with extensive pre-trained Bengali word embedding such as Word2Vec, FastText and BengFastText on this dataset to facilitate future research opportunities. The experiment illustrated that although all deep learning models performed well, SVM achieved the best result with 87.5% accuracy. Our core contribution is to make this benchmark dataset available and accessible to facilitate further research in the field of in the field of Bengali hate speech detection.

翻訳日:2021-05-02 15:45:13 公開日:2020-12-17

# (参考訳) sroll3: プランク高周波楽器マップにおける大規模系統効果低減のためのニューラルネットワークアプローチ

SRoll3: A neural network approach to reduce large-scale systematic effects in the Planck High Frequency Instrument maps ( http://arxiv.org/abs/2012.09702v1 )

ライセンス: CC BY 4.0

Manuel L\'opez-Radcenco, Jean-Marc Delouis and Laurent Vibert

(参考訳) 本研究では,Planck High Frequency Instrument(Planck-HFI)データに対するマップ作成と,生成したスカイマップ内の大規模な系統的効果の除去に着目し,構造化汚染源の削減を目的としたニューラルネットワークに基づくデータインバージョン手法を提案する。汚染源の除去は、異なる時空間スケール間のカップリングを生み出す局所時空間相互作用によって特徴づけられるこれらの源の構造的性質によって可能となる。これらの結合を利用して最適な低次元表現を学習し、汚染源除去と地図作成の目的に最適化し、堅牢で効果的なデータインバージョンを実現する手段として、ニューラルネットワークの探索に焦点をあてる。提案手法の多種多様な変種を開発し,物理学的インフォームド制約とトランスファー学習技術の導入を検討する。さらに、専門家の知識を教師なしのネットワークトレーニングアプローチに統合するために、データ拡張技術を活用することに注力する。提案手法をPlanck-HFI 545 GHz Far Side Lobe シミュレーションデータに適用し,部分的,ギャップ満載,一貫性のないデータセットを含む理想的,非理想的事例を考察し,ニューラルネットワークに基づく次元性低減の可能性を示す。また,本論文では,実プランクhfi 857 ghzデータに適用し,汚染除去性能の面で最大1桁の利益を報告し,構造的汚染源を正確にモデル化・捕捉するための提案手法の妥当性を示す。本研究で開発された手法は,SRollアルゴリズムの新バージョン(SRoll3)に統合され,SRoll3 857 GHz検出器マップをコミュニティに公開する。

In the present work, we propose a neural network based data inversion approach to reduce structured contamination sources, with a particular focus on the mapmaking for Planck High Frequency Instrument (Planck-HFI) data and the removal of large-scale systematic effects within the produced sky maps. The removal of contamination sources is rendered possible by the structured nature of these sources, which is characterized by local spatiotemporal interactions producing couplings between different spatiotemporal scales. We focus on exploring neural networks as a means of exploiting these couplings to learn optimal low-dimensional representations, optimized with respect to the contamination source removal and mapmaking objectives, to achieve robust and effective data inversion. We develop multiple variants of the proposed approach, and consider the inclusion of physics informed constraints and transfer learning techniques. Additionally, we focus on exploiting data augmentation techniques to integrate expert knowledge into an otherwise unsupervised network training approach. We validate the proposed method on Planck-HFI 545 GHz Far Side Lobe simulation data, considering ideal and non-ideal cases involving partial, gap-filled and inconsistent datasets, and demonstrate the potential of the neural network based dimensionality reduction to accurately model and remove large-scale systematic effects. We also present an application to real Planck-HFI 857 GHz data, which illustrates the relevance of the proposed method to accurately model and capture structured contamination sources, with reported gains of up to one order of magnitude in terms of contamination removal performance. Importantly, the methods developed in this work are to be integrated in a new version of the SRoll algorithm (SRoll3), and we describe here SRoll3 857 GHz detector maps that will be released to the community.

翻訳日:2021-05-02 15:17:55 公開日:2020-12-17

# (参考訳) Deep Molecular Dreaming: Inverse Machine Learning for De-novo Molecular Design and Interpretability with surjective representations

Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations ( http://arxiv.org/abs/2012.09712v1 )

ライセンス: CC BY 4.0

Cynthia Shen, Mario Krenn, Sagi Eppel, Alan Aspuru-Guzik

(参考訳) コンピュータによる機能分子のデノボ設計は、今日の化学情報学における最も顕著な課題の1つである。その結果、人工知能の分野からの生成的および進化的逆設計が急速に発展し、特定の化学的性質のために分子を最適化することを目指している。これらのモデルは「間接的に」化学空間を探索し、潜伏空間、政策、分布を学習したり、分子の集団に突然変異を施すことで探索する。しかし、SMILESの代替である分子のSELFIES文字列表現の最近の発展により、他の潜在的な技術が考えられるようになった。そこで本研究では,SELFIESに基づく直進勾配に基づく分子最適化手法PASITHEAを提案する。 PASITHEAは、ニューラルネットワークの学習プロセスを直接反転させることで勾配の利用を利用する。効果的に、これはある性質に最適化された分子変種を生成することができる逆回帰モデルを形成する。結果は予備的ではあるが,パシテアの生存可能性を明確に示し,逆訓練中の選択された属性の分布の変化を観察した。インセプション主義の驚くべき特性は、モデルがトレーニングした化学空間に対する理解を直接調査できることである。 PASITHEAをより大きなデータセット、分子、さらに複雑な性質に拡張することは、新しい機能分子の設計と機械学習モデルの解釈と説明につながると期待している。

Computer-based de-novo design of functional molecules is one of the most prominent challenges in cheminformatics today. As a result, generative and evolutionary inverse designs from the field of artificial intelligence have emerged at a rapid pace, with aims to optimize molecules for a particular chemical property. These models 'indirectly' explore the chemical space; by learning latent spaces, policies, distributions or by applying mutations on populations of molecules. However, the recent development of the SELFIES string representation of molecules, a surjective alternative to SMILES, have made possible other potential techniques. Based on SELFIES, we therefore propose PASITHEA, a direct gradient-based molecule optimization that applies inceptionism techniques from computer vision. PASITHEA exploits the use of gradients by directly reversing the learning process of a neural network, which is trained to predict real-valued chemical properties. Effectively, this forms an inverse regression model, which is capable of generating molecular variants optimized for a certain property. Although our results are preliminary, we observe a shift in distribution of a chosen property during inverse-training, a clear indication of PASITHEA's viability. A striking property of inceptionism is that we can directly probe the model's understanding of the chemical space it was trained on. We expect that extending PASITHEA to larger datasets, molecules and more complex properties will lead to advances in the design of new functional molecules as well as the interpretation and explanation of machine learning models.

翻訳日:2021-05-02 14:39:40 公開日:2020-12-17

# (参考訳) FERMI FELを用いた粒子加速器制御のためのモデルフリー・ベイズ組立モデルに基づく深部強化学習

Model-free and Bayesian Ensembling Model-based Deep Reinforcement Learning for Particle Accelerator Control Demonstrated on the FERMI FEL ( http://arxiv.org/abs/2012.09737v1 )

ライセンス: CC BY 4.0

Simon Hirlaender, Niky Bruchon

(参考訳) 強化学習は加速器制御において大きな可能性を秘めている。本研究の主な目的は, 加速器物理問題に対する運用レベルで, このアプローチをどのように活用できるかを示すことである。モデルなし強化学習がいくつかの領域で成功したにもかかわらず、サンプル効率は依然としてボトルネックであり、モデルベース手法によって包含される可能性がある。 ferMI FELシステムの強度最適化に応用したモデルベースとモデルフリー強化学習を比較した。モデルベースアプローチは,高い表現力とサンプル効率を示すが,モデルフリー手法の漸近的な性能は若干優れている。モデルベースアルゴリズムは不確実性認識モデルを用いてDYNA形式で実装され、モデルフリーアルゴリズムはカスタマイズされた深層Q-ラーニングに基づいている。いずれの場合もアルゴリズムが実装され、加速器制御問題におけるノイズロバスト性が増大する。コードはhttps://github.com/MathPhysSim/FERMI_RL_Paperで公開されている。

Reinforcement learning holds tremendous promise in accelerator controls. The primary goal of this paper is to show how this approach can be utilised on an operational level on accelerator physics problems. Despite the success of model-free reinforcement learning in several domains, sample-efficiency still is a bottle-neck, which might be encompassed by model-based methods. We compare well-suited purely model-based to model-free reinforcement learning applied to the intensity optimisation on the FERMI FEL system. We find that the model-based approach demonstrates higher representational power and sample-efficiency, while the asymptotic performance of the model-free method is slightly superior. The model-based algorithm is implemented in a DYNA-style using an uncertainty aware model, and the model-free algorithm is based on tailored deep Q-learning. In both cases, the algorithms were implemented in a way, which presents increased noise robustness as omnipresent in accelerator control problems. Code is released in https://github.com/MathPhysSim/FERMI_RL_Paper.

翻訳日:2021-05-02 14:30:30 公開日:2020-12-17

# (参考訳) MAGNet:ディープマルチエージェント強化学習のためのマルチエージェントグラフネットワーク

MAGNet: Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning ( http://arxiv.org/abs/2012.09762v1 )

ライセンス: CC BY 4.0

Aleksandra Malysheva, Daniel Kudenko, Aleksei Shpilman

(参考訳) 近年、深層強化学習は複雑な単一エージェントタスクにおいて強い成功をおさめており、近年ではマルチエージェントドメインにもこのアプローチが適用されている。本稿では,自己着脱機構によって得られた環境の関連性グラフ表現とメッセージ生成手法を用いたマルチエージェント強化学習のための新しい手法であるmagnetを提案する。 MAGnetのアプローチを人工捕食者によるマルチエージェント環境とポンマーマンゲームに適用し、マルチエージェントディープQ-Networks(MADQN)、マルチエージェントディープ決定ポリシーグラディエント(MADDPG)、QMIX(QMIX)など、最先端のMARLソリューションを著しく上回っていることを示す。

Over recent years, deep reinforcement learning has shown strong successes in complex single-agent tasks, and more recently this approach has also been applied to multi-agent domains. In this paper, we propose a novel approach, called MAGNet, to multi-agent reinforcement learning that utilizes a relevance graph representation of the environment obtained by a self-attention mechanism, and a message-generation technique. We applied our MAGnet approach to the synthetic predator-prey multi-agent environment and the Pommerman game and the results show that it significantly outperforms state-of-the-art MARL solutions, including Multi-agent Deep Q-Networks (MADQN), Multi-agent Deep Deterministic Policy Gradient (MADDPG), and QMIX

翻訳日:2021-05-02 13:41:29 公開日:2020-12-17

# (参考訳) 化学空間を探究する好奇心 -深層分子強化学習への内在的報酬-

Curiosity in exploring chemical space: Intrinsic rewards for deep molecular reinforcement learning ( http://arxiv.org/abs/2012.11293v1 )

ライセンス: CC BY 4.0

Luca A. Thiede, Mario Krenn, AkshatKumar Nigam, Alan Aspuru-Guzik

(参考訳) コンピュータ支援による分子の設計は、薬物や物質発見の分野をディスラプトする可能性がある。機械学習、特にディープラーニングは、この分野が急速に発展しているトピックである。強化学習は、事前知識なしで分子設計を可能にするため、特に有望なアプローチである。しかし,強化学習エージェントを用いた場合,検索空間は広く,効率的な探索が望ましい。本研究では,効率的な探索を支援するアルゴリズムを提案する。このアルゴリズムは、キュリオシティとして知られる概念にインスパイアされている。興味のあるエージェントがより優れた分子を見つけるための3つのベンチマークを示す。これは、自身のモチベーションから化学空間を探索できる強化学習エージェントのための、エキサイティングな新しい研究方向を示している。これは、人類がこれまで考えていなかった予期せぬ新しい分子を生み出す可能性がある。

Computer-aided design of molecules has the potential to disrupt the field of drug and material discovery. Machine learning, and deep learning, in particular, have been topics where the field has been developing at a rapid pace. Reinforcement learning is a particularly promising approach since it allows for molecular design without prior knowledge. However, the search space is vast and efficient exploration is desirable when using reinforcement learning agents. In this study, we propose an algorithm to aid efficient exploration. The algorithm is inspired by a concept known in the literature as curiosity. We show on three benchmarks that a curious agent finds better performing molecules. This indicates an exciting new research direction for reinforcement learning agents that can explore the chemical space out of their own motivation. This has the potential to eventually lead to unexpected new molecules that no human has thought about so far.

翻訳日:2021-05-02 13:32:13 公開日:2020-12-17

# (参考訳) 回転バウンディングボックスの円形損失関数を用いた終端物体追跡

End-to-end Deep Object Tracking with Circular Loss Function for Rotated Bounding Box ( http://arxiv.org/abs/2012.09771v1 )

ライセンス: CC BY 4.0

Vladislav Belyaev, Aleksandra Malysheva, Aleksei Shpilman

(参考訳) タスクオブジェクトのトラッキングは、自動運転、インテリジェントな監視、ロボット工学など、多くのアプリケーションで不可欠です。このタスクは、ビデオストリーム内のオブジェクトへのバウンディングボックスの割り当てを伴い、最初のフレームのオブジェクトのバウンディングボックスのみを与えられる。 2015年、軸に沿ったものの拡張として回転バウンディングボックスを導入した新しいタイプのビデオオブジェクト追跡(VOT)データセットが作成された。本研究では,Transformer Multi-Head Attentionアーキテクチャに基づくエンドツーエンドのディープラーニング手法を提案する。また,境界ボックスの重なりと向きを考慮に入れた新しいタイプの損失関数を提案する。円形損失関数(DOTCL)を用いたDeep Object Trackingモデルでは,現在の最先端のディープラーニングモデルよりも堅牢性が大幅に向上している。また、期待平均オーバーラップ(EAO)メトリックの観点から、VOT2018データセットの最先端のオブジェクトトラッキング手法よりも優れています。

The task object tracking is vital in numerous applications such as autonomous driving, intelligent surveillance, robotics, etc. This task entails the assigning of a bounding box to an object in a video stream, given only the bounding box for that object on the first frame. In 2015, a new type of video object tracking (VOT) dataset was created that introduced rotated bounding boxes as an extension of axis-aligned ones. In this work, we introduce a novel end-to-end deep learning method based on the Transformer Multi-Head Attention architecture. We also present a new type of loss function, which takes into account the bounding box overlap and orientation. Our Deep Object Tracking model with Circular Loss Function (DOTCL) shows an considerable improvement in terms of robustness over current state-of-the-art end-to-end deep learning models. It also outperforms state-of-the-art object tracking methods on VOT2018 dataset in terms of expected average overlap (EAO) metric.

翻訳日:2021-05-02 13:23:16 公開日:2020-12-17

# (参考訳) 野生におけるハンドオブジェクトインタラクションの再構築

Reconstructing Hand-Object Interactions in the Wild ( http://arxiv.org/abs/2012.09856v1 )

ライセンス: CC BY 4.0

Zhe Cao, Ilija Radosavovic, Angjoo Kanazawa, Jitendra Malik

(参考訳) 本研究では,野生におけるハンドオブジェクトインタラクションの再構築について検討する。この問題の主な課題は、適切な3Dラベル付きデータの欠如である。この問題を解決するために,直接3D監視を必要としない最適化手法を提案する。私たちが採用する一般的な戦略は,利用可能なすべての関連データ(2dバウンディングボックス,2dハンドキーポイント,2dインスタンスマスク,3dオブジェクトモデル,3d in-the-lab mocap)を活用して,3d再構成の制約を提供することです。手と物体を個別に最適化するのではなく、手オブジェクトの接触、衝突、閉塞に基づく追加の制約を課すことができるように、それらを共同で最適化する。提案手法は,EPIC Kitchens と 100 Days of Hands のデータセットから,様々な対象カテゴリにまたがる挑戦的なデータに対して,魅力的な再構築を行う。定量的に,我々のアプローチは,ground truth 3d アノテーションが利用可能なラボ環境における既存のアプローチと好適に比較できることを実証する。

In this work we explore reconstructing hand-object interactions in the wild. The core challenge of this problem is the lack of appropriate 3D labeled data. To overcome this issue, we propose an optimization-based procedure which does not require direct 3D supervision. The general strategy we adopt is to exploit all available related data (2D bounding boxes, 2D hand keypoints, 2D instance masks, 3D object models, 3D in-the-lab MoCap) to provide constraints for the 3D reconstruction. Rather than optimizing the hand and object individually, we optimize them jointly which allows us to impose additional constraints based on hand-object contact, collision, and occlusion. Our method produces compelling reconstructions on the challenging in-the-wild data from the EPIC Kitchens and the 100 Days of Hands datasets, across a range of object categories. Quantitatively, we demonstrate that our approach compares favorably to existing approaches in the lab settings where ground truth 3D annotations are available.

翻訳日:2021-05-02 11:18:20 公開日:2020-12-17

# (参考訳) FantastIC4: 4bit-Compact Multilayer Perceptronの効率的な動作のためのハードウェアソフトウェア共同設計手法

FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4bit-Compact Multilayer Perceptrons ( http://arxiv.org/abs/2012.11331v1 )

ライセンス: CC BY 4.0

Simon Wiedemann, Suhas Shivapakash, Pablo Wiedemann, Daniel Becking, Wojciech Samek, Friedel Gerfers, Thomas Wiegand

(参考訳) ディープラーニングモデルを"エッジ"にデプロイする需要が高まっているため、非常に厳密で限られたリソース制約の中で最先端のモデルを実行できる技術を開発することが最重要である。本研究では,完全接続層に基づくディープニューラルネットワーク(DNN)の高効率実行エンジンを実現するためのソフトウェアハードウェア最適化パラダイムを提案する。提案手法は,高い予測性能を有する多層パーセプトロン(MLP)の面積削減と電力要求の低減を目的とした圧縮を中心にしている。まず、ファンタスティック4と呼ばれる新しいハードウェアアーキテクチャを設計し、(1)完全連結層の複数のコンパクト表現の効率的なオンチップ実行をサポートし、(2)推論に必要な乗算器の数をわずか4(名前)まで最小化する。さらに、ファンタスティック4上での効率的な実行のためにモデルを改善可能にするため、4ビット量子化に頑健で、同時に圧縮性が高い新しいエントロピー拘束トレーニング手法を提案する。実験結果から,仮想超音速FPGA XCVU440デバイス実装において,総消費電力3.6Wの2.45TOPSのスループットを実現し,22nmプロセスASIC版では20.17TOPS/Wのスループットを実現することができた。 Google Speech Command(GSC)データセット用に設計された他の最先端アクセラレータと比較すると、スループットに関しては51$\times$、面積効率(GOPS/W)では145$\times$がよい。

With the growing demand for deploying deep learning models to the "edge", it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. Our approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named FantastIC4, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the models amenable for efficient execution on FantastIC4, we introduce a novel entropy-constrained training method that renders them to be robust to 4bit quantization and highly compressible in size simultaneously. The experimental results show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version. When compared to the other state-of-the-art accelerators designed for the Google Speech Command (GSC) dataset, FantastIC4 is better by 51$\times$ in terms of throughput and 145$\times$ in terms of area efficiency (GOPS/W).

翻訳日:2021-05-02 11:03:29 公開日:2020-12-17

# (参考訳) 注意に基づくイメージアップサンプリング

Attention-based Image Upsampling ( http://arxiv.org/abs/2012.09904v1 )

ライセンス: CC BY 4.0

Souvik Kundu, Hesham Mostafa, Sharath Nittur Sridhar, Sairam Sundaresan

(参考訳) 畳み込み層は、コンピュータビジョンにおける多くのディープニューラルネットワークソリューションの不可欠な部分である。近年の研究では、標準畳み込み操作を自己注意に基づくメカニズムに置き換えることで、画像分類や物体検出タスクの性能が改善されている。本稿では,別の正準演算であるstrided transposed convolutionをアテンション機構で置き換える方法について述べる。特徴写像の空間的次元を増加/上昇させるので,新しい注意に基づく操作注意に基づくアップサンプリングと呼ぶ。単一画像の超解像とジョイント画像のアップサンプリングタスクの実験を通じて,従来のアップサンプリング手法よりも,より少ないパラメータを用いて,ストレート変換畳み込みや適応フィルタを基本としたアテンションベースアップサンプリングを一貫して上回っていることを示す。注意係数と注意目標の計算に別個のソースを使用できるアテンション機構の固有の柔軟性は、複数の画像モダリティからの情報を融合する際に、アテンションベースアップサンプリングが自然な選択であることを示す。

Convolutional layers are an integral part of many deep neural network solutions in computer vision. Recent work shows that replacing the standard convolution operation with mechanisms based on self-attention leads to improved performance on image classification and object detection tasks. In this work, we show how attention mechanisms can be used to replace another canonical operation: strided transposed convolution. We term our novel attention-based operation attention-based upsampling since it increases/upsamples the spatial dimensions of the feature maps. Through experiments on single image super-resolution and joint-image upsampling tasks, we show that attention-based upsampling consistently outperforms traditional upsampling methods based on strided transposed convolution or based on adaptive filters while using fewer parameters. We show that the inherent flexibility of the attention mechanism, which allows it to use separate sources for calculating the attention coefficients and the attention targets, makes attention-based upsampling a natural choice when fusing information from multiple image modalities.

翻訳日:2021-05-02 10:34:33 公開日:2020-12-17

# (参考訳) 病理的特徴を用いた不確実性処理 : 高リスク癌生存法開発のためのプライマリケアデータの活用の可能性

Handling uncertainty using features from pathology: opportunities in primary care data for developing high risk cancer survival methods ( http://arxiv.org/abs/2012.09976v1 )

ライセンス: CC BY 4.0

Goce Ristanoski, Jon Emery, Javiera Martinez-Gutierrez, Damien Mccarthy, Uwe Aickelin

(参考訳) 2019年、オーストラリア人144万人以上ががんと診断された。大多数は、スクリーニングプログラムが存在する癌であっても、まずgpの症状を呈する。プライマリケアにおけるがんの診断は、がん症状の非特異的な性質と頻度が低いため困難である。がんの症状の疫学と,プライマリケアデータから患者の医療史の提示パターンを理解することは,早期発見とがん予後を改善する上で重要であると考えられた。過去の患者の医療データは不完全、不規則、または欠如である可能性があるため、新しい診断に患者の歴史を使おうとする際、さらなる課題が生じる。本研究の目的は,患者がGPで利用できる病歴の機会を探ることであり,早期に高リスク癌予後と治療成績の関連性を検討するために,早期に注文された全血液計数の結果に焦点をあてることである。 2年以内に癌を生存しないリスクのある患者に焦点をあてて,過去の病理検査結果が癌の予後を予測するのに利用できる特徴の導出につながるかを検討した。この最初の研究は肺癌患者に焦点を当てているが、その方法論は他の種類のがんや他の医療記録に応用できる。病理組織学的検査は,不完全あるいは不明瞭な症例においても,癌リスクと生存率の予測に関連性のある特徴を生じさせるのに有用であると考えられた。以上の結果から,高リスク癌診断のための病理検査データの利用が強く示唆され,同様の目的で,新たな病理指標や他のプライマリケアデータセットの利用がさらに促進された。

More than 144 000 Australians were diagnosed with cancer in 2019. The majority will first present to their GP symptomatically, even for cancer for which screening programs exist. Diagnosing cancer in primary care is challenging due to the non-specific nature of cancer symptoms and its low prevalence. Understanding the epidemiology of cancer symptoms and patterns of presentation in patient's medical history from primary care data could be important to improve earlier detection and cancer outcomes. As past medical data about a patient can be incomplete, irregular or missing, this creates additional challenges when attempting to use the patient's history for any new diagnosis. Our research aims to investigate the opportunities in a patient's pathology history available to a GP, initially focused on the results within the frequently ordered full blood count to determine relevance to a future high-risk cancer prognosis, and treatment outcome. We investigated how past pathology test results can lead to deriving features that can be used to predict cancer outcomes, with emphasis on patients at risk of not surviving the cancer within 2-year period. This initial work focuses on patients with lung cancer, although the methodology can be applied to other types of cancer and other data within the medical record. Our findings indicate that even in cases of incomplete or obscure patient history, hematological measures can be useful in generating features relevant for predicting cancer risk and survival. The results strongly indicate to add the use of pathology test data for potential high-risk cancer diagnosis, and the utilize additional pathology metrics or other primary care datasets even more for similar purposes.

翻訳日:2021-05-02 09:14:50 公開日:2020-12-17

# (参考訳) コミュニティ分析のための二項尾

Binomial Tails for Community Analysis ( http://arxiv.org/abs/2012.09968v1 )

ライセンス: CC BY 4.0

Omid Madani, Thanh Ngo, Weifei Zeng, Sai Ankith Averine, Sasidhar Evuru, Varun Malhotra, Shashidhar Gandham, Navindra Yadav

(参考訳) ネットワークにおけるコミュニティ発見の重要な課題は、結果の重要性と、生成した候補グループのロバストなランキングを評価することである。多くの場合、多くの候補コミュニティが発見され、アナリストの時間を最も有望で有望な発見に集中することが重要です。二項モデルを用いて,末尾確率から導出した簡便なグループスコアリング関数を開発した。合成および多数の実世界のデータに関する実験は、二項スコアリングがコンダクタンスのような他の安価なスコアリング関数よりも堅牢なランク付けにつながることを示す。さらに、検出されたグループをフィルタリングしラベル付けするために使用できる信頼値(p$-values)を得る。我々の分析はアプローチの様々な特性に光を当てた。二項尾は単純で汎用的であり、コミュニティ分析の他の2つの応用として、コミュニティメンバーシップの度合い(それがグループスコア機能をもたらす)と、コミュニティが引き起こすグラフにおける重要なエッジの発見について述べる。

An important task of community discovery in networks is assessing significance of the results and robust ranking of the generated candidate groups. Often in practice, numerous candidate communities are discovered, and focusing the analyst's time on the most salient and promising findings is crucial. We develop simple efficient group scoring functions derived from tail probabilities using binomial models. Experiments on synthetic and numerous real-world data provides evidence that binomial scoring leads to a more robust ranking than other inexpensive scoring functions, such as conductance. Furthermore, we obtain confidence values ($p$-values) that can be used for filtering and labeling the discovered groups. Our analyses shed light on various properties of the approach. The binomial tail is simple and versatile, and we describe two other applications for community analysis: degree of community membership (which in turn yields group-scoring functions), and the discovery of significant edges in the community-induced graph.

翻訳日:2021-05-02 08:34:14 公開日:2020-12-17

# 自然言語処理における持続的生涯学習 : 調査

Continual Lifelong Learning in Natural Language Processing: A Survey ( http://arxiv.org/abs/2012.09823v1 )

ライセンス: Link先を確認

Magdalena Biesialska and Katarzyna Biesialska and Marta R. Costa-juss\`a

(参考訳) 連続学習(continual learning, cl)は,情報システムが時間を越えた連続的なデータストリームから学ぶことを可能にする。しかし,既存のディープラーニングアーキテクチャでは,従来の知識を忘れずに新しいタスクを学習することは困難である。さらに、CLは言語学習において特に困難であり、自然言語は曖昧である:それは離散的で構成的であり、その意味は文脈に依存している。本研究では,様々なNLPタスクのレンズを通してCLの問題を考察する。本調査では,CLにおける主な課題とニューラルネットワークモデルに適用された現在の手法について論じる。また,NLPにおける既存のCL評価手法とデータセットの批判的レビューを行う。最後に,今後の研究方向性について概観する。

Continual learning (CL) aims to enable information systems to learn from a continuous data stream across time. However, it is difficult for existing deep learning architectures to learn a new task without largely forgetting previously acquired knowledge. Furthermore, CL is particularly challenging for language learning, as natural language is ambiguous: it is discrete, compositional, and its meaning is context-dependent. In this work, we look at the problem of CL through the lens of various NLP tasks. Our survey discusses major challenges in CL and current methods applied in neural network models. We also provide a critical review of the existing CL evaluation methods and datasets in NLP. Finally, we present our outlook on future research directions.

翻訳日:2021-05-02 07:42:34 公開日:2020-12-17

# マルウェア検出への定記憶による極長の分類

Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection ( http://arxiv.org/abs/2012.09390v1 )

ライセンス: Link先を確認

Edward Raff, William Fleshman, Richard Zak, Hyrum S. Anderson, Bobby Filar, Mark McLean

(参考訳) 機械学習における最近の研究は、特に極端な長さのシーケンス分類問題をサイバーセキュリティが提示している。 Windows実行可能マルウェア検出の場合、入力は100ドル MB を超え、これは$T=100,000,000 ステップの時系列に対応する。現在、そのようなタスクを処理するための最も近いアプローチは、最大2000,000ドルのステップを処理できる畳み込みニューラルネットワークであるMalConvである。 CNNの$\mathcal{O}(T)$メモリは、CNNのマルウェアへのさらなる適用を妨げている。本研究では,時間的最大値プーリングに対する新たなアプローチを開発し,必要なメモリを列長$T$に不変にする。これにより、MalConv $116\times$ メモリ効率が向上し、25.8\times$ のトレーニング速度が向上し、MalConvへの入力長制限が取り除かれた。我々は,MalConvアーキテクチャを改良するために,新たなGlobal Channel Gating設計を導入し,従来のMalConv CNNに欠ける機能である1億のタイムステップにわたる機能インタラクションを効率的に学習する機構について検討した。私たちの実装はhttps://github.com/NeuromorphicComputationResearchProgram/MalConv2で確認できます。

Recent works within machine learning have been tackling inputs of ever-increasing size, with cybersecurity presenting sequence classification problems of particularly extreme lengths. In the case of Windows executable malware detection, inputs may exceed $100$ MB, which corresponds to a time series with $T=100,000,000$ steps. To date, the closest approach to handling such a task is MalConv, a convolutional neural network capable of processing up to $T=2,000,000$ steps. The $\mathcal{O}(T)$ memory of CNNs has prevented further application of CNNs to malware. In this work, we develop a new approach to temporal max pooling that makes the required memory invariant to the sequence length $T$. This makes MalConv $116\times$ more memory efficient, and up to $25.8\times$ faster to train on its original dataset, while removing the input length restrictions to MalConv. We re-invest these gains into improving the MalConv architecture by developing a new Global Channel Gating design, giving us an attention mechanism capable of learning feature interactions across 100 million time steps in an efficient manner, a capability lacked by the original MalConv CNN. Our implementation can be found at https://github.com/NeuromorphicComputationResearchProgram/MalConv2

翻訳日:2021-05-02 07:42:25 公開日:2020-12-17

# マルコフ等価DAGのカウントとサンプリングのための多項式時間アルゴリズム

Polynomial-Time Algorithms for Counting and Sampling Markov Equivalent DAGs ( http://arxiv.org/abs/2012.09679v1 )

ライセンス: Link先を確認

Marcel Wien\"obst and Max Bannach and Maciej Li\'skiewicz

(参考訳) マルコフ同値類からの有向非巡回グラフ(DAG)の計数と一様サンプリングは、グラフィカル因果解析の基本的な課題である。本稿では,これらの課題を多項式時間で実行可能であることを示し,この領域における長年のオープン問題を解く。我々のアルゴリズムは効果的で容易に実装できる。実験結果から, アルゴリズムは最先端手法よりも優れていた。

Counting and uniform sampling of directed acyclic graphs (DAGs) from a Markov equivalence class are fundamental tasks in graphical causal analysis. In this paper, we show that these tasks can be performed in polynomial time, solving a long-standing open problem in this area. Our algorithms are effective and easily implementable. Experimental results show that the algorithms significantly outperform state-of-the-art methods.

翻訳日:2021-05-02 07:42:03 公開日:2020-12-17

# 生存分析としての研究の再現性

Research Reproducibility as a Survival Analysis ( http://arxiv.org/abs/2012.09932v1 )

ライセンス: Link先を確認

Edward Raff

(参考訳) 機械学習コミュニティでは、再現性危機に直面しているという懸念が高まっています。多くの人がこの問題に取り組み始めていますが、私たちは、再現性の問題を本質的なバイナリプロパティとして扱うことに気付いています。そこで我々は,論文の再現可能性のモデル化を生存分析問題として検討する。我々は、この視点が再現可能な研究のメタ科学的疑問のより正確なモデルであることを論じ、生存分析がいかにして、先行する縦断的なデータを説明するための新たな洞察を引き出すかを示す。データとコードはhttps://github.com/EdwardRaff/Research-Reproducibility-Survival-Analysisで確認できる。

There has been increasing concern within the machine learning community that we are in a reproducibility crisis. As many have begun to work on this problem, all work we are aware of treat the issue of reproducibility as an intrinsic binary property: a paper is or is not reproducible. Instead, we consider modeling the reproducibility of a paper as a survival analysis problem. We argue that this perspective represents a more accurate model of the underlying meta-science question of reproducible research, and we show how a survival analysis allows us to draw new insights that better explain prior longitudinal data. The data and code can be found at https://github.com/EdwardRaff/Research-Reproducibility-Survival-Analysis

翻訳日:2021-05-02 07:41:57 公開日:2020-12-17

# 変圧器に基づく物体検出に向けて

Toward Transformer-Based Object Detection ( http://arxiv.org/abs/2012.09958v1 )

ライセンス: Link先を確認

Josh Beal, Eric Kim, Eric Tzeng, Dong Huk Park, Andrew Zhai, Dmitry Kislyuk

(参考訳) トランスフォーマーは、大量のデータに基づいて事前訓練を行い、微調整によってより小さな特定のタスクに移行する能力のため、自然言語処理において支配的なモデルとなっている。 Vision Transformerは、純粋なトランスフォーマーモデルを直接入力として画像に適用する最初の主要な試みであり、畳み込みネットワークと比較して、トランスフォーマーベースのアーキテクチャはベンチマーク分類タスクにおいて競合的な結果が得られることを示した。しかしながら、注意演算子の計算複雑性は、低解像度入力に制限されることを意味する。検出やセグメンテーションのようなより複雑なタスクでは、高いインプット解像度を維持することが、モデルがアウトプットの細部を適切に識別し、反映できるように不可欠である。これにより、Vision Transformerのようなトランスフォーマーベースのアーキテクチャが、分類以外のタスクを実行できるかどうかという疑問が自然に持ち上がる。本稿では、共通検出タスクヘッドによって、視覚変換器をバックボーンとして使用し、競合するCOCO結果を生成する。提案するモデルであるViT-FRCNNは,事前学習能力と高速な微調整性能を含む,変圧器に関連するいくつかの既知の特性を示す。また、ドメイン外画像の性能の向上、大規模オブジェクトの性能向上、非最大抑圧への依存の低減など、標準的な検出バックボーンの改善についても検討した。我々は、ViT-FRCNNを、オブジェクト検出などの複雑な視覚タスクの純粋変換器ソリューションに向けた重要なステップストーンであると考えている。

Transformers have become the dominant model in natural language processing, owing to their ability to pretrain on massive amounts of data, then transfer to smaller, more specific tasks via fine-tuning. The Vision Transformer was the first major attempt to apply a pure transformer model directly to images as input, demonstrating that as compared to convolutional networks, transformer-based architectures can achieve competitive results on benchmark classification tasks. However, the computational complexity of the attention operator means that we are limited to low-resolution inputs. For more complex tasks such as detection or segmentation, maintaining a high input resolution is crucial to ensure that models can properly identify and reflect fine details in their output. This naturally raises the question of whether or not transformer-based architectures such as the Vision Transformer are capable of performing tasks other than classification. In this paper, we determine that Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results. The model that we propose, ViT-FRCNN, demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance. We also investigate improvements over a standard detection backbone, including superior performance on out-of-domain images, better performance on large objects, and a lessened reliance on non-maximum suppression. We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.

翻訳日:2021-05-02 07:41:47 公開日:2020-12-17

# トランスフォーマーはアクションの効果を判断できるのか?

Can Transformers Reason About Effects of Actions? ( http://arxiv.org/abs/2012.09938v1 )

ライセンス: Link先を確認

Pratyay Banerjee, Chitta Baral, Man Luo, Arindam Mitra, Kuntal Pal, Tran C. Son, Neeraj Varshney

(参考訳) 最近の研究では、トランスフォーマーは、ルールが結論を暗示する条件の結合の自然言語表現である限定された環境で、事実とルールを「合理化」することができることが示されている。これは、トランスフォーマーが自然言語で与えられた知識を推論するために使われることを示唆するので、我々は、共通の知識の形式とその対応する推論、すなわち行動の影響に関する推論に関して、厳密な評価を行う。行動と変化に関する推論は、AIの初期からAIの知識表現サブフィールドにおける最重要課題であり、最近では常識的質問応答において目立った側面となっている。我々は、自然言語で4つのアクションドメイン(Blocks World、Logistics、Dock-Worker-Robots、Generic Domain)を検討し、これらのドメインにおけるアクションの効果を推論するQAデータセットを作成します。 a)これらの領域における推論を学習するトランスフォーマーの能力について検討し、(b)一般的なドメインから他のドメインへの学習を伝達する。

A recent work has shown that transformers are able to "reason" with facts and rules in a limited setting where the rules are natural language expressions of conjunctions of conditions implying a conclusion. Since this suggests that transformers may be used for reasoning with knowledge given in natural language, we do a rigorous evaluation of this with respect to a common form of knowledge and its corresponding reasoning -- the reasoning about effects of actions. Reasoning about action and change has been a top focus in the knowledge representation subfield of AI from the early days of AI and more recently it has been a highlight aspect in common sense question answering. We consider four action domains (Blocks World, Logistics, Dock-Worker-Robots and a Generic Domain) in natural language and create QA datasets that involve reasoning about the effects of actions in these domains. We investigate the ability of transformers to (a) learn to reason in these domains and (b) transfer that learning from the generic domains to the other domains.

翻訳日:2021-05-02 07:40:49 公開日:2020-12-17

# 畳み込みニューラルネットワークを用いたマルチモーダル深さ推定

Multi-Modal Depth Estimation Using Convolutional Neural Networks ( http://arxiv.org/abs/2012.09667v1 )

ライセンス: Link先を確認

Sadique Adnan Siddiqui, Axel Vierling and Karsten Berns

(参考訳) 本稿では,厳密な距離センサデータと単一カメラ画像から,厳密な奥行き予測の問題点について考察する。本研究は,Deep Learning アプローチの適用による深度推定における,カメラ,レーダー,ライダーなどのセンサモードの重要性について検討する。リダーはレーダよりも深度感知能力が高く、多くの過去の研究でカメラ画像と統合されているが、ロバストなレーダ距離データとカメラ画像の融合に基づくCNNの深度推定はあまり研究されていない。本研究では,高密度特徴抽出のための初期化のために高パフォーマンス事前学習モデルを用いたエンコーダと,所望の深さをアップサンプリングし予測するデコーダとからなる,転置学習手法を用いて深層回帰ネットワークを提案する。これらの結果は,CARLAシミュレータを用いて作成したNuscenes,KITTI,およびSyntheticデータセットで実証された。また、建設現場でクレーンから撮影したトップビューのズームカメラ画像を評価し、地上からの重荷を積んだクレーンブームの距離を推定し、安全クリティカルな用途のユーザビリティを示す。

This paper addresses the problem of dense depth predictions from sparse distance sensor data and a single camera image on challenging weather conditions. This work explores the significance of different sensor modalities such as camera, Radar, and Lidar for estimating depth by applying Deep Learning approaches. Although Lidar has higher depth-sensing abilities than Radar and has been integrated with camera images in lots of previous works, depth estimation using CNN's on the fusion of robust Radar distance data and camera images has not been explored much. In this work, a deep regression network is proposed utilizing a transfer learning approach consisting of an encoder where a high performing pre-trained model has been used to initialize it for extracting dense features and a decoder for upsampling and predicting desired depth. The results are demonstrated on Nuscenes, KITTI, and a Synthetic dataset which was created using the CARLA simulator. Also, top-view zoom-camera images captured from the crane on a construction site are evaluated to estimate the distance of the crane boom carrying heavy loads from the ground to show the usability in safety-critical applications.

翻訳日:2021-05-02 07:40:31 公開日:2020-12-17

# ニューラルネットワーク圧縮を用いた効率的なCNN-LSTM画像キャプション

Efficient CNN-LSTM based Image Captioning using Neural Network Compression ( http://arxiv.org/abs/2012.09708v1 )

ライセンス: Link先を確認

Harshit Rampal, Aman Mohanty

(参考訳) 現代のニューラルネットワークは、コンピュータビジョン、自然言語処理および関連する分野のタスクにおけるアートパフォーマンスの状態を達成している。しかし、彼らは、リソース制限されたエッジデバイスへのデプロイをさらに阻害する、猛烈なメモリと計算の食欲で悪名高い。エッジデプロイメントを実現するために、研究者はネットワークの有効性を損なうことなく圧縮するプラニングと量子化アルゴリズムを開発した。このような圧縮アルゴリズムはスタンドアロンのCNNおよびRNNアーキテクチャで広く実験されているが、本研究では、CNN-LSTMベースの画像キャプチャーモデルの非従来型エンドツーエンド圧縮パイプラインを示す。このモデルは、flickr8kデータセット上のエンコーダとLSTMデコーダとしてVGG16またはResNet50を使用してトレーニングされる。次に,異なる圧縮アーキテクチャがモデルに与える影響を調べ,モデルサイズを73.1%削減し,推論時間を71.3%削減し,非圧縮アーキテクチャに比べてbleuスコアを7.7%向上させる圧縮アーキテクチャを設計する。

Modern Neural Networks are eminent in achieving state of the art performance on tasks under Computer Vision, Natural Language Processing and related verticals. However, they are notorious for their voracious memory and compute appetite which further obstructs their deployment on resource limited edge devices. In order to achieve edge deployment, researchers have developed pruning and quantization algorithms to compress such networks without compromising their efficacy. Such compression algorithms are broadly experimented on standalone CNN and RNN architectures while in this work, we present an unconventional end to end compression pipeline of a CNN-LSTM based Image Captioning model. The model is trained using VGG16 or ResNet50 as an encoder and an LSTM decoder on the flickr8k dataset. We then examine the effects of different compression architectures on the model and design a compression architecture that achieves a 73.1% reduction in model size, 71.3% reduction in inference time and a 7.7% increase in BLEU score as compared to its uncompressed counterpart.

翻訳日:2021-05-02 07:40:14 公開日:2020-12-17

# ReferentialGym: (Visual) Referential Gamesにおける言語創発と接地のための命名と枠組み

ReferentialGym: A Nomenclature and Framework for Language Emergence & Grounding in (Visual) Referential Games ( http://arxiv.org/abs/2012.09486v1 )

ライセンス: Link先を確認

Kevin Denamgana\"i and James Alfred Walker

(参考訳) 自然言語は、人間が情報を伝達し、共通の目標に向けて協力するための強力なツールである。彼らの値はコンポジション性、階層性、リカレント構文といったいくつかの主要な特性に関係しており、計算言語学者は言語ゲームによって引き起こされる人工言語における出現を研究している。ごく最近になって、AIコミュニティは、より良いヒューマンマシンインターフェースに向けた言語出現と基盤の研究を開始した。例えば、対話型/会話型AIアシスタントは、自身のビジョンと進行中の会話を関連付けることができる。本稿では,本研究への2つの貢献について述べる。第一に, 言語創発と接地の研究における主なイニシアティブを理解するための命名法を提案し, 仮定と制約のバリエーションを考察した。次に、PyTorchベースのディープラーニングフレームワークReferentialGymを紹介します。主要なアルゴリズムとメトリクスのベースライン実装を提供することで、多くの異なる機能やアプローチに加えて、referentialgymはフィールドへの参入障壁を緩和し、コミュニティに共通の実装を提供する。

Natural languages are powerful tools wielded by human beings to communicate information and co-operate towards common goals. Their values lie in some main properties like compositionality, hierarchy and recurrent syntax, which computational linguists have been researching the emergence of in artificial languages induced by language games. Only relatively recently, the AI community has started to investigate language emergence and grounding working towards better human-machine interfaces. For instance, interactive/conversational AI assistants that are able to relate their vision to the ongoing conversation. This paper provides two contributions to this research field. Firstly, a nomenclature is proposed to understand the main initiatives in studying language emergence and grounding, accounting for the variations in assumptions and constraints. Secondly, a PyTorch based deep learning framework is introduced, entitled ReferentialGym, which is dedicated to furthering the exploration of language emergence and grounding. By providing baseline implementations of major algorithms and metrics, in addition to many different features and approaches, ReferentialGym attempts to ease the entry barrier to the field and provide the community with common implementations.

翻訳日:2021-05-02 07:39:56 公開日:2020-12-17

# 高出力同期深部RL

High-Throughput Synchronous Deep RL ( http://arxiv.org/abs/2012.09849v1 )

ライセンス: Link先を確認

Iou-Jen Liu and Raymond A. Yeh and Alexander G. Schwing

(参考訳) 深層強化学習(RL)は計算的に要求され、多くのデータポイントの処理を必要とする。同期メソッドは、データスループットを低くしながらトレーニングの安定性を楽しむ。対照的に、非同期メソッドは高いスループットを実現するが、安定性の問題や'スタックポリシー'によるサンプル効率の低下に悩まされる。両手法の利点を組み合わせるために,HTS-RL(High-Throughput Synchronous Deep Reinforcement Learning)を提案する。 HTS-RLでは,学習とロールアウトを同時に実施し,「安定ポリシー」を回避するシステム設計を考案し,アクターが完全な決定性を維持しつつ,非同期で環境レプリカと対話することを保証する。我々は,アタリゲームとGoogle Research Football環境に対するアプローチを評価した。同期ベースラインと比較して、HTS-RLは2-6$\times$高速である。最先端の非同期手法と比較して、HTS-RLは競争力があり、平均的なエピソード報酬を一貫して達成する。

Deep reinforcement learning (RL) is computationally demanding and requires processing of many data points. Synchronous methods enjoy training stability while having lower data throughput. In contrast, asynchronous methods achieve high throughput but suffer from stability issues and lower sample efficiency due to `stale policies.' To combine the advantages of both methods we propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL). In HTS-RL, we perform learning and rollouts concurrently, devise a system design which avoids `stale policies' and ensure that actors interact with environment replicas in an asynchronous manner while maintaining full determinism. We evaluate our approach on Atari games and the Google Research Football environment. Compared to synchronous baselines, HTS-RL is 2-6$\times$ faster. Compared to state-of-the-art asynchronous methods, HTS-RL has competitive throughput and consistently achieves higher average episode rewards.

翻訳日:2021-05-02 07:39:12 公開日:2020-12-17

# 低境界の損失フィードバックの専門家:統一フレームワーク

Experts with Lower-Bounded Loss Feedback: A Unifying Framework ( http://arxiv.org/abs/2012.09537v1 )

ライセンス: Link先を確認

Eyal Gofer and Guy Gilboa

(参考訳) 最高の専門家問題の最も顕著なフィードバックモデルは、完全な情報とバンディットモデルである。本研究では,各ラウンドにおいて,バンディットフィードバックに加えて,各専門家の損失率を低く抑えるために,双方を一般化した単純なフィードバックモデルを検討する。このような低い境界は、例えば株式取引や特定の測定装置の誤差を評価する際の様々なシナリオで得られる。このモデルでは、Exp3の修正版に対する最適後悔境界(対数係数まで)を証明し、バンディットと全情報設定の両方に対してアルゴリズムと境界を一般化する。我々の2段階の統合的後悔分析は、2段階の損失更新をシミュレートし、3つのヘッセン語やヘッセン語のような表現を強調します。この結果から,各ラウンドにおける専門家の任意のサブセットからのフィードバックを,グラフ構造化されたフィードバックで受けられるようにした。しかし,本モデルでは,各損失に対する非自明な下限を許容することで,単者レベルでの部分的なフィードバックを許容する。

The most prominent feedback models for the best expert problem are the full information and bandit models. In this work we consider a simple feedback model that generalizes both, where on every round, in addition to a bandit feedback, the adversary provides a lower bound on the loss of each expert. Such lower bounds may be obtained in various scenarios, for instance, in stock trading or in assessing errors of certain measurement devices. For this model we prove optimal regret bounds (up to logarithmic factors) for modified versions of Exp3, generalizing algorithms and bounds both for the bandit and the full-information settings. Our second-order unified regret analysis simulates a two-step loss update and highlights three Hessian or Hessian-like expressions, which map to the full-information regret, bandit regret, and a hybrid of both. Our results intersect with those for bandits with graph-structured feedback, in that both settings can accommodate feedback from an arbitrary subset of experts on each round. However, our model also accommodates partial feedback at the single-expert level, by allowing non-trivial lower bounds on each loss.

翻訳日:2021-05-02 07:38:25 公開日:2020-12-17

# 対称ラプラシアン逆行列を用いた混合メンバーシップの推定

Estimating mixed-memberships using the Symmetric Laplacian Inverse Matrix ( http://arxiv.org/abs/2012.09561v1 )

ライセンス: Link先を確認

Huan Qing and Jingli Wang

(参考訳) コミュニティ検出はネットワーク分析においてよく研究されており、あるネットワークに対して高速で統計的に分析可能なスペクトルクラスタリングが人気である。しかし、混成会員コミュニティ検出のより現実的なケースは依然として課題である。本稿では,混合会員コミュニティ検出のためのスペクトルクラスタリング手法Mixed-SLIMを提案する。混合SLIMはシンメトリゼーションされたラプラシア逆行列 (SLIM) (Jing et al) に基づいて設計されている。 2021年) 度補正混合メンバーシップ(dcmm)モデル。このアルゴリズムとその正規化バージョン Mixed-SLIM {\tau} は、温和な条件下で漸近的に整合していることを示す。一方,Mixed-SLIMアポとその正規化バージョンであるMixed-SLIM {\tau}approは,大規模ネットワークを扱う場合のSLIM行列を近似することで提供する。これらの4つの混合SLIM法は,コミュニティ検出問題と混合コミュニティ検出問題の両方において,シミュレーションにおける最先端の手法と実際の実験データセットより優れている。

Community detection has been well studied in network analysis, and one popular technique is spectral clustering which is fast and statistically analyzable for detect-ing clusters for given networks. But the more realistic case of mixed membership community detection remains a challenge. In this paper, we propose a new spectral clustering method Mixed-SLIM for mixed membership community detection. Mixed-SLIM is designed based on the symmetrized Laplacian inverse matrix (SLIM) (Jing et al. 2021) under the degree-corrected mixed membership (DCMM) model. We show that this algorithm and its regularized version Mixed-SLIM {\tau} are asymptotically consistent under mild conditions. Meanwhile, we provide Mixed-SLIM appro and its regularized version Mixed-SLIM {\tau}appro by approximating the SLIM matrix when dealing with large networks in practice. These four Mixed-SLIM methods outperform state-of-art methods in simulations and substantial empirical datasets for both community detection and mixed membership community detection problems.

翻訳日:2021-05-02 07:38:06 公開日:2020-12-17

# DenseHMM:Dense表現の学習による隠れマルコフモデル学習

DenseHMM: Learning Hidden Markov Models by Learning Dense Representations ( http://arxiv.org/abs/2012.09783v1 )

ライセンス: Link先を確認

Joachim Sicking, Maximilian Pintz, Maram Akila, Tim Wirtz

(参考訳) 本研究では,隠れマルコフモデル(hidden markov model:hmms)の修正法である densehmm を提案する。標準的なHMMと比較して、遷移確率は原子ではなく、カーネル化によるこれらの表現で構成されている。本手法は制約なしおよび勾配ベース最適化を可能にする。本稿では,baum-welchアルゴリズムの改良と直接共起最適化という2つの最適化手法を提案する。後者は高度にスケーラブルで、標準的なhmmと比べて経験上パフォーマンスが損なわれない。カーネル化の非線形性は表現の表現性に不可欠であることを示す。 DenseHMMの学習された共起物やログのような性質は、合成および生医学的なデータセットで経験的に研究されている。

We propose DenseHMM - a modification of Hidden Markov Models (HMMs) that allows to learn dense representations of both the hidden states and the observables. Compared to the standard HMM, transition probabilities are not atomic but composed of these representations via kernelization. Our approach enables constraint-free and gradient-based optimization. We propose two optimization schemes that make use of this: a modification of the Baum-Welch algorithm and a direct co-occurrence optimization. The latter one is highly scalable and comes empirically without loss of performance compared to standard HMMs. We show that the non-linearity of the kernelization is crucial for the expressiveness of the representations. The properties of the DenseHMM like learned co-occurrences and log-likelihoods are studied empirically on synthetic and biomedical datasets.

翻訳日:2021-05-02 07:37:51 公開日:2020-12-17

# Marginal Likelihood Maximizationによるニューラルネットワークの初期化誘導

Guiding Neural Network Initialization via Marginal Likelihood Maximization ( http://arxiv.org/abs/2012.09943v1 )

ライセンス: Link先を確認

Anthony S. Tai, Chunfeng Huang

(参考訳) 本稿では,ハイパーパラメータ選択をニューラルネットワークの初期化に導くための簡易なデータ駆動手法を提案する。モデル初期化に望ましいハイパーパラメータ値を推定するために、対応する活性化関数と共分散関数を持つガウス過程モデルとニューラルネットワークの関係を利用する。実験の結果,実験条件下でのmnist分類タスクの最適に近い予測性能が得られた。さらに,提案手法の整合性を示す実験結果から,より少ないトレーニングセットで計算コストを大幅に削減できることが示唆された。

We propose a simple, data-driven approach to help guide hyperparameter selection for neural network initialization. We leverage the relationship between neural network and Gaussian process models having corresponding activation and covariance functions to infer the hyperparameter values desirable for model initialization. Our experiment shows that marginal likelihood maximization provides recommendations that yield near-optimal prediction performance on MNIST classification task under experiment constraints. Furthermore, our empirical results indicate consistency in the proposed technique, suggesting that computation cost for the procedure could be significantly reduced with smaller training sets.

翻訳日:2021-05-02 07:37:39 公開日:2020-12-17

# ベイズニューラルネットワークを用いた高次元レベルセット推定

High Dimensional Level Set Estimation with Bayesian Neural Network ( http://arxiv.org/abs/2012.09973v1 )

ライセンス: Link先を確認

Huong Ha, Sunil Gupta, Santu Rana, Svetha Venkatesh

(参考訳) レベルセット推定(LSE)は、材料設計、バイオテクノロジー、機械操作テストなど様々な分野の応用において重要な問題である。既存の技術ではスケーラビリティの問題、すなわちこれらの手法は高次元入力ではうまく動作しない。本稿では,ベイズニューラルネットワークを用いた高次元LSE問題の解法を提案する。特に, (1) しきい値レベルが固定ユーザ指定値である場合の \textit{explicit} lse問題, (2) 目標関数の(未知)最大値の割合として閾値が定義される場合の \textit{implicit} lse問題である。各問題に対して対応する理論情報に基づく取得関数を導出してデータポイントをサンプリングし、レベル設定精度を最大に向上させる。さらに,提案する取得関数の理論的時間複雑性を解析し,ネットワークハイパーパラメータを効率的に調整し,高いモデル精度を達成するための実用的な手法を提案する。合成データと実世界のデータの両方における数値実験により,提案手法が従来の最先端手法よりも優れた結果が得られることを示した。

Level Set Estimation (LSE) is an important problem with applications in various fields such as material design, biotechnology, machine operational testing, etc. Existing techniques suffer from the scalability issue, that is, these methods do not work well with high dimensional inputs. This paper proposes novel methods to solve the high dimensional LSE problems using Bayesian Neural Networks. In particular, we consider two types of LSE problems: (1) \textit{explicit} LSE problem where the threshold level is a fixed user-specified value, and, (2) \textit{implicit} LSE problem where the threshold level is defined as a percentage of the (unknown) maximum of the objective function. For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points so as to maximally increase the level set accuracy. Furthermore, we also analyse the theoretical time complexity of our proposed acquisition functions, and suggest a practical methodology to efficiently tune the network hyper-parameters to achieve high model accuracy. Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.

翻訳日:2021-05-02 07:37:29 公開日:2020-12-17

# 敵防衛としてのDenoising Strategieの限界について

On the Limitations of Denoising Strategies as Adversarial Defenses ( http://arxiv.org/abs/2012.09384v1 )

ライセンス: Link先を確認

Zhonghan Niu, Zhaoxi Chen, Linyi Li, Yubin Yang, Bo Li, Jinfeng Yi

(参考訳) 機械学習モデルに対する敵対的な攻撃が懸念を増す中、多くのデノワズベースの防御アプローチが提案されている。本稿では,データのデノイジングと再構成($f+$逆$f$,$f-if$フレームワーク)による対称変換という形で防衛戦略を要約・分析する。特に、これらの認知戦略を3つの側面(すなわち)から分類する。空間領域、周波数領域、潜在空間においてそれぞれ雑音化される)。通常、対向的な例で防御が行われ、画像と摂動の両方が修正され、摂動に対してどのように防御するかを判断することは困難である。直感的にこれらの難読化戦略の頑健さを評価するため、敵の雑音自体を防御するために直接適用し、良識を犠牲にするのを防ぎます。意外なことに、実験の結果、各次元の摂動の大部分を排除しても、満足な堅牢性を得るのは難しいことが示されている。以上の結果と解析に基づき,ロバスト性を改善するため,特徴領域の異なる周波数帯域に対する適応圧縮戦略を提案する。実験の結果,適応圧縮戦略は,既存手法と比較して,逆摂動の抑制やロバスト性の向上を可能にした。

As adversarial attacks against machine learning models have raised increasing concerns, many denoising-based defense approaches have been proposed. In this paper, we summarize and analyze the defense strategies in the form of symmetric transformation via data denoising and reconstruction (denoted as $F+$ inverse $F$, $F-IF$ Framework). In particular, we categorize these denoising strategies from three aspects (i.e. denoising in the spatial domain, frequency domain, and latent space, respectively). Typically, defense is performed on the entire adversarial example, both image and perturbation are modified, making it difficult to tell how it defends against the perturbations. To evaluate the robustness of these denoising strategies intuitively, we directly apply them to defend against adversarial noise itself (assuming we have obtained all of it), which saving us from sacrificing benign accuracy. Surprisingly, our experimental results show that even if most of the perturbations in each dimension is eliminated, it is still difficult to obtain satisfactory robustness. Based on the above findings and analyses, we propose the adaptive compression strategy for different frequency bands in the feature domain to improve the robustness. Our experiment results show that the adaptive compression strategies enable the model to better suppress adversarial perturbations, and improve robustness compared with existing denoising strategies.

翻訳日:2021-05-02 07:37:07 公開日:2020-12-17

# 自律走行のための時間ライダーフレーム予測

Temporal LiDAR Frame Prediction for Autonomous Driving ( http://arxiv.org/abs/2012.09409v1 )

ライセンス: Link先を確認

David Deng and Avideh Zakhor

(参考訳) ダイナミックなシーンで未来を予測することは、自律運転やロボット工学など、多くの分野において重要である。本稿では,従来のLiDARフレームを予測するための新しいニューラルネットワークアーキテクチャのクラスを提案する。このアプリケーションの基本的真理は、単にシーケンスの次のフレームであるので、自己教師型でモデルをトレーニングすることができる。提案アーキテクチャはFlowNet3DとDynamic Graph CNNに基づいている。我々は、損失関数と評価指標として、Chamfer Distance (CD) と Earth Mover's Distance (EMD) を用いる。新たにリリースされたnuScenesデータセットを使ってモデルをトレーニングし、評価し、いくつかのベースラインでそれらのパフォーマンスと複雑さを特徴付ける。 FlowNet3Dを直接使用するのに比べ、提案するアーキテクチャはCDとEMDをほぼ1桁小さくする。さらに, ラベル付き監視を使わずに, 合理的なシーンフロー近似を生成できることを示す。

Anticipating the future in a dynamic scene is critical for many fields such as autonomous driving and robotics. In this paper we propose a class of novel neural network architectures to predict future LiDAR frames given previous ones. Since the ground truth in this application is simply the next frame in the sequence, we can train our models in a self-supervised fashion. Our proposed architectures are based on FlowNet3D and Dynamic Graph CNN. We use Chamfer Distance (CD) and Earth Mover's Distance (EMD) as loss functions and evaluation metrics. We train and evaluate our models using the newly released nuScenes dataset, and characterize their performance and complexity with several baselines. Compared to directly using FlowNet3D, our proposed architectures achieve CD and EMD nearly an order of magnitude lower. In addition, we show that our predictions generate reasonable scene flow approximations without using any labelled supervision.

翻訳日:2021-05-02 07:36:32 公開日:2020-12-17

# エピソード, 原型的ネットワーク, 数少ない学習について

On Episodes, Prototypical Networks, and Few-shot Learning ( http://arxiv.org/abs/2012.09831v1 )

ライセンス: Link先を確認

Steinar Laenen and Luca Bertinetto

(参考訳) エピソディクス学習は、少数の学習に興味を持つ研究者や実践者の間で人気のある実践である。一連の学習問題のトレーニングを組織化し、それぞれが小さな"サポート"セットと"クエリ"セットに依存して、評価中に遭遇する数少ない状況を模倣する。本稿では,この手法を応用したアルゴリズムの2つである,プロトタイプネットワークとマッチングネットワークにおけるエピソード学習の有用性について検討する。驚くべきことに、私たちの実験では、プロトタイプネットワークとマッチングネットワークでは、トレーニングサンプルをサポートとクエリセットに分離するエピソディクス学習戦略を使うのは、トレーニングバッチを利用するデータ非効率な方法である、ということが分かりました。古典的な近傍成分分析と密接に関連しているこれらの「非エピソジック」変種は、複数のデータセットにおけるエピソジックな特徴よりも確実に改善され、非常に単純なにもかかわらず(プロトタイプネットワークの場合)最先端技術と競合する正確性を達成する。

Episodic learning is a popular practice among researchers and practitioners interested in few-shot learning. It consists of organising training in a series of learning problems, each relying on small "support" and "query" sets to mimic the few-shot circumstances encountered during evaluation. In this paper, we investigate the usefulness of episodic learning in Prototypical Networks and Matching Networks, two of the most popular algorithms making use of this practice. Surprisingly, in our experiments we found that, for Prototypical and Matching Networks, it is detrimental to use the episodic learning strategy of separating training samples between support and query set, as it is a data-inefficient way to exploit training batches. These "non-episodic" variants, which are closely related to the classic Neighbourhood Component Analysis, reliably improve over their episodic counterparts in multiple datasets, achieving an accuracy that (in the case of Prototypical Networks) is competitive with the state-of-the-art, despite being extremely simple.

翻訳日:2021-05-02 07:36:19 公開日:2020-12-17

# ビデオ分類と推薦のための平滑化ガウス混合モデル

Smoothed Gaussian Mixture Models for Video Classification and Recommendation ( http://arxiv.org/abs/2012.11673v1 )

ライセンス: Link先を確認

Sirjan Kafle, Aman Gupta, Xue Xia, Ananth Sankar, Xi Chen, Di Wen, Liang Zhang

(参考訳) VLAD(Vector of Locally Aggregated Descriptors)のようなクラスタ・アンド・アグリゲート技術や、NetVLADのようなエンドツーエンドの差別的に訓練された同等品は、最近ビデオ分類やアクション認識タスクで人気がある。これらの手法は、ビデオフレームをクラスタに割り当て、各クラスタの平均に関するフレームの残余を集約することで、ビデオを表現する。一部のクラスタはビデオ特有のデータが少ないため、これらの機能は騒がしい。本稿では,sugmented gaussian mixture model (sgmm) と呼ばれる新しいクラスタ・アンド・アグリゲーション法と,そのエンドツーエンドの識別訓練された等価値である deep smoothed gaussian mixture model (dsgmm) を提案する。 SGMMは、そのビデオのために訓練されたガウス混合モデル(GMM)のパラメータによって、各ビデオを表す。ローカウントクラスタは、多数のビデオでトレーニングされたユニバーサルバックグラウンドモデル(UBM)を用いて、ビデオ固有の見積をスムースにすることで対処される。 VLADに対するSGMMの主な利点はスムージングであり、少数のトレーニングサンプルに対する感度が低下する。 youtube-8m分類タスクの広範な実験を通じて、sgmm/dsgmmはvlad/netvladよりも小さいが統計的に有意なマージンで一貫して優れていることを示した。また、LinkedInで作成されたデータセットを使って、メンバーがアップロードされたビデオを見るかどうかを予測する。

Cluster-and-aggregate techniques such as Vector of Locally Aggregated Descriptors (VLAD), and their end-to-end discriminatively trained equivalents like NetVLAD have recently been popular for video classification and action recognition tasks. These techniques operate by assigning video frames to clusters and then representing the video by aggregating residuals of frames with respect to the mean of each cluster. Since some clusters may see very little video-specific data, these features can be noisy. In this paper, we propose a new cluster-and-aggregate method which we call smoothed Gaussian mixture model (SGMM), and its end-to-end discriminatively trained equivalent, which we call deep smoothed Gaussian mixture model (DSGMM). SGMM represents each video by the parameters of a Gaussian mixture model (GMM) trained for that video. Low-count clusters are addressed by smoothing the video-specific estimates with a universal background model (UBM) trained on a large number of videos. The primary benefit of SGMM over VLAD is smoothing which makes it less sensitive to small number of training samples. We show, through extensive experiments on the YouTube-8M classification task, that SGMM/DSGMM is consistently better than VLAD/NetVLAD by a small but statistically significant margin. We also show results using a dataset created at LinkedIn to predict if a member will watch an uploaded video.

翻訳日:2021-05-02 07:36:00 公開日:2020-12-17

# ポインタージェネレータネットワークを用いた法域における名前付きエンティティ認識

Named Entity Recognition in the Legal Domain using a Pointer Generator Network ( http://arxiv.org/abs/2012.09936v1 )

ライセンス: Link先を確認

Stavroula Skylaki, Ali Oskooei, Omar Bari, Nadja Herger, Zac Kriegman (Thomson Reuters Labs)

(参考訳) 名前付きエンティティ認識(NER)は、名前付きエンティティを非構造化テキストで識別し分類するタスクである。法領域において,利害関係者は,当事者,裁判官,裁判所の名称,事件番号,法律への言及を含むことができる。我々は, 訴訟のPDFファイルからノイズテキストを抽出し, 法的NERの問題点を米国裁判所から調査した。 NERシステムの「ゴールドスタンダード」トレーニングデータは、テキストの各トークンに対応するエンティティまたは非エンティティラベルのアノテーションを提供する。文章中のエンティティの正確な位置が不明で、エンティティがタイプミスやocrミスを含む可能性があるという点で、gold標準nerデータとは異なる部分的な完全なトレーニングデータのみを扱う。ノイズの多いトレーニングデータの課題を克服するためですテキスト抽出エラーおよび/またはタイプミスおよび未知ラベルインデックスは、nerタスクをテキストからテキストへのシーケンス生成タスクとして定式化し、ポインタ生成ネットワークを訓練して文書内のエンティティを生成する。金標準データがない場合、ポインタジェネレータはNERに有効であり、長い法律文書において一般的なNERニューラルネットワークアーキテクチャよりも優れていることを示す。

Named Entity Recognition (NER) is the task of identifying and classifying named entities in unstructured text. In the legal domain, named entities of interest may include the case parties, judges, names of courts, case numbers, references to laws etc. We study the problem of legal NER with noisy text extracted from PDF files of filed court cases from US courts. The "gold standard" training data for NER systems provide annotation for each token of the text with the corresponding entity or non-entity label. We work with only partially complete training data, which differ from the gold standard NER data in that the exact location of the entities in the text is unknown and the entities may contain typos and/or OCR mistakes. To overcome the challenges of our noisy training data, e.g. text extraction errors and/or typos and unknown label indices, we formulate the NER task as a text-to-text sequence generation task and train a pointer generator network to generate the entities in the document rather than label them. We show that the pointer generator can be effective for NER in the absence of gold standard data and outperforms the common NER neural network architectures in long legal documents.

翻訳日:2021-05-02 07:35:32 公開日:2020-12-17

# 機械学習による量子状態再構成の実験的実現可能性について

On the experimental feasibility of quantum state reconstruction via machine learning ( http://arxiv.org/abs/2012.09432v1 )

ライセンス: Link先を確認

Sanjaya Lohani, Thomas A. Searles, Brian T. Kirby, and Ryan T. Glasser

(参考訳) 最大4量子ビットのシステムに対して、推論とトレーニングの両方の観点から機械学習に基づく量子状態再構成手法のリソーススケーリングを決定する。さらに,高次元システムのトモグラフィーで発生する可能性のある低カウント状態におけるシステム性能について検討した。最後に、IBM Q量子コンピュータに量子状態再構成法を実装し、その結果を確認した。

We determine the resource scaling of machine learning-based quantum state reconstruction methods, in terms of both inference and training, for systems of up to four qubits. Further, we examine system performance in the low-count regime, likely to be encountered in the tomography of high-dimensional systems. Finally, we implement our quantum state reconstruction method on a IBM Q quantum computer and confirm our results.

翻訳日:2021-05-02 07:35:13 公開日:2020-12-17

# 幾何と密度のバランス:高次元データを用いた経路距離

Balancing Geometry and Density: Path Distances on High-Dimensional Data ( http://arxiv.org/abs/2012.09385v1 )

ライセンス: Link先を確認

Anna Little, Daniel McKenzie and James Murphy

(参考訳) pwspds(power-weighted shortest-path distances)の新しい幾何学的および計算的解析を行った。これらの指標が基礎となるデータにおける密度と幾何のバランスをとる方法を明らかにすることで、それらの重要なパラメータを明確にし、実際にどのように選択されるかについて議論する。カーネルベースの教師なしおよび半教師付き機械学習における密度の広範な役割を示す、関連するデータ駆動メトリクスと比較する。計算学的には、完全重み付きグラフ上のPWSPDと、重み付き隣接グラフ上の類似点を関連付け、ほぼ最適である同値性に対する高い確率保証を提供する。パーコレーション理論との結びつきは、有限標本設定におけるPWSPDのバイアスと分散を推定するために展開される。理論的結果は、幅広いデータ設定に対するPWSPDの汎用性を実証する実証実験によって裏付けられている。論文全体では、基礎となるデータは低次元多様体からサンプリングされ、その周囲の次元ではなく、この多様体の固有次元に決定的に依存することが求められている。

New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance density and geometry in the underlying data, we clarify their key parameters and discuss how they may be chosen in practice. Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semi-supervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results require only that the underlying data is sampled from a low-dimensional manifold, and depend crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.

翻訳日:2021-05-02 07:35:07 公開日:2020-12-17

# 深層学習におけるアンサンブル,知識蒸留,自己蒸留の理解に向けて

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning ( http://arxiv.org/abs/2012.09816v1 )

ライセンス: Link先を確認

Zeyuan Allen-Zhu and Yuanzhi Li

(参考訳) 深層学習モデルのアンサンブルがテスト精度を向上させる方法と、知識蒸留を用いた単一モデルにアンサンブルの優れた性能を蒸留する方法を正式に研究する。我々は,このアンサンブルが,一意に訓練された数個のニューラルネットワークのパットアーキテクチャによる出力の平均であり,パットデータセット上で,パットアルゴリズムを用いてトレーニングされている場合,初期化に使用するランダムなシードによってのみ異なる場合を考える。深層学習におけるアンサンブル・ナレッジ蒸留は従来の学習理論とは全く異なる働きをしており、特にランダム特徴マッピングやニューラルネットワーク-タンジェント-カーネル特徴マッピングとは異なっている。そこで, 深層学習におけるアンサンブルと知識蒸留を適切に理解するために, データが「マルチビュー」と呼ばれる構造を持つ場合, 独立に訓練されたニューラルネットワークのアンサンブルがテスト精度を向上し, 真のラベルの代わりにアンサンブルの出力に適合するように単一のモデルを訓練することにより, 優れたテスト精度を1つのモデルに証明可能とする理論を開発した。その結果、従来の定理とは全く異なる方法で、アンサンブルがディープラーニングでどのように機能するか、そして、真のデータラベルと比較して、知識蒸留に使用できるアンサンブルのアウトプットに「ダーク知識」がどのように隠されているかに光を当てている。最後に, 自己蒸留は, アンサンブルと知識蒸留を暗黙的に組み合わせて, 試験精度を向上させることができることを示した。

We formally study how Ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using Knowledge Distillation. We consider the challenging case where the ensemble is simply an average of the outputs of a few independently trained neural networks with the SAME architecture, trained using the SAME algorithm on the SAME data set, and they only differ by the random seeds used in the initialization. We empirically show that ensemble/knowledge distillation in deep learning works very differently from traditional learning theory, especially differently from ensemble of random feature mappings or the neural-tangent-kernel feature mappings, and is potentially out of the scope of existing theorems. Thus, to properly understand ensemble and knowledge distillation in deep learning, we develop a theory showing that when data has a structure we refer to as "multi-view", then ensemble of independently trained neural networks can provably improve test accuracy, and such superior test accuracy can also be provably distilled into a single model by training a single model to match the output of the ensemble instead of the true label. Our result sheds light on how ensemble works in deep learning in a way that is completely different from traditional theorems, and how the "dark knowledge" is hidden in the outputs of the ensemble -- that can be used in knowledge distillation -- comparing to the true data labels. In the end, we prove that self-distillation can also be viewed as implicitly combining ensemble and knowledge distillation to improve test accuracy.

翻訳日:2021-05-02 07:34:34 公開日:2020-12-17

# 畳み込みニューラルネットワークを用いたコントラスト合成視床核セグメンテーション法

A Contrast Synthesized Thalamic Nuclei Segmentation Scheme using Convolutional Neural Networks ( http://arxiv.org/abs/2012.09386v1 )

ライセンス: Link先を確認

Lavanya Umapathy, Mahesh Bharath Keerthivasan, Natalie M. Zahr, Ali Bilgin, Manojkumar Saranathan

(参考訳) 視床核はいくつかの神経疾患に関係している。 WMn-MPRAGE画像は従来のMPRAGE画像と比較して視床内核コントラストが良いことが示されているが、追加の取得は検査時間の増加をもたらす。本研究では,3次元畳み込みニューラルネットワーク(cnn)を用いた従来型mprage画像からの視床核パーセレーション手法について検討した。 MPRAGE画像から合成したWMn-MPRAGE画像を用いて, 合成コントラストセグメンテーション(NCS)と合成コントラストセグメンテーション(SCS)の2つの3次元CNNを開発した。 mprage image (n=35) とthalamic nuclei labels を用いた2つのセグメンテーションフレームワークをマルチアトラス法を用いて訓練した。健常者とアルコール使用障害(aud)患者(n=45)のコホートを用いて分節精度と臨床的有用性を評価した。 SCSネットワークは、NCSネットワークと比較すると、前腹側核(P=.001)と後腹側核(P=.01)の体積差が低い中間体生成核(P=.003)とセントロメディア核(P=.01)で高Diceスコアを得た。 Bland-Altman 解析により,SCS ネットワークで予測される実数量と実数量の変動係数の低い一致限界が明らかにされた。 scsネットワークは健常年齢対照群 (p=0.01) と比較し, aud患者で有意な後側核萎縮を認めたが, ncsネットワークでは後側核の急激な萎縮を認めた。 CNNによるコントラスト合成は、従来のMPRAGE画像から高速で正確な視床核セグメンテーションを提供することができる。

Thalamic nuclei have been implicated in several neurological diseases. WMn-MPRAGE images have been shown to provide better intra-thalamic nuclear contrast compared to conventional MPRAGE images but the additional acquisition results in increased examination times. In this work, we investigated 3D Convolutional Neural Network (CNN) based techniques for thalamic nuclei parcellation from conventional MPRAGE images. Two 3D CNNs were developed and compared for thalamic nuclei parcellation using MPRAGE images: a) a native contrast segmentation (NCS) and b) a synthesized contrast segmentation (SCS) using WMn-MPRAGE images synthesized from MPRAGE images. We trained the two segmentation frameworks using MPRAGE images (n=35) and thalamic nuclei labels generated on WMn-MPRAGE images using a multi-atlas based parcellation technique. The segmentation accuracy and clinical utility were evaluated on a cohort comprising of healthy subjects and patients with alcohol use disorder (AUD) (n=45). The SCS network yielded higher Dice scores in the Medial geniculate nucleus (P=.003) and Centromedian nucleus (P=.01) with lower volume differences for Ventral anterior (P=.001) and Ventral posterior lateral (P=.01) nuclei when compared to the NCS network. A Bland-Altman analysis revealed tighter limits of agreement with lower coefficient of variation between true volumes and those predicted by the SCS network. The SCS network demonstrated a significant atrophy in Ventral lateral posterior nucleus in AUD patients compared to healthy age-matched controls (P=0.01), agreeing with previous studies on thalamic atrophy in alcoholism, whereas the NCS network showed spurious atrophy of the Ventral posterior lateral nucleus. CNN-based contrast synthesis prior to segmentation can provide fast and accurate thalamic nuclei segmentation from conventional MPRAGE images.

翻訳日:2021-05-02 07:33:41 公開日:2020-12-17

# 縦型空中画像を用いた栄養不足ストレスの検出と予測

Detection and Prediction of Nutrient Deficiency Stress using Longitudinal Aerial Imagery ( http://arxiv.org/abs/2012.09654v1 )

ライセンス: Link先を確認

Saba Dadsetan, Gisele Rose, Naira Hovakimyan, Jennifer Hobbs

(参考訳) 早期に、栄養不足ストレス(NDS)の正確な検出は、環境への影響だけでなく、経済的にも重要であり、毛布の塗布に代えて化学物質の精密適用は、栽培者の運用コストを削減し、環境に不必要に侵入する化学物質の量を削減している。さらに、早期の処理は損失の量を減らすため、特定の季節に作物の生産を増加させる。このことを念頭に,高分解能空中画像のシーケンスを収集し,セマンティクスセグメンテーションモデルを構築し,フィールド全体のndsの検出と予測を行う。私たちの仕事は農業、リモートセンシング、現代のコンピュータビジョンとディープラーニングの交差点にあります。まず,NDSのフルフィールド検出のためのベースラインを構築し,事前学習,バックボーンアーキテクチャ,入力表現,サンプリング戦略の影響を定量化する。次に、unetに基づくシングルタイムスタンプモデルを構築して、シーズンの異なるポイントで利用可能な情報量を定量化する。次に,NDSを示すフィールドの領域を正確に検出するために,UNetと畳み込みLSTM層を組み合わせた時空間アーキテクチャを構築した。最後に, このアーキテクチャは, 後続飛行(将来3週間以上)でNDSを示すと予測されるフィールドの領域を予測するために, 予報までの距離に応じて, IOUスコア0.47-0.51を維持することができることを示す。私たちはまた、コンピュータビジョン、リモートセンシング、農業分野にメリットがあると信じているデータセットもリリースします。この研究は、リモートセンシングと農業の深層学習の発展に寄与し、経済と持続可能性に関する重要な社会的課題に対処している。

Early, precise detection of nutrient deficiency stress (NDS) has key economic as well as environmental impact; precision application of chemicals in place of blanket application reduces operational costs for the growers while reducing the amount of chemicals which may enter the environment unnecessarily. Furthermore, earlier treatment reduces the amount of loss and therefore boosts crop production during a given season. With this in mind, we collect sequences of high-resolution aerial imagery and construct semantic segmentation models to detect and predict NDS across the field. Our work sits at the intersection of agriculture, remote sensing, and modern computer vision and deep learning. First, we establish a baseline for full-field detection of NDS and quantify the impact of pretraining, backbone architecture, input representation, and sampling strategy. We then quantify the amount of information available at different points in the season by building a single-timestamp model based on a UNet. Next, we construct our proposed spatiotemporal architecture, which combines a UNet with a convolutional LSTM layer, to accurately detect regions of the field showing NDS; this approach has an impressive IOU score of 0.53. Finally, we show that this architecture can be trained to predict regions of the field which are expected to show NDS in a later flight -- potentially more than three weeks in the future -- maintaining an IOU score of 0.47-0.51 depending on how far in advance the prediction is made. We will also release a dataset which we believe will benefit the computer vision, remote sensing, as well as agriculture fields. This work contributes to the recent developments in deep learning for remote sensing and agriculture, while addressing a key social challenge with implications for economics and sustainability.

翻訳日:2021-05-02 07:33:04 公開日:2020-12-17

# 人工知能を用いた緑内障視神経頭の構造表現型記述

Describing the Structural Phenotype of the Glaucomatous Optic Nerve Head Using Artificial Intelligence ( http://arxiv.org/abs/2012.09755v1 )

ライセンス: Link先を確認

Satish K. Panda, Haris Cheong, Tin A. Tun, Sripad K. Devella, Ramaswami Krishnadas, Martin L. Buist, Shamira Perera, Ching-Yu Cheng, Tin Aung, Alexandre H. Thi\'ery, and Micha\"el J. A. Girard

(参考訳) 視神経頭(ONH)は通常、緑内障の発生と進行に伴う神経・結合組織構造の変化を経験し、これらの変化を監視することは緑内障クリニックの診断と予後の改善に重要である。 onhの構造変化を臨床的に評価するための金標準技術は光コヒーレンストモグラフィ(oct)である。しかし、octは、網膜神経線維層(rnfl)の厚みなどのいくつかの手工学パラメータの測定に限定されており、まだ緑内障の診断と予後診断のための単独の装置として認定されていない。これは、ONHの3D OCTスキャンで利用できる膨大な情報が十分に活用されていないためである。そこで本研究では, onh の oct スキャンからの情報を十分に活用できる深層学習手法を提案し, 緑内障診断ツールとして \textbf{(3)} を使用できることを提案する。具体的には,本アルゴリズムで同定された構造的特徴は緑内障の臨床観察と関係があることが判明した。これらの構造的特徴の診断精度は92.0 \pm 2.3 \%$であり、感度は90.0 \pm 2.4 \%$(95 \%$)である。ステップで等級を変えることで、オンの形状が'非グラコマ'から'グラコマ'状態へ遷移するにつれてどのように変化するかを明らかにすることができた。我々の研究は緑内障の病態の理解に強い臨床的意味を持ち、将来は視力喪失を予測できるように改善できると考えている。

The optic nerve head (ONH) typically experiences complex neural- and connective-tissue structural changes with the development and progression of glaucoma, and monitoring these changes could be critical for improved diagnosis and prognosis in the glaucoma clinic. The gold-standard technique to assess structural changes of the ONH clinically is optical coherence tomography (OCT). However, OCT is limited to the measurement of a few hand-engineered parameters, such as the thickness of the retinal nerve fiber layer (RNFL), and has not yet been qualified as a stand-alone device for glaucoma diagnosis and prognosis applications. We argue this is because the vast amount of information available in a 3D OCT scan of the ONH has not been fully exploited. In this study we propose a deep learning approach that can: \textbf{(1)} fully exploit information from an OCT scan of the ONH; \textbf{(2)} describe the structural phenotype of the glaucomatous ONH; and that can \textbf{(3)} be used as a robust glaucoma diagnosis tool. Specifically, the structural features identified by our algorithm were found to be related to clinical observations of glaucoma. The diagnostic accuracy from these structural features was $92.0 \pm 2.3 \%$ with a sensitivity of $90.0 \pm 2.4 \% $ (at $95 \%$ specificity). By changing their magnitudes in steps, we were able to reveal how the morphology of the ONH changes as one transitions from a `non-glaucoma' to a `glaucoma' condition. We believe our work may have strong clinical implication for our understanding of glaucoma pathogenesis, and could be improved in the future to also predict future loss of vision.

翻訳日:2021-05-02 07:32:36 公開日:2020-12-17

# 4次元ビュー合成とビデオ処理のためのニューラルラジアンスフロー

Neural Radiance Flow for 4D View Synthesis and Video Processing ( http://arxiv.org/abs/2012.09790v1 )

ライセンス: Link先を確認

Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu

(参考訳) 本稿では,rgb画像から動的シーンの4次元空間-時間表現を学ぶためのニューラル・ラミアンス・フロー(nerflow)を提案する。我々のアプローチの鍵は、シーンの3D占有率、放射率、ダイナミックスを捉えることを学習する神経暗黙表現を使用することである。異なるモダリティにまたがる一貫性を強制することにより,水注,ロボットインタラクション,実画像など多様な動的シーンにおける多視点レンダリングが可能となり,空間-時空間ビュー合成における最先端手法を上回っている。私たちのアプローチは、入力画像が1つのカメラでキャプチャされる場合でも機能します。さらに,学習表現が先行して暗黙的なシーンとして機能できることを実証し,画像の超解像やノイズ除去といった映像処理タスクを,追加の監督なしに行えることを示した。

We present a method, Neural Radiance Flow (NeRFlow),to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images. Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene. By enforcing consistency across different modalities, our representation enables multi-view rendering in diverse dynamic scenes, including water pouring, robotic interaction, and real images, outperforming state-of-the-art methods for spatial-temporal view synthesis. Our approach works even when inputs images are captured with only one camera. We further demonstrate that the learned representation can serve as an implicit scene prior, enabling video processing tasks such as image super-resolution and de-noising without any additional supervision.

翻訳日:2021-05-02 07:32:10 公開日:2020-12-17

# 動的サイクル整合性を考慮した制御のためのクロスドメイン対応学習

Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency ( http://arxiv.org/abs/2012.09811v1 )

ライセンス: Link先を確認

Qiang Zhang, Tete Xiao, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

(参考訳) 多くのロボティクス問題の核心は、ドメイン間の通信を学習することである。例えば、模倣学習は人間とロボットの対応を得る必要があり、sim-to-realは物理シミュレータと現実世界の対応を必要とする。本稿では,表現(視覚と内部状態),物理パラメータ(質量と摩擦),形態(手足の数)の異なる領域間の対応について学ぶことを目的とした。重要なことに、2つのドメインから無作為かつランダムに収集されたデータを用いて対応を学習する。本稿では,サイクル整合性制約を用いて2つの領域にまたがる動的ロボット動作を協調する「textit{dynamics cycles」を提案する。この対応が見つかると、第2のドメインで追加の微調整を必要とせずに、あるドメインでトレーニングされたポリシーを直接他のドメインに転送できます。我々は,シミュレーションと実ロボットの両方において,様々な問題領域で実験を行う。本フレームワークは,実ロボットアームの無補間単眼映像とシミュレーションアームの動的状態動作軌跡をペアデータなしで一致させることができる。結果のビデオデモは、https://sjtuzq.github.io/cycle_dynamics.htmlで見ることができる。

At the heart of many robotics problems is the challenge of learning correspondences across domains. For instance, imitation learning requires obtaining correspondence between humans and robots; sim-to-real requires correspondence between physics simulators and the real world; transfer learning requires correspondences between different robotics environments. This paper aims to learn correspondence across domains differing in representation (vision vs. internal state), physics parameters (mass and friction), and morphology (number of limbs). Importantly, correspondences are learned using unpaired and randomly collected data from the two domains. We propose \textit{dynamics cycles} that align dynamic robot behavior across two domains using a cycle-consistency constraint. Once this correspondence is found, we can directly transfer the policy trained on one domain to the other, without needing any additional fine-tuning on the second domain. We perform experiments across a variety of problem domains, both in simulation and on real robot. Our framework is able to align uncalibrated monocular video of a real robot arm to dynamic state-action trajectories of a simulated arm without paired data. Video demonstrations of our results are available at: https://sjtuzq.github.io/cycle_dynamics.html .

翻訳日:2021-05-02 07:31:55 公開日:2020-12-17

# マスアート雑音による半空間学習の難易度

Hardness of Learning Halfspaces with Massart Noise ( http://arxiv.org/abs/2012.09720v1 )

ライセンス: Link先を確認

Ilias Diakonikolas and Daniel M. Kane

(参考訳) マスアートノイズの存在下でのPAC学習ハーフスペースの複雑さについて検討した。具体的には、ラベル付き例 $(x, y)$ が分布 $D$ on $\mathbb{R}^{n} \times \{ \pm 1\}$ から与えられたとき、$x$ の辺分布は任意であり、そのラベルはマッサルトノイズが速度 $\eta<1/2$ で崩壊した未知の半空間によって生成されるので、小さな誤分類誤差で仮説を計算したい。マッサートモデルにおける半空間の効率的な学習可能性の特徴付けは、学習理論における長年の未解決問題である。最近の研究は、この問題の多項式時間学習アルゴリズムをエラー$\eta+\epsilon$で与えた。この誤差上限は、情報理論的に最適な$\mathrm{OPT}+\epsilon$の境界から遠く離れることができる。より最近の研究は、"em exact learning}、すなわちエラー $\mathrm{opt}+\epsilon$ を達成することは統計クエリ(sq)モデルでは難しいことを示した。本研究では,情報理論の最適誤差と多項式時間SQアルゴリズムで達成できる最良の誤差との間には指数的ギャップが存在することを示す。特に、我々の下界は、効率的なSQアルゴリズムが任意の多項式係数内で最適誤差を近似できないことを意味する。

We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise. Specifically, given labeled examples $(x, y)$ from a distribution $D$ on $\mathbb{R}^{n} \times \{ \pm 1\}$ such that the marginal distribution on $x$ is arbitrary and the labels are generated by an unknown halfspace corrupted with Massart noise at rate $\eta<1/2$, we want to compute a hypothesis with small misclassification error. Characterizing the efficient learnability of halfspaces in the Massart model has remained a longstanding open problem in learning theory. Recent work gave a polynomial-time learning algorithm for this problem with error $\eta+\epsilon$. This error upper bound can be far from the information-theoretically optimal bound of $\mathrm{OPT}+\epsilon$. More recent work showed that {\em exact learning}, i.e., achieving error $\mathrm{OPT}+\epsilon$, is hard in the Statistical Query (SQ) model. In this work, we show that there is an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a polynomial-time SQ algorithm. In particular, our lower bound implies that no efficient SQ algorithm can approximate the optimal error within any polynomial factor.

翻訳日:2021-05-02 07:31:37 公開日:2020-12-17

# insrl: 遠隔教師付き関係抽出のための複数の情報ソースを用いた多視点学習フレームワーク

InSRL: A Multi-view Learning Framework Fusing Multiple Information Sources for Distantly-supervised Relation Extraction ( http://arxiv.org/abs/2012.09370v1 )

ライセンス: Link先を確認

Zhendong Chu, Haiyun Jiang, Yanghua Xiao, Wei Wang

(参考訳) 遠隔監視により、知識ベースを利用して関係抽出のための文の袋を自動的にラベル付けすることができるが、狭くうるさいバッグの問題に苦しむ。トレーニングデータを補完し、これらの問題を克服するために、追加の情報ソースが緊急に必要となる。本稿では,知識ベースに広く存在する2つの情報源,すなわちエンティティ記述と多粒体型を導入し,教師付きデータの充実を図る。我々は、情報ソースを複数のビューと見なし、十分な情報を持つ無傷空間を構築するためにそれらを融合させる。 Intact Space Representation Learning (InSRL) による関係抽出のために, エンドツーエンドのマルチビュー学習フレームワークを提案し, 単一ビューの表現を同時に学習する。さらに、インナービューとクロスビューアテンションメカニズムを用いて、異なるレベルの重要な情報をエンティティペアベースで強調する。一般的なベンチマークデータセットの実験結果から,追加の情報ソースの必要性とフレームワークの有効性が示された。匿名化レビューフェーズの後、複数の情報ソースを持つモデルとデータセットの実装をリリースします。

Distant supervision makes it possible to automatically label bags of sentences for relation extraction by leveraging knowledge bases, but suffers from the sparse and noisy bag issues. Additional information sources are urgently needed to supplement the training data and overcome these issues. In this paper, we introduce two widely-existing sources in knowledge bases, namely entity descriptions, and multi-grained entity types to enrich the distantly supervised data. We see information sources as multiple views and fusing them to construct an intact space with sufficient information. An end-to-end multi-view learning framework is proposed for relation extraction via Intact Space Representation Learning (InSRL), and the representations of single views are jointly learned simultaneously. Moreover, inner-view and cross-view attention mechanisms are used to highlight important information on different levels on an entity-pair basis. The experimental results on a popular benchmark dataset demonstrate the necessity of additional information sources and the effectiveness of our framework. We will release the implementation of our model and dataset with multiple information sources after the anonymized review phase.

翻訳日:2021-05-02 07:31:11 公開日:2020-12-17

# 強化学習による対話における対話的質問の明確化

Interactive Question Clarification in Dialogue via Reinforcement Learning ( http://arxiv.org/abs/2012.09411v1 )

ライセンス: Link先を確認

Xiang Hu, Zujie Wen, Yafang Wang, Xiaolong Li, Gerard de Melo

(参考訳) あいまいな質問への対処は、現実世界の対話システムにおける長年の問題である。質問による明確化はヒューマンインタラクションの一般的な形態であるが,ユーザからより具体的な意図を引き出すための適切な質問を定義することは困難である。本研究では,元のクエリの改良を提案することにより,あいまいな質問を明確化するための強化モデルを提案する。まず、コレクション分割問題を定式化し、潜在的な曖昧な意図を区別できるラベルのセットを選択する。我々は、選択したラベルをインテントフレーズとしてユーザにリストし、さらなる確認を行う。選択されたラベルと元のユーザクエリは、適切な応答をより容易に識別できる洗練されたクエリとして機能する。このモデルは、深層ポリシーネットワークを用いた強化学習を用いてトレーニングされる。我々は,実世界のユーザクリックに基づいてモデルを評価し,いくつかの実験で有意な改善を示す。

Coping with ambiguous questions has been a perennial problem in real-world dialogue systems. Although clarification by asking questions is a common form of human interaction, it is hard to define appropriate questions to elicit more specific intents from a user. In this work, we propose a reinforcement model to clarify ambiguous questions by suggesting refinements of the original query. We first formulate a collection partitioning problem to select a set of labels enabling us to distinguish potential unambiguous intents. We list the chosen labels as intent phrases to the user for further confirmation. The selected label along with the original user query then serves as a refined query, for which a suitable response can more easily be identified. The model is trained using reinforcement learning with a deep policy network. We evaluate our model based on real-world user clicks and demonstrate significant improvements across several different experiments.

翻訳日:2021-05-02 07:30:54 公開日:2020-12-17

# ルーフGAN:住宅用ルーフ形状と関係性の学習

Roof-GAN: Learning to Generate Roof Geometry and Relations for Residential Houses ( http://arxiv.org/abs/2012.09340v1 )

ライセンス: Link先を確認

Yiming Qian, Hao Zhang, Yasutaka Furukawa

(参考訳) 本稿では, 住宅用屋根構造の構造的幾何を屋根プリミティブの集合として生成する, 新規な対向ネットワークであるRoof-GANについて述べる。プリミティブの数を仮定すると、ジェネレータは、1)各ノードのラスター画像としてのプリミティブ幾何からなり、ファセットセグメンテーションと角度をエンコードするグラフ、2)各エッジにおけるプリミティブコリニア/コプランナ関係、3)新しい微分可能ベクトル化器によって生成された各ノードのベクトル形式におけるプリミティブ幾何からなる構造化屋根モデルを生成する。判別器は、完全なエンドツーエンドアーキテクチャで原始ラスタ幾何学、原始関係、原始ベクトル幾何学を評価するために訓練される。定量的・質的評価は, 構造幾何生成の課題として提案する新しい指標を用いて, 競合する手法よりも多様で現実的な屋根モデルを生成する手法の有効性を示す。私たちはコードとデータを共有します。

This paper presents Roof-GAN, a novel generative adversarial network that generates structured geometry of residential roof structures as a set of roof primitives and their relationships. Given the number of primitives, the generator produces a structured roof model as a graph, which consists of 1) primitive geometry as raster images at each node, encoding facet segmentation and angles; 2) inter-primitive colinear/coplanar relationships at each edge; and 3) primitive geometry in a vector format at each node, generated by a novel differentiable vectorizer while enforcing the relationships. The discriminator is trained to assess the primitive raster geometry, the primitive relationships, and the primitive vector geometry in a fully end-to-end architecture. Qualitative and quantitative evaluations demonstrate the effectiveness of our approach in generating diverse and realistic roof models over the competing methods with a novel metric proposed in this paper for the task of structured geometry generation. We will share our code and data.

翻訳日:2021-05-02 07:29:25 公開日:2020-12-17

# 1枚の画像から3次元シーン形状を復元する学習

Learning to Recover 3D Scene Shape from a Single Image ( http://arxiv.org/abs/2012.09365v1 )

ライセンス: Link先を確認

Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen

(参考訳) 野生個体における単眼深度推定の有意な進歩にもかかわらず,混合データ深度予測訓練におけるシフト不変再構成損失と未知のカメラ焦点長による未知の深度シフトによる正確な3次元シーン形状の復元には,最近の最新手法では使用できない。この問題を詳細に検討し,まずは未知のスケールで深度を予測し,単一の単眼画像からシフトする2段階のフレームワークを提案し,次に3Dポイント・クラウドエンコーダを用いて,現実的な3Dシーン形状を復元する。さらに,画像レベルの正規化回帰損失と正規化幾何損失を提案し,混合データセット上で訓練された深度予測モデルを強化する。 9つの未知のデータセットで深度モデルを検証し、ゼロショットデータセットの一般化で最先端のパフォーマンスを達成する。コードは、https://git.io/depthで入手できる。

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth

翻訳日:2021-05-02 07:29:05 公開日:2020-12-17

# 半グローバル形状認識ネットワーク

Semi-Global Shape-aware Network ( http://arxiv.org/abs/2012.09372v1 )

ライセンス: Link先を確認

Pengju Zhang, Yihong Wu, Jiagang Zhu

(参考訳) ローカルでない操作は、最近各位置へのグローバルコンテキストの集約を通じて、長距離依存関係をキャプチャするために使用される。しかし、ほとんどの手法は、特徴の類似性のみに焦点をあてるだけでオブジェクトの形状を保存できないが、長距離依存を捉えるために中央と他の位置との近接を無視する一方で、形状認識は多くのコンピュータビジョンタスクに有用である。本稿では,長距離依存をモデル化する際のオブジェクト形状の類似性と近接性を考慮したセミ・グローバル形状認識ネットワーク(SGSNet)を提案する。階層的な方法でグローバルなコンテキストを集約する。第1段階では、特徴地図全体における各位置は、類似度と近接度の両方に応じて、縦方向と横方向の文脈情報のみを集約する。そして、結果は第2のレベルに入力され、同じ操作を行います。この階層的な方法では、各中央位置ゲインは、他の全ての位置から支持され、類似性と近接の組み合わせにより、各位置ゲインは、ほとんど同じ意味オブジェクトから支持される。また,特徴マップ内の各行や列を二分木として扱い,類似性計算コストを低減させる,文脈情報集約のための線形時間アルゴリズムを提案する。セマンティックセグメンテーションと画像検索の実験により、既存のネットワークにSGSNetを追加することにより、精度と効率の両面で確固たる改善が得られた。

Non-local operations are usually used to capture long-range dependencies via aggregating global context to each position recently. However, most of the methods cannot preserve object shapes since they only focus on feature similarity but ignore proximity between central and other positions for capturing long-range dependencies, while shape-awareness is beneficial to many computer vision tasks. In this paper, we propose a Semi-Global Shape-aware Network (SGSNet) considering both feature similarity and proximity for preserving object shapes when modeling long-range dependencies. A hierarchical way is taken to aggregate global context. In the first level, each position in the whole feature map only aggregates contextual information in vertical and horizontal directions according to both similarity and proximity. And then the result is input into the second level to do the same operations. By this hierarchical way, each central position gains supports from all other positions, and the combination of similarity and proximity makes each position gain supports mostly from the same semantic object. Moreover, we also propose a linear time algorithm for the aggregation of contextual information, where each of rows and columns in the feature map is treated as a binary tree to reduce similarity computation cost. Experiments on semantic segmentation and image retrieval show that adding SGSNet to existing networks gains solid improvements on both accuracy and efficiency.

翻訳日:2021-05-02 07:28:47 公開日:2020-12-17

# 非ラベルデータ誘導半教師付き病理組織像分割

Unlabeled Data Guided Semi-supervised Histopathology Image Segmentation ( http://arxiv.org/abs/2012.09373v1 )

ライセンス: Link先を確認

Hongxiao Wang, Hao Zheng, Jianxu Chen, Lin Yang, Yizhe Zhang, Danny Z. Chen

(参考訳) 病理組織像の自動分割は疾患解析に不可欠である。制限付きラベル付きデータは、完全に教師された設定の下で訓練されたモデルの一般化を妨げます。生成法に基づく半教師付き学習(SSL)は多様な画像特性の活用に有効であることが証明されている。しかし、モデルトレーニングやそのような画像の使い方において、どのような生成画像がより有用かは明らかにされていない。本稿では,未ラベルデータ分布を利用した病理組織像分割のための新しいデータガイド生成法を提案する。まず、画像生成モジュールを設計する。画像コンテンツとスタイルは分離され、クラスタリングフレンドリーなスペースに埋め込まれて配布される。新しい画像は、コンテンツやスタイルのサンプリングと相互結合によって合成される。第2に,生成した画像を定量的にサンプリングするための効果的なデータ選択ポリシーを考案する。(1) 生成されたトレーニングセットをデータセットをよりよくカバーするために,(2) トレーニングプロセスをより効果的にするために,アノテーション付きトレーニングデータセットが不足するデータ中の「ハードケース」の画像を特定し,オーバーサンプリングする。本手法は腺および核データセット上で評価される。提案手法は,インダクティブ設定とトランスダクティブ設定の両方において,共通セグメンテーションモデルの性能を一貫して向上させ,最先端の結果を得る。

Automatic histopathology image segmentation is crucial to disease analysis. Limited available labeled data hinders the generalizability of trained models under the fully supervised setting. Semi-supervised learning (SSL) based on generative methods has been proven to be effective in utilizing diverse image characteristics. However, it has not been well explored what kinds of generated images would be more useful for model training and how to use such images. In this paper, we propose a new data guided generative method for histopathology image segmentation by leveraging the unlabeled data distributions. First, we design an image generation module. Image content and style are disentangled and embedded in a clustering-friendly space to utilize their distributions. New images are synthesized by sampling and cross-combining contents and styles. Second, we devise an effective data selection policy for judiciously sampling the generated images: (1) to make the generated training set better cover the dataset, the clusters that are underrepresented in the original training set are covered more; (2) to make the training process more effective, we identify and oversample the images of "hard cases" in the data for which annotated training data may be scarce. Our method is evaluated on glands and nuclei datasets. We show that under both the inductive and transductive settings, our SSL method consistently boosts the performance of common segmentation models and attains state-of-the-art results.

翻訳日:2021-05-02 07:28:25 公開日:2020-12-17

# 教師なし3次元姿勢推定のための不変教師と同変学生

Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose Estimation ( http://arxiv.org/abs/2012.09398v1 )

ライセンス: Link先を確認

Chenxin Xu, Siheng Chen, Maosen Li, Ya Zhang

(参考訳) 3dアノテーションやサイド情報のない3次元ポーズ推定のための教師・学生学習フレームワークに基づく新しい手法を提案する。教師ネットワークでは,この教師の学習課題を解決するために,ポーズディクショナリーモデルを用いて正規化を行い,物理的に妥当な3dポーズを推定する。教師ネットワークにおける分解のあいまいさに対処するため,教師ネットワークをトレーニングするための3次元回転不変性を促進するサイクル一貫性アーキテクチャを提案する。推定精度をさらに向上するため、学生ネットワークは3D座標を直接推定するフレキシビリティのための新しいグラフ畳み込みネットワークを採用している。 3次元回転同値性を促進するもう一つのサイクル一貫性アーキテクチャは、幾何学的一貫性を活用し、教師ネットワークからの知識蒸留と合わせてポーズ推定性能を向上させる。我々はHuman3.6MとMPI-INF-3DHPについて広範な実験を行った。本手法は,最先端の非教師付き手法と比較して3次元関節予測誤差を11.4%削減し,Human3.6Mの側情報を用いた弱い教師付き手法よりも優れている。コードはhttps://github.com/sjtuxcx/ITESで入手できる。

We propose a novel method based on teacher-student learning framework for 3D human pose estimation without any 3D annotation or side information. To solve this unsupervised-learning problem, the teacher network adopts pose-dictionary-based modeling for regularization to estimate a physically plausible 3D pose. To handle the decomposition ambiguity in the teacher network, we propose a cycle-consistent architecture promoting a 3D rotation-invariant property to train the teacher network. To further improve the estimation accuracy, the student network adopts a novel graph convolution network for flexibility to directly estimate the 3D coordinates. Another cycle-consistent architecture promoting 3D rotation-equivariant property is adopted to exploit geometry consistency, together with knowledge distillation from the teacher network to improve the pose estimation performance. We conduct extensive experiments on Human3.6M and MPI-INF-3DHP. Our method reduces the 3D joint prediction error by 11.4% compared to state-of-the-art unsupervised methods and also outperforms many weakly-supervised methods that use side information on Human3.6M. Code will be available at https://github.com/sjtuxcx/ITES.

翻訳日:2021-05-02 07:27:45 公開日:2020-12-17

# LIGHTEN:ビデオにおけるHOIのためのグラフと階層的テンポラルネットワークとのインタラクションの学習

LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos ( http://arxiv.org/abs/2012.09402v1 )

ライセンス: Link先を確認

Sai Praneeth Reddy Sunkesula, Rishabh Dabral, Ganesh Ramakrishnan

(参考訳) ビデオから人間とオブジェクト間の相互作用を分析することで、人間とビデオに存在するオブジェクトの関係を識別する。これは、物体の1つが人間でなければならない視覚関係検出の特殊なバージョンと考えることができる。従来の手法では,ビデオセグメントのシーケンスの推論として問題を定式化するが,階層的なアプローチであるLIGHTENを用いて視覚的特徴を学習し,ビデオ内の複数の粒度の時空間的手がかりを効果的に捉える。現在のアプローチとは異なり、LIGHTENは深度マップや3D人間のポーズのような地上の真実データの使用を避けるため、RGBD以外のデータセットも一般化される。さらに,手作りの空間的特徴ではなく,視覚的特徴のみを用いて同じことを実現する。本研究では,v-cocoデータセットにおける画像に基づくhoi検出に基づくcad-120のヒューマン・オブジェクト間インタラクション検出(88.9%,92.6%)と期待タスク,および競合結果を用いて,視覚特徴ベースアプローチの新しいベンチマークを設定する。 LIGHTENのコードはhttps://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-Temporal -Networks-for-HOIで公開されている。

Analyzing the interactions between humans and objects from a video includes identification of the relationships between humans and the objects present in the video. It can be thought of as a specialized version of Visual Relationship Detection, wherein one of the objects must be a human. While traditional methods formulate the problem as inference on a sequence of video segments, we present a hierarchical approach, LIGHTEN, to learn visual features to effectively capture spatio-temporal cues at multiple granularities in a video. Unlike current approaches, LIGHTEN avoids using ground truth data like depth maps or 3D human pose, thus increasing generalization across non-RGBD datasets as well. Furthermore, we achieve the same using only the visual features, instead of the commonly used hand-crafted spatial features. We achieve state-of-the-art results in human-object interaction detection (88.9% and 92.6%) and anticipation tasks of CAD-120 and competitive results on image based HOI detection in V-COCO dataset, setting a new benchmark for visual features based approaches. Code for LIGHTEN is available at https://github.com/praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal -Networks-for-HOI

翻訳日:2021-05-02 07:27:04 公開日:2020-12-17

# 不確かさ認識混合による計算効率の良い知識蒸留

Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup ( http://arxiv.org/abs/2012.09413v1 )

ライセンス: Link先を確認

Guodong Xu, Ziwei Liu, Chen Change Loy

(参考訳) 学生ネットワークの学習を指導するために教師ネットワークから「暗黒知識」を抽出する知識蒸留が,モデル圧縮と伝達学習に不可欠な技術として登場した。学生ネットワークの正確さに焦点をあてた以前の研究とは違って,本研究では,知識蒸留の効率性について研究する。我々のゴールは、訓練中に計算コストの低い従来の知識蒸留に匹敵する性能を達成することである。我々は,Uncertainty-aware mIXup (UNIX) がクリーンで効果的なソリューションであることを示す。不確実性サンプリング戦略は、各トレーニングサンプルの情報性を評価するために使用される。適応混合は不確実なサンプルにコンパクトな知識に適用される。さらに、従来の知識蒸留の冗長性は、簡単なサンプルの過剰な学習にあることを示す。不確実性と混在性を組み合わせることで,提案手法は冗長性を低減し,教師ネットワークに対する各クエリをより活用する。 CIFAR100とImageNetのアプローチを検証する。特に,計算コストがわずか79%のCIFAR100では,従来の知識蒸留よりも優れており,ImageNetでは同等の結果が得られる。

Knowledge distillation, which involves extracting the "dark knowledge" from a teacher network to guide the learning of a student network, has emerged as an essential technique for model compression and transfer learning. Unlike previous works that focus on the accuracy of student network, here we study a little-explored but important question, i.e., knowledge distillation efficiency. Our goal is to achieve a performance comparable to conventional knowledge distillation with a lower computation cost during training. We show that the UNcertainty-aware mIXup (UNIX) can serve as a clean yet effective solution. The uncertainty sampling strategy is used to evaluate the informativeness of each training sample. Adaptive mixup is applied to uncertain samples to compact knowledge. We further show that the redundancy of conventional knowledge distillation lies in the excessive learning of easy samples. By combining uncertainty and mixup, our approach reduces the redundancy and makes better use of each query to the teacher network. We validate our approach on CIFAR100 and ImageNet. Notably, with only 79% computation cost, we outperform conventional knowledge distillation on CIFAR100 and achieve a comparable result on ImageNet.

翻訳日:2021-05-02 07:26:37 公開日:2020-12-17

# PanoNet3D:LiDARPointクラウド検出のための意味的および幾何学的理解の組み合わせ

PanoNet3D: Combining Semantic and Geometric Understanding for LiDARPoint Cloud Detection ( http://arxiv.org/abs/2012.09418v1 )

ライセンス: Link先を確認

Xia Chen, Jianren Wang, David Held, Martial Hebert

(参考訳) カメラ画像やLiDAR点雲のような自律走行知覚における視覚データは、意味的特徴と幾何学的構造という2つの側面の混合として解釈できる。意味論は物体の外観と文脈からセンサーにもたらされ、幾何学的構造は点雲の実際の3d形状である。 LiDAR点雲上のほとんどの検出器は、実際の3次元空間における物体の幾何学的構造を分析することのみに焦点を当てている。先行研究とは異なり,多視点統合フレームワークを用いて意味的特徴と幾何学的構造の両方を学ぶことを提案する。提案手法は,2次元範囲画像のlidarスキャンの性質を活用し,よく検討された2次元畳み込みを意味的特徴抽出に適用する。意味的特徴と幾何学的特徴を融合することにより,この手法はすべてのカテゴリにおいて最先端のアプローチを大きなマージンで上回っている。意味的特徴と幾何学的特徴を組み合わせる手法は、実世界の3Dポイントクラウド検出の問題を考察するためのユニークな視点を提供する。

Visual data in autonomous driving perception, such as camera image and LiDAR point cloud, can be interpreted as a mixture of two aspects: semantic feature and geometric structure. Semantics come from the appearance and context of objects to the sensor, while geometric structure is the actual 3D shape of point clouds. Most detectors on LiDAR point clouds focus only on analyzing the geometric structure of objects in real 3D space. Unlike previous works, we propose to learn both semantic feature and geometric structure via a unified multi-view framework. Our method exploits the nature of LiDAR scans -- 2D range images, and applies well-studied 2D convolutions to extract semantic features. By fusing semantic and geometric features, our method outperforms state-of-the-art approaches in all categories by a large margin. The methodology of combining semantic and geometric features provides a unique perspective of looking at the problems in real-world 3D point cloud detection.

翻訳日:2021-05-02 07:26:19 公開日:2020-12-17

# 幾何学的変形と照度変化によるCTフィルムの復元:シミュレーションデータセットと深部モデル

CT Film Recovery via Disentangling Geometric Deformation and Illumination Variation: Simulated Datasets and Deep Models ( http://arxiv.org/abs/2012.09491v1 )

ライセンス: Link先を確認

Quan Quan, Qiyuan Wang, Liu Li, Yuanqi Du, S. Kevin Zhou

(参考訳) コンピュータ断層撮影(CT)などの医用画像は病院PACSのDICOM形式で保存されているが, セルフストレージや二次コンサルテーションのために, フィルムを転写可能な媒体として印刷することは, 多くの国で日常的に行われている。また、携帯電話カメラのユビキタス性により、ctフィルムの写真を撮るのが一般的であり、残念ながら幾何学的変形や照明変化に苦しむ。本研究は,文献における最初の試みであるctフィルムの回収の問題点を,我々の知識を最大限に活用するために検討する。まず,広く使用されているコンピュータグラフィックスソフトウェアであるBlenderを用いて,約2万枚の画像からなる大規模頭部CTフィルムデータベースCTFilm20Kを構築した。また,幾何学的変形(3次元座標,深さ,正規分布,紫外線図など)と照明変化(アルベド写像など)に関する全ての情報を記録した。そこで本研究では,ctフィルムから抽出した多重地図を用いて,幾何変形と照明変動を解消する深い枠組みを提案する。シミュレーションおよび実画像に対する大規模な実験は、従来のアプローチよりもアプローチの優位性を実証している。我々はCTフィルム回収の研究を促進するためのシミュレーション画像と深部モデルをオープンソース化する(https://anonymous.4open.science/r/e6b1f6e3-9b36-423f-a225-55b7d0b55523/)。

While medical images such as computed tomography (CT) are stored in DICOM format in hospital PACS, it is still quite routine in many countries to print a film as a transferable medium for the purposes of self-storage and secondary consultation. Also, with the ubiquitousness of mobile phone cameras, it is quite common to take pictures of the CT films, which unfortunately suffer from geometric deformation and illumination variation. In this work, we study the problem of recovering a CT film, which marks the first attempt in the literature, to the best of our knowledge. We start with building a large-scale head CT film database CTFilm20K, consisting of approximately 20,000 pictures, using the widely used computer graphics software Blender. We also record all accompanying information related to the geometric deformation (such as 3D coordinate, depth, normal, and UV maps) and illumination variation (such as albedo map). Then we propose a deep framework to disentangle geometric deformation and illumination variation using the multiple maps extracted from the CT films to collaboratively guide the recovery process. Extensive experiments on simulated and real images demonstrate the superiority of our approach over the previous approaches. We plan to open source the simulated images and deep models for promoting the research on CT film recovery (https://anonymous.4open.science/r/e6b1f6e3-9b36-423f-a225-55b7d0b55523/).

翻訳日:2021-05-02 07:26:02 公開日:2020-12-17

# 学習可能な関節群を用いた手のポーズ推定

Exploiting Learnable Joint Groups for Hand Pose Estimation ( http://arxiv.org/abs/2012.09496v1 )

ライセンス: Link先を確認

Moran Li, Yuan Gao, Nong Sang

(参考訳) 本稿では, 関節の3次元座標をグループ的に復元し, 低関係の関節が自動的に異なるグループに分類され, 異なる特徴を示す3次元ハンドポーズを推定する。これは、全てのジョイントが階層的に考慮され、同じ特徴を共有する以前の方法とは異なる。提案手法の利点はマルチタスク学習(MTL)の原理,すなわち,低関係の関節を異なるグループ(異なるタスク)に分けて各グループごとに異なる特徴を学習することにより,負の移動を効果的に回避する。提案手法の鍵となるのは, 関連継手を自動的に同一群に選択する新しいバイナリセレクタである。学習可能なパラメータにgumbel softmaxを用いて構築した,具体的分布から確率的にサンプリングされたバイナリ値を持つセレクタを実装した。これにより、ネットワーク全体の差別化可能な特性を保存できます。さらに,これらの非関連グループからの機能を活用し,それらの間の機能融合方式を適用し,より識別的な特徴を学習する。これは、結合した特徴に対して複数の1x1畳み込みを実装することで実現され、各結合群は特徴融合のための1x1畳み込みを含む。いくつかのベンチマークデータセットにおける詳細なアブレーション解析と広範な実験は、最先端(sota)法に対する提案手法の有望な性能を示している。また,提案手法は,最新のfreihandコンペティションにおいて,密集した3d形状ラベルを使用しないすべての手法の中でトップ1を達成した。ソースコードとモデルはhttps://github.com/moranli-aca/learnablegroups-handで入手できる。

In this paper, we propose to estimate 3D hand pose by recovering the 3D coordinates of joints in a group-wise manner, where less-related joints are automatically categorized into different groups and exhibit different features. This is different from the previous methods where all the joints are considered holistically and share the same feature. The benefits of our method are illustrated by the principle of multi-task learning (MTL), i.e., by separating less-related joints into different groups (as different tasks), our method learns different features for each of them, therefore efficiently avoids the negative transfer (among less related tasks/groups of joints). The key of our method is a novel binary selector that automatically selects related joints into the same group. We implement such a selector with binary values stochastically sampled from a Concrete distribution, which is constructed using Gumbel softmax on trainable parameters. This enables us to preserve the differentiable property of the whole network. We further exploit features from those less-related groups by carrying out an additional feature fusing scheme among them, to learn more discriminative features. This is realized by implementing multiple 1x1 convolutions on the concatenated features, where each joint group contains a unique 1x1 convolution for feature fusion. The detailed ablation analysis and the extensive experiments on several benchmark datasets demonstrate the promising performance of the proposed method over the state-of-the-art (SOTA) methods. Besides, our method achieves top-1 among all the methods that do not exploit the dense 3D shape labels on the most recently released FreiHAND competition at the submission date. The source code and models are available at https://github.com/ moranli-aca/LearnableGroups-Hand.

翻訳日:2021-05-02 07:25:32 公開日:2020-12-17

# カモフラージュによる医療敵の攻撃に対する階層的特徴制約

A Hierarchical Feature Constraint to Camouflage Medical Adversarial Attacks ( http://arxiv.org/abs/2012.09501v1 )

ライセンス: Link先を確認

Qingsong Yao, Zecheng He, Yi Lin, Kai Ma, Yefeng Zheng and S. Kevin Zhou

(参考訳) 医療画像のためのディープニューラルネットワーク(DNN)は、臨床上の意思決定にセキュリティ上の懸念をもたらす敵例(AE)に対して極めて脆弱である。幸いなことに、医療用AEは階層的な特徴空間でも容易に検出できます。この現象をよりよく理解するために、我々は特徴空間における医療用aesの本質的特徴を徹底的に調査し、経験的証拠と理論的説明の両方を提供している。まず,自然画像とは対照的に,医用画像の深部表現の脆弱性を明らかにするためのストレステストを行った。次に,2次疾患診断ネットワークに対する典型的な敵対的攻撃が,脆弱な表現を一定方向に連続的に最適化することにより予測を操作できることを理論的に証明した。しかし、この脆弱性は機能領域にAEを隠すために利用することもできる。本稿では,既存の敵攻撃に対するアドオンとして,新しい階層的特徴制約 (HFC) を提案する。提案手法は,Fundoscopy と Chest X-Ray の2つの公開医用画像データセット上で評価する。実験結果から,攻撃手法よりも先進的対人検知器の配列をバイパスし,医療的特徴の重大な脆弱性により,攻撃者が対人表現を操作できる余地が大きくなることが示唆された。

Deep neural networks (DNNs) for medical images are extremely vulnerable to adversarial examples (AEs), which poses security concerns on clinical decision making. Luckily, medical AEs are also easy to detect in hierarchical feature space per our study herein. To better understand this phenomenon, we thoroughly investigate the intrinsic characteristic of medical AEs in feature space, providing both empirical evidence and theoretical explanations for the question: why are medical adversarial attacks easy to detect? We first perform a stress test to reveal the vulnerability of deep representations of medical images, in contrast to natural images. We then theoretically prove that typical adversarial attacks to binary disease diagnosis network manipulate the prediction by continuously optimizing the vulnerable representations in a fixed direction, resulting in outlier features that make medical AEs easy to detect. However, this vulnerability can also be exploited to hide the AEs in the feature space. We propose a novel hierarchical feature constraint (HFC) as an add-on to existing adversarial attacks, which encourages the hiding of the adversarial representation within the normal feature distribution. We evaluate the proposed method on two public medical image datasets, namely {Fundoscopy} and {Chest X-Ray}. Experimental results demonstrate the superiority of our adversarial attack method as it bypasses an array of state-of-the-art adversarial detectors more easily than competing attack methods, supporting that the great vulnerability of medical features allows an attacker more room to manipulate the adversarial representations.

翻訳日:2021-05-02 07:25:04 公開日:2020-12-17

# 意味セグメンテーションのための具体化ビジュアルアクティブラーニング

Embodied Visual Active Learning for Semantic Segmentation ( http://arxiv.org/abs/2012.09503v1 )

ライセンス: Link先を確認

David Nilsson, Aleksis Pirinen, Erik G\"artner, Cristian Sminchisescu

(参考訳) エージェントが3次元環境を探索し、アノテーションを要求するビューを積極的に選択することで視覚的シーン理解を得ることを目的として、視覚的能動学習の課題について検討する。一部のベンチマークでは正確だが、今日のディープビジュアル認識パイプラインは、特定の現実世界のシナリオや異常な視点ではうまく一般化しない傾向がある。ロボットの知覚は、屋内環境の混乱や照明不足など、モバイルシステムの動作状況の認識能力を洗練する能力を必要としている。これにより,エージェントを視覚認識能力の向上を目的とした新しい環境に配置するタスクが提案される。視覚活動学習の具体化を研究するため,環境に関する知識の異なるエージェント(学習と事前特定の両方)の電池を開発する。エージェントはセマンティックセグメンテーションネットワークを備えており、それらのビューの周辺でアノテーションを広めるために情報的ビューを取得し、移動し、探索し、オンラインリトレーニングによって基礎となるセグメンテーションネットワークを洗練させる。トレーニング可能な方法は、深層強化学習を使用して、2つの競合する目標、すなわち、視覚認識精度として表現されるタスクのパフォーマンスと、アクティブな探索中に要求される必要量のアノテートされたデータとをバランスさせる。本稿では,フォトリアリスティックなMatterport3Dシミュレータを用いて提案手法を広範囲に評価し,より少ないアノテーションを要求しても,完全に学習した手法が比較対象よりも優れていることを示す。

We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding by actively selecting views for which to request annotation. While accurate on some benchmarks, today's deep visual recognition pipelines tend to not generalize well in certain real-world scenarios, or for unusual viewpoints. Robotic perception, in turn, requires the capability to refine the recognition capabilities for the conditions where the mobile system operates, including cluttered indoor environments or poor illumination. This motivates the proposed task, where an agent is placed in a novel environment with the objective of improving its visual recognition capability. To study embodied visual active learning, we develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the environment. The agents are equipped with a semantic segmentation network and seek to acquire informative views, move and explore in order to propagate annotations in the neighbourhood of those views, then refine the underlying segmentation network by online retraining. The trainable method uses deep reinforcement learning with a reward function that balances two competing objectives: task performance, represented as visual recognition accuracy, which requires exploring the environment, and the necessary amount of annotated data requested during active exploration. We extensively evaluate the proposed models using the photorealistic Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts, even when requesting fewer annotations.

翻訳日:2021-05-02 07:24:35 公開日:2020-12-17

# オープンセット映像認識における低遅延ストリームデータからのインクリメンタル学習

Incremental Learning from Low-labelled Stream Data in Open-Set Video Face Recognition ( http://arxiv.org/abs/2012.09571v1 )

ライセンス: Link先を確認

Eric Lopez-Lopez, Carlos V. Regueiro, Xose M. Pardo

(参考訳) ディープラーニングアプローチは、豊富な注釈付きデータがトレーニングのために提供される一般的な分類問題に対して、優れたパフォーマンスを備えたソリューションをもたらした。対照的に、ストリーミングデータの教師なし問題に主に適用した場合に、非定常クラスを連続的に学習する際の進歩は少ない。本稿では,深層機能エンコーダとSVMのオープンセット動的アンサンブルを組み合わせた新たなインクリメンタル学習手法を提案する。いくつかのビデオフレームで訓練された単純な弱い分類器から、教師なし操作データを用いて認識を向上させることができる。我々のアプローチは、破滅的な忘れを回避し、ミス適応から部分的に修復する新しいパターンに適応する。さらに、現実世界の条件に適合するように、システムはオープンセットで運用するように設計された。その結果、非適応的な最先端手法に対するF1スコアの最大15%向上効果が示された。

Deep Learning approaches have brought solutions, with impressive performance, to general classification problems where wealthy of annotated data are provided for training. In contrast, less progress has been made in continual learning of a set of non-stationary classes, mainly when applied to unsupervised problems with streaming data. Here, we propose a novel incremental learning approach which combines a deep features encoder with an Open-Set Dynamic Ensembles of SVM, to tackle the problem of identifying individuals of interest (IoI) from streaming face data. From a simple weak classifier trained on a few video-frames, our method can use unsupervised operational data to enhance recognition. Our approach adapts to new patterns avoiding catastrophic forgetting and partially heals itself from miss-adaptation. Besides, to better comply with real world conditions, the system was designed to operate in an open-set setting. Results show a benefit of up to 15% F1-score increase respect to non-adaptive state-of-the-art methods.

翻訳日:2021-05-02 07:24:07 公開日:2020-12-17

# 畳み込みニューラルネットワークによる銃器検出:エンドツーエンドソリューションに対する意味セグメンテーションモデルの比較

Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions ( http://arxiv.org/abs/2012.09662v1 )

ライセンス: Link先を確認

Alexander Egiazarov, Fabio Massimo Zennaro, Vasileios Mavroeidis

(参考訳) 武器の脅威検出とライブビデオからの攻撃的な行動は、テロリズムや一般犯罪、家庭内暴力などの致命的な事件の迅速検出と予防に利用できる。これを実現する1つの方法は、人工知能の使用と、特に機械学習による画像解析である。本稿では,従来のモノリシックなエンド・ツー・エンドのディープラーニングモデルと,セマンティクスセグメンテーションによって火花を検知する単純なニューラルネットワークのアンサンブルに基づく先行モデルの比較を行う。精度,計算量,データ複雑性,柔軟性,信頼性など,異なる観点から両モデルを評価した。その結果,セマンティクスセグメンテーションモデルは,従来の深層モデルと比べ,低データ環境においてかなりの柔軟性とレジリエンスを提供するが,その構成とチューニングはエンドツーエンドモデルと同等の精度を達成する上では困難であることがわかった。

Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents such as terrorism, general criminal offences, or even domestic violence. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. In this paper we conduct a comparison between a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation. We evaluated both models from different points of view, including accuracy, computational and data complexity, flexibility and reliability. Our results show that a semantic segmentation model provides considerable amount of flexibility and resilience in the low data environment compared to classical deep model models, although its configuration and tuning presents a challenge in achieving the same levels of accuracy as an end-to-end model.

翻訳日:2021-05-02 07:23:53 公開日:2020-12-17

# 複数ショットによるヒトメッシュの回復

Human Mesh Recovery from Multiple Shots ( http://arxiv.org/abs/2012.09843v1 )

ライセンス: Link先を確認

Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

(参考訳) 映画のような編集されたメディアのビデオは、有用だが未調査の情報ソースである。これらの映画において、大きな時間的文脈で描かれた人間同士の多様な外観と相互作用は、貴重なデータ源となり得る。しかし、データの豊かさは、急激なショット変更や、重度のトランケーションを持つアクターのクローズアップといった基本的な課題を犠牲にされ、既存の人間の3D理解方法の適用性が制限される。本稿では,同一シーンのショット変更がフレーム間の不連続を生じさせるが,シーンの3d構造は依然としてスムーズに変化するという考察を加えて,これらの制約について述べる。これにより、撮影前後のフレームをマルチビュー信号として処理し、アクターの3D状態を復元する強力な手がかりを提供する。提案するマルチショット最適化フレームワークは,擬似基底真理3次元メッシュを用いた長周期の3次元再構成とマイニングを改善する。得られたデータは,人間のメッシュ回復モデルのトレーニングにおいて有用であることが示される: 単一画像の場合, 頑健性が向上する; ビデオの場合, 入力フレームのショット変化による観察の欠如を自然に処理できる純粋トランスフォーマーベースのテンポラルエンコーダを提案する。広範な実験を通じて,洞察と提案モデルの重要性を実証する。私たちが開発しているツールは、編集されたメディアの巨大なライブラリから3Dコンテンツを処理・分析するための扉を開きます。プロジェクトページ: https://geopavlakos.github.io/multishot

Videos from edited media like movies are a useful, yet under-explored source of information. The rich variety of appearance and interactions between humans depicted over a large temporal context in these films could be a valuable source of data. However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncation, which limits the applicability of existing human 3D understanding methods. In this paper, we address these limitations with an insight that while shot changes of the same scene incur a discontinuity between frames, the 3D structure of the scene still changes smoothly. This allows us to handle frames before and after the shot change as multi-view signal that provide strong cues to recover the 3D state of the actors. We propose a multi-shot optimization framework, which leads to improved 3D reconstruction and mining of long sequences with pseudo ground truth 3D human mesh. We show that the resulting data is beneficial in the training of various human mesh recovery models: for single image, we achieve improved robustness; for video we propose a pure transformer-based temporal encoder, which can naturally handle missing observations due to shot changes in the input frames. We demonstrate the importance of the insight and proposed models through extensive experiments. The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media, which could be helpful for many downstream applications. Project page: https://geopavlakos.github.io/multishot

翻訳日:2021-05-02 07:22:33 公開日:2020-12-17

# スマートフォンで撮影した3dヘッドポートレート

Relightable 3D Head Portraits from a Smartphone Video ( http://arxiv.org/abs/2012.09963v1 )

ライセンス: Link先を確認

Artem Sevastopolsky, Savva Ignatiev, Gonzalo Ferrer, Evgeny Burnaev, Victor Lempitsky

(参考訳) 本研究は、人間の頭部の光沢ある3D肖像画を作成するシステムについて述べる。私たちのニューラルパイプラインは、スマートフォンのカメラがフラッシュ点滅(フラッシュなしのフラッシュシーケンス)で撮影したフレームのシーケンスで動作します。 structure-from-motion software と multi-view denoising によって再構成された粗い点雲は、幾何学的なプロキシとして使われる。その後、深いレンダリングネットワークを訓練して、任意の新しい視点のために密なアルベド、ノーマル、環境照明マップを復元する。効果的に、プロキシジオメトリとレンダリングネットワークは、任意の視点から任意の照明下で合成可能な、再生可能な3dポートレートモデルを構成する。方向光、点光、あるいは環境マップ。このモデルは、アルベド光分解の可視性を強制する人間の顔特有の先行するフレーム列に適合し、対話的なフレームレートで動作させる。異なる照明条件および外挿視点下での性能評価を行い,既存の照明法との比較を行った。

In this work, a system for creating a relightable 3D portrait of a human head is presented. Our neural pipeline operates on a sequence of frames captured by a smartphone camera with the flash blinking (flash-no flash sequence). A coarse point cloud reconstructed via structure-from-motion software and multi-view denoising is then used as a geometric proxy. Afterwards, a deep rendering network is trained to regress dense albedo, normals, and environmental lighting maps for arbitrary new viewpoints. Effectively, the proxy geometry and the rendering network constitute a relightable 3D portrait model, that can be synthesized from an arbitrary viewpoint and under arbitrary lighting, e.g. directional light, point light, or an environment map. The model is fitted to the sequence of frames with human face-specific priors that enforce the plausibility of albedo-lighting decomposition and operates at the interactive frame rate. We evaluate the performance of the method under varying lighting conditions and at the extrapolated viewpoints and compare with existing relighting methods.

翻訳日:2021-05-02 07:21:50 公開日:2020-12-17

# BERTが販売開始 - 製品表現の分散モデルの比較

BERT Goes Shopping: Comparing Distributional Models for Product Representations ( http://arxiv.org/abs/2012.09807v1 )

ライセンス: Link先を確認

Federico Bianchi and Bingqing Yu and Jacopo Tagliabue

(参考訳) ワード埋め込み(例: word2vec)はprod2vecを通じてeコマース製品にうまく適用されている。コンテキスト化された埋め込みによってもたらされるいくつかのnlpタスクの最近のパフォーマンス改善に触発されて、我々はbertのようなアーキテクチャをeコマースに転送することを提案します。 ProdBERTは従来の手法よりもいくつかのシナリオで優れているが、最高の性能のモデルではリソースとハイパーパラメータの重要性を強調している。最後に、様々な計算およびデータ制約の下で埋め込みを訓練するためのガイドラインを提供することで結論付ける。

Word embeddings (e.g., word2vec) have been applied successfully to eCommerce products through prod2vec. Inspired by the recent performance improvements on several NLP tasks brought by contextualized embeddings, we propose to transfer BERT-like architectures to eCommerce: our model -- ProdBERT -- is trained to generate representations of products through masked session modeling. Through extensive experiments over multiple shops, different tasks, and a range of design choices, we systematically compare the accuracy of ProdBERT and prod2vec embeddings: while ProdBERT is found to be superior to traditional methods in several scenarios, we highlight the importance of resources and hyperparameters in the best performing models. Finally, we conclude by providing guidelines for training embeddings under a variety of computational and data constraints.

翻訳日:2021-05-02 07:21:26 公開日:2020-12-17

# DecAug: Decomposed Feature Representation と Semantic Augmentation によるアウト・オブ・ディストリビューションの一般化

DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation ( http://arxiv.org/abs/2012.09382v1 )

ライセンス: Link先を確認

Haoyue Bai, Rui Sun, Lanqing Hong, Fengwei Zhou, Nanyang Ye, Han-Jia Ye, S.-H. Gary Chan, Zhenguo Li

(参考訳) ディープラーニングは、独立で同一に分散した(IID)データを扱う強力な能力を示しているが、テストデータが別の分布(w.r.t)から来るようなOoD(out-of-distriion)の一般化に悩まされることが多い。訓練1号) 一般のOoD一般化フレームワークを広範囲のアプリケーションに設計することは、主に現実世界における相関シフトと多様性シフトによって困難である。以前のアプローチのほとんどは、ドメイン間のシフトや相関の補間など、ひとつの特定の分散シフトのみを解決できる。そこで本研究では,OoD一般化のための特徴表現と意味拡張手法であるDecAugを提案する。 DecAugはカテゴリ関連の機能とコンテキスト関連の機能を分離する。カテゴリ関連機能は対象オブジェクトの因果情報を含み、コンテキスト関連機能は属性、スタイル、背景、シーンを記述し、トレーニングデータとテストデータの間の分散シフトを引き起こす。この分解は2つの勾配(w.r.t)の直交化によって達成される。中間特徴) カテゴリーとコンテキストラベルの予測のための損失。さらに,学習表現のロバスト性を改善するために,文脈関連特徴の勾配に基づく拡張を行う。実験結果から、DecAugは様々なOoDデータセット上で、様々なタイプのOoD一般化課題に対処できる手法の中で、他の最先端手法よりも優れていることが示された。

While deep learning demonstrates its strong ability to handle independent and identically distributed (IID) data, it often suffers from out-of-distribution (OoD) generalization, where the test data come from another distribution (w.r.t. the training one). Designing a general OoD generalization framework to a wide range of applications is challenging, mainly due to possible correlation shift and diversity shift in the real world. Most of the previous approaches can only solve one specific distribution shift, such as shift across domains or the extrapolation of correlation. To address that, we propose DecAug, a novel decomposed feature representation and semantic augmentation approach for OoD generalization. DecAug disentangles the category-related and context-related features. Category-related features contain causal information of the target object, while context-related features describe the attributes, styles, backgrounds, or scenes, causing distribution shifts between training and test data. The decomposition is achieved by orthogonalizing the two gradients (w.r.t. intermediate features) of losses for predicting category and context labels. Furthermore, we perform gradient-based augmentation on context-related features to improve the robustness of the learned representations. Experimental results show that DecAug outperforms other state-of-the-art methods on various OoD datasets, which is among the very few methods that can deal with different types of OoD generalization challenges.

翻訳日:2021-05-02 07:21:15 公開日:2020-12-17

# ベイズネットワークモデルを用いた心臓疾患予測のための高速アルゴリズム

A Fast Algorithm for Heart Disease Prediction using Bayesian Network Model ( http://arxiv.org/abs/2012.09429v1 )

ライセンス: Link先を確認

Mistura Muibideen and Rajesh Prasad (Department of Computer Science African University of Science and Technology, Abuja, Nigeria)

(参考訳) 心臓血管疾患は世界中の死因の1つである。データマイニングは、医療部門から利用可能なデータから貴重な知識を取得するのに役立つ。これは、臨床実験よりも速い患者の健康状態を予測するためのモデルをトレーニングするのに役立ちます。 Logistic Regression, K-Nearest Neighbor, Naive Bayes (NB), Support Vector Machineなど,さまざまな機械学習アルゴリズムの実装。クリーブランド心臓データセットに適用されているが、ベイジアンネットワーク(BN)を用いたモデリングには限界がある。本研究は,UCIレポジトリから収集したクリーブランド心臓データ14の関連属性の関係を明らかにするためにBNモデリングを適用した。その目的は、属性間の依存性が分類器のパフォーマンスにどう影響するかをチェックすることである。 BNは属性間の信頼性と透過的なグラフィカル表現を生成し、新しいシナリオを予測できる。このモデルは85%の精度を持つ。モデルでは80%の精度でNB分類器よりも優れていた。

Cardiovascular disease is the number one cause of death all over the world. Data mining can help to retrieve valuable knowledge from available data from the health sector. It helps to train a model to predict patients' health which will be faster as compared to clinical experimentation. Various implementation of machine learning algorithms such as Logistic Regression, K-Nearest Neighbor, Naive Bayes (NB), Support Vector Machine, etc. have been applied on Cleveland heart datasets but there has been a limit to modeling using Bayesian Network (BN). This research applied BN modeling to discover the relationship between 14 relevant attributes of the Cleveland heart data collected from The UCI repository. The aim is to check how the dependency between attributes affects the performance of the classifier. The BN produces a reliable and transparent graphical representation between the attributes with the ability to predict new scenarios. The model has an accuracy of 85%. It was concluded that the model outperformed the NB classifier which has an accuracy of 80%.

翻訳日:2021-05-02 07:20:38 公開日:2020-12-17

# 一般化保証によるAUUC最大化による治療目標設定

Treatment Targeting by AUUC Maximization with Generalization Guarantees ( http://arxiv.org/abs/2012.09897v1 )

ライセンス: Link先を確認

Artem Betlei, Eustache Diemert, Massih-Reza Amini

(参考訳) 個々の治療効果予測に基づいて治療課題を最適化する作業を検討する。このタスクはパーソナライズされた医療やターゲット広告といった多くのアプリケーションで見られ、近年はアップリフト・モデリング(uplift modeling)という名で関心を集めている。それは、最も有益であろう個人に対する治療を標的にしている。実生活のシナリオでは、地道的個別治療効果にアクセスできない場合には、一般に、個別治療効果(ITE)モデルの大半の学習目標とは異なるAUUC(Area Under the Uplift Curve)によって、それを行うモデルの能力が測定される。これらのモデルの学習は、不注意にauucを分解し、サブオプティカルな治療の割り当てにつながると論じている。この問題に対処するために,AUUCに縛られる一般化を提案し,AUUC-maxと呼ばれるこの境界の導出可能なサロゲートを最適化する新しい学習アルゴリズムを提案する。最後に,この一般化境界の厳密性,ハイパーパラメータチューニングの有効性を実証的に示し,従来の2つのベンチマークの幅広い基準値と比較し,提案アルゴリズムの有効性を示す。

We consider the task of optimizing treatment assignment based on individual treatment effect prediction. This task is found in many applications such as personalized medicine or targeted advertising and has gained a surge of interest in recent years under the name of Uplift Modeling. It consists in targeting treatment to the individuals for whom it would be the most beneficial. In real life scenarios, when we do not have access to ground-truth individual treatment effect, the capacity of models to do so is generally measured by the Area Under the Uplift Curve (AUUC), a metric that differs from the learning objectives of most of the Individual Treatment Effect (ITE) models. We argue that the learning of these models could inadvertently degrade AUUC and lead to suboptimal treatment assignment. To tackle this issue, we propose a generalization bound on the AUUC and present a novel learning algorithm that optimizes a derivable surrogate of this bound, called AUUC-max. Finally, we empirically demonstrate the tightness of this generalization bound, its effectiveness for hyper-parameter tuning and show the efficiency of the proposed algorithm compared to a wide range of competitive baselines on two classical benchmarks.

翻訳日:2021-05-02 07:19:39 公開日:2020-12-17

# 新型コロナウイルスの音声:感染の音響的相関

The voice of COVID-19: Acoustic correlates of infection ( http://arxiv.org/abs/2012.09478v1 )

ライセンス: Link先を確認

Katrin D. Bartl-Pokorny, Florian B. Pokorny, Anton Batliner, Shahin Amiriparian, Anastasia Semertzidou, Florian Eyben, Elena Kramer, Florian Schmidt, Rainer Sch\"onweiler, Markus Wehler, Bj\"orn W. Schuller

(参考訳) 新型コロナウイルス(covid-19)は世界の健康危機であり、ここ1年間、私たちの日常生活の多くの側面に影響を与えてきた。新型コロナウイルスの症状は重度連続体と異質である。症状のかなりの割合は声帯の病理学的変化と関連しており、COVID-19が発声に影響を及ぼす可能性があると仮定される。本研究は,本研究で初めて,包括的音響パラメータセットに基づいて,新型コロナウイルス感染の音声音響相関について検討することを目的とした。 i:/, /e:/, /o:/, /u:/, /a:/, /a:/の母音から抽出された88の音響的特徴を,11の症状性covid-19陽性者および11人の陰性ドイツ語話者参加者で比較した。我々はMann-Whitney Uテストを採用し、最も顕著なグループ差のある特徴を特定するために効果サイズを算出する。平均発声セグメント長と1秒あたりの発声セグメント数の差は、新型コロナウイルス陽性者の発声中の肺気流の不連続を示す母音全体において最も重要な違いとなる。前母音 /i:/ と /e:/ の群差は、基本周波数の変動と調和音-雑音比、後母音 /o:/ と /u:/ の群差、メル周波数ケプストラム係数とスペクトル傾斜の統計にさらに反映される。この研究の発見は、COVID-19に感染した個人を音声で識別する可能性を示す重要な概念実証として考えられる。

COVID-19 is a global health crisis that has been affecting many aspects of our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. A considerable proportion of symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the very first time, the present study aims to investigate voice acoustic correlates of an infection with COVID-19 on the basis of a comprehensive acoustic parameter set. We compare 88 acoustic features extracted from recordings of the vowels /i:/, /e:/, /o:/, /u:/, and /a:/ produced by 11 symptomatic COVID-19 positive and 11 COVID-19 negative German-speaking participants. We employ the Mann-Whitney U test and calculate effect sizes to identify features with the most prominent group differences. The mean voiced segment length and the number of voiced segments per second yield the most important differences across all vowels indicating discontinuities in the pulmonic airstream during phonation in COVID-19 positive participants. Group differences in the front vowels /i:/ and /e:/ are additionally reflected in the variation of the fundamental frequency and the harmonics-to-noise ratio, group differences in back vowels /o:/ and /u:/ in statistics of the Mel-frequency cepstral coefficients and the spectral slope. Findings of this study can be considered an important proof-of-concept contribution for a potential future voice-based identification of individuals infected with COVID-19.

翻訳日:2021-05-02 07:19:19 公開日:2020-12-17

# Clique: 都市規模における時空間物体の再同定

Clique: Spatiotemporal Object Re-identification at the City Scale ( http://arxiv.org/abs/2012.09329v1 )

ライセンス: Link先を確認

Tiantu Xu, Kaiwen Shen, Yang Fu, Humphrey Shi, Felix Xiaozhu Lin

(参考訳) オブジェクト再識別(ReID)は都市規模のカメラのキーとなる応用である。古典的なreidタスクは画像検索と見なされることが多いが、対象オブジェクトが現れる場所と時間についての時空間クエリとして扱う。時空間レイドは、コンピュータビジョンアルゴリズムと都市カメラからのコロッサルビデオの精度の限界に挑戦されている。 Clique は,(1) ReID アルゴリズムによって抽出されたファジィオブジェクトの特徴をクラスタリングすることで,ターゲット発生を判定する実用的 ReID エンジンであり,各クラスタは,入力にマッチする別物体の一般的な印象を表す。(2) ビデオで検索するために,Clique は時空間のカバレッジを最大化し,必要に応じてカメラを段階的に追加する。 25台のカメラから25時間のビデオを評価することで、Cliqueは70のクエリで0.87(リコールは5)に達し、高い精度で830倍の動画をリアルタイムに実行した。

Object re-identification (ReID) is a key application of city-scale cameras. While classic ReID tasks are often considered as image retrieval, we treat them as spatiotemporal queries for locations and times in which the target object appeared. Spatiotemporal reID is challenged by the accuracy limitation in computer vision algorithms and the colossal videos from city cameras. We present Clique, a practical ReID engine that builds upon two new techniques: (1) Clique assesses target occurrences by clustering fuzzy object features extracted by ReID algorithms, with each cluster representing the general impression of a distinct object to be matched against the input; (2) to search in videos, Clique samples cameras to maximize the spatiotemporal coverage and incrementally adds cameras for processing on demand. Through evaluation on 25 hours of videos from 25 cameras, Clique reached a high accuracy of 0.87 (recall at 5) across 70 queries and runs at 830x of video realtime in achieving high accuracy.

翻訳日:2021-05-02 07:18:08 公開日:2020-12-17

# ピクセルごとのバイアスドコントラスト閾値のイベントカメラ校正

Event Camera Calibration of Per-pixel Biased Contrast Threshold ( http://arxiv.org/abs/2012.09378v1 )

ライセンス: Link先を確認

Ziwei Wang, Yonhon Ng, Pieter van Goor, Robert Mahony

(参考訳) イベントカメラは、極端な照明条件下でも高い時間分解能で強度変化を表す非同期イベントを出力する。現在、既存の作品のほとんどは、すべてのピクセルの強度変化を推定するために単一のコントラスト閾値を使用している。しかし、複雑な回路バイアスと製造不完全さは、画素間のバイアス付き画素とミスマッチするコントラスト閾値を引き起こし、望ましくない出力に繋がる可能性がある。本稿では,イベント専用カメラとハイブリッドカメラを対象とする新しいイベントカメラモデルと2つのキャリブレーション手法を提案する。また,インテンシティ画像とイベントを同時に提供した場合,時間変動イベントレートに適応するイベントカメラのキャリブレーションを行う効率的なオンライン手法を提案する。提案手法の利点を,複数のイベントカメラデータセットにおける最新技術と比較した。

Event cameras output asynchronous events to represent intensity changes with a high temporal resolution, even under extreme lighting conditions. Currently, most of the existing works use a single contrast threshold to estimate the intensity change of all pixels. However, complex circuit bias and manufacturing imperfections cause biased pixels and mismatch contrast threshold among pixels, which may lead to undesirable outputs. In this paper, we propose a new event camera model and two calibration approaches which cover event-only cameras and hybrid image-event cameras. When intensity images are simultaneously provided along with events, we also propose an efficient online method to calibrate event cameras that adapts to time-varying event rates. We demonstrate the advantages of our proposed methods compared to the state-of-the-art on several different event camera datasets.

翻訳日:2021-05-02 07:17:50 公開日:2020-12-17

# スケール不変な特徴変換キーポイント記述子マッチングのための完全パイプラインFPGAアクセラレータ

A fully pipelined FPGA accelerator for scale invariant feature transform keypoint descriptor matching, ( http://arxiv.org/abs/2012.09666v1 )

ライセンス: Link先を確認

Luka Daoud, Muhammad Kamran Latif, H S. Jacinto, Nader Rafla

(参考訳) スケール不変特徴変換(SIFT)アルゴリズムはコンピュータビジョンの分野における古典的特徴抽出アルゴリズムであると考えられている。 siftのキーポイント記述子マッチングは、消費されるデータ量による計算集約的なプロセスである。本研究では,SIFTキーポイント記述子マッチングのための完全パイプライン型ハードウェアアクセラレータアーキテクチャを設計した。加速器コアはfield programmable gate array (fpga) で実装・テストされた。提案するハードウェアアーキテクチャは,完全な実装に必要なメモリ帯域幅を適切に処理し,屋上性能モデルに到達し,潜在的な最大スループットを実現する。完全なパイプラインマッチングアーキテクチャは、共振角距離法に基づいて設計されている。アーキテクチャは16ビットの固定点演算に最適化され,Xilinx ZynqベースのFPGA開発ボードを用いてハードウェア上に実装された。提案アーキテクチャは,メモリ帯域幅制限を緩和し,高いスループットを維持しつつ,文学的手法と比較して,領域資源の顕著な削減を示す。その結果、使用済みデバイスリソースの最大91%がLUTで、99%がBRAMで削減された。私たちのハードウェア実装は、同等のソフトウェアアプローチの15.7倍高速です。

The scale invariant feature transform (SIFT) algorithm is considered a classical feature extraction algorithm within the field of computer vision. SIFT keypoint descriptor matching is a computationally intensive process due to the amount of data consumed. In this work, we designed a novel fully pipelined hardware accelerator architecture for SIFT keypoint descriptor matching. The accelerator core was implemented and tested on a field programmable gate array (FPGA). The proposed hardware architecture is able to properly handle the memory bandwidth necessary for a fully-pipelined implementation and hits the roofline performance model, achieving the potential maximum throughput. The fully pipelined matching architecture was designed based on the consine angle distance method. Our architecture was optimized for 16-bit fixed-point operations and implemented on hardware using a Xilinx Zynq-based FPGA development board. Our proposed architecture shows a noticeable reduction of area resources compared with its counterparts in literature, while maintaining high throughput by alleviating memory bandwidth restrictions. The results show a reduction in consumed device resources of up to 91 percent in LUTs and 79 percent of BRAMs. Our hardware implementation is 15.7 times faster than the comparable software approach.

翻訳日:2021-05-02 07:16:46 公開日:2020-12-17

# OCTAを用いた胎児血管域の高速3次元計測

Fast 3-dimensional estimation of the Foveal Avascular Zone from OCTA ( http://arxiv.org/abs/2012.09945v1 )

ライセンス: Link先を確認

Giovanni Ometto, Giovanni Montesano, Usha Chakravarthy, Frank Kee, Ruth E. Hogg and David P. Crabb

(参考訳) 光コヒーレンス断層撮影法(optical coherence tomography angiography:octa)のen face imageからのfoveal avascular zone(faz)領域は、この技術に基づいた最も一般的な測定方法の1つである。 FAZの体積測定はOCTAスキャンを特徴付ける高雑音で計算されるのに対し, 診療におけるFAZ領域の使用は, 正常者間でのFAZ領域の高変動によって制限される。本研究では,3次元領域における内網膜の毛細血管網を3次元で効率的に同定するために,en面画像の信号対ノイズ比を高く活用するアルゴリズムを考案した。その後、ネットワークは形態学的操作で処理され、内部網膜の境界領域内の3d fazを識別する。 430個の眼のデータセットを用いてFAZの体積と面積を算出した。次に,線形混合効果モデルを用いて,糖尿病網膜症を伴わない健常者,糖尿病性網膜症者(dr),糖尿病者(dr)の3群間の差を同定した。その結果, FAZ量は異なる群間で有意差を認めたが, 面積測定では認められなかった。これらの結果から,体積型FAZは平面型FAZよりも優れた診断検出器である可能性が示唆された。私たちが導入した効率的な手法は、内網膜の毛細血管ネットワークの3dセグメンテーションを提供するだけでなく、診療所におけるfazボリュームの高速計算を可能にします。

The area of the foveal avascular zone (FAZ) from en face images of optical coherence tomography angiography (OCTA) is one of the most common measurement based on this technology. However, its use in clinic is limited by the high variation of the FAZ area across normal subjects, while the calculation of the volumetric measurement of the FAZ is limited by the high noise that characterizes OCTA scans. We designed an algorithm that exploits the higher signal-to-noise ratio of en face images to efficiently identify the capillary network of the inner retina in 3-dimensions (3D), under the assumption that the capillaries in separate plexuses do not overlap. The network is then processed with morphological operations to identify the 3D FAZ within the bounding segmentations of the inner retina. The FAZ volume and area in different plexuses were calculated for a dataset of 430 eyes. Then, the measurements were analyzed using linear mixed effect models to identify differences between three groups of eyes: healthy, diabetic without diabetic retinopathy (DR) and diabetic with DR. Results showed significant differences in the FAZ volume between the different groups but not in the area measurements. These results suggest that volumetric FAZ could be a better diagnostic detector than the planar FAZ. The efficient methodology that we introduced could allow the fast calculation of the FAZ volume in clinics, as well as providing the 3D segmentation of the capillary network of the inner retina.

翻訳日:2021-05-02 07:16:30 公開日:2020-12-17

# 動的頭部の合成放射場を学習する

Learning Compositional Radiance Fields of Dynamic Human Heads ( http://arxiv.org/abs/2012.09955v1 )

ライセンス: Link先を確認

Ziyan Wang, Timur Bagautdinov, Stephen Lombardi, Tomas Simon, Jason Saragih, Jessica Hodgins, Michael Zollh\"ofer

(参考訳) 動的人間のフォトリアリスティックなレンダリングは、テレプレゼンスシステム、仮想ショッピング、合成データ生成などにとって重要な能力である。近年,コンピュータグラフィックスと機械学習の技法を組み合わせたニューラルレンダリング手法が,人間と物体の高忠実度モデルを作成している。これらの手法のいくつかは、駆動可能な人間モデル(ニューラルボリューム)に十分な忠実度を持たず、一方、非常に長いレンダリング時間(NeRF)を持つ。本稿では,従来の手法の長所を組み合わせ,高解像度かつ高速な結果を生成する新しい合成3次元表現を提案する。アニメーションコードの粗い3次元構造を意識したグリッドと、各位置とその対応する局所アニメーションコードをビュー依存放射率と局所体積密度にマッピングする連続学習シーン関数を組み合わせることで、離散的かつ連続的なボリューム表現のギャップを埋める。異なるボリュームレンダリングは、人間の頭部と上半身のフォトリアリスティックな斬新なビューを計算したり、2次元の監督だけで新しい表現をエンドツーエンドに訓練したりするために用いられる。さらに,学習した動的放射場を用いて,グローバルなアニメーションコードに基づく新しい未知の表現を合成できることを示す。本研究は,人間の頭と上半身の新たな視線を合成する手法である。

Photorealistic rendering of dynamic humans is an important ability for telepresence systems, virtual shopping, synthetic data generation, and more. Recently, neural rendering methods, which combine techniques from computer graphics and machine learning, have created high-fidelity models of humans and objects. Some of these methods do not produce results with high-enough fidelity for driveable human models (Neural Volumes) whereas others have extremely long rendering times (NeRF). We propose a novel compositional 3D representation that combines the best of previous methods to produce both higher-resolution and faster results. Our representation bridges the gap between discrete and continuous volumetric representations by combining a coarse 3D-structure-aware grid of animation codes with a continuous learned scene function that maps every position and its corresponding local animation code to its view-dependent emitted radiance and local volume density. Differentiable volume rendering is employed to compute photo-realistic novel views of the human head and upper body as well as to train our novel representation end-to-end using only 2D supervision. In addition, we show that the learned dynamic radiance field can be used to synthesize novel unseen expressions based on a global animation code. Our approach achieves state-of-the-art results for synthesizing novel views of dynamic human heads and the upper body.

翻訳日:2021-05-02 07:16:04 公開日:2020-12-17

# 視覚質問応答のための自己教師付き学習による言語優先の克服

Overcoming Language Priors with Self-supervised Learning for Visual Question Answering ( http://arxiv.org/abs/2012.11528v1 )

ライセンス: Link先を確認

Xi Zhu, Zhendong Mao, Chunxiao Liu, Peng Zhang, Bin Wang, and Yongdong Zhang

(参考訳) ほとんどのVisual Question Answering (VQA)モデルは、固有のデータバイアスによって引き起こされる言語前の問題に悩まされている。具体的には、VQAモデルは質問に答える傾向がある(例えば、バナナは何色か? 画像内容を無視した高周波応答(例えばイエロー)に基づいて。既存のアプローチでは、繊細なモデルを作成したり、画像依存性を強化しながら質問依存を減らす視覚アノテーションを追加することでこの問題に対処している。しかし、データバイアスが緩和されてはいないため、まだ言語に先行する問題に直面している。本稿では,この問題を解決するための自己教師付き学習フレームワークを提案する。具体的には,まずラベル付きデータを自動生成してバイアスデータのバランスをとるとともに,バランスの取れたデータを活用する自己教師付き補助タスクを提案する。本手法は,外部アノテーションを導入することなく,バランスデータを生成することにより,データのバイアスを補償する。実験結果から,最も一般的に使用されているベンチマークVQA-CP v2の精度は49.50%から57.59%に向上した。言い換えれば、外部アノテーションを使わずにアノテーションベースのメソッドのパフォーマンスを16%向上させることができる。

Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases. Specifically, VQA models tend to answer questions (e.g., what color is the banana?) based on the high-frequency answers (e.g., yellow) ignoring image contents. Existing approaches tackle this problem by creating delicate models or introducing additional visual annotations to reduce question dependency while strengthening image dependency. However, they are still subject to the language prior problem since the data biases have not been even alleviated. In this paper, we introduce a self-supervised learning framework to solve this problem. Concretely, we first automatically generate labeled data to balance the biased data, and propose a self-supervised auxiliary task to utilize the balanced data to assist the base VQA model to overcome language priors. Our method can compensate for the data biases by generating balanced data without introducing external annotations. Experimental results show that our method can significantly outperform the state-of-the-art, improving the overall accuracy from 49.50% to 57.59% on the most commonly used benchmark VQA-CP v2. In other words, we can increase the performance of annotation-based methods by 16% without using external annotations.

翻訳日:2021-05-02 07:15:41 公開日:2020-12-17

# 小売の非行の因果学習

The Causal Learning of Retail Delinquency ( http://arxiv.org/abs/2012.09448v1 )

ライセンス: Link先を確認

Yiyan Huang, Cheuk Hang Leung, Xing Yan, Qi Wu, Nanbo Peng, Dongdong Wang, Zhixiang Huang

(参考訳) 本稿では、貸主の信用決定に変化があった場合の借主の返済の期待差に焦点を当てる。古典的推定器は相反する効果を見落とし、したがって推定誤差は壮大である。そこで我々は,誤差を大幅に低減できる推定器を構築するための別の手法を提案する。提案する推定器は, 理論解析と数値実験を組み合わせることで, 偏りがなく, 一貫性があり, 頑健であることが示されている。さらに,古典的推定器と提案した推定器の因果量の推定能力を比較する。比較は、線形回帰モデル、ツリーベースモデル、ニューラルネットワークベースのモデルなど、さまざまなレベルの因果性、異なる非線形性、異なる分布特性を示す異なるシミュレーションデータセットの下で、幅広いモデルでテストされる。最も重要なことは、当社のアプローチを、eコマースと融資ビジネスの両方を運用するグローバルテクノロジー企業が提供する大規模な観察データセットに適用することです。因果効果が正しく説明されれば, 推定誤差の相対的低減は極めて有意であることがわかった。

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

翻訳日:2021-05-02 07:14:57 公開日:2020-12-17

# アルゴリズム・暗号共設計によるスケーラブル・プライバシ保全型深層ニューラルネットワーク

Towards Scalable and Privacy-Preserving Deep Neural Network via Algorithmic-Cryptographic Co-design ( http://arxiv.org/abs/2012.09364v1 )

ライセンス: Link先を確認

Chaochao Chen, Jun Zhou, Longfei Zheng, Yan Wang, Xiaolin Zheng, Bingzhe Wu, Cen Chen, Li Wang, and Jianwei Yin

(参考訳) ディープニューラルネットワーク(DNN)は、特に豊富なトレーニングデータを提供する場合、様々な現実世界のアプリケーションにおいて顕著な進歩を遂げている。しかし、データ分離は現在深刻な問題となっている。既存の作業は、アルゴリズムの観点からも暗号化の観点からも、DNNモデルをプライバシ保護する。前者は主にデータホルダとデータホルダとサーバでDNN計算グラフを分割するが、スケーラビリティは良好だが、精度の低下と潜在的なプライバシーリスクに悩まされている。対照的に後者は、プライバシーの保証は強いがスケーラビリティは乏しい、時間を要する暗号技術を利用している。本稿では,アルゴリズムと暗号を併用した,スケーラブルでプライバシ保護の深いニューラルネットワーク学習フレームワークSPNNを提案する。アルゴリズムの観点から,dnnモデルの計算グラフを,データホルダが行うプライベートデータ関連計算と,計算能力の高いサーバに委譲されるその他の重い計算の2つの部分に分割する。暗号の観点からは,秘密共有法と準同型暗号法という2種類の暗号手法を用いて,私的および協調的にプライベートデータ関連計算を行う手法を提案する。さらに,SPNNを分散環境で実装し,ユーザフレンドリなAPIを導入する。実世界のデータセットで行った実験結果はspnnの優位を示している。

Deep Neural Networks (DNNs) have achieved remarkable progress in various real-world applications, especially when abundant training data are provided. However, data isolation has become a serious problem currently. Existing works build privacy preserving DNN models from either algorithmic perspective or cryptographic perspective. The former mainly splits the DNN computation graph between data holders or between data holders and server, which demonstrates good scalability but suffers from accuracy loss and potential privacy risks. In contrast, the latter leverages time-consuming cryptographic techniques, which has strong privacy guarantee but poor scalability. In this paper, we propose SPNN - a Scalable and Privacy-preserving deep Neural Network learning framework, from algorithmic-cryptographic co-perspective. From algorithmic perspective, we split the computation graph of DNN models into two parts, i.e., the private data related computations that are performed by data holders and the rest heavy computations that are delegated to a server with high computation ability. From cryptographic perspective, we propose using two types of cryptographic techniques, i.e., secret sharing and homomorphic encryption, for the isolated data holders to conduct private data related computations privately and cooperatively. Furthermore, we implement SPNN in a decentralized setting and introduce user-friendly APIs. Experimental results conducted on real-world datasets demonstrate the superiority of SPNN.

翻訳日:2021-05-02 07:14:40 公開日:2020-12-17

# 薬物標的結合親和性予測のための距離対応分子グラフ注意ネットワーク

Distance-aware Molecule Graph Attention Network for Drug-Target Binding Affinity Prediction ( http://arxiv.org/abs/2012.09624v1 )

ライセンス: Link先を確認

Jingbo Zhou, Shuangli Li, Liang Huang, Haoyi Xiong, Fan Wang, Tong Xu, Hui Xiong, Dejing Dou

(参考訳) 薬物とタンパク質の結合親和性を正確に予測することは、計算薬物発見の重要なステップである。グラフニューラルネットワーク(gnns)は様々なグラフ関連タスクで顕著な成功を収めているため、gnnは近年、結合親和性予測を改善する有望なツールと見なされている。しかし、既存のGNNアーキテクチャのほとんどは、その原子間の相対的な空間情報を考えることなく、薬物やタンパク質のトポロジカルグラフ構造を符号化することができる。ソーシャルネットワークやコモンセンス知識グラフのような他のグラフデータセットとは異なり、原子間の相対的な空間的位置と化学結合は結合親和性に大きな影響を及ぼす。そこで本研究では,ドラッグターゲット結合親和性予測に適したディスタンス対応分子グラフ注意ネットワーク(S-MAN)を提案する。そこで,我々はまず,構築したポケットリガンドグラフに位相構造と空間位置情報を統合する位置符号化機構を提案する。また,エッジレベルのアグリゲーションとノードレベルのアグリゲーションを有する新しいエッジノード階層型アグリゲーション構造を提案する。階層的注意集約は、原子間の空間的依存関係を捉えるだけでなく、原子間の複数の空間的関係を識別する能力で位置強調情報を融合することができる。最後に、S-MANの有効性を示すために、2つの標準データセットについて広範な実験を行った。

Accurately predicting the binding affinity between drugs and proteins is an essential step for computational drug discovery. Since graph neural networks (GNNs) have demonstrated remarkable success in various graph-related tasks, GNNs have been considered as a promising tool to improve the binding affinity prediction in recent years. However, most of the existing GNN architectures can only encode the topological graph structure of drugs and proteins without considering the relative spatial information among their atoms. Whereas, different from other graph datasets such as social networks and commonsense knowledge graphs, the relative spatial position and chemical bonds among atoms have significant impacts on the binding affinity. To this end, in this paper, we propose a diStance-aware Molecule graph Attention Network (S-MAN) tailored to drug-target binding affinity prediction. As a dedicated solution, we first propose a position encoding mechanism to integrate the topological structure and spatial position information into the constructed pocket-ligand graph. Moreover, we propose a novel edge-node hierarchical attentive aggregation structure which has edge-level aggregation and node-level aggregation. The hierarchical attentive aggregation can capture spatial dependencies among atoms, as well as fuse the position-enhanced information with the capability of discriminating multiple spatial relations among atoms. Finally, we conduct extensive experiments on two standard datasets to demonstrate the effectiveness of S-MAN.

翻訳日:2021-05-02 07:14:07 公開日:2020-12-17

# Fairkit, Fairkit, on the Wall, Who's the Fairest of Them All? フェアモデルトレーニングにおけるデータサイエンティストの支援

Fairkit, Fairkit, on the Wall, Who's the Fairest of Them All? Supporting Data Scientists in Training Fair Models ( http://arxiv.org/abs/2012.09951v1 )

ライセンス: Link先を確認

Brittany Johnson, Jesse Bartola, Rico Angell, Katherine Keith, Sam Witty, Stephen J. Giguere, Yuriy Brun

(参考訳) 現代のソフトウェアはデータと機械学習に大きく依存しており、世界を形成する決定に影響を与える。残念なことに、最近の研究では、データに偏りがあるため、ソフトウェアシステムは、女性の声よりも男性の声のより良い字幕の書き起こしを生成することから、金融ローンのために有色人種の人々を過大に引き込むことまで、彼らの決定にバイアスをしばしば注入していることが示されている。機械学習のバイアスに対処するために、データサイエンティストは、特定のデータ領域におけるモデル品質と公平性の間のトレードオフを理解するためのツールが必要である。その目的に向けて,データサイエンティストが公平性を判断し理解するためのツールキットであるfairkit-learnを提案する。 Fairkit-learnは最先端の機械学習ツールで動作し、同じインターフェースを使って採用を容易にする。複数の機械学習アルゴリズム、ハイパーパラメータ、データ置換によって生成される何千ものモデルを評価し、フェアネスと品質の間の最適なトレードオフを記述する小さなパレート最適モデルの集合を計算し視覚化することができる。その結果,fairkit-learnを利用する学生は,scikit-learn と ibm ai fairness 360 ツールキットを用いた学生よりも,公平性と品質のバランスが良いモデルを作成していることがわかった。 fairkit-learnでは、scikit-learnでトレーニングされるであろうモデルよりも、最大67%公平で10%精度の高いモデルを選択することができる。

Modern software relies heavily on data and machine learning, and affects decisions that shape our world. Unfortunately, recent studies have shown that because of biases in data, software systems frequently inject bias into their decisions, from producing better closed caption transcriptions of men's voices than of women's voices to overcharging people of color for financial loans. To address bias in machine learning, data scientists need tools that help them understand the trade-offs between model quality and fairness in their specific data domains. Toward that end, we present fairkit-learn, a toolkit for helping data scientists reason about and understand fairness. Fairkit-learn works with state-of-the-art machine learning tools and uses the same interfaces to ease adoption. It can evaluate thousands of models produced by multiple machine learning algorithms, hyperparameters, and data permutations, and compute and visualize a small Pareto-optimal set of models that describe the optimal trade-offs between fairness and quality. We evaluate fairkit-learn via a user study with 54 students, showing that students using fairkit-learn produce models that provide a better balance between fairness and quality than students using scikit-learn and IBM AI Fairness 360 toolkits. With fairkit-learn, users can select models that are up to 67% more fair and 10% more accurate than the models they are likely to train with scikit-learn.

翻訳日:2021-05-02 07:13:21 公開日:2020-12-17

# ビデオゲームにおける超解像の深層学習技術

Deep Learning Techniques for Super-Resolution in Video Games ( http://arxiv.org/abs/2012.09810v1 )

ライセンス: Link先を確認

Alexander Watson

(参考訳) ビデオゲームグラフィックスの計算コストは増加し、グラフィックス処理のハードウェアは追いつくのに苦労している。つまり、コンピュータ科学者はグラフィカル処理ハードウェアの性能を改善する創造的な新しい方法を開発する必要がある。ビデオ超解像のための深層学習技術は、計算コストの大部分を相殺しながら、高品質なグラフィックスを持つことができる。これらの新興技術は、消費者がビデオゲームのパフォーマンスと楽しみを改善し、ゲーム開発業界で標準になる可能性を秘めている。

The computational cost of video game graphics is increasing and hardware for processing graphics is struggling to keep up. This means that computer scientists need to develop creative new ways to improve the performance of graphical processing hardware. Deep learning techniques for video super-resolution can enable video games to have high quality graphics whilst offsetting much of the computational cost. These emerging technologies allow consumers to have improved performance and enjoyment from video games and have the potential to become standard within the game development industry.

翻訳日:2021-05-02 07:12:31 公開日:2020-12-17

# Treadmill Assisted Gait Spoofing (TAGS):ウェアラブルセンサーによる歩行認証への新たな脅威

Treadmill Assisted Gait Spoofing (TAGS): An Emerging Threat to wearable Sensor-based Gait Authentication ( http://arxiv.org/abs/2012.09950v1 )

ライセンス: Link先を確認

Rajesh Kumar and Can Isik and Vir V Phoha

(参考訳) 本研究では,Treadmill Assisted Gait Spoofing (TAGS) がWearable Sensor-based Gait Authentication (WSGait) に与える影響を検討する。我々は,加速度センサと固定された機能のセットのみに焦点を当てた,以前の研究よりも現実的な実装と展開のシナリオを検討する。具体的には、WSGaitの実装が1つ以上のセンサーを現代のスマートフォンに組み込むことができる状況について考察する。さらに、異なる機能セットや異なる分類アルゴリズム、あるいはその両方を使うこともできる。さまざまなセンサー、機能セット(相互情報によってランク付けされる)、および6つの異なる分類アルゴリズムが使用されているにもかかわらず、TAGSは平均FAR(False Accept Rate)を4%から26%に向上することができた。このような平均的なFARの大幅な増加、特に本研究で考慮された厳格な実装とデプロイメントのシナリオの下では、WSGaitの公開デプロイ前の評価設計に関するさらなる調査が求められている。

In this work, we examine the impact of Treadmill Assisted Gait Spoofing (TAGS) on Wearable Sensor-based Gait Authentication (WSGait). We consider more realistic implementation and deployment scenarios than the previous study, which focused only on the accelerometer sensor and a fixed set of features. Specifically, we consider the situations in which the implementation of WSGait could be using one or more sensors embedded into modern smartphones. Besides, it could be using different sets of features or different classification algorithms, or both. Despite the use of a variety of sensors, feature sets (ranked by mutual information), and six different classification algorithms, TAGS was able to increase the average False Accept Rate (FAR) from 4% to 26%. Such a considerable increase in the average FAR, especially under the stringent implementation and deployment scenarios considered in this study, calls for a further investigation into the design of evaluations of WSGait before its deployment for public use.

翻訳日:2021-05-02 07:12:24 公開日:2020-12-17

# ゼロショットモデル選択による音声強調

Speech Enhancement with Zero-Shot Model Selection ( http://arxiv.org/abs/2012.09359v1 )

ライセンス: Link先を確認

Ryandhimas E. Zezario, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

(参考訳) 音声強調(SE)に関する最近の研究は、深層学習に基づく手法の出現を目にしている。多様なテスト条件下でSEの一般化性を高める効果的な方法を決定することは依然として難しい課題である。本稿では,ゼロショット学習とアンサンブル学習を組み合わせることで,se性能の一般化を促進するためのゼロショットモデル選択(zmos)手法を提案する。提案手法はオフラインとオンラインの2つのフェーズで実現されている。オフラインフェーズでは、トレーニングデータのセット全体を複数のサブセットにクラスタし、各サブセットで専用のseモデル(コンポーネントseモデルと呼ばれる)をトレーニングする。オンラインフェーズは、拡張を行うのに最も適したコンポーネントSEモデルを選択する。品質スコア(QS)に基づく選択と品質埋め込み(QE)に基づく選択の2つの選択戦略が開発されている。 qsとqeはいずれも、非侵入的品質評価ネットワークであるquality-netによって得られる。オフラインフェーズでは、トレーニングデータをクラスタにグループ化するために、トレーニング発話のqsまたはqeを使用する。オンラインフェーズでは、テスト発話のQSまたはQEを使用して、適切なコンポーネントSEモデルを特定し、テスト発話の強化を行う。実験結果から,提案手法の有効性を示唆するベースラインシステムと比較して,zmos法が観測されたノイズタイプと未検出ノイズタイプの両方において,より優れた性能が得られることを確認した。

Recent research on speech enhancement (SE) has seen the emergence of deep learning-based methods. It is still a challenging task to determine effective ways to increase the generalizability of SE under diverse test conditions. In this paper, we combine zero-shot learning and ensemble learning to propose a zero-shot model selection (ZMOS) approach to increase the generalization of SE performance. The proposed approach is realized in two phases, namely offline and online phases. The offline phase clusters the entire set of training data into multiple subsets, and trains a specialized SE model (termed component SE model) with each subset. The online phase selects the most suitable component SE model to carry out enhancement. Two selection strategies are developed: selection based on quality score (QS) and selection based on quality embedding (QE). Both QS and QE are obtained by a Quality-Net, a non-intrusive quality assessment network. In the offline phase, the QS or QE of a train-ing utterance is used to group the training data into clusters. In the online phase, the QS or QE of the test utterance is used to identify the appropriate component SE model to perform enhancement on the test utterance. Experimental results have confirmed that the proposed ZMOS approach can achieve better performance in both seen and unseen noise types compared to the baseline systems, which indicates the effectiveness of the proposed approach to provide robust SE performance.

翻訳日:2021-05-02 07:12:06 公開日:2020-12-17

# グラスマン層を有する浅部ReLUネットワークを用いた低次モデリング

Reduced Order Modeling using Shallow ReLU Networks with Grassmann Layers ( http://arxiv.org/abs/2012.09940v1 )

ライセンス: Link先を確認

Kayla Bollinger and Hayden Schaeffer

(参考訳) 本稿では,ニューラルネットワークを用いた方程式系の非線形モデル削減手法を提案する。ニューラルネットワークは、グラスマン多様体上の第1層と同一性に設定された第1活性化関数を持つ「3層」ネットワークであり、残りのネットワークは標準の2層ReLUニューラルネットワークである。グラスマン層は入力空間の低減基底を決定するが、残りの層は非線形入力出力系を近似する。トレーニングは減弱基底と非線形近似の学習を交互に行い、減弱基底の修正やネットワークのみのトレーニングよりも効果的であることが示されている。このアプローチのさらなる利点は、低次元の部分空間上にあるデータに対して、ネットワーク内のパラメータの数が大きくなる必要はないことである。本稿では,ニューラルネットワークの近似に適さないデータスカース方式の科学的問題に対して,本手法が適用可能であることを示す。例えば、非線形力学系の低次モデリングや、いくつかの航空宇宙工学の問題がある。

This paper presents a nonlinear model reduction method for systems of equations using a structured neural network. The neural network takes the form of a "three-layer" network with the first layer constrained to lie on the Grassmann manifold and the first activation function set to identity, while the remaining network is a standard two-layer ReLU neural network. The Grassmann layer determines the reduced basis for the input space, while the remaining layers approximate the nonlinear input-output system. The training alternates between learning the reduced basis and the nonlinear approximation, and is shown to be more effective than fixing the reduced basis and training the network only. An additional benefit of this approach is, for data that lie on low-dimensional subspaces, that the number of parameters in the network does not need to be large. We show that our method can be applied to scientific problems in the data-scarce regime, which is typically not well-suited for neural network approximations. Examples include reduced order modeling for nonlinear dynamical systems and several aerospace engineering problems.

翻訳日:2021-05-02 07:11:45 公開日:2020-12-17

PDF登録状況（公開日: 20201217）