# 種間情報システム

Interspecies information systems ( http://arxiv.org/abs/2004.12168v2 )

Dirk van der Linden(参考訳) 本稿では,商業的に利用可能なデータ駆動型動物中心技術を用いて出現するこれらのシステムのいくつかの例を説明することにより,新たな社会工学システムである種間情報システム(iis)を紹介する。 ペットのウェアラブル、牛の健康モニタリング、野生生物のドローンといった動物中心のテクノロジーが動物データを取り込み、動物に向けるべき行動を人間に伝えると、種間情報システムが出現する。 孤立した技術や技術が媒介する相互作用ではなく,情報システムとして理解することの重要性について論じ,一般種間情報システムのキーコンポーネントと情報フローを捉えた概念モデルを提案する。 最後に、動物データが人間の行動を知らせるあらゆる種間情報システムの設計、工学、利用において直面する複数の実用的な課題を提案する。

This article introduces a new class of socio-technical systems, interspecies information systems (IIS) by describing several examples of these systems emerging through the use of commercially available data-driven animal-centered technology. When animal-centered technology, such as pet wearables, cow health monitoring, or even wildlife drones captures animal data and inform humans of actions to take towards animals, interspecies information systems emerge. I discuss the importance of understanding them as information systems rather than isolated technology or technology-mediated interactions, and propose a conceptual model capturing the key components and information flow of a general interspecies information system. I conclude by proposing multiple practical challenges that are faced in the successful design, engineering and use of any interspecies information systems where animal data informs human actions.
# 完全非分極化チャネルに対する不定因果順序を用いた古典的通信

Classical Communications with Indefinite Causal Order for $N$ completely depolarizing channels ( http://arxiv.org/abs/2004.14339v2 )

Sk Sazim, Michal Sedlak, Kratveer Singh, Arun Kumar Pati(参考訳) 完全に非分極チャネルの2つの同一のコピーがそれらの因果順序の重畳される場合、非ゼロ古典情報を送信することができる。 ここでは、量子SWITCHを用いて、$M$因果順序を重畳した$N$脱分極チャネルを用いて、いかにして古典的な情報を伝達できるかを検討する。 重ね合わせがチャネルの巡回置換のみを使用し、それが$M$で増加し、$N$とは独立である場合、ホレボ量を計算する。 キュービットの場合、$m$を増やせば1ドルに達することはない。 一方、古典的な容量はメッセージシステムの次元$d$で減少します。 さらに、$N=3$ および $N=4$ に対して、すべての因果順序の重ね合わせと、巡回置換部分群によって生成される異なる余集合に属する一様に重ね合わせされた因果順序について研究した。

If two identical copies of a completely depolarizing channel are put into a superposition of their possible causal orders, they can transmit non-zero classical information. Here, we study how well we can transmit classical information with $N$ depolarizing channels put in superposition of $M$ causal orders via quantum SWITCH. We calculate Holevo quantity if the superposition uses only cyclic permutations of channels and find that it increases with $M$ and it is independent of $N$. For a qubit it never reaches $1$ if we are increasing $M$. On the other hand, the classical capacity decreases with the dimension $d$ of the message system. Further, for $N=3$ and $N=4$ we studied superposition of all causal orders and uniformly superposed causal orders belonging to different cosets created by cyclic permutation subgroup.
# 有限エネルギーにおける量子シミュレーションのアルゴリズム

Algorithms for quantum simulation at finite energies ( http://arxiv.org/abs/2006.03032v3 )

Sirui Lu, Mari Carmen Ba\~nuls, J. Ignacio Cirac(参考訳) マルチボディシステムのマイクロカノニカルおよびカノニカル特性を探索するための2種類の量子アルゴリズムを提案する。 1つはハイブリッド量子アルゴリズムで、効率よく準備可能な状態が与えられると、平均エネルギーの周りの有限エネルギー間隔で期待値を計算する。 このアルゴリズムは、所望のエネルギー間隔外のエネルギーを投影する量子位相推定に類似したフィルタリング演算子に基づいている。 しかし、この操作を物理的に行う代わりに、フィルタ状態を作成することなく干渉測定を行うことで、物理的値を回復する。 計算時間は、キュービット数、所定の分散の逆数、および逆誤差と多項式的にスケールすることを示す。 実際には、アルゴリズムは長い時間進化を必要とせず、合理的な結果を得るためにかなりの数の測定を行う。 第2のアルゴリズムは、マイクロカノニカルアンサンブルとカノニカルアンサンブルの期待値にアプローチする他の量を計算するための量子支援モンテカルロサンプリング法である。 古典的モンテカルロ法と量子コンピュータを資源として用い、この方法は、適切なエネルギーで状態を作ることができる限り、古典的量子モンテカルロシミュレーションを悩ませているサイン問題を回避している。 すべてのアルゴリズムは、干渉計測を行うことができる限り、小さな量子コンピュータやアナログ量子シミュレータで使用することができる。 また、この最後のタスクは、より多くの測定を行うことを犠牲にして、大幅に単純化できることを示す。

We introduce two kinds of quantum algorithms to explore microcanonical and canonical properties of many-body systems. The first one is a hybrid quantum algorithm that, given an efficiently preparable state, computes expectation values in a finite energy interval around its mean energy. This algorithm is based on a filtering operator, similar to quantum phase estimation, which projects out energies outside the desired energy interval. However, instead of performing this operation on a physical state, it recovers the physical values by performing interferometric measurements without the need to prepare the filtered state. We show that the computational time scales polynomially with the number of qubits, the inverse of the prescribed variance, and the inverse error. In practice, the algorithm does not require the evolution for long times, but instead a significant number of measurements in order to obtain sensible results. Our second algorithm is a quantum-assisted Monte Carlo sampling method to compute other quantities which approach the expectation values for the microcanonical and canonical ensembles. Using classical Monte Carlo techniques and the quantum computer as a resource, this method circumvents the sign problem that is plaguing classical Quantum Monte Carlo simulations, as long as one can prepare states with suitable energies. All algorithms can be used with small quantum computers and analog quantum simulators, as long as they can perform the interferometric measurements. We also show that this last task can be greatly simplified at the expense of performing more measurements.
# 任意の局所対称性を持つ極大絡み合い量子状態の設計

Designing locally maximally entangled quantum states with arbitrary local symmetries ( http://arxiv.org/abs/2011.04078v5 )

Oskar S{\l}owik, Adam Sawicki, Tomasz Maci\k{a}\.zek(参考訳) 量子情報における多くのLOCCプロトコルの重要な要素の1つは、(局所的に)最大に絡み合った量子状態、すなわち局所対称性を持つ臨界状態である。 任意に大きい局所ユニタリ対称性を持つ臨界状態の設計法を示す。 このような状態は、ボソンやフェルミオンが有限個のモードを占める区別可能なトラップを持つ量子系で実現可能である。 そして、設計された量子状態の局所対称性は、すべてのトラップに対して対角に作用する局所モード演算のユニタリ群と等しい。 したがって、そのような対称性の群は、自然にモード演算子の物理的実現時に発生するエラーから保護される。 我々はまた、この結果といわゆる厳密な半安定状態と特定の漸近対角対称の存在を結びつける。 我々の主要な技術的結果は、$N$th tensor power of any irreducible representation of $\mathrm{SU}(N)$ は自明な表現のコピーを含むと述べている。 これはリトルウッド・リチャードソン規則の直接コンビネーション解析によって確立され、我々が望遠鏡と呼ぶ特定のコンビネーションオブジェクトを利用する。

One of the key ingredients of many LOCC protocols in quantum information is a multiparticle (locally) maximally entangled quantum state, aka a critical state, that possesses local symmetries. We show how to design critical states with arbitrarily large local unitary symmetry. We explain that such states can be realised in a quantum system of distinguishable traps with bosons or fermions occupying a finite number of modes. Then, local symmetries of the designed quantum state are equal to the unitary group of local mode operations acting diagonally on all traps. Therefore, such a group of symmetries is naturally protected against errors that occur in a physical realisation of mode operators. We also link our results with the existence of so-called strictly semistable states with particular asymptotic diagonal symmetries. Our main technical result states that the $N$th tensor power of any irreducible representation of $\mathrm{SU}(N)$ contains a copy of the trivial representation. This is established via a direct combinatorial analysis of Littlewood-Richardson rules utilising certain combinatorial objects which we call telescopes.
# 量子シミュレーションによる大域脱分極誤差の簡易緩和

Simple Mitigation of Global Depolarizing Errors in Quantum Simulations ( http://arxiv.org/abs/2101.01690v2 )

Joseph Vovrosh, Kiran E. Khosla, Sean Greenaway, Christopher Self, Myungshik Kim and Johannes Knolle(参考訳) 現在の量子デバイスから可能な最良の結果を得るには、エラー軽減が不可欠である。 本研究では,深部量子回路のノイズが大域的非分極誤差チャネルによってよく説明されるという仮定に基づく,単純かつ効果的な誤り軽減手法を提案する。 デバイス上で直接エラーを測定することで、ノイズデータからエラーのない結果を推測するためにエラーモデルansatzを使用する。 量子多体物理学における最近の関心の2つの例: 絡み合いの測定と量子スピン鎖の閉じ込めのリアルタイムダイナミクスによる緩和の有効性を強調した。 我々の技術は、閉じ込めのサインを示すIBM量子コンピュータから定量的な結果を得ることを可能にする。 さらに,現実的な誤りモデルを用いて,より一般的なタスクの数値シミュレーションを用いて,この緩和プロトコルの適用性を示す。 我々のプロトコルはデバイス非依存であり、単に実装可能であり、グローバルエラーが非分極によって適切に記述されている場合、結果が大きく改善される。

To get the best possible results from current quantum devices error mitigation is essential. In this work we present a simple but effective error mitigation technique based on the assumption that noise in a deep quantum circuit is well described by global depolarizing error channels. By measuring the errors directly on the device, we use an error model ansatz to infer error-free results from noisy data. We highlight the effectiveness of our mitigation via two examples of recent interest in quantum many-body physics: entanglement measurements and real time dynamics of confinement in quantum spin chains. Our technique enables us to get quantitative results from the IBM quantum computers showing signatures of confinement, i.e. we are able to extract the meson masses of the confined excitations which were previously out of reach. Additionally, we show the applicability of this mitigation protocol in a wider setting with numerical simulations of more general tasks using a realistic error model. Our protocol is device-independent, simply implementable and leads to large improvements in results if the global errors are well described by depolarization.
# 遅延トレースを持つグラフィカル言語:有限メモリによる量子コンピューティング

Graphical Language with Delayed Trace: Picturing Quantum Computing with Finite Memory ( http://arxiv.org/abs/2102.03133v2 )

ライセンス: Link先を確認
Graphical languages, like quantum circuits or ZX-calculus, have been successfully designed to represent (memoryless) quantum computations acting on a finite number of qubits. Meanwhile, delayed traces have been used as a graphical way to represent finite-memory computations on streams, in a classical setting (cartesian data types). We merge those two approaches and describe a general construction that extends any graphical language, equipped with a notion of discarding, to a graphical language of finite memory computations. In order to handle cases like the ZX-calculus, which is complete for post-selected quantum mechanics, we extend the delayed trace formalism beyond the causal case, refining the notion of causality for stream transformers. We design a stream semantics based on stateful morphism sequences and, under some assumptions, show universality and completeness results. Finally, we investigate the links of our framework with previous works on cartesian data types, signal flow graphs, and quantum channels with memories.
翻訳日:2023-04-12 11:48:25 公開日:2021-04-28
# Revault ProtocolによるBitcoinCustody運用のリスクフレームワーク

Risk Framework for Bitcoin Custody Operation with the Revault Protocol ( http://arxiv.org/abs/2102.09392v2 )

ライセンス: Link先を確認
Our contributions with this paper are twofold. First, we elucidate the methodological requirements for a risk framework of custodial operations and argue for the value of this type of risk model as complementary with cryptographic and blockchain security models. Second, we present a risk model in the form of a library of attack-trees for Revault -- an open-source custody protocol. The model can be used by organisations as a risk quantification framework for a thorough security analysis in their specific deployment context. Our work exemplifies an approach that can be used independent of which custody protocol is being considered, including complex protocols with multiple stakeholders and active defence infrastructure.
翻訳日:2023-04-11 06:09:13 公開日:2021-04-28
# 量子コンピューティングにおけるセキュリティと信頼のための量子PUF

Quantum PUF for Security and Trust in Quantum Computing ( http://arxiv.org/abs/2104.06244v3 )

Koustubh Phalak, Abdullah Ash-Saki, Mahabubul Alam, Rasit Onur Topaloglu, Swaroop Ghosh(参考訳) 量子コンピューティングは計算が難解な問題を解くための有望なパラダイムである。 IBM、Rigetti、D-Waveといった企業は、いくつかの興味深い機能を持つクラウドベースのプラットフォームを使用して量子コンピュータを提供している。 これらの要因が新たな脅威モデルを生み出す。 この脅威を軽減するために,重ね合わせに基づくQuPUFとデコヒーレンスに基づくQuPUFの2つのフレーバーを提案する。 IBMの実際の量子ハードウェアの実験では、提案されたQuPUFは、55%のダイ・ハミング距離(HD)と4%の低いイントラHDを、それぞれ50%と0%の理想的なケースと比較して達成できることを示した。 提案されたqupufsは他のアプリケーションのスタンドアロンソリューションとしても使用できる。

Quantum computing is a promising paradigm to solve computationally intractable problems. Various companies such as, IBM, Rigetti and D-Wave offer quantum computers using a cloud-based platform that possess several interesting features. These factors motivate a new threat model. To mitigate this threat, we propose two flavors of QuPUF: one based on superposition, and another based on decoherence. Experiments on real IBM quantum hardware show that the proposed QuPUF can achieve inter-die Hamming Distance(HD) of 55% and intra-HD as low as 4%, as compared to ideal cases of 50% and 0% respectively. The proposed QuPUFs can also be used as a standalone solution for any other application.
翻訳日:2023-04-03 23:30:36 公開日:2021-04-28
# エントロピーに基づく販売者問題に対する進化的多様性最適化

Entropy-Based Evolutionary Diversity Optimisation for the Traveling Salesperson Problem ( http://arxiv.org/abs/2104.13538v1 )

ライセンス: Link先を確認
Computing diverse sets of high-quality solutions has gained increasing attention among the evolutionary computation community in recent years. It allows practitioners to choose from a set of high-quality alternatives. In this paper, we employ a population diversity measure, called the high-order entropy measure, in an evolutionary algorithm to compute a diverse set of high-quality solutions for the Traveling Salesperson Problem. In contrast to previous studies, our approach allows diversifying segments of tours containing several edges based on the entropy measure. We examine the resulting evolutionary diversity optimisation approach precisely in terms of the final set of solutions and theoretical properties. Experimental results show significant improvements compared to a recently proposed edge-based diversity optimisation approach when working with a large population of solutions or long segments.
翻訳日:2023-04-02 04:44:00 公開日:2021-04-28
# スプリットステップ量子ウォークのユニタリ同値類

Unitary equivalence classes of split-step quantum walks ( http://arxiv.org/abs/2104.13529v1 )

ライセンス: Link先を確認
This study investigates the unitary equivalence of split-step quantum walks (SSQW). We consider a new class of quantum walks which includes all SSQWs. We show the explicit form of quantum walks in this class, and clarify their unitary equivalence classes. Unitary equivalence classes of Suzuki's SSQW are also given.
翻訳日:2023-04-02 04:43:47 公開日:2021-04-28
# 時空間相関を利用したbiphoton波動関数の帯域制御

Bandwidth control of the biphoton wavefunction exploiting spatio-temporal correlations ( http://arxiv.org/abs/2104.13750v1 )

ライセンス: Link先を確認
In this work we study the spatio-temporal correlations of photons produced by spontaneous parametric down conversion. In particular, we study how the waists of the detection and pump beams impact on the spectral bandwidth of the photons. Our results indicate that this parameter is greatly affected by the spatial properties of the detection beam, while not as much by the pump beam. This allows for a simple experimental implementation to control the bandwidth of the biphoton spectra, which only entails modifying the optical configuration to collect the photons. Moreover, we have performed Hong-Ou-Mandel interferometry measurements that also provide the phase of the biphoton wavefunction, and thereby its temporal shape. We explain all these results with a toy model derived under certain approximations, which accurately recovers most of the interesting experimental details.
翻訳日:2023-04-02 04:40:00 公開日:2021-04-28
# 量子鍵分布系の非古典的攻撃

Nonclassical Attack on a Quantum KeyDistribution System ( http://arxiv.org/abs/2104.13720v1 )

ライセンス: Link先を確認
The article is focused on research of an attack on the quantum key distribution system and proposes a countermeasure method. Particularly noteworthy is that this is not a classic attack on a quantum protocol. We describe an attack on the process of calibration. Results of the research show that quantum key distribution systems have vulnerabilities not only in the protocols, but also in other vital system components. The described type of attack does not affect the cryptographic strength of the received keys and does not point to the vulnerability of the quantum key distribution protocol. We also propose a method for autocompensating optical communication system development, which protects synchronization from unauthorized access. The proposed method is based on the use of sync pulses attenuated to a photon level in the process of detecting a time interval with a signal. The paper presents the results of experimental studies that show the discrepancies between the theoretical and real parameters of the system. The obtained data allow the length of the quantum channel to be calculated with high accuracy.
翻訳日:2023-04-02 04:39:22 公開日:2021-04-28
# 短絡から断熱への3レベル量子電池の高効率充電と放電

Highly efficient charging and discharging of three-level quantum batteries through shortcuts to adiabaticity ( http://arxiv.org/abs/2104.13668v1 )

ライセンス: Link先を確認
Quantum batteries are energy storage devices that satisfy quantum mechanical principles. How to improve the battery's performance such as stored energy and power is a crucial element in the quantum battery. Here, we investigate the charging and discharging dynamics of a three-level counterdiabatic stimulated Raman adiabatic passage quantum battery via shortcuts to adiabaticity, which can compensate for undesired transitions to realize a fast adiabatic evolution through the application of an additional control field to an initial Hamiltonian. The scheme can significantly speed up the charging and discharging processes of a three-level quantum battery and obtain more stored energy and higher power compared with the original stimulated Raman adiabatic passage. We explore the effect of both the amplitude and the delay time of driving fields on the performances of the quantum battery. Possible experimental implementation in superconducting circuit and nitrogen-vacancy center is also discussed.
翻訳日:2023-04-02 04:39:09 公開日:2021-04-28
# 量子符号の伝播デコードにおける爆発的縮退

Exploiting Degeneracy in Belief Propagation Decoding of Quantum Codes ( http://arxiv.org/abs/2104.13659v1 )

ライセンス: Link先を確認
Quantum information needs to be protected by quantum error-correcting codes due to imperfect quantum devices and operations. One would like to have an efficient and high-performance decoding procedure for quantum codes. A potential candidate is Pearl's belief propagation (BP), but its performance suffers from the many short cycles inherent in quantum codes, especially \textit{{highly-degenerate}} codes (that is, codes with many low-weight stabilizers). A general impression exists that BP cannot work for topological codes, such as the surface and toric codes. In this paper, we propose a decoding algorithm for quantum codes based on quaternary BP but with additional memory effects (called MBP). This MBP is like a recursive neural network with inhibition between neurons (edges with negative weights) during recursion, which enhances the network's perception capability. Moreover, MBP exploits the degeneracy of quantum codes so that it has a better chance to find the most probable error or its degenerate errors. The decoding performance is significantly improved over the conventional BP for various quantum codes, including quantum bicycle codes, hypergraph-product codes, and surface (or toric) codes. For MBP on the surface and toric codes over depolarizing errors, we observe thresholds of 14.5\%--16\% and 14.5\%--17.5\%, respectively.
翻訳日:2023-04-02 04:38:53 公開日:2021-04-28
# 鋭い界面を有する反強磁性絶縁体/トポロジカル絶縁体ヘテロ構造における磁気近接効果

Magnetic Proximity Effect in an Antiferromagnetic Insulator/Topological Insulator Heterostructure with Sharp Interface ( http://arxiv.org/abs/2104.13655v1 )

ライセンス: Link先を確認
We report an experimental study of electron transport properties of MnSe/(Bi,Sb)2Te3 heterostructures, in which MnSe is an antiferromagnetic insulator, and (Bi,Sb)2Te3 is a three-dimensional topological insulator (TI). Strong magnetic proximity effect is manifested in the measurements of the Hall effect and longitudinal resistances. Our analysis shows that the gate voltage can substantially modify the anomalous Hall conductance, which exceeds 0.1 e2/h at temperature of 1.6 K and magnetic field of 5 T, even though only the top TI surface is in proximity to MnSe. This work suggests that heterostructures based on antiferromagnetic insulators provide a promising platform for investigating a wide range of topological spintronic phenomena.
翻訳日:2023-04-02 04:38:27 公開日:2021-04-28
# 非自己随伴diracおよびklein-gordon作用素の固有値の局在

Localization of eigenvalues for non-self-adjoint Dirac and Klein-Gordon operators ( http://arxiv.org/abs/2104.13647v1 )

ライセンス: Link先を確認
This note aims to give prominence to some new results on the absence and localization of eigenvalues for the Dirac and Klein-Gordon operators, starting from known resolvent estimates already established in the literature combined with the renowned Birman-Schwinger principle.
翻訳日:2023-04-02 04:38:10 公開日:2021-04-28
# 価電子とホール作用素の位相因子の欠測理解

Missing understanding of the phase factor between valence-electron and hole operators ( http://arxiv.org/abs/2104.13644v1 )

ライセンス: Link先を確認
This paper provides the long-missing foundation to connect semiconductor and atomic notations and to support results incorrectly obtained by doing as if semiconductor electrons possessed an orbital angular momentum. We here show that the phase factor between valence-electron destruction operator and hole creation operator is the same as the one between particle and antiparticle in quantum relativity, namely $\hat{a}_{m}=(-1)^{j-m} \hat{b}^\dag_{-m}$ provided that $m=(j,j-1\cdots,-j)$ labels the degenerate states of the $(2j+1)$-fold electron level at hand. This result is remarkable because $(i)$ the hole is definitely not a naive antiparticle due to the remaining valence electrons; $(ii)$ unlike atomic electrons in a central potential, semiconductor electrons in a periodic crystal do not have orbital angular momentum $\textbf{L}=\textbf{r}\wedge\textbf{p}$ nor angular momentum $\textbf{J}=\textbf{L}+\textbf{S}$. Consequently, $(j,m)$ for semiconductor electrons merely are convenient notations to label the states of a degenerate level. To illustrate the physical implications, we discuss the interband couplings between photons and semiconductor, in terms of valence electrons and of holes: the phase factor is crucial to establish that bright excitons are in a spin-singlet state.
翻訳日:2023-04-02 04:38:02 公開日:2021-04-28
# mhealthアプリのための医療機器規制の取り組み -コロナチェックとコロナヘルスの経験報告-

Medical device regulation efforts for mHealth apps -- An experience report of Corona Check and Corona Health ( http://arxiv.org/abs/2104.13635v1 )

ライセンス: Link先を確認
Within the healthcare environment, mobile health (mHealth) applications (apps) are more and more important. The number of new mHealth apps has risen steadily in the last years. Especially the Covid-19 pandemic has led to an enormous amount of app releases. Notably, in most countries, mHealth applications have to be already compliant with several regulatory aspects in order to be declared to be a 'medical app'. However, the latest applicable medical device regulation (MDR) does not comment in more detail on the topic of the requirements for mHealth applications. When developing a medical app, it is essential that all contributors in an interdisciplinary team - especially the software engineers - are aware of the specific regulatory requirements beforehand. The development process, however, should not be stalled too long due to the integration of the MDR. Therefore, a developing framework, which includes these aspects, is required, to enable a smooth development process. The paper at hand introduces the creation of such a framework on the basis of the Corona Health and Corona Check apps. The relevant regulatory guidelines are listed and summarized to a guidance for medical app developments. In particular, the important stages and faced challenges emerged during the entire development process are highlighted.
翻訳日:2023-04-02 04:37:28 公開日:2021-04-28
# 異常 Moir\e パターンの観察

Observation of Anomalous Moir\'e Patterns ( http://arxiv.org/abs/2104.13625v1 )

ライセンス: Link先を確認
Moir\'e patterns are omnipresent. They are important for any overlapping periodic phenomenon, from vibrational and electromagnetic, to condensed matter. Here we show, both theoretically and via experimental simulations by ultracold atoms, that for one-dimensional finite-size periodic systems, moir\'e patterns give rise to anomalous features in both classical and quantum systems. In contrast to the standard moir\'e phenomenon, in which the pattern periodicity is a result of a beat-note between its constituents, we demonstrate moir\'e patterns formed from constituents with the same periodicity. Surprisingly, we observe, in addition, rigidity and singularities. We furthermore uncover universal properties in the frequency domain, which might serve as a novel probe of emitters. These one-dimensional effects could be relevant to a wide range of periodic phenomena.
翻訳日:2023-04-02 04:37:15 公開日:2021-04-28
# 正と完全正の写像の補間:絡み合った状態の新しい階層

Interpolating between positive and completely positive maps: a new hierarchy of entangled states ( http://arxiv.org/abs/2104.13829v1 )

ライセンス: Link先を確認
A new class of positive maps is introduced. It interpolates between positive and completely positive maps. It is shown that this class gives rise to a new characterization of entangled states. Additionally, it provides a refinement of the well-known classes of entangled states characterized in term of the Schmidt number. The analysis is illustrated with examples of qubit maps.
翻訳日:2023-04-02 04:29:58 公開日:2021-04-28
# 量子ドットバイエクシトンからの単一ラマン光子のオンデマンド放出のためのパルス整形

Pulse shaping for on-demand emission of single Raman photons from a quantum-dot biexciton ( http://arxiv.org/abs/2104.13781v1 )

ライセンス: Link先を確認
Semiconductor quantum dots embedded in optical cavities are promising on-demand sources of single photons. Here, we theoretically study single photon emission from an optically driven two-photon Raman transition between the biexciton and the ground state of a quantum dot. The advantage of this process is that it allows all-optical control of the properties of the emitted single photon with a laser pulse. However, with the presence of other decay channels and excitation-induced quantum interference, on-demand emission of the single Raman photon is generally difficult to achieve. Here we show that laser pulses with non-trivial shapes can be used to maintain excitation conditions for which with increasing pulse intensities the on-demand regime is reached. To provide a realistic picture of the achievable system performance, we include phonon-mediated processes in the theoretical caluclations. While preserving both high photon purity and indistinguishability, we find that although based on a higher-order emission process, for realistic system parameters on-demand Raman photon emission is indeed achievable with suitably tailored laser pulses.
翻訳日:2023-04-02 04:29:07 公開日:2021-04-28
# 量子コンピュータにおけるスピンダイマーの多重量子nmrダイナミクスのシミュレーション

Simulation of multiple-quantum NMR dynamics of spin dimer on quantum computer ( http://arxiv.org/abs/2104.13777v1 )

ライセンス: Link先を確認
Dymanics of spin dimers in multiple quantum NMR experiment is studied on the 5-qubit superconducting quantum processor of IBM {Quantum Experience} for the both {pure} ground and thermodynamic equilibrium (mixed) initial states. The work can be considered as a first step towards an application of quantum computers to solving problems of magnetic resonance. This article is dedicated to Prof. Klaus M\"obius and Prof. Kev Salikhov on the occasion of their 85th birthdays.
翻訳日:2023-04-02 04:28:50 公開日:2021-04-28
# エネルギーギャップの消滅の普遍的目撃者

Universal witnesses of vanishing energy gap ( http://arxiv.org/abs/2104.13884v1 )

ライセンス: Link先を確認
Energy gap, the difference between the energy of the ground state of a given Hamiltonian and the energy of its first excited state, is a parameter of a critical importance in analysis of phase transitions and adiabatic quantum computation. We present a concrete technique to determine the upper bound for the energy gap of a Hamiltonian $H_0$ based on properties of the set of expectation values of $H_0$ and an additional auxiliary Hamiltonian $V$. This formalism can be applied to obtain an effective criterion of gaplessness, which we illustrate with a concrete example of the XY model -- a physical system with vanishing energy gap.
翻訳日:2023-04-02 04:20:52 公開日:2021-04-28
# シュウィンガーの量子力学像:2-群と対称性

Schwinger's picture of quantum mechanics: 2-groupoids and symmetries ( http://arxiv.org/abs/2104.13880v1 )

ライセンス: Link先を確認
Starting from the groupoid approach to Schwinger's picture of Quantum Mechanics, a proposal for the description of symmetries in this framework is advanced.It is shown that, given a groupoid $G\rightrightarrows \Omega$ associated with a (quantum) system, there are two possible descriptions of its symmetries, one "microscopic", the other one "global".The microscopic point of view leads to the introduction of an additional layer over the grupoid $G$, giving rise to a suitable algebraic structure of 2-groupoid.On the other hand, taking advantage of the notion of group of bisections of a given groupoid, the global perspective allows to construct a group of symmetries out of a 2-groupoid.The latter notion allows to introduce an analog of the Wigner's theorem for quantum symmetries in the groupoid approach to Quantum Mechanics.
翻訳日:2023-04-02 04:20:40 公開日:2021-04-28
# 工学的運動モードによる捕捉イオン量子コンピュータの2量子ゲート

Two-qubit gates in a trapped-ion quantum computer by engineering motional modes ( http://arxiv.org/abs/2104.13870v1 )

ライセンス: Link先を確認
A global race towards developing a gate-based, universal quantum computer that one day promises to unlock the never before seen computational power has begun and the biggest challenge in achieving this goal arguably is the quality implementation of a two-qubit gate. In a trapped-ion quantum computer, one of the leading quantum computational platforms, a two-qubit gate is typically implemented by modulating the individual addressing beams that illuminate the two target ions, which, together with others, form a linear chain. The required modulation, expectedly so, becomes increasingly more complex, especially as the quantum computer becomes larger and runs faster, complicating the control hardware design. Here, we develop a simple method to essentially remove the pulse-modulation complexity at the cost of engineering the normal modes of the ion chain. We demonstrate that the required mode engineering is possible for a three ion chain, even with a trapped-ion quantum computational system built and optimized for a completely different mode of operations. This indicates that a system, if manufactured to target specifically for the mode-engineering based two-qubit gates, would readily be able to implement the gates without significant additional effort.
翻訳日:2023-04-02 04:20:21 公開日:2021-04-28
# beables、primitive ontology、beyond:理論と世界との出会い方

Beables, Primitive Ontology and Beyond: How Theories Meet the World ( http://arxiv.org/abs/2104.13859v1 )

ライセンス: Link先を確認
Bohm and Bell's approaches to the foundations of quantum mechanics share notable features with the contemporary Primitive Ontology perspective and Esfeld and Deckert minimalist ontology. For instance, all these programs consider ontological clarity a necessary condition to be met by every theoretical framework, promote scientific realism also in the quantum domain and strengthen the explanatory power of quantum theory. However, these approaches remarkably diverge from one another, since they employ different metaphysical principles leading to conflicting Weltanschaaungen. The principal aim of this essay is to spell out the relations as well as the main differences existing among such programs, which unfortunately remain often unnoticed in literature. Indeed, it is not uncommon to see Bell's views conflated with the PO programme, and the latter with Esfeld and Deckert's proposal. It will be our task to clear up this confusion.
翻訳日:2023-04-02 04:19:42 公開日:2021-04-28
# 二次元モースポテンシャルの縮退とコヒーレント状態

Degeneracy and coherent states of the two-dimensional Morse potential ( http://arxiv.org/abs/2104.13837v1 )

ライセンス: Link先を確認
In this paper we construct coherent states for the two-dimensional Morse potential. We find the dependence of the spectrum on the physical parameters and use this to understand the emergence of accidental degeneracies. It is observed that, under certain conditions pertaining to the irrationality of the parameters, accidental degeneracies do not appear and as such energy levels are at most two-fold degenerate. After defining a non-degenerate spectrum and set of states for the 2D Morse potential, we construct generalised coherent states and discuss the spatial distribution of their probability densities and their uncertainty relations.
翻訳日:2023-04-02 04:19:24 公開日:2021-04-28
# 光猫状態の量子コヒーレンスの定量化

Quantifying quantum coherence of optical cat states ( http://arxiv.org/abs/2104.13833v1 )

ライセンス: Link先を確認
Optical cat state plays an essential role in quantum computation and quantum metrology. Here, we experimentally quantify quantum coherence of an optical cat state by means of relative entropy and l_1 norm of coherence in Fock basis based on the prepared optical cat state at rubidium D1 line. By transmitting the optical cat state through a lossy channel, we also demonstrate the robustness of quantum coherence of optical cat state in the presence of loss, which is different from the decoherence properties of fidelity and Wigner function negativity of the optical cat state. Our results confirm that quantum coherence of optical cat states is robust against loss and pave the way for the application with optical cat states.
翻訳日:2023-04-02 04:19:13 公開日:2021-04-28
# AIがHydraでデータ品質監視が可能に

AI Enabled Data Quality Monitoring with Hydra ( http://arxiv.org/abs/2105.07948v1 )

ライセンス: Link先を確認
Data quality monitoring is critical to all experiments impacting the quality of any physics results. Traditionally, this is done through an alarm system, which detects low level faults, leaving higher level monitoring to human crews. Artificial Intelligence is beginning to find its way into scientific applications, but comes with difficulties, relying on the acquisition of new skill sets, either through education or acquisition, in data science. This paper will discuss the development and deployment of the Hydra monitoring system in production at Gluex. It will show how "off-the-shelf" technologies can be rapidly developed, as well as discuss what sociological hurdles must be overcome to successfully deploy such a system. Early results from production running of Hydra will also be shared as well as a future outlook for development of Hydra.
翻訳日:2023-04-02 04:11:58 公開日:2021-04-28
# 選挙人投票における不確実性の視覚的コミュニケーション : mundus vult decipi, ergo decipiatur

Mundus vult decipi, ergo decipiatur: Visual Communication of Uncertainty in Election Polls ( http://arxiv.org/abs/2105.07811v1 )

ライセンス: Link先を確認
Election poll reporting often focuses on mean values and only subordinately discusses the underlying uncertainty. Subsequent interpretations are too often phrased as certain. Moreover, media coverage rarely adequately takes into account the differences between now- and forecasts. These challenges were ubiquitous in the context of the 2016 and 2020 U.S. presidential elections, but are also present in multi-party systems like Germany. We discuss potential sources of bias in nowcasting and forecasting and review the current standards in the visual presentation of survey-based nowcasts. Concepts are presented to attenuate the issue of falsely perceived accuracy. We discuss multiple visual presentation techniques for central aspects in poll reporting. One key idea is the use of Probabilities of Events instead of party shares. The presented ideas offer modern and improved ways to communicate (changes in) the electoral mood for the general media.
翻訳日:2023-04-02 04:11:33 公開日:2021-04-28
# メルミン・ペレス正方形における量子文脈性:隠れ変数視点

Quantum contextuality in the Mermin-Peres square: A hidden variable perspective ( http://arxiv.org/abs/2105.00940v1 )

ライセンス: Link先を確認
The question of a hidden variable interpretation of quantum contextuality in the Mermin-Peres square is considered. The Kochen-Specker theorem implies that quantum mechanics may be interpreted as a contextual hidden variable theory. It is shown that such a hidden variable description can be viewed as either contextual in the random variables mapping hidden states to observable outcomes or in the probability measure on the hidden state space. The latter view suggests that this apparent contextuality may be interpreted as a simple consequence of measurement disturbance, wherein the initial hidden state is altered through interaction with the measuring device, thereby giving rise to a possibly different final hidden variable state from which the measurement outcome is obtained. In light of this observation, a less restrictive and, arguably, more reasonable definition of noncontextuality is suggested. To prove that such a description is possible, an explicit and, in this sense, noncontextual hidden variable model is constructed which reproduces all quantum theoretic predictions for the Mermin-Peres square. A critical analysis of some recent and proposed experimental tests of contextuality is also provided. Although the discussion is restricted to a four-dimensional Hilbert space, the approach and conclusions are expected to generalize to any Hilbert space.
翻訳日:2023-04-02 04:11:20 公開日:2021-04-28
# オントロジ的枠組みに基づく古典の普遍的概念

Universal notion of classicality based on ontological framework ( http://arxiv.org/abs/2104.14355v1 )

ライセンス: Link先を確認
Existence of physical reality in the classical world is a well-established fact from day-to-day observations. However within quantum theory, it is not straightforward to reach such a conclusion. A framework to analyse how observations can be described using some physical states of reality in a theory independent way was recently developed, known as ontological framework. Different principles when imposed on the ontological level give rise to different observations in physical experiments. Using the ontological framework, we formulate a novel notion of classicality termed "universal classicality" which is based upon the physical principles that in classical theories pure states are physical states of reality and every projective measurement just observes the state of the system. We construct a communication task in which the success probability is bounded from above for ontological models satisfying the notion of universal classicality. Contrary to previous notions of classicality which either required systems of dimension strictly greater than two or atleast three preparations, a violation of "universal classicality" can be observed using just a pair of qubits and a pair of incompatible measurements. We further show that violations of previously known notions of classicality such as preparation non-contexuality and Bell's local causality is a violation of universal classicality.
翻訳日:2023-04-02 04:10:59 公開日:2021-04-28
# 平軸および非同軸光線における「相補性」

'Complementarity' in paraxial and non-paraxial optical beams ( http://arxiv.org/abs/2104.14338v1 )

ライセンス: Link先を確認
Using the correspondence of a two dimensional paraxial and three dimensional non-paraxial optical beams with the qubit and qutrit systems respectively, we derive a complementarity relation between Hilbert-Schmidt coherence, generalized predictability and linear entropy. The linear entropy, a measure of mixedness is shown to saturate the complimentarity relation for mixed bi-partite states, which for the pure two qubit and qutrit systems quantifies the global entanglement and reduces the complimentarity relation to the triality relation between coherence, predictability and entanglement.
翻訳日:2023-04-02 04:10:39 公開日:2021-04-28
# 閉じ込められた原子系の分極性:ダルガーノ・ルイス法の適用

The polarizability of a confined atomic system: An application of Dalgarno-Lewis method ( http://arxiv.org/abs/2104.13973v1 )

ライセンス: Link先を確認
In this paper we give an application of Dalgarno-Lewis method, the latter not usually taught in quantum mechanics courses. This is very unfortunate since this method allows to bypass the sum over states appearing in the usual perturbation theory. In this context, and as an example, we study the effect of an external field, both static and frequency dependent, on a model-atom at fixed distance from a substrate. This can happen, for instance, when some organic molecule binds from one side to the substrate and from the other side to an atom or any other polarizable system. We model the polarizable atom by a short range potential, a Dirac$-\delta$ and find that the existence of a bound state depends on the ratio of the effective "nuclear charge" to the distance of the atom to the substrate. Using an asymptotic analysis, previously developed in the context of a single $\delta-$function potential in an infinite medium, we determine the ionization rate and the Stark shift of our system. Using Dalgarno-Lewis theory we find an exact expression for the static and dynamic polarizabilities of our system valid to all distances. We show that the polarizability is extremely sensitive to the distance to the substrate creating the possibility of using this quantity as a nanometric ruler. Furthermore, the line shape of the dynamic polarizability is also extremely sensitive to the distance to the substrate, thus providing another route to measure nanometric distances. The ditactic value of the $\delta-$function potential is well accepted in teaching activities due to its simplicity, while keeping the essential ingredients of a given problem.
翻訳日:2023-04-02 04:10:13 公開日:2021-04-28
# 深層強化学習を用いたインテリジェントラウンドアラウンドアラウンドアラウンド挿入

Intelligent Roundabout Insertion using Deep Reinforcement Learning ( http://arxiv.org/abs/2001.00786v3 )

ライセンス: Link先を確認
An important topic in the autonomous driving research is the development of maneuver planning systems. Vehicles have to interact and negotiate with each other so that optimal choices, in terms of time and safety, are taken. For this purpose, we present a maneuver planning module able to negotiate the entering in busy roundabouts. The proposed module is based on a neural network trained to predict when and how entering the roundabout throughout the whole duration of the maneuver. Our model is trained with a novel implementation of A3C, which we will call Delayed A3C (D-A3C), in a synthetic environment where vehicles move in a realistic manner with interaction capabilities. In addition, the system is trained such that agents feature a unique tunable behavior, emulating real world scenarios where drivers have their own driving styles. Similarly, the maneuver can be performed using different aggressiveness levels, which is particularly useful to manage busy scenarios where conservative rule-based policies would result in undefined waits.
翻訳日:2023-01-14 17:37:10 公開日:2021-04-28
# 線形拘束型ニューラルネットワーク

Linearly Constrained Neural Networks ( http://arxiv.org/abs/2002.01600v4 )

ライセンス: Link先を確認
We present a novel approach to modelling and learning vector fields from physical systems using neural networks that explicitly satisfy known linear operator constraints. To achieve this, the target function is modelled as a linear transformation of an underlying potential field, which is in turn modelled by a neural network. This transformation is chosen such that any prediction of the target function is guaranteed to satisfy the constraints. The approach is demonstrated on both simulated and real data examples.
翻訳日:2023-01-03 21:09:43 公開日:2021-04-28
# Renofeation: 対向ロバスト性向上のための簡易移動学習法

Renofeation: A Simple Transfer Learning Method for Improved Adversarial Robustness ( http://arxiv.org/abs/2002.02998v2 )

ライセンス: Link先を確認
Fine-tuning through knowledge transfer from a pre-trained model on a large-scale dataset is a widely spread approach to effectively build models on small-scale datasets. In this work, we show that a recent adversarial attack designed for transfer learning via re-training the last linear layer can successfully deceive models trained with transfer learning via end-to-end fine-tuning. This raises security concerns for many industrial applications. In contrast, models trained with random initialization without transfer are much more robust to such attacks, although these models often exhibit much lower accuracy. To this end, we propose noisy feature distillation, a new transfer learning method that trains a network from random initialization while achieving clean-data performance competitive with fine-tuning. Code available at https://github.com/cmu-enyac/Renofeation.
翻訳日:2023-01-03 03:34:07 公開日:2021-04-28
# スペクトル相関関数に基づく深層学習によるスペクトルセンシングと信号同定

Spectrum Sensing and Signal Identification with Deep Learning based on Spectral Correlation Function ( http://arxiv.org/abs/2003.08359v4 )

ライセンス: Link先を確認
Spectrum sensing is one of the means of utilizing the scarce source of wireless spectrum efficiently. In this paper, a convolutional neural network (CNN) model employing spectral correlation function which is an effective characterization of cyclostationarity property, is proposed for wireless spectrum sensing and signal identification. The proposed method classifies wireless signals without a priori information and it is implemented in two different settings entitled CASE1 and CASE2. In CASE1, signals are jointly sensed and classified. In CASE2, sensing and classification are conducted in a sequential manner. In contrary to the classical spectrum sensing techniques, the proposed CNN method does not require a statistical decision process and does not need to know the distinct features of signals beforehand. Implementation of the method on the measured overthe-air real-world signals in cellular bands indicates important performance gains when compared to the signal classifying deep learning networks available in the literature and against classical sensing methods. Even though the implementation herein is over cellular signals, the proposed approach can be extended to the detection and classification of any signal that exhibits cyclostationary features. Finally, the measurement-based dataset which is utilized to validate the method is shared for the purposes of reproduction of the results and further research and development.
翻訳日:2022-12-22 21:04:10 公開日:2021-04-28
# 散逸的シンプレクティック積分と勾配に基づく最適化への応用について

On dissipative symplectic integration with applications to gradient-based optimization ( http://arxiv.org/abs/2004.06840v4 )

ライセンス: Link先を確認
Recently, continuous-time dynamical systems have proved useful in providing conceptual and quantitative insights into gradient-based optimization, widely used in modern machine learning and statistics. An important question that arises in this line of work is how to discretize the system in such a way that its stability and rates of convergence are preserved. In this paper we propose a geometric framework in which such discretizations can be realized systematically, enabling the derivation of "rate-matching" algorithms without the need for a discrete convergence analysis. More specifically, we show that a generalization of symplectic integrators to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error. Moreover, such methods preserve a shadow Hamiltonian despite the absence of a conservation law, extending key results of symplectic integrators to nonconservative cases. Our arguments rely on a combination of backward error analysis with fundamental results from symplectic geometry. We stress that although the original motivation for this work was the application to optimization, where dissipative systems play a natural role, they are fully general and not only provide a differential geometric framework for dissipative Hamiltonian systems but also substantially extend the theory of structure-preserving integration.
翻訳日:2022-12-13 04:17:24 公開日:2021-04-28
# ランダム化臨床試験におけるバイオマーカー-治療相互作用検出のための二段階ペナルティ回帰スクリーニング法

Two-Stage Penalized Regression Screening to Detect Biomarker-Treatment Interactions in Randomized Clinical Trials ( http://arxiv.org/abs/2004.12028v2 )

ライセンス: Link先を確認
High-dimensional biomarkers such as genomics are increasingly being measured in randomized clinical trials. Consequently, there is a growing interest in developing methods that improve the power to detect biomarker-treatment interactions. We adapt recently proposed two-stage interaction detecting procedures in the setting of randomized clinical trials. We also propose a new stage 1 multivariate screening strategy using ridge regression to account for correlations among biomarkers. For this multivariate screening, we prove the asymptotic between-stage independence, required for family-wise error rate control, under biomarker-treatment independence. Simulation results show that in various scenarios, the ridge regression screening procedure can provide substantially greater power than the traditional one-biomarker-at-a-time screening procedure in highly correlated data. We also exemplify our approach in two real clinical trial data applications.
翻訳日:2022-12-09 21:35:26 公開日:2021-04-28
# 深層強化学習によるシミュレーションから実世界演習へ

From Simulation to Real World Maneuver Execution using Deep Reinforcement Learning ( http://arxiv.org/abs/2005.07023v4 )

ライセンス: Link先を確認
Deep Reinforcement Learning has proved to be able to solve many control tasks in different fields, but the behavior of these systems is not always as expected when deployed in real-world scenarios. This is mainly due to the lack of domain adaptation between simulated and real-world data together with the absence of distinction between train and test datasets. In this work, we investigate these problems in the autonomous driving field, especially for a maneuver planning module for roundabout insertions. In particular, we present a system based on multiple environments in which agents are trained simultaneously, evaluating the behavior of the model in different scenarios. Finally, we analyze techniques aimed at reducing the gap between simulated and real-world data showing that this increased the generalization capabilities of the system both on unseen and real-world scenarios.
翻訳日:2022-12-03 12:51:15 公開日:2021-04-28
# 関数近似器としてのフーリエニューラルネットワークと微分方程式解法

Fourier Neural Networks as Function Approximators and Differential Equation Solvers ( http://arxiv.org/abs/2005.13100v2 )

ライセンス: Link先を確認
We present a Fourier neural network (FNN) that can be mapped directly to the Fourier decomposition. The choice of activation and loss function yields results that replicate a Fourier series expansion closely while preserving a straightforward architecture with a single hidden layer. The simplicity of this network architecture facilitates the integration with any other higher-complexity networks, at a data pre- or postprocessing stage. We validate this FNN on naturally periodic smooth functions and on piecewise continuous periodic functions. We showcase the use of this FNN for modeling or solving partial differential equations with periodic boundary conditions. The main advantages of the current approach are the validity of the solution outside the training region, interpretability of the trained model, and simplicity of use.
翻訳日:2022-11-28 08:31:11 公開日:2021-04-28
# SPSG:RGB-Dスキャンによる自己監督型測光シーン生成

SPSG: Self-Supervised Photometric Scene Generation from RGB-D Scans ( http://arxiv.org/abs/2006.14660v2 )

ライセンス: Link先を確認
We present SPSG, a novel approach to generate high-quality, colored 3D models of scenes from RGB-D scan observations by learning to infer unobserved scene geometry and color in a self-supervised fashion. Our self-supervised approach learns to jointly inpaint geometry and color by correlating an incomplete RGB-D scan with a more complete version of that scan. Notably, rather than relying on 3D reconstruction losses to inform our 3D geometry and color reconstruction, we propose adversarial and perceptual losses operating on 2D renderings in order to achieve high-resolution, high-quality colored reconstructions of scenes. This exploits the high-resolution, self-consistent signal from individual raw RGB-D frames, in contrast to fused 3D reconstructions of the frames which exhibit inconsistencies from view-dependent effects, such as color balancing or pose inconsistencies. Thus, by informing our 3D scene generation directly through 2D signal, we produce high-quality colored reconstructions of 3D scenes, outperforming state of the art on both synthetic and real data.
翻訳日:2022-11-17 04:15:07 公開日:2021-04-28
# Laplacian Regularized Few-Shot Learning

Laplacian Regularized Few-Shot Learning ( http://arxiv.org/abs/2006.15486v3 )

ライセンス: Link先を確認
We propose a transductive Laplacian-regularized inference for few-shot tasks. Given any feature embedding learned from the base classes, we minimize a quadratic binary-assignment function containing two terms: (1) a unary term assigning query samples to the nearest class prototype, and (2) a pairwise Laplacian term encouraging nearby query samples to have consistent label assignments. Our transductive inference does not re-train the base model, and can be viewed as a graph clustering of the query set, subject to supervision constraints from the support set. We derive a computationally efficient bound optimizer of a relaxation of our function, which computes independent (parallel) updates for each query sample, while guaranteeing convergence. Following a simple cross-entropy training on the base classes, and without complex meta-learning strategies, we conducted comprehensive experiments over five few-shot learning benchmarks. Our LaplacianShot consistently outperforms state-of-the-art methods by significant margins across different models, settings, and data sets. Furthermore, our transductive inference is very fast, with computational times that are close to inductive inference, and can be used for large-scale few-shot tasks.
翻訳日:2022-11-16 01:57:25 公開日:2021-04-28
# ソーシャルメディア上でのハイブリッドディープラーニングモデルを用いたマルチモードによる説明可能な抑うつ検出

Explainable Depression Detection with Multi-Modalities Using a Hybrid Deep Learning Model on Social Media ( http://arxiv.org/abs/2007.02847v2 )

ライセンス: Link先を確認
Model interpretability has become important to engenders appropriate user trust by providing the insight into the model prediction. However, most of the existing machine learning methods provide no interpretability for depression prediction, hence their predictions are obscure to human. In this work, we propose interpretive Multi-Modal Depression Detection with Hierarchical Attention Network MDHAN, for detection depressed users on social media and explain the model prediction. We have considered user posts along with Twitter-based multi-modal features, specifically, we encode user posts using two levels of attention mechanisms applied at the tweet-level and word-level, calculate each tweet and words' importance, and capture semantic sequence features from the user timelines (posts). Our experiments show that MDHAN outperforms several popular and robust baseline methods, demonstrating the effectiveness of combining deep learning with multi-modal features. We also show that our model helps improve predictive performance when detecting depression in users who are posting messages publicly on social media. MDHAN achieves excellent performance and ensures adequate evidence to explain the prediction.
翻訳日:2022-11-14 06:17:33 公開日:2021-04-28
# 深層ネットワーク加速器のハードウェア実装と医療・医療への応用

Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications ( http://arxiv.org/abs/2007.05657v2 )

ライセンス: Link先を確認
The advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors has brought on new opportunities for applying both Deep and Spiking Neural Network (SNN) algorithms to healthcare and biomedical applications at the edge. This can facilitate the advancement of medical Internet of Things (IoT) systems and Point of Care (PoC) devices. In this paper, we provide a tutorial describing how various technologies including emerging memristive devices, Field Programmable Gate Arrays (FPGAs), and Complementary Metal Oxide Semiconductor (CMOS) can be used to develop efficient DL accelerators to solve a wide variety of diagnostic, pattern recognition, and signal processing problems in healthcare. Furthermore, we explore how spiking neuromorphic processors can complement their DL counterparts for processing biomedical signals. The tutorial is augmented with case studies of the vast literature on neural network and neuromorphic hardware as applied to the healthcare domain. We benchmark various hardware platforms by performing a sensor fusion signal processing task combining electromyography (EMG) signals with computer vision. Comparisons are made between dedicated neuromorphic processors and embedded AI accelerators in terms of inference latency and energy. Finally, we provide our analysis of the field and share a perspective on the advantages, disadvantages, challenges, and opportunities that various accelerators and neuromorphic processors introduce to healthcare and biomedical domains.
翻訳日:2022-11-11 13:55:07 公開日:2021-04-28
# Sparsity-Agnostic Lasso Bandit

Sparsity-Agnostic Lasso Bandit ( http://arxiv.org/abs/2007.08477v2 )

ライセンス: Link先を確認
We consider a stochastic contextual bandit problem where the dimension $d$ of the feature vectors is potentially large, however, only a sparse subset of features of cardinality $s_0 \ll d$ affect the reward function. Essentially all existing algorithms for sparse bandits require a priori knowledge of the value of the sparsity index $s_0$. This knowledge is almost never available in practice, and misspecification of this parameter can lead to severe deterioration in the performance of existing methods. The main contribution of this paper is to propose an algorithm that does not require prior knowledge of the sparsity index $s_0$ and establish tight regret bounds on its performance under mild conditions. We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms existing methods, even when the correct sparsity index is revealed to them but is kept hidden from our algorithm.
翻訳日:2022-11-09 22:24:03 公開日:2021-04-28
# Hopfield Networksは必要なものすべて

Hopfield Networks is All You Need ( http://arxiv.org/abs/2008.02217v3 )

ライセンス: Link先を確認
We introduce a modern Hopfield network with continuous states and a corresponding update rule. The new Hopfield network can store exponentially (with the dimension of the associative space) many patterns, retrieves the pattern with one update, and has exponentially small retrieval errors. It has three types of energy minima (fixed points of the update): (1) global fixed point averaging over all patterns, (2) metastable states averaging over a subset of patterns, and (3) fixed points which store a single pattern. The new update rule is equivalent to the attention mechanism used in transformers. This equivalence enables a characterization of the heads of transformer models. These heads perform in the first layers preferably global averaging and in higher layers partial averaging via metastable states. The new modern Hopfield network can be integrated into deep learning architectures as layers to allow the storage of and access to raw input data, intermediate results, or learned prototypes. These Hopfield layers enable new ways of deep learning, beyond fully-connected, convolutional, or recurrent networks, and provide pooling, memory, association, and attention mechanisms. We demonstrate the broad applicability of the Hopfield layers across various domains. Hopfield layers improved state-of-the-art on three out of four considered multiple instance learning problems as well as on immune repertoire classification with several hundreds of thousands of instances. On the UCI benchmark collections of small classification tasks, where deep learning methods typically struggle, Hopfield layers yielded a new state-of-the-art when compared to different machine learning methods. Finally, Hopfield layers achieved state-of-the-art on two drug design datasets. The implementation is available at: https://github.com/ml-jku/hopfield-layers
翻訳日:2022-11-09 21:48:30 公開日:2021-04-28
# 教師付き構文解析は言語理解に有用か? 実証的な調査

Is Supervised Syntactic Parsing Beneficial for Language Understanding? An Empirical Investigation ( http://arxiv.org/abs/2008.06788v2 )

ライセンス: Link先を確認
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU). The recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, however, questions this belief. In this work, we empirically investigate the usefulness of supervised parsing for semantic LU in the context of LM-pretrained transformer networks. Relying on the established fine-tuning paradigm, we first couple a pretrained transformer with a biaffine parsing head, aiming to infuse explicit syntactic knowledge from Universal Dependencies treebanks into the transformer. We then fine-tune the model for LU tasks and measure the effect of the intermediate parsing training (IPT) on downstream LU task performance. Results from both monolingual English and zero-shot language transfer experiments (with intermediate target-language parsing) show that explicit formalized syntax, injected into transformers through IPT, has very limited and inconsistent effect on downstream LU performance. Our results, coupled with our analysis of transformers' representation spaces before and after intermediate parsing, make a significant step towards providing answers to an essential question: how (un)availing is supervised parsing for high-level semantic natural language understanding in the era of large neural models?
翻訳日:2022-10-28 21:03:25 公開日:2021-04-28
# 領域とボックスレベルのアノテーションを用いたsalientインスタンスセグメンテーション

Salient Instance Segmentation with Region and Box-level Annotations ( http://arxiv.org/abs/2008.08246v3 )

ライセンス: Link先を確認
Salient instance segmentation is a new challenging task that received widespread attention in the saliency detection area. The new generation of saliency detection provides a strong theoretical and technical basis for video surveillance. Due to the limited scale of the existing dataset and the high mask annotations cost, plenty of supervision source is urgently needed to train a well-performing salient instance model. In this paper, we aim to train a novel salient instance segmentation framework by an inexact supervision without resorting to laborious labeling. To this end, we present a cyclic global context salient instance segmentation network (CGCNet), which is supervised by the combination of salient regions and bounding boxes from the ready-made salient object detection datasets. To locate salient instance more accurately, a global feature refining layer is proposed that dilates the features of the region of interest (ROI) to the global context in a scene. Meanwhile, a labeling updating scheme is embedded in the proposed framework to update the coarse-grained labels for next iteration. Experiment results demonstrate that the proposed end-to-end framework trained by inexact supervised annotations can be competitive to the existing fully supervised salient instance segmentation methods. Without bells and whistles, our proposed method achieves a mask AP of 58.3% in the test set of Dataset1K that outperforms the mainstream state-of-the-art methods.
翻訳日:2022-10-27 08:59:36 公開日:2021-04-28
# ニューラルネットワークの対向ロバスト性に対するメムリシティブクロスバーの非理想性の再考

Rethinking Non-idealities in Memristive Crossbars for Adversarial Robustness in Neural Networks ( http://arxiv.org/abs/2008.11298v2 )

ライセンス: Link先を確認
Deep Neural Networks (DNNs) have been shown to be prone to adversarial attacks. Memristive crossbars, being able to perform Matrix-Vector-Multiplications (MVMs) efficiently, are used to realize DNNs on hardware. However, crossbar non-idealities have always been devalued since they cause errors in performing MVMs, leading to computational accuracy losses in DNNs. Several software-based defenses have been proposed to make DNNs adversarially robust. However, no previous work has demonstrated the advantage conferred by the crossbar non-idealities in unleashing adversarial robustness. We show that the intrinsic hardware non-idealities yield adversarial robustness to the mapped DNNs without any additional optimization. We evaluate the adversarial resilience of state-of-the-art DNNs (VGG8 & VGG16 networks) using benchmark datasets (CIFAR-10, CIFAR-100 & Tiny Imagenet) across various crossbar sizes. We find that crossbar non-idealities unleash significantly greater adversarial robustness (>10-20%) in crossbar-mapped DNNs than baseline software DNNs. We further assess the performance of our approach with other state-of-the-art efficiency-driven adversarial defenses and find that our approach performs significantly well in terms of reducing adversarial loss.
翻訳日:2022-10-25 03:52:31 公開日:2021-04-28
# pix2prof:deep natural language 'captioning'モデルによる銀河画像からのシーケンシャル情報の高速抽出

Pix2Prof: fast extraction of sequential information from galaxy imagery via a deep natural language 'captioning' model ( http://arxiv.org/abs/2010.00622v2 )

ライセンス: Link先を確認
We present 'Pix2Prof', a deep learning model that can eliminate any manual steps taken when extracting galaxy profiles. We argue that a galaxy profile of any sort is conceptually similar to a natural language image caption. This idea allows us to leverage image captioning methods from the field of natural language processing, and so we design Pix2Prof as a float sequence 'captioning' model suitable for galaxy profile inference. We demonstrate the technique by approximating a galaxy surface brightness (SB) profile fitting method that contains several manual steps. Pix2Prof processes $\sim$1 image per second on an Intel Xeon E5 2650 v3 CPU, improving on the speed of the manual interactive method by more than two orders of magnitude. Crucially, Pix2Prof requires no manual interaction, and since galaxy profile estimation is an embarrassingly parallel problem, we can further increase the throughput by running many Pix2Prof instances simultaneously. In perspective, Pix2Prof would take under an hour to infer profiles for $10^5$ galaxies on a single NVIDIA DGX-2 system. A single human expert would take approximately two years to complete the same task. Automated methodology such as this will accelerate the analysis of the next generation of large area sky surveys expected to yield hundreds of millions of targets. In such instances, all manual approaches -- even those involving a large number of experts -- will be impractical.
翻訳日:2022-10-12 09:00:23 公開日:2021-04-28
# pymia: 深層学習に基づく医用画像解析におけるデータ処理と評価のためのPythonパッケージ

pymia: A Python package for data handling and evaluation in deep learning-based medical image analysis ( http://arxiv.org/abs/2010.03639v2 )

ライセンス: Link先を確認
Background and Objective: Deep learning enables tremendous progress in medical image analysis. One driving force of this progress are open-source frameworks like TensorFlow and PyTorch. However, these frameworks rarely address issues specific to the domain of medical image analysis, such as 3-D data handling and distance metrics for evaluation. pymia, an open-source Python package, tries to address these issues by providing flexible data handling and evaluation independent of the deep learning framework. Methods: The pymia package provides data handling and evaluation functionalities. The data handling allows flexible medical image handling in every commonly used format (e.g., 2-D, 2.5-D, and 3-D; full- or patch-wise). Even data beyond images like demographics or clinical reports can easily be integrated into deep learning pipelines. The evaluation allows stand-alone result calculation and reporting, as well as performance monitoring during training using a vast amount of domain-specific metrics for segmentation, reconstruction, and regression. Results: The pymia package is highly flexible, allows for fast prototyping, and reduces the burden of implementing data handling routines and evaluation methods. While data handling and evaluation are independent of the deep learning framework used, they can easily be integrated into TensorFlow and PyTorch pipelines. The developed package was successfully used in a variety of research projects for segmentation, reconstruction, and regression. Conclusions: The pymia package fills the gap of current deep learning frameworks regarding data handling and evaluation in medical image analysis. It is available at https://github.com/rundherum/pymia and can directly be installed from the Python Package Index using pip install pymia.
翻訳日:2022-10-10 00:05:49 公開日:2021-04-28
# 反復学習者の限界における半空間学習と他の概念クラス

Learning Half-Spaces and other Concept Classes in the Limit with Iterative Learners ( http://arxiv.org/abs/2010.03227v2 )

ライセンス: Link先を確認
In order to model an efficient learning paradigm, iterative learning algorithms access data one by one, updating the current hypothesis without regress to past data. Past research on iterative learning analyzed for example many important additional requirements and their impact on iterative learners. In this paper, our results are twofold. First, we analyze the relative learning power of various settings of iterative learning, including learning from text and from informant, as well as various further restrictions, for example we show that strongly non-U-shaped learning is restrictive for iterative learning from informant. Second, we investigate the learnability of the concept class of half-spaces and provide a constructive iterative algorithm to learn the set of half-spaces from informant.
翻訳日:2022-10-09 22:44:32 公開日:2021-04-28
# PathoNet: 乳癌の予後因子としてのKi-67および腫瘍浸潤リンパ球(TIL)のディープラーニングによる評価 : 大規模データセットとベースライン

PathoNet: Deep learning assisted evaluation of Ki-67 and tumor infiltrating lymphocytes (TILs) as prognostic factors in breast cancer; A large dataset and baseline ( http://arxiv.org/abs/2010.04713v3 )

ライセンス: Link先を確認
The nuclear protein Ki-67 and Tumor infiltrating lymphocytes (TILs) have been introduced as prognostic factors in predicting tumor progression and its treatment response. The value of the Ki-67 index and TILs in approach to heterogeneous tumors such as Breast cancer (BC), known as the most common cancer in women worldwide, has been highlighted in the literature. Due to the indeterminable and subjective nature of Ki-67 as well as TILs scoring, automated methods using machine learning, specifically approaches based on deep learning, have attracted attention. Yet, deep learning methods need considerable annotated data. In the absence of publicly available benchmarks for BC Ki-67 stained cell detection and further annotated classification of cells, we propose SHIDC-BC-Ki-67 as a dataset for the aforementioned purpose. We also introduce a novel pipeline and a backend, namely PathoNet for Ki-67 immunostained cell detection and classification and simultaneous determination of intratumoral TILs score. Further, we show that despite facing challenges, our proposed backend, PathoNet, outperforms the state of the art methods proposed to date in the harmonic mean measure.
翻訳日:2022-10-09 05:14:24 公開日:2021-04-28
# SmoothとStrongly Convex最適化の高次Oracle複雑度

High-Order Oracle Complexity of Smooth and Strongly Convex Optimization ( http://arxiv.org/abs/2010.06642v2 )

ライセンス: Link先を確認
In this note, we consider the complexity of optimizing a highly smooth (Lipschitz $k$-th order derivative) and strongly convex function, via calls to a $k$-th order oracle which returns the value and first $k$ derivatives of the function at a given point, and where the dimension is unrestricted. Extending the techniques introduced in Arjevani et al. [2019], we prove that the worst-case oracle complexity for any fixed $k$ to optimize the function up to accuracy $\epsilon$ is on the order of $\left(\frac{\mu_k D^{k-1}}{\lambda}\right)^{\frac{2}{3k+1}}+\log\log\left(\frac{1}{\epsilon}\right)$ (in sufficiently high dimension, and up to log factors independent of $\epsilon$), where $\mu_k$ is the Lipschitz constant of the $k$-th derivative, $D$ is the initial distance to the optimum, and $\lambda$ is the strong convexity parameter.
翻訳日:2022-10-08 00:49:22 公開日:2021-04-28
# MTAG:非整列型マルチモーダル言語系列のためのモーダル時間注意グラフ

MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences ( http://arxiv.org/abs/2010.11985v2 )

ライセンス: Link先を確認
Human communication is multimodal in nature; it is through multiple modalities such as language, voice, and facial expressions, that opinions and emotions are expressed. Data in this domain exhibits complex multi-relational and temporal interactions. Learning from this data is a fundamentally challenging research problem. In this paper, we propose Modal-Temporal Attention Graph (MTAG). MTAG is an interpretable graph-based neural model that provides a suitable framework for analyzing multimodal sequential data. We first introduce a procedure to convert unaligned multimodal sequence data into a graph with heterogeneous nodes and edges that captures the rich interactions across modalities and through time. Then, a novel graph fusion operation, called MTAG fusion, along with a dynamic pruning and read-out technique, is designed to efficiently process this modal-temporal graph and capture various interactions. By learning to focus only on the important interactions within the graph, MTAG achieves state-of-the-art performance on multimodal sentiment analysis and emotion recognition benchmarks, while utilizing significantly fewer model parameters.
翻訳日:2022-10-04 04:54:26 公開日:2021-04-28
# 視点の発見を支援する:共同トピックモデルによる意見マイニングの強化

Helping users discover perspectives: Enhancing opinion mining with joint topic models ( http://arxiv.org/abs/2010.12505v2 )

ライセンス: Link先を確認
Support or opposition concerning a debated claim such as abortion should be legal can have different underlying reasons, which we call perspectives. This paper explores how opinion mining can be enhanced with joint topic modeling, to identify distinct perspectives within the topic, providing an informative overview from unstructured text. We evaluate four joint topic models (TAM, JST, VODUM, and LAM) in a user study assessing human understandability of the extracted perspectives. Based on the results, we conclude that joint topic models such as TAM can discover perspectives that align with human judgments. Moreover, our results suggest that users are not influenced by their pre-existing stance on the topic of abortion when interpreting the output of topic models.
翻訳日:2022-10-03 22:43:48 公開日:2021-04-28
# 生成モデルの分散化帰属

Decentralized Attribution of Generative Models ( http://arxiv.org/abs/2010.13974v4 )

ライセンス: Link先を確認
Growing applications of generative models have led to new threats such as malicious personation and digital copyright infringement. One solution to these threats is model attribution, i.e., the identification of user-end models where the contents under question are generated from. Existing studies showed empirical feasibility of attribution through a centralized classifier trained on all user-end models. However, this approach is not scalable in reality as the number of models ever grows. Neither does it provide an attributability guarantee. To this end, this paper studies decentralized attribution, which relies on binary classifiers associated with each user-end model. Each binary classifier is parameterized by a user-specific key and distinguishes its associated model distribution from the authentic data distribution. We develop sufficient conditions of the keys that guarantee an attributability lower bound. Our method is validated on MNIST, CelebA, and FFHQ datasets. We also examine the trade-off between generation quality and robustness of attribution against adversarial post-processes.
翻訳日:2022-10-02 12:41:07 公開日:2021-04-28
# 新しい周期置換不変ニューラルネットワークによる周期可変星の分類

Classification of Periodic Variable Stars with Novel Cyclic-Permutation Invariant Neural Networks ( http://arxiv.org/abs/2011.01243v2 )

ライセンス: Link先を確認
Neural networks (NNs) have been shown to be competitive against state-of-the-art feature engineering and random forest (RF) classification of periodic variable stars. Although previous work utilising NNs has made use of periodicity by period folding multiple-cycle time-series into a single cycle -- from time-space to phase-space -- no approach to date has taken advantage of the fact that network predictions should be invariant to the initial phase of the period-folded sequence. Initial phase is exogenous to the physical origin of the variability and should thus be factored out. Here, we present cyclic-permutation invariant networks, a novel class of NNs for which invariance to phase shifts is guaranteed through polar coordinate convolutions, which we implement by means of "Symmetry Padding." Across three different datasets of variable star light curves, we show that two implementations of the cyclic-permutation invariant network: the iTCN and the iResNet, consistently outperform non-invariant baselines and reduce overall error rates by between 4% to 22%. Over a 10-class OGLE-III sample, the iTCN/iResNet achieves an average per-class accuracy of 93.4%/93.3%, compared to RNN/RF accuracies of 70.5%/89.5% in a recent study using the same data. Finding improvement on a non-astronomy benchmark, we suggest that the methodology introduced here should also be applicable to a wide range of science domains where periodic data abounds due to physical symmetries.
翻訳日:2022-09-30 13:16:22 公開日:2021-04-28
# 意味年齢操作のための事前学習型GANの潜時空間上での深層強化学習政策の学習

Learning a Deep Reinforcement Learning Policy Over the Latent Space of a Pre-trained GAN for Semantic Age Manipulation ( http://arxiv.org/abs/2011.00954v2 )

ライセンス: Link先を確認
Learning a disentangled representation of the latent space has become one of the most fundamental problems studied in computer vision. Recently, many Generative Adversarial Networks (GANs) have shown promising results in generating high fidelity images. However, studies to understand the semantic layout of the latent space of pre-trained models are still limited. Several works train conditional GANs to generate faces with required semantic attributes. Unfortunately, in these attempts, the generated output is often not as photo-realistic as the unconditional state-of-the-art models. Besides, they also require large computational resources and specific datasets to generate high fidelity images. In our work, we have formulated a Markov Decision Process (MDP) over the latent space of a pre-trained GAN model to learn a conditional policy for semantic manipulation along specific attributes under defined identity bounds. Further, we have defined a semantic age manipulation scheme using a locally linear approximation over the latent space. Results show that our learned policy samples high fidelity images with required age alterations, while preserving the identity of the person.
翻訳日:2022-09-30 12:24:35 公開日:2021-04-28
# グループ化畳み込みを用いた階層型深層ニューラルネットワークによる追随者戦略

A Follow-the-Leader Strategy using Hierarchical Deep Neural Networks with Grouped Convolutions ( http://arxiv.org/abs/2011.07948v4 )

ライセンス: Link先を確認
The task of following-the-leader is implemented using a hierarchical Deep Neural Network (DNN) end-to-end driving model to match the direction and speed of a target pedestrian. The model uses a classifier DNN to determine if the pedestrian is within the field of view of the camera sensor. If the pedestrian is present, the image stream from the camera is fed to a regression DNN which simultaneously adjusts the autonomous vehicle's steering and throttle to keep cadence with the pedestrian. If the pedestrian is not visible, the vehicle uses a straightforward exploratory search strategy to reacquire the tracking objective. The classifier and regression DNNs incorporate grouped convolutions to boost model performance as well as to significantly reduce parameter count and compute latency. The models are trained on the Intelligence Processing Unit (IPU) to leverage its fine-grain compute capabilities in order to minimize time-to-train. The results indicate very robust tracking behavior on the part of the autonomous vehicle in terms of its steering and throttle profiles, while requiring minimal data collection to produce. The throughput in terms of processing training samples has been boosted by the use of the IPU in conjunction with grouped convolutions by a factor ~3.5 for training of the classifier and a factor of ~7 for the regression network. A recording of the vehicle tracking a pedestrian has been produced and is available on the web. This is a preprint of an article published in SN Computer Science. The final authenticated version is available online at: https://doi.org/https://doi.org/10.1007/s42979-021-00572-1.
翻訳日:2022-09-29 21:55:45 公開日:2021-04-28
# EstBERT:エストニア語のための事前訓練された言語特有なBERT

EstBERT: A Pretrained Language-Specific BERT for Estonian ( http://arxiv.org/abs/2011.04784v3 )

ライセンス: Link先を確認
This paper presents EstBERT, a large pretrained transformer-based language-specific BERT model for Estonian. Recent work has evaluated multilingual BERT models on Estonian tasks and found them to outperform the baselines. Still, based on existing studies on other languages, a language-specific BERT model is expected to improve over the multilingual ones. We first describe the EstBERT pretraining process and then present the results of the models based on finetuned EstBERT for multiple NLP tasks, including POS and morphological tagging, named entity recognition and text classification. The evaluation results show that the models based on EstBERT outperform multilingual BERT models on five tasks out of six, providing further evidence towards a view that training language-specific BERT models are still useful, even when multilingual models are available.
翻訳日:2022-09-28 01:09:06 公開日:2021-04-28
# プライベートラーニングはインスタンスエンコーディングで可能か?

Is Private Learning Possible with Instance Encoding? ( http://arxiv.org/abs/2011.05315v2 )

ライセンス: Link先を確認
A private machine learning algorithm hides as much as possible about its training data while still preserving accuracy. In this work, we study whether a non-private learning algorithm can be made private by relying on an instance-encoding mechanism that modifies the training inputs before feeding them to a normal learner. We formalize both the notion of instance encoding and its privacy by providing two attack models. We first prove impossibility results for achieving a (stronger) model. Next, we demonstrate practical attacks in the second (weaker) attack model on InstaHide, a recent proposal by Huang, Song, Li and Arora [ICML'20] that aims to use instance encoding for privacy.
翻訳日:2022-09-27 07:14:02 公開日:2021-04-28
# 情報負荷を考慮したディープニューラルネットワークを用いた電力系統イベント同定

Power System Event Identification based on Deep Neural Network with Information Loading ( http://arxiv.org/abs/2011.06718v2 )

ライセンス: Link先を確認
Online power system event identification and classification is crucial to enhancing the reliability of transmission systems. In this paper, we develop a deep neural network (DNN) based approach to identify and classify power system events by leveraging real-world measurements from hundreds of phasor measurement units (PMUs) and labels from thousands of events. Two innovative designs are embedded into the baseline model built on convolutional neural networks (CNNs) to improve the event classification accuracy. First, we propose a graph signal processing based PMU sorting algorithm to improve the learning efficiency of CNNs. Second, we deploy information loading based regularization to strike the right balance between memorization and generalization for the DNN. Numerical studies results based on real-world dataset from the Eastern Interconnection of the U.S power transmission grid show that the combination of PMU based sorting and the information loading based regularization techniques help the proposed DNN approach achieve highly accurate event identification and classification results.
翻訳日:2022-09-26 00:45:26 公開日:2021-04-28
# コネクテッドヘルスの深層表現--認知症患者における尿路感染症のリスク分析のための半教師付き学習

Deep Representation for Connected Health: Semi-supervised Learning for Analysing the Risk of Urinary Tract Infections in People with Dementia ( http://arxiv.org/abs/2011.13916v4 )

ライセンス: Link先を確認
Machine learning techniques combined with in-home monitoring technologies provide a unique opportunity to automate diagnosis and early detection of adverse health conditions in long-term conditions such as dementia. However, accessing sufficient labelled training samples and integrating high-quality, routinely collected data from heterogeneous in-home monitoring technologies are main obstacles hindered utilising these technologies in real-world medicine. This work presents a semi-supervised model that can continuously learn from routinely collected in-home observation and measurement data. We show how our model can process highly imbalanced and dynamic data to make robust predictions in analysing the risk of Urinary Tract Infections (UTIs) in dementia. UTIs are common in older adults and constitute one of the main causes of avoidable hospital admissions in people with dementia (PwD). Health-related conditions, such as UTI, have a lower prevalence in individuals, which classifies them as sporadic cases (i.e. rare or scattered, yet important events). This limits the access to sufficient training data, without which the supervised learning models risk becoming overfitted or biased. We introduce a probabilistic semi-supervised learning framework to address these issues. The proposed method produces a risk analysis score for UTIs using routinely collected data by in-home sensing technologies.
翻訳日:2022-09-20 02:40:56 公開日:2021-04-28
# (参考訳) 強化学習エージェントのニューラルネットワークアーキテクチャの最適化

Optimizing the Neural Architecture of Reinforcement Learning Agents ( http://arxiv.org/abs/2011.14632v3 )

ライセンス: CC BY 4.0
Reinforcement learning (RL) enjoyed significant progress over the last years. One of the most important steps forward was the wide application of neural networks. However, architectures of these neural networks are typically constructed manually. In this work, we study recently proposed neural architecture search (NAS) methods for optimizing the architecture of RL agents. We carry out experiments on the Atari benchmark and conclude that modern NAS methods find architectures of RL agents outperforming a manually selected one.
翻訳日:2021-06-07 04:06:40 公開日:2021-04-28
# エッジでの深層学習によるロバストな超広帯域誤差低減

Robust Ultra-wideband Range Error Mitigation with Deep Learning at the Edge ( http://arxiv.org/abs/2011.14684v2 )

ライセンス: Link先を確認
Ultra-wideband (UWB) is the state-of-the-art and most popular technology for wireless localization. Nevertheless, precise ranging and localization in non-line-of-sight (NLoS) conditions is still an open research topic. Indeed, multipath effects, reflections, refractions, and complexity of the indoor radio environment can easily introduce a positive bias in the ranging measurement, resulting in highly inaccurate and unsatisfactory position estimation. This article proposes an efficient representation learning methodology that exploits the latest advancement in deep learning and graph optimization techniques to achieve effective ranging error mitigation at the edge. Channel Impulse Response (CIR) signals are directly exploited to extract high semantic features to estimate corrections in either NLoS or LoS conditions. Extensive experimentation with different settings and configurations has proved the effectiveness of our methodology and demonstrated the feasibility of a robust and low computational power UWB range error mitigation.
翻訳日:2021-06-06 14:56:42 公開日:2021-04-28
# (参考訳) 定量的構造-活性関係の回帰法としての光勾配昇降機

Light Gradient Boosting Machine as a Regression Method for Quantitative Structure-Activity Relationships ( http://arxiv.org/abs/2105.08626v1 )

ライセンス: CC BY 4.0
In the pharmaceutical industry, where it is common to generate many QSAR models with large numbers of molecules and descriptors, the best QSAR methods are those that can generate the most accurate predictions but that are also insensitive to hyperparameters and are computationally efficient. Here we compare Light Gradient Boosting Machine (LightGBM) to random forest, single-task deep neural nets, and Extreme Gradient Boosting (XGBoost) on 30 in-house data sets. While any boosting algorithm has many adjustable hyperparameters, we can define a set of standard hyperparameters at which LightGBM makes predictions about as accurate as single-task deep neural nets, but is a factor of 1000-fold faster than random forest and ~4-fold faster than XGBoost in terms of total computational time for the largest models. Another very useful feature of LightGBM is that it includes a native method for estimating prediction intervals.
翻訳日:2021-05-20 08:15:36 公開日:2021-04-28
# 機械学習分類器を用いた従業員のワークライフバランスの分析

An Experimental Analysis of Work-Life Balance Among The Employees using Machine Learning Classifiers ( http://arxiv.org/abs/2105.07837v1 )

ライセンス: Link先を確認
Researchers today have found out the importance of Artificial Intelligence, and Machine Learning in our daily lives, as well as they can be used to improve the quality of our lives as well as the cities and nations alike. An example of this is that it is currently speculated that ML can provide ways to relieve workers as it can predict effective working schedules and patterns which increase the efficiency of the workers. Ultimately this is leading to a Work-Life Balance for the workers. But how is this possible? It is practically possible with the Machine Learning algorithms to predict, calculate the factors affecting the feelings of the worker's work-life balance. In order to actually do this, a sizeable amount of 12,756 people's data has been taken under consideration. Upon analysing the data and calculating under various factors, we have found out the correlation of various factors and WLB(Work-Life Balance in short). There are some factors that have to be taken into serious consideration as they play a major role in WLB. We have trained 80% of our data with Random Forest Classifier, SVM and Naive Bayes algorithms. Upon testing, the algorithms predict the WLB with 71.5% as the best accuracy.
翻訳日:2021-05-18 17:21:11 公開日:2021-04-28
# (参考訳) 脳波に基づくAMCI診断システムのためのグループ特徴学習とドメイン反転ニューラルネットワーク

Group Feature Learning and Domain Adversarial Neural Network for aMCI Diagnosis System Based on EEG ( http://arxiv.org/abs/2105.06270v1 )

ライセンス: CC BY 4.0
Medical diagnostic robot systems have been paid more and more attention due to its objectivity and accuracy. The diagnosis of mild cognitive impairment (MCI) is considered an effective means to prevent Alzheimer's disease (AD). Doctors diagnose MCI based on various clinical examinations, which are expensive and the diagnosis results rely on the knowledge of doctors. Therefore, it is necessary to develop a robot diagnostic system to eliminate the influence of human factors and obtain a higher accuracy rate. In this paper, we propose a novel Group Feature Domain Adversarial Neural Network (GF-DANN) for amnestic MCI (aMCI) diagnosis, which involves two important modules. A Group Feature Extraction (GFE) module is proposed to reduce individual differences by learning group-level features through adversarial learning. A Dual Branch Domain Adaptation (DBDA) module is carefully designed to reduce the distribution difference between the source and target domain in a domain adaption way. On three types of data set, GF-DANN achieves the best accuracy compared with classic machine learning and deep learning methods. On the DMS data set, GF-DANN has obtained an accuracy rate of 89.47%, and the sensitivity and specificity are 90% and 89%. In addition, by comparing three EEG data collection paradigms, our results demonstrate that the DMS paradigm has the potential to build an aMCI diagnose robot system.
翻訳日:2021-05-15 12:06:20 公開日:2021-04-28
# (参考訳) UVStyle-Net:B-Repsのための3次元スタイル類似度測定の教師なしFew-shot学習

UVStyle-Net: Unsupervised Few-shot Learning of 3D Style Similarity Measure for B-Reps ( http://arxiv.org/abs/2105.02961v1 )

ライセンス: CC BY-SA 4.0
Boundary Representations (B-Reps) are the industry standard in 3D Computer Aided Design/Manufacturing (CAD/CAM) and industrial design due to their fidelity in representing stylistic details. However, they have been ignored in the 3D style research. Existing 3D style metrics typically operate on meshes or pointclouds, and fail to account for end-user subjectivity by adopting fixed definitions of style, either through crowd-sourcing for style labels or hand-crafted features. We propose UVStyle-Net, a style similarity measure for B-Reps that leverages the style signals in the second order statistics of the activations in a pre-trained (unsupervised) 3D encoder, and learns their relative importance to a subjective end-user through few-shot learning. Our approach differs from all existing data-driven 3D style methods since it may be used in completely unsupervised settings, which is desirable given the lack of publicly available labelled B-Rep datasets. More importantly, the few-shot learning accounts for the inherent subjectivity associated with style. We show quantitatively that our proposed method with B-Reps is able to capture stronger style signals than alternative methods on meshes and pointclouds despite its significantly greater computational efficiency. We also show it is able to generate meaningful style gradients with respect to the input shape, and that few-shot learning with as few as two positive examples selected by an end-user is sufficient to significantly improve the style measure. Finally, we demonstrate its efficacy on a large unlabeled public dataset of CAD models. Source code and data will be released in the future.
翻訳日:2021-05-11 09:44:58 公開日:2021-04-28
# (参考訳) 転送学習を用いたクエリインテントと名前付きエンティティのマルチタスク学習

Multi-Task Learning of Query Intent and Named Entities using Transfer Learning ( http://arxiv.org/abs/2105.03316v1 )

ライセンス: CC BY 4.0
Named entity recognition (NER) has been studied extensively and the earlier algorithms were based on sequence labeling like Hidden Markov Models (HMM) and conditional random fields (CRF). These were followed by neural network based deep learning models. Recently, BERT has shown new state of the art accuracy in sequence labeling tasks like NER. In this short article, we study various approaches to task specific NER. Task specific NER has two components - identifying the intent of a piece of text (like search queries), and then labeling the query with task specific named entities. For example, we consider the task of labeling Target store locations in a search query (which could be entered in a search box or spoken in a device like Alexa or Google Home). Store locations are highly ambiguous and sometimes it is difficult to differentiate between say a location and a non-location. For example, "pickup my order at orange store" has "orange" as the store location, while "buy orange at target" has "orange" as a fruit. We explore this difficulty by doing multi-task learning which we call global to local transfer of information. We jointly learn the query intent (i.e. store lookup) and the named entities by using multiple loss functions in our BERT based model and find interesting results.
翻訳日:2021-05-11 09:12:37 公開日:2021-04-28
# 2次元画像分類のための重み近似と計算再利用に基づく深層ニューラルネットワーク

Deep Neural Networks Based Weight Approximation and Computation Reuse for 2-D Image Classification ( http://arxiv.org/abs/2105.02954v1 )

ライセンス: Link先を確認
Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques to be used for image recognition applications. DNNs weights are approximated based on the linear and quadratic approximation methods during the training phase, then, all of the weights are replaced with the linear/quadratic coefficients to execute the inference in a way where different weights could be computed using the same coefficients. This leads to a repetition of the weights across the processing element (PE) array, which in turn enables the reuse of the DNN sub-computations (computational reuse) and leverage the same data (data reuse) to reduce DNNs computations, memory accesses, and improve energy efficiency albeit at the cost of increased training time. Complete analysis for both MNIST and CIFAR 10 datasets is presented for image recognition , where LeNet 5 revealed a reduction in the number of parameters by a factor of 1211.3x with a drop of less than 0.9% in accuracy. When compared to the state of the art Row Stationary (RS) method, the proposed architecture saved 54% of the total number of adders and multipliers needed. Overall, the proposed approach is suitable for IoT edge devices as it reduces the memory size requirement as well as the number of needed memory accesses.
翻訳日:2021-05-11 08:36:35 公開日:2021-04-28
# sky画像を用いた物理およびデータ駆動型nowcasting法の検討

A review on physical and data-driven based nowcasting methods using sky images ( http://arxiv.org/abs/2105.02959v1 )

ライセンス: Link先を確認
Amongst all the renewable energy resources (RES), solar is the most popular form of energy source and is of particular interest for its widely integration into the power grid. However, due to the intermittent nature of solar source, it is of the greatest significance to forecast solar irradiance to ensure uninterrupted and reliable power supply to serve the energy demand. There are several approaches to perform solar irradiance forecasting, for instance satellite-based methods, sky image-based methods, machine learning-based methods, and numerical weather prediction-based methods. In this paper, we present a review on short-term intra-hour solar prediction techniques known as nowcasting methods using sky images. Along with this, we also report and discuss which sky image features are significant for the nowcasting methods.
翻訳日:2021-05-11 08:36:07 公開日:2021-04-28
# 深層移動学習に基づく在宅健康モニタリングのためのエッジコンピューティング手法

A Deep Transfer Learning-based Edge Computing Method for Home Health Monitoring ( http://arxiv.org/abs/2105.02960v1 )

ライセンス: Link先を確認
The health-care gets huge stress in a pandemic or epidemic situation. Some diseases such as COVID-19 that causes a pandemic is highly spreadable from an infected person to others. Therefore, providing health services at home for non-critical infected patients with isolation shall assist to mitigate this kind of stress. In addition, this practice is also very useful for monitoring the health-related activities of elders who live at home. The home health monitoring, a continuous monitoring of a patient or elder at home using visual sensors is one such non-intrusive sub-area of health services at home. In this article, we propose a transfer learning-based edge computing method for home health monitoring. Specifically, a pre-trained convolutional neural network-based model can leverage edge devices with a small amount of ground-labeled data and fine-tuning method to train the model. Therefore, on-site computing of visual data captured by RGB, depth, or thermal sensor could be possible in an affordable way. As a result, raw data captured by these types of sensors is not required to be sent outside from home. Therefore, privacy, security, and bandwidth scarcity shall not be issues. Moreover, real-time computing for the above-mentioned purposes shall be possible in an economical way.
翻訳日:2021-05-11 08:35:23 公開日:2021-04-28
# (参考訳) ディープエンコーダネットワークを用いた非線形状態空間同定

Nonlinear state-space identification using deep encoder networks ( http://arxiv.org/abs/2012.07697v2 )

ライセンス: CC BY 4.0
Nonlinear state-space identification for dynamical systems is most often performed by minimizing the simulation error to reduce the effect of model errors. This optimization problem becomes computationally expensive for large datasets. Moreover, the problem is also strongly non-convex, often leading to sub-optimal parameter estimates. This paper introduces a method that approximates the simulation loss by splitting the data set into multiple independent sections similar to the multiple shooting method. This splitting operation allows for the use of stochastic gradient optimization methods which scale well with data set size and has a smoothing effect on the non-convex cost function. The main contribution of this paper is the introduction of an encoder function to estimate the initial state at the start of each section. The encoder function estimates the initial states using a feed-forward neural network starting from historical input and output samples. The efficiency and performance of the proposed state-space encoder method is illustrated on two well-known benchmarks where, for instance, the method achieves the lowest known simulation error on the Wiener--Hammerstein benchmark.
翻訳日:2021-05-08 21:48:21 公開日:2021-04-28
# (参考訳) ディープエンコーダを用いたビデオデータからの非線形状態空間モデル同定

Non-linear State-space Model Identification from Video Data using Deep Encoders ( http://arxiv.org/abs/2012.07721v2 )

ライセンス: CC BY 4.0
Identifying systems with high-dimensional inputs and outputs, such as systems measured by video streams, is a challenging problem with numerous applications in robotics, autonomous vehicles and medical imaging. In this paper, we propose a novel non-linear state-space identification method starting from high-dimensional input and output data. Multiple computational and conceptual advances are combined to handle the high-dimensional nature of the data. An encoder function, represented by a neural network, is introduced to learn a reconstructability map to estimate the model states from past inputs and outputs. This encoder function is jointly learned with the dynamics. Furthermore, multiple computational improvements, such as an improved reformulation of multiple shooting and batch optimization, are proposed to keep the computational time under control when dealing with high-dimensional and large datasets. We apply the proposed method to a video stream of a simulated environment of a controllable ball in a unit box. The simulation study shows low simulation error with excellent long term prediction for the obtained model using the proposed method.
翻訳日:2021-05-08 21:38:36 公開日:2021-04-28
# (参考訳) イベントシーケンスに基づくフライトサービスプロセスの因果発見

Causal Discovery of Flight Service Process Based on Event Sequence ( http://arxiv.org/abs/2105.00866v1 )

ライセンス: CC BY 4.0
The development of the civil aviation industry has continuously increased the requirements for the efficiency of airport ground support services. In the existing ground support research, there has not yet been a process model that directly obtains support from the ground support log to study the causal relationship between service nodes and flight delays. Most ground support studies mainly use machine learning methods to predict flight delays, and the flight support model they are based on is an ideal model. The study did not conduct an in-depth study of the causal mechanism behind the ground support link and did not reveal the true cause of flight delays. Therefore, there is a certain deviation in the prediction of flight delays by machine learning, and there is a certain deviation between the ideal model based on the research and the actual service process. Therefore, it is of practical significance to obtain the process model from the guarantee log and analyze its causality. However, the existing process causal factor discovery methods only do certain research when the assumption of causal sufficiency is established and does not consider the existence of latent variables. Therefore, this article proposes a framework to realize the discovery of process causal factors without assuming causal sufficiency. The optimized fuzzy mining process model is used as the service benchmark model, and the local causal discovery algorithm is used to discover the causal factors. Under this framework, this paper proposes a new Markov blanket discovery algorithm that does not assume causal sufficiency to discover causal factors and uses benchmark data sets for testing. Finally, the actual flight service data is used.
翻訳日:2021-05-06 06:44:43 公開日:2021-04-28
# (参考訳) 新型コロナウイルス病院跡の明示的重複隠れマルコフモデルに対する近似ベイズ計算

Approximate Bayesian Computation for an Explicit-Duration Hidden Markov Model of COVID-19 Hospital Trajectories ( http://arxiv.org/abs/2105.00773v1 )

ライセンス: CC BY 4.0
We address the problem of modeling constrained hospital resources in the midst of the COVID-19 pandemic in order to inform decision-makers of future demand and assess the societal value of possible interventions. For broad applicability, we focus on the common yet challenging scenario where patient-level data for a region of interest are not available. Instead, given daily admissions counts, we model aggregated counts of observed resource use, such as the number of patients in the general ward, in the intensive care unit, or on a ventilator. In order to explain how individual patient trajectories produce these counts, we propose an aggregate count explicit-duration hidden Markov model, nicknamed the ACED-HMM, with an interpretable, compact parameterization. We develop an Approximate Bayesian Computation approach that draws samples from the posterior distribution over the model's transition and duration parameters given aggregate counts from a specific location, thus adapting the model to a region or individual hospital site of interest. Samples from this posterior can then be used to produce future forecasts of any counts of interest. Using data from the United States and the United Kingdom, we show our mechanistic approach provides competitive probabilistic forecasts for the future even as the dynamics of the pandemic shift. Furthermore, we show how our model provides insight about recovery probabilities or length of stay distributions, and we suggest its potential to answer challenging what-if questions about the societal value of possible interventions.
翻訳日:2021-05-06 06:24:29 公開日:2021-04-28
# モデル駆動深層学習によるミリ波大規模MIMOシステムのチャネル推定とフィードバック

Model-Driven Deep Learning Based Channel Estimation and Feedback for Millimeter-Wave Massive Hybrid MIMO Systems ( http://arxiv.org/abs/2104.11052v2 )

ライセンス: Link先を確認
This paper proposes a model-driven deep learning (MDDL)-based channel estimation and feedback scheme for wideband millimeter-wave (mmWave) massive hybrid multiple-input multiple-output (MIMO) systems, where the angle-delay domain channels' sparsity is exploited for reducing the overhead. Firstly, we consider the uplink channel estimation for time-division duplexing systems. To reduce the uplink pilot overhead for estimating the high-dimensional channels from a limited number of radio frequency (RF) chains at the base station (BS), we propose to jointly train the phase shift network and the channel estimator as an auto-encoder. Particularly, by exploiting the channels' structured sparsity from an a priori model and learning the integrated trainable parameters from the data samples, the proposed multiple-measurement-vectors learned approximate message passing (MMV-LAMP) network with the devised redundant dictionary can jointly recover multiple subcarriers' channels with significantly enhanced performance. Moreover, we consider the downlink channel estimation and feedback for frequency-division duplexing systems. Similarly, the pilots at the BS and channel estimator at the users can be jointly trained as an encoder and a decoder, respectively. Besides, to further reduce the channel feedback overhead, only the received pilots on part of the subcarriers are fed back to the BS, which can exploit the MMV-LAMP network to reconstruct the spatial-frequency channel matrix. Numerical results show that the proposed MDDL-based channel estimation and feedback scheme outperforms the state-of-the-art approaches.
翻訳日:2021-05-03 19:49:48 公開日:2021-04-28
# 深層学習法を用いた新型コロナウイルスの新しい症例の時系列予測と新たな死亡率

Time Series Forecasting of New Cases and New Deaths Rate for COVID-19 using Deep Learning Methods ( http://arxiv.org/abs/2104.15007v1 )

ライセンス: Link先を確認
Covid-19 has been started in the year 2019 and imposed restrictions in many countries and costs organisations and governments. Predicting the number of new cases and deaths during this period can be a useful step in predicting the costs and facilities required in the future. The purpose of this study is to predict new cases and death rate for seven days ahead. Deep learning methods and statistical analysis model these predictions for 100 days. Six different deep learning methods are examined for the data adopted from the WHO website. Three methods are known as LSTM, Convolutional LSTM, and GRU. The bi-directional mode is then considered for each method to forecast the rate of new cases and new deaths for Australia and Iran countries. This study is novel as it attempts to implement the mentioned three deep learning methods, along with their Bi-directional models, to predict COVID-19 new cases and new death rate time series. All methods are compared, and results are presented. The results are examined in the form of graphs and statistical analyses. The results show that the Bi-directional models have lower error than other models. Several error evaluation metrics are presented to compare all models, and finally, the superiority of Bi-directional methods are determined. The experimental results and statistical test show on datasets to compare the proposed method with other baseline methods. This research could be useful for organisations working against COVID-19 and determining their long-term plans.
翻訳日:2021-05-03 13:48:40 公開日:2021-04-28
# 船舶ターンアラウンド時間予測のための機械学習システム

Machine Learning based System for Vessel Turnaround Time Prediction ( http://arxiv.org/abs/2104.14980v1 )

ライセンス: Link先を確認
In this paper, we present a novel system for predicting vessel turnaround time, based on machine learning and standardized port call data. We also investigate the use of specific external maritime big data, to enhance the accuracy of the available data and improve the performance of the developed system. An extensive evaluation is performed in Port of Bordeaux, where we report the results on 11 years of historical port call data and provide verification on live, operational data from the port. The proposed automated data-driven turnaround time prediction system is able to perform with increased accuracy, in comparison with the current manual expert-based system in Port of Bordeaux.
翻訳日:2021-05-03 13:34:29 公開日:2021-04-28
# 自己回帰型および言語横断型音声認識ネットワークを用いた教師なしサブワードモデルの有効性

The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks ( http://arxiv.org/abs/2012.09544v2 )

ライセンス: Link先を確認
This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations that can distinguish between subword units of a language. We propose a two-stage learning framework that combines self-supervised learning and cross-lingual knowledge transfer. The framework consists of autoregressive predictive coding (APC) as the front-end and a cross-lingual deep neural network (DNN) as the back-end. Experiments on the ABX subword discriminability task conducted with the Libri-light and ZeroSpeech 2017 databases showed that our approach is competitive or superior to state-of-the-art studies. Comprehensive and systematic analyses at the phoneme- and articulatory feature (AF)-level showed that our approach was better at capturing diphthong than monophthong vowel information, while also differences in the amount of information captured for different types of consonants were observed. Moreover, a positive correlation was found between the effectiveness of the back-end in capturing a phoneme's information and the quality of the cross-lingual phone labels assigned to the phoneme. The AF-level analysis together with t-SNE visualization results showed that the proposed approach is better than MFCC and APC features in capturing manner and place of articulation information, vowel height, and backness information. Taken together, the analyses showed that the two stages in our approach are both effective in capturing phoneme and AF information. Nevertheless, monophthong vowel information is less well captured than consonant information, which suggests that future research should focus on improving capturing monophthong vowel information.
翻訳日:2021-05-02 07:18:52 公開日:2021-04-28
# (参考訳) 臨床画像と病変情報を用いた深層学習を用いたスマートフォンによる皮膚癌の分類

A Smartphone based Application for Skin Cancer Classification Using Deep Learning with Clinical Images and Lesion Information ( http://arxiv.org/abs/2104.14353v1 )

ライセンス: CC BY 4.0
Over the last decades, the incidence of skin cancer, melanoma and non-melanoma, has increased at a continuous rate. In particular for melanoma, the deadliest type of skin cancer, early detection is important to increase patient prognosis. Recently, deep neural networks (DNNs) have become viable to deal with skin cancer detection. In this work, we present a smartphone-based application to assist on skin cancer detection. This application is based on a Convolutional Neural Network(CNN) trained on clinical images and patients demographics, both collected from smartphones. Also, as skin cancer datasets are imbalanced, we present an approach, based on the mutation operator of Differential Evolution (DE) algorithm, to balance data. In this sense, beyond provides a flexible tool to assist doctors on skin cancer screening phase, the method obtains promising results with a balanced accuracy of 85% and a recall of 96%.
翻訳日:2021-05-01 03:00:03 公開日:2021-04-28
# (参考訳) ニューラルネットワークを用いたインド上空の落雷予測器の定義

Defined the predictors of the lightning over India by using artificial neural network ( http://arxiv.org/abs/2104.13958v1 )

ライセンス: CC BY 4.0
Lightning casualties cause tremendous loss to life and property. However, very lately lightning has been considered as one of the major natural calamities which is now studied or monitored with proper instrumentation. The lightning characteristics over India have been studying by using daily data low resolution time series and monthly data high resolution monthly climatology. We have used ANN time series method (a neural network) to analyze the time series and defined which one will be the best predictor of lightning over India. The time series of lightning is output(dependent) and input (independent) are k-index, AOD, Cape etc. The Gaussian process regression, support vector machine, regression trees and linear regression defined the input variables. Which show approximately linear relation.
翻訳日:2021-05-01 02:45:49 公開日:2021-04-28
# (参考訳) Tail-Net: ビッグデータアプリケーションのための最も低い特異トリプレットを抽出する

Tail-Net: Extracting Lowest Singular Triplets for Big Data Applications ( http://arxiv.org/abs/2104.13968v1 )

ライセンス: CC BY 4.0
SVD serves as an exploratory tool in identifying the dominant features in the form of top rank-r singular factors corresponding to the largest singular values. For Big Data applications it is well known that Singular Value Decomposition (SVD) is restrictive due to main memory requirements. However, a number of applications such as community detection, clustering, or bottleneck identification in large scale graph data-sets rely upon identifying the lowest singular values and the singular corresponding vectors. For example, the lowest singular values of a graph Laplacian reveal the number of isolated clusters (zero singular values) or bottlenecks (lowest non-zero singular values) for undirected, acyclic graphs. A naive approach here would be to perform a full SVD however, this quickly becomes infeasible for practical big data applications due to the enormous memory requirements. Furthermore, for such applications only a few lowest singular factors are desired making a full decomposition computationally exorbitant. In this work, we trivially extend the previously proposed Range-Net to \textbf{Tail-Net} for a memory and compute efficient extraction of lowest singular factors of a given big dataset and a specified rank-r. We present a number of numerical experiments on both synthetic and practical data-sets for verification and bench-marking using conventional SVD as the baseline.
翻訳日:2021-05-01 02:38:30 公開日:2021-04-28
# (参考訳) マルチテナントDNNアクセラレータスケジューリングのためのドメイン固有遺伝的アルゴリズム

Domain-specific Genetic Algorithm for Multi-tenant DNNAccelerator Scheduling ( http://arxiv.org/abs/2104.13997v1 )

ライセンス: CC BY 4.0
As Deep Learning continues to drive a variety of applications in datacenters and HPC, there is a growing trend towards building large accelerators with several sub-accelerator cores/chiplets. This work looks at the problem of supporting multi-tenancy on such accelerators. In particular, we focus on the problem of mapping layers from several DNNs simultaneously on an accelerator. Given the extremely large search space, we formulate the search as an optimization problem and develop a specialized genetic algorithm called G# withcustom operators to enable structured sample-efficient exploration. We quantitatively compare G# with several common heuristics, state-of-the-art optimization methods, and reinforcement learning methods across different accelerator set-tings (large/small accelerators) and different sub-accelerator configurations (homogeneous/heterogeneous), and observeG# can consistently find better solutions. Further, to enable real-time scheduling, we also demonstrate a method to generalize the learnt schedules and transfer them to the next batch of jobs, reducing schedule compute time to near zero.
翻訳日:2021-05-01 02:28:12 公開日:2021-04-28
# (参考訳) kalman filter for online rating: one-fits-all approach

Simplified Kalman filter for online rating: one-fits-all approach ( http://arxiv.org/abs/2104.14012v1 )

ライセンス: CC BY 4.0
In this work, we deal with the problem of rating in sports, where the skills of the players/teams are inferred from the observed outcomes of the games. Our focus is on the online rating algorithms which estimate the skills after each new game by exploiting the probabilistic models of the relationship between the skills and the game outcome. We propose a Bayesian approach which may be seen as an approximate Kalman filter and which is generic in the sense that it can be used with any skills-outcome model and can be applied in the individual -- as well as in the group-sports. We show how the well-know algorithms (such as the Elo, the Glicko, and the TrueSkill algorithms) may be seen as instances of the one-fits-all approach we propose. In order to clarify the conditions under which the gains of the Bayesian approach over the simpler solutions can actually materialize, we critically compare the known and the new algorithms by means of numerical examples using the synthetic as well as the empirical data.
翻訳日:2021-05-01 01:14:08 公開日:2021-04-28
# (参考訳) 機械学習におけるバイアスのアルゴリズム的要因

Algorithmic Factors Influencing Bias in Machine Learning ( http://arxiv.org/abs/2104.14014v1 )

ライセンス: CC BY 4.0
It is fair to say that many of the prominent examples of bias in Machine Learning (ML) arise from bias that is there in the training data. In fact, some would argue that supervised ML algorithms cannot be biased, they reflect the data on which they are trained. In this paper we demonstrate how ML algorithms can misrepresent the training data through underestimation. We show how irreducible error, regularization and feature and class imbalance can contribute to this underestimation. The paper concludes with a demonstration of how the careful management of synthetic counterfactuals can ameliorate the impact of this underestimation bias.
# (参考訳) 感染診断におけるディープニューラルネットワークのリスクと不確実性

Reducing Risk and Uncertainty of Deep Neural Networks on Diagnosing COVID-19 Infection ( http://arxiv.org/abs/2104.14029v1 )

Krishanu Sarker, Sharbani Pandit, Anupam Sarker, Saeid Belkasim and Shihao Ji(参考訳) コンピューター診断による効果的で信頼性の高い患者のスクリーニングは、新型コロナウイルス(covid-19)との闘いにおいて重要な役割を果たす。 既存の研究のほとんどは、高い検出性能をもたらす洗練された手法の開発に重点を置いているが、予測の不確実性の問題には対処していない。 本研究は、新型コロナウイルス検出における最先端(SOTA)DNNの信頼性の欠如に対処するため、専門家紹介の紛らわしい事例を検出するための不確実性推定を導入する。 私たちの知る限りでは、COVID-19検出問題でこの問題に最初に取り組むのは私たちです。 本研究は, 市販のCOVIDデータセット上でのSOTA不確実性評価手法を多数検討し, 実験結果について報告する。 医療専門家との協働により, 臨床実践における最善の実施方法の実現可能性を確保するために, 結果をさらに検証する。

Effective and reliable screening of patients via Computer-Aided Diagnosis can play a crucial part in the battle against COVID-19. Most of the existing works focus on developing sophisticated methods yielding high detection performance, yet not addressing the issue of predictive uncertainty. In this work, we introduce uncertainty estimation to detect confusing cases for expert referral to address the unreliability of state-of-the-art (SOTA) DNNs on COVID-19 detection. To the best of our knowledge, we are the first to address this issue on the COVID-19 detection problem. In this work, we investigate a number of SOTA uncertainty estimation methods on publicly available COVID dataset and present our experimental findings. In collaboration with medical professionals, we further validate the results to ensure the viability of the best performing method in clinical practice.
# (参考訳) 深層学習の数学に関する研究

A Study of the Mathematics of Deep Learning ( http://arxiv.org/abs/2104.14033v1 )

Anirbit Mukherjee(参考訳) ディープ・ラーニング(deep learning)/ディープ・ニューラル・ネット(deep neural nets)は、人工知能タスクの最先端にますます展開されている技術革新だ。 ここ数年のディープラーニングの劇的な成功は、膨大な量のヒューリスティックな研究に支えられ、それらを厳格に説明できるという真剣な数学的挑戦であることが判明した。 この論文では、ジョンズ・ホプキンス大学応用数学・統計学科に提出され、これらの新しいディープラーニングのパラダイムの強力な理論的基盤を構築するためのいくつかのステップを踏む。 第2章では、深部神経関数の新しい回路複雑性定理を示し、これらの関数空間に関する分類定理を証明し、その結果、深さ2ReLUネットの実験的リスク最小化のための正確なアルゴリズムを導いた。 また、高複雑度神経機能の存在を構築的に確立するために、神経機能の複雑さの尺度をモチベーションとする。 第3章では、ほぼ分布のない設定で線形時間で実現可能な設定でReLUゲートを訓練できる最初のアルゴリズムを提供する。 第4章では、スパースコーディングが可能なオートエンコーダの現象を説明するための厳密な証明を与える。 第5章では、広く使われている適応的勾配深層学習アルゴリズム RMSProp と ADAM の確率的および決定論的バージョンに対する収束の最初の証明を行う。 この章には、現代のアルゴリズムが古典的加速度に基づく方法よりも大きな利点を持つハイパーパラメータ値のオートエンコーダに関する詳細な実証研究も含まれている。 第6章では,確率的ニューラルネットのリスクに対して,PAC-ベイジアン境界を新たに改良した。 この章はまた、トレーニング中にネットによって追跡される重み空間の経路の新たな幾何学的性質を明らかにする実験的調査を含んでいる。

"Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This dramatic success of deep learning in the last few years has been hinged on an enormous amount of heuristics and it has turned out to be a serious mathematical challenge to be able to rigorously explain them. In this thesis, submitted to the Department of Applied Mathematics and Statistics, Johns Hopkins University we take several steps towards building strong theoretical foundations for these new paradigms of deep-learning. In chapter 2 we show new circuit complexity theorems for deep neural functions and prove classification theorems about these function spaces which in turn lead to exact algorithms for empirical risk minimization for depth 2 ReLU nets. We also motivate a measure of complexity of neural functions to constructively establish the existence of high-complexity neural functions. In chapter 3 we give the first algorithm which can train a ReLU gate in the realizable setting in linear time in an almost distribution free set up. In chapter 4 we give rigorous proofs towards explaining the phenomenon of autoencoders being able to do sparse-coding. In chapter 5 we give the first-of-its-kind proofs of convergence for stochastic and deterministic versions of the widely used adaptive gradient deep-learning algorithms, RMSProp and ADAM. This chapter also includes a detailed empirical study on autoencoders of the hyper-parameter values at which modern algorithms have a significant advantage over classical acceleration based methods. In the last chapter 6 we give new and improved PAC-Bayesian bounds for the risk of stochastic neural nets. This chapter also includes an experimental investigation revealing new geometric properties of the paths in weight space that are traced out by the net during the training.
# (参考訳) PIDのためのブースト決定木に代わるディープニューラルネットワーク

Deep Neural Network as an alternative to Boosted Decision Trees for PID ( http://arxiv.org/abs/2104.14045v1 )

Denis Stanev, Riccardo Riva, Michele Umassi(参考訳) 本稿では,Roe et alで提案した粒子の2値分類法を再現し,改良する。 2005年)論文"boosted decision trees as an alternative of artificial neural networks for particle identification"。 このような粒子はタウニュートリノ(tau neutrinos)と呼ばれ、背景(background)、電子ニュートリノ(electronic neutrinos)と呼ばれる。 元の論文では、望ましいアルゴリズムはブースト決定木である。 これは、その労力の少ないチューニングと、その時の全体的なパフォーマンスが良いためである。 実装の選択はディープニューラルネットワークで、パフォーマンスがより速く、より有望です。 現代の技術を用いて、精度とトレーニング時間の両方において、元の結果をどのように改善できるかを示す。

In this paper we recreate, and improve, the binary classification method for particles proposed in Roe et al. (2005) paper "Boosted decision trees as an alternative to artificial neural networks for particle identification". Such particles are tau neutrinos, which we will refer to as background, and electronic neutrinos: the signal we are interested in. In the original paper the preferred algorithm is a Boosted decision tree. This is due to its low effort tuning and good overall performance at the time. Our choice for implementation is a deep neural network, faster and more promising in performance. We will show how, using modern techniques, we are able to improve on the original result, both in accuracy and in training time.
# 依存性解析のための多様性を考慮したバッチアクティブラーニング

Diversity-Aware Batch Active Learning for Dependency Parsing ( http://arxiv.org/abs/2104.13936v1 )

Tianze Shi, Adrian Benton, Igor Malioutov, Ozan \.Irsoy(参考訳) 現代の統計依存性パーサーの予測性能は、高価な専門家が注釈付きツリーバンクデータの可用性に大きく依存しているが、すべてのアノテーションがパーサーのトレーニングに等しく寄与するわけではない。 本稿では,バッチアクティブラーニング(al)を用いた強い依存関係パーサのトレーニングに必要なラベル付きサンプル数を削減することを試みる。 特に,DPP(Determinantal point process)を用いたサンプルバッチにおける多様性の強制が,多様性に依存しないプロセスよりも改善できるかどうかを検討する。 英ニューズワイヤコーパスにおけるシミュレーション実験により,dppを用いた多様なバッチの選択は,特に学習過程の初期段階においてバッチの多様性を強制しない強力な選択戦略よりも優れていることが示された。 さらに,ダイバーシティアウェア戦略はコーパス重複環境下で頑健であり,ダイバーシティ非依存なサンプリング戦略は著しい劣化を示す。

While the predictive performance of modern statistical dependency parsers relies heavily on the availability of expensive expert-annotated treebank data, not all annotations contribute equally to the training of the parsers. In this paper, we attempt to reduce the number of labeled examples needed to train a strong dependency parser using batch active learning (AL). In particular, we investigate whether enforcing diversity in the sampled batches, using determinantal point processes (DPPs), can improve over their diversity-agnostic counterparts. Simulation experiments on an English newswire corpus show that selecting diverse batches with DPPs is superior to strong selection strategies that do not enforce batch diversity, especially during the initial stages of the learning process. Additionally, our diversityaware strategy is robust under a corpus duplication setting, where diversity-agnostic sampling strategies exhibit significant degradation.
# 非パラメトリック予測型ビューアサインメントによる視覚特徴の半教師付き学習

Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples ( http://arxiv.org/abs/2104.13963v1 )

Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, Michael Rabbat(参考訳) 本稿では,支援サンプル(PAWS)を用いたビュー割り当て予測による学習手法を提案する。 このメソッドは、一貫性の損失を最小限に抑えるためにモデルを訓練し、同じラベルのないインスタンスの異なるビューに同様の擬似ラベルが割り当てられることを保証する。 擬似ラベルは、画像ビューの表現をランダムにサンプリングされた一連のラベル付き画像と比較することにより、非パラメトリックに生成される。 ビュー表現とラベル付き表現の間の距離は、クラスラベルの重み付けに使われ、ソフトな擬似ラベルと解釈する。 このようにラベル付きサンプルを非パラメトリックに組み込むことにより、PAWSはBYOLやSwaVといった自己監督手法で使用される距離測定損失を半教師付き設定に拡張する。 アプローチの単純さにもかかわらず、PAWSはアーキテクチャ全体で他の半教師付き手法よりも優れており、ラベルの10%または1%でトレーニングされたImageNet上でResNet-50の最先端を新たに設定し、それぞれ75.5%と66.5%に達した。 PAWSは以前のベストメソッドの4倍から12倍のトレーニングを必要とする。

This paper proposes a novel method of learning by predicting view assignments with support samples (PAWS). The method trains a model to minimize a consistency loss, which ensures that different views of the same unlabeled instance are assigned similar pseudo-labels. The pseudo-labels are generated non-parametrically, by comparing the representations of the image views to those of a set of randomly sampled labeled images. The distance between the view representations and labeled representations is used to provide a weighting over class labels, which we interpret as a soft pseudo-label. By non-parametrically incorporating labeled samples in this way, PAWS extends the distance-metric loss used in self-supervised methods such as BYOL and SwAV to the semi-supervised setting. Despite the simplicity of the approach, PAWS outperforms other semi-supervised methods across architectures, setting a new state-of-the-art for a ResNet-50 on ImageNet trained with either 10% or 1% of the labels, reaching 75.5% and 66.5% top-1 respectively. PAWS requires 4x to 12x less training than the previous best methods.
# MeerCRAB:ディープラーニングを用いた実・ボグス過渡現象のMeerLICHT分類

MeerCRAB: MeerLICHT Classification of Real and Bogus Transients using Deep Learning ( http://arxiv.org/abs/2104.13950v1 )

Zafiirah Hosenie, Steven Bloemen, Paul Groot, Robert Lyon, Bart Scheers, Benjamin Stappers, Fiorenzo Stoppa, Paul Vreeswijk, Simon De Wet, Marc Klein Wolt, Elmar K\"ording, Vanessa McBride, Rudolf Le Poole, Kerry Paterson, Dani\"elle L. A. Pieterse and Patrick Woudt(参考訳) 天文学者は、変数とトランジェント源のために(光学的)空を大規模に調査する際に、効率的な自動検出と分類パイプラインを必要とする。 このようなパイプラインは基本的に重要であり、これらの検出の迅速な追跡と分析が科学的な価値である可能性が高いためである。 そこで我々は,$\texttt{meercrab}$と呼ばれる畳み込みニューラルネットワークアーキテクチャに基づくディープラーニングパイプラインを提案する。 これは、meerlicht望遠鏡の過渡検出パイプラインにおいて、真の天体物理源からいわゆる「ボガス」検出をフィルターするように設計されている。 様々な2次元画像とそれらの画像から抽出した数値特徴を用いて光学的候補を記述する。 入力画像と対象クラスとの関係は不明確であり、基礎的真理は定義が不十分であり、しばしば議論の対象となっている。 これにより、分類アルゴリズムのトレーニングに使用する情報のソースを決定するのが難しくなる。 そのため、データ(i)しきい値付けと(ii)潜在クラスモデルアプローチのラベル付けに2つの手法を用いた。 入力画像の異なる組み合わせでトレーニングされた異なるネットワークアーキテクチャと、ボランティアが提供する分類ラベルに基づいたトレーニングセットの選択を駆使した、$\texttt{meercrab}$の変種をデプロイしました。 最も深いネットワークは99.5$\%$の精度で動作し、マシューズ相関係数 (mcc) は0.989であった。 最良のモデルは meerlicht transient vetting pipeline に統合され、検出されたトランジットの正確かつ効率的な分類が可能となり、研究者は研究目標に最も有望な候補を選ぶことができる。

Astronomers require efficient automated detection and classification pipelines when conducting large-scale surveys of the (optical) sky for variable and transient sources. Such pipelines are fundamentally important, as they permit rapid follow-up and analysis of those detections most likely to be of scientific value. We therefore present a deep learning pipeline based on the convolutional neural network architecture called $\texttt{MeerCRAB}$. It is designed to filter out the so called 'bogus' detections from true astrophysical sources in the transient detection pipeline of the MeerLICHT telescope. Optical candidates are described using a variety of 2D images and numerical features extracted from those images. The relationship between the input images and the target classes is unclear, since the ground truth is poorly defined and often the subject of debate. This makes it difficult to determine which source of information should be used to train a classification algorithm. We therefore used two methods for labelling our data (i) thresholding and (ii) latent class model approaches. We deployed variants of $\texttt{MeerCRAB}$ that employed different network architectures trained using different combinations of input images and training set choices, based on classification labels provided by volunteers. The deepest network worked best with an accuracy of 99.5$\%$ and Matthews correlation coefficient (MCC) value of 0.989. The best model was integrated to the MeerLICHT transient vetting pipeline, enabling the accurate and efficient classification of detected transients that allows researchers to select the most promising candidates for their research goals.
# 自然発生ブラケットからの構文学習

Learning Syntax from Naturally-Occurring Bracketings ( http://arxiv.org/abs/2104.13933v1 )

Tianze Shi, Ozan \.Irsoy, Igor Malioutov, Lillian Lee(参考訳) 自然言語の質問に対する回答フラグメントやWebページのハイパーリンクなど、自然に発生するブラケットは、フレーズ境界に関する人間の構文的直感を反映することができる。 それらの構文の可用性と近似対応は、教師なし選挙区解析に組み込むための遠方の情報ソースとしてアピールする。 しかし、これらは騒々しく不完全であり、この課題に対処するために、学習における部分ブラケットを意識した構造化されたランプ損失を開発する。 実験により,自然に発生するブラケットデータに基づいて学習した遠隔教師付きモデルが,非教師付きシステムよりも構文構造を誘導する方が正確であることを実証した。 英語のWSJコーパスでは、登録されていないF1スコアが68.9である。

Naturally-occurring bracketings, such as answer fragments to natural language questions and hyperlinks on webpages, can reflect human syntactic intuition regarding phrasal boundaries. Their availability and approximate correspondence to syntax make them appealing as distant information sources to incorporate into unsupervised constituency parsing. But they are noisy and incomplete; to address this challenge, we develop a partial-brackets-aware structured ramp loss in learning. Experiments demonstrate that our distantly-supervised models trained on naturally-occurring bracketing data are more accurate in inducing syntactic structures than competing unsupervised systems. On the English WSJ corpus, our models achieve an unlabeled F1 score of 68.9 for constituency parsing.
# 邪魔にならないようにする:インタラクティブなビジュアルナビゲーション

Pushing it out of the Way: Interactive Visual Navigation ( http://arxiv.org/abs/2104.14040v1 )

Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi(参考訳) 我々は,具体化エージェントの視覚的ナビゲーションの著しい進歩を観察した。 視覚ナビゲーションの研究における一般的な仮定は、環境が静的であることである。 インテリジェントなナビゲーションは、前進/後退と左右旋回以外の環境との相互作用を伴う可能性がある。 時には、ナビゲートする最良の方法は、何かを道から押し出すことです。 本稿では,エージェントがより効率的に移動するための環境変更を学習するインタラクティブナビゲーションの課題について検討する。 この目的のために,ニューラル・インタラクション・エンジン(nie)を導入し,エージェントの行動による環境の変化を明示的に予測する。 計画中の変更をモデル化することにより,エージェントのナビゲーション能力が大幅に向上することがわかった。 具体的には,(1)目標への経路が塞がれながら目標に到達し,(2)目標地点に物体を移動させる,という2つの課題を物理対応型で視覚的にリッチなAI2-THOR環境において検討する。 いずれのタスクにおいても,NIEを装着したエージェントは,アプローチのメリットを示すアクションの効果を理解せずに,エージェントよりも優れていた。

We have observed significant progress in visual navigation for embodied agents. A common assumption in studying visual navigation is that the environments are static; this is a limiting assumption. Intelligent navigation may involve interacting with the environment beyond just moving forward/backward and turning left/right. Sometimes, the best way to navigate is to push something out of the way. In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals. To this end, we introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions. By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities. More specifically, we consider two downstream tasks in the physics-enabled, visually rich, AI2-THOR environment: (1) reaching a target while the path to the target is blocked (2) moving an object to a target location by pushing it. For both tasks, agents equipped with an NIE significantly outperform agents without the understanding of the effect of the actions indicating the benefits of our approach.
# emergencynet:arous convolutional feature fusionを用いたドローン型緊急監視のための高効率空中画像分類法

EmergencyNet: Efficient Aerial Image Classification for Drone-Based Emergency Monitoring Using Atrous Convolutional Feature Fusion ( http://arxiv.org/abs/2104.14006v1 )

Christos Kyrkou and Theocharis Theocharides(参考訳) ディープラーニングベースのアルゴリズムは、無人航空機(UAV)やドローンのようなリモートセンシング技術に対して最先端の精度を提供し、多くの緊急対応および災害管理アプリケーションに対するリモートセンシング能力を向上する可能性がある。 特に、カメラセンサーを装備したuavは遠隔操作が可能で、災害地域へのアクセスが困難であり、崩壊した建物、洪水、火災などの様々な災害発生時に画像を分析し、警告することで、環境や人口への影響を迅速に緩和することができる。 しかし、ディープラーニングの統合は、大きな計算要件を導入し、ミッションクリティカルな決定をリアルタイムで行うために、推論に低レイテンシの制約を課す多くのシナリオにおいて、そのようなディープニューラルネットワークの展開を妨げる。 この目的のために本稿では,緊急対応・監視用uavの航空機画像の効率的な分類に焦点をあてる。 具体的には、緊急対応アプリケーションのための専用空中画像データベースを導入し、既存のアプローチの比較分析を行う。 この分析を通じて、マルチレゾリューション機能を処理するためのアトラスな畳み込みに基づく、軽量な畳み込みニューラルネットワークアーキテクチャが提案され、最先端モデルと比較して1%未満の精度で最小限のメモリ要件を持つ既存モデルと比較して、最大20倍のパフォーマンスを達成することができる。

Deep learning-based algorithms can provide state-of-the-art accuracy for remote sensing technologies such as unmanned aerial vehicles (UAVs)/drones, potentially enhancing their remote sensing capabilities for many emergency response and disaster management applications. In particular, UAVs equipped with camera sensors can operating in remote and difficult to access disaster-stricken areas, analyze the image and alert in the presence of various calamities such as collapsed buildings, flood, or fire in order to faster mitigate their effects on the environment and on human population. However, the integration of deep learning introduces heavy computational requirements, preventing the deployment of such deep neural networks in many scenarios that impose low-latency constraints on inference, in order to make mission-critical decisions in real time. To this end, this article focuses on the efficient aerial image classification from on-board a UAV for emergency response/monitoring applications. Specifically, a dedicated Aerial Image Database for Emergency Response applications is introduced and a comparative analysis of existing approaches is performed. Through this analysis a lightweight convolutional neural network architecture is proposed, referred to as EmergencyNet, based on atrous convolutions to process multiresolution features and capable of running efficiently on low-power embedded platforms achieving upto 20x higher performance compared to existing models with minimal memory requirements with less than 1% accuracy drop compared to state-of-the-art models.
# 中分解能衛星画像からの船舶自動検出システム

Automated System for Ship Detection from Medium Resolution Satellite Optical Imagery ( http://arxiv.org/abs/2104.13923v1 )

Dejan Stepec and Tomaz Martincic and Danijel Skocaj(参考訳) 本稿では,ESA Sentinel-2とPlanet Labs Doveの星座から得られた低解像度衛星画像に対する船舶検出パイプラインを提案する。 この光学衛星画像は、合成開口レーダー(SAR)画像に基づく既存のソリューションと比較して、地球上の任意の場所で容易に利用でき、海洋領域では利用できない。 本研究では,ais(automatic identification system)データの助けを借りて自動注釈付けされた大規模データセットを用いて,最先端のディープラーニングに基づく物体検出法に基づいて船舶検出法を開発した。

In this paper, we present a ship detection pipeline for low-cost medium resolution satellite optical imagery obtained from ESA Sentinel-2 and Planet Labs Dove constellations. This optical satellite imagery is readily available for any place on Earth and underutilized in the maritime domain, compared to existing solutions based on synthetic-aperture radar (SAR) imagery. We developed a ship detection method based on a state-of-the-art deep-learning-based object detection method which was developed and evaluated on a large-scale dataset that was collected and automatically annotated with the help of Automatic Identification System (AIS) data.
# 映像群カウントのための移動誘導非局所空間時間ネットワーク

Motion-guided Non-local Spatial-Temporal Network for Video Crowd Counting ( http://arxiv.org/abs/2104.13946v1 )

Haoyue Bai, S.-H. Gary Chan(参考訳) 本研究では,映像列の全てのフレームにおける物体数(本論文の人物数)を推定するビデオ群数について検討する。 群衆の数え方に関するこれまでの研究は、主に静止画に基づいている。 ビデオシーケンスの高精度な推定を実現するために,近距離フレームと短距離フレームの空間的時間的相関を適切に抽出し,どのように活用するかは,ほとんど研究されていない。 本研究では,映像群カウントのための新しい,高精度な動き誘導型非局所時空間ネットワークMonetを提案する。 monetはまず、人フロー(動き情報)をガイダンスとして、人がいるかもしれないピクセルの領域を粗く分割する。 これらの領域から、モネは非局所的な時空間ネットワークを使用して、短距離と長距離の両方の空間的時間的情報を抽出する。 ネットワーク全体が最終的に融合損失でエンドツーエンドにトレーニングされ、高品質な密度マップを生成する。 公開ビデオ群集データセットの不足と(解像度とシーンの多様性の観点から)低品質に注目して、コミュニティに貢献するために、大規模なビデオ群集計数データセットであるviscrowdを収集し、構築しました。 VidCrowdには9000フレームの高解像度(2560 x 1440)があり、2つの都市で1,150,239のヘッドアノテーションが撮影されている。 我々は、挑戦的なVideoCrowdと、UCSDとMallの2つの公開ビデオクラウドカウントデータセットに関する広範な実験を行った。 このアプローチは他の最先端のアプローチと比べて、maeとmseの点で大幅に優れたパフォーマンスを実現しています。

We study video crowd counting, which is to estimate the number of objects (people in this paper) in all the frames of a video sequence. Previous work on crowd counting is mostly on still images. There has been little work on how to properly extract and take advantage of the spatial-temporal correlation between neighboring frames in both short and long ranges to achieve high estimation accuracy for a video sequence. In this work, we propose Monet, a novel and highly accurate motion-guided non-local spatial-temporal network for video crowd counting. Monet first takes people flow (motion information) as guidance to coarsely segment the regions of pixels where a person may be. Given these regions, Monet then uses a non-local spatial-temporal network to extract spatial-temporally both short and long-range contextual information. The whole network is finally trained end-to-end with a fused loss to generate a high-quality density map. Noting the scarcity and low quality (in terms of resolution and scene diversity) of the publicly available video crowd datasets, we have collected and built a large-scale video crowd counting datasets, VidCrowd, to contribute to the community. VidCrowd contains 9,000 frames of high resolution (2560 x 1440), with 1,150,239 head annotations captured in different scenes, crowd density and lighting in two cities. We have conducted extensive experiments on the challenging VideoCrowd and two public video crowd counting datasets: UCSD and Mall. Our approach achieves substantially better performance in terms of MAE and MSE as compared with other state-of-the-art approaches.
# 画像分類のための畳み込みネットワークにおけるフィルタ分布テンプレート

Filter Distribution Templates in Convolutional Networks for Image Classification Tasks ( http://arxiv.org/abs/2104.13993v1 )

Ramon Izquierdo-Cordova and Walterio Mayol-Cuevas(参考訳) ニューラルネットワークデザイナは、モデルの深度を高め、新しいレイヤタイプを導入し、新しいレイヤの組み合わせを発見することで、進歩的な精度に達した。 多くのアーキテクチャにおいて共通する要素は、各層におけるフィルタ数の分布である。 ニューラルネットワークモデルは、LeNet、VGG、ResNet、MobileNet、NASNetのような自動検出アーキテクチャでさえも、より深いレイヤでフィルタを増やすパターン設計を維持している。 このフィルタのピラミッド分布が、異なるタスクや制約に対して最適かどうかは不明だ。 本稿では,4つの一般的なニューラルネットワークモデルにおけるフィルタ分布の変化と,その精度と資源消費への影響について述べる。 その結果、このアプローチを適用することで、パラメータの減少を示す精度が最大8.9%向上したモデルもある。

Neural network designers have reached progressive accuracy by increasing models depth, introducing new layer types and discovering new combinations of layers. A common element in many architectures is the distribution of the number of filters in each layer. Neural network models keep a pattern design of increasing filters in deeper layers such as those in LeNet, VGG, ResNet, MobileNet and even in automatic discovered architectures such as NASNet. It remains unknown if this pyramidal distribution of filters is the best for different tasks and constrains. In this work we present a series of modifications in the distribution of filters in four popular neural network models and their effects in accuracy and resource consumption. Results show that by applying this approach, some models improve up to 8.9% in accuracy showing reductions in parameters up to 54%.
# 自己中心型ビデオにおけるソーシャルインタラクション認識のためのグラフ畳み込みネットワークベースのフレームワークInteraction-GCN

Interaction-GCN: a Graph Convolutional Network based framework for social interaction recognition in egocentric videos ( http://arxiv.org/abs/2104.14007v1 )

Simone Felicioni, Mariella Dimiccoli(参考訳) 本稿では,エゴセントリックビデオにおけるソーシャルインタラクションを分類する新たなフレームワークであるInteractionGCNを提案する。 本手法はフレームレベルで関係と非関係の手がかりのパターンを抽出し、グラフ畳み込みネットワークに基づくアプローチによりフレームレベルの相互作用コンテキストを推定する関係グラフを構築する。 そして、Gated Recurrent Unitアーキテクチャを通じて、一人称モーション情報とともに、時間とともにこのコンテキストを伝播する。 2つの公開データセットにおけるアブレーション研究と実験評価により,提案手法が検証され,結果が確立された。

In this paper we propose a new framework to categorize social interactions in egocentric videos, we named InteractionGCN. Our method extracts patterns of relational and non-relational cues at the frame level and uses them to build a relational graph from which the interactional context at the frame level is estimated via a Graph Convolutional Network based approach. Then it propagates this context over time, together with first-person motion information, through a Gated Recurrent Unit architecture. Ablation studies and experimental evaluation on two publicly available datasets validate the proposed approach and establish state of the art results.
# ランダム化ヒストグラムマッチング:上向き画像における教師なし領域適応のための簡易拡張

Randomized Histogram Matching: A Simple Augmentation for Unsupervised Domain Adaptation in Overhead Imagery ( http://arxiv.org/abs/2104.14032v1 )

Can Yaris and Bohao Huang and Kyle Bradbury and Jordan M. Malof(参考訳) 現代のディープニューラルネットワーク(DNN)は、オーバーヘッド(例えば衛星)画像上の多くの認識タスクに対して非常に正確な結果を達成する。 しかし1つの課題は、視覚的領域シフト(すなわち統計的変化)であり、新しい画像セットでテストすると、DNNの精度が大幅に低下する可能性がある。 本研究では,画像ハードウェア,照明,その他の条件の変化による領域シフトを非線形画素変換としてモデル化する。 しかし、一般に2つの画像間の変換は分かっていない。 この問題を解決するために,ランダム化ヒストグラムマッチング (RHM) と呼ばれる,簡易なリアルタイム教師なしトレーニング強化手法を提案する。 セグメンテーションを構築するための2つの大規模公開ベンチマークデータセットを用いて実験を行い、rrmがよりシンプルで高速なにもかかわらず、最近の最先端の非教師なしドメイン適応アプローチに一貫して匹敵するパフォーマンスをもたらすことを発見した。 RHMはまた、オーバーヘッド画像で広く使われている他の可分に単純なアプローチよりも大幅に優れたパフォーマンスを提供する。

Modern deep neural networks (DNNs) achieve highly accurate results for many recognition tasks on overhead (e.g., satellite) imagery. One challenge however is visual domain shifts (i.e., statistical changes), which can cause the accuracy of DNNs to degrade substantially and unpredictably when tested on new sets of imagery. In this work we model domain shifts caused by variations in imaging hardware, lighting, and other conditions as non-linear pixel-wise transformations; and we show that modern DNNs can become largely invariant to these types of transformations, if provided with appropriate training data augmentation. In general, however, we do not know the transformation between two sets of imagery. To overcome this problem, we propose a simple real-time unsupervised training augmentation technique, termed randomized histogram matching (RHM). We conduct experiments with two large public benchmark datasets for building segmentation and find that RHM consistently yields comparable performance to recent state-of-the-art unsupervised domain adaptation approaches despite being simpler and faster. RHM also offers substantially better performance than other comparably simple approaches that are widely-used in overhead imagery.
# 適応メッシュ微細化と粗化シミュレーションにおける動的モード分解

Dynamic Mode Decomposition in Adaptive Mesh Refinement and Coarsening Simulations ( http://arxiv.org/abs/2104.14034v1 )

Gabriel F. Barros, Mal\'u Grave, Alex Viguerie, Alessandro Reali, Alvaro L. G. A. Coutinho(参考訳) 動的モード分解(Dynamic Mode Decomposition, DMD)は、与えられた力学系を決定する時空間コヒーレント構造を抽出する強力なデータ駆動手法である。 この方法は、収集された時間スナップショットをマトリックスに積み重ね、線形演算子を用いて非線形ダイナミクスをマッピングする。 標準手順では、スナップショットは観測可能なすべてのデータに対して同じ次元を持つ。 しかし、アダプティブメッシュ精錬/粗大化スキーム(AMR/C)による数値シミュレーションではこのような現象は起こらないことが多い。 本稿では,dmdがamr/cシミュレーションに見られるような,異なるメッシュトポロジーと次元の観測から特徴を抽出する方法を提案する。 この目的のために、アダプティブスナップショットは同じ参照関数空間に投影され、MDDのようなスナップショットベースのメソッドが使用できる。 本研究は,AMR/Cシミュレーションの課題である,新型コロナウイルスの持続拡散反応疫学モデル,密度駆動重力電流シミュレーション,気泡上昇問題に適用する。 また,dmdの効率を評価し,ダイナミックスと関連する利害関係を再構築する。 特に,SEIRDモデルと気泡上昇問題に対して,DMDの時間外挿能力(短期将来予測)を評価する。

Dynamic Mode Decomposition (DMD) is a powerful data-driven method used to extract spatio-temporal coherent structures that dictate a given dynamical system. The method consists of stacking collected temporal snapshots into a matrix and mapping the nonlinear dynamics using a linear operator. The standard procedure considers that snapshots possess the same dimensionality for all the observable data. However, this often does not occur in numerical simulations with adaptive mesh refinement/coarsening schemes (AMR/C). This paper proposes a strategy to enable DMD to extract features from observations with different mesh topologies and dimensions, such as those found in AMR/C simulations. For this purpose, the adaptive snapshots are projected onto the same reference function space, enabling the use of snapshot-based methods such as DMD. The present strategy is applied to challenging AMR/C simulations: a continuous diffusion-reaction epidemiological model for COVID-19, a density-driven gravity current simulation, and a bubble rising problem. We also evaluate the DMD efficiency to reconstruct the dynamics and some relevant quantities of interest. In particular, for the SEIRD model and the bubble rising problem, we evaluate DMD's ability to extrapolate in time (short-time future estimates).
# ニューロモルフィックコンピューティングはチューリング完全

Neuromorphic Computing is Turing-Complete ( http://arxiv.org/abs/2104.13983v1 )

Prasanna Date, Catherine Schuman, Bill Kay, Thomas Potok(参考訳) ニューロモルフィックコンピューティング(Neuromorphic computing)は、人間の脳をエミュレートして計算を行うニューマン計算パラダイムである。 ニューロモルフィックシステムはエネルギー効率が非常に高く、cpuやgpuの数千倍の消費電力で知られている。 彼らは将来、自動運転車、エッジコンピューティング、物のインターネットといった重要なユースケースを駆動する可能性がある。 このため、これらは将来のコンピューティングの展望に欠かせない部分となることが求められている。 ニューロモルフィックシステムは、主にスパイクベースの機械学習アプリケーションに使用されるが、グラフ理論、微分方程式、スパイクベースのシミュレーションには非機械的な応用がある。 これらの応用は、ニューロモルフィックコンピューティングが汎用コンピューティングを実現できる可能性を示唆している。 しかし、ニューロモルフィックコンピューティングの汎用計算性はまだ確立されていない。 本研究では,ニューロモルフィックコンピューティングがチューリング完全であり,汎用コンピューティングが可能であることを証明する。 具体的には,2つのニューロンパラメータ(閾値とリーク)と2つのシナプスパラメータ(重みと遅延)からなるニューロモルフィックコンピューティングのモデルを提案する。 我々は、すべての {\mu}-再帰関数(定数、後続関数および射影関数)とすべての {\mu}-再帰作用素(合成、原始再帰および最小化演算子)を計算するためのニューロモルフィック回路を考案する。 mu {\displaystyle {\mu}-再帰関数と演算子がチューリングマシンを使って正確に計算できる関数であることを考えると、この研究はニューロモルフィックコンピューティングのチューリング完全性を確立する。

Neuromorphic computing is a non-von Neumann computing paradigm that performs computation by emulating the human brain. Neuromorphic systems are extremely energy-efficient and known to consume thousands of times less power than CPUs and GPUs. They have the potential to drive critical use cases such as autonomous vehicles, edge computing and internet of things in the future. For this reason, they are sought to be an indispensable part of the future computing landscape. Neuromorphic systems are mainly used for spike-based machine learning applications, although there are some non-machine learning applications in graph theory, differential equations, and spike-based simulations. These applications suggest that neuromorphic computing might be capable of general-purpose computing. However, general-purpose computability of neuromorphic computing has not been established yet. In this work, we prove that neuromorphic computing is Turing-complete and therefore capable of general-purpose computing. Specifically, we present a model of neuromorphic computing, with just two neuron parameters (threshold and leak), and two synaptic parameters (weight and delay). We devise neuromorphic circuits for computing all the {\mu}-recursive functions (i.e., constant, successor and projection functions) and all the {\mu}-recursive operators (i.e., composition, primitive recursion and minimization operators). Given that the {\mu}-recursive functions and operators are precisely the ones that can be computed using a Turing machine, this work establishes the Turing-completeness of neuromorphic computing.
# 自律運転のための天気と光レベルの分類:データセット、ベースライン、アクティブラーニング

Weather and Light Level Classification for Autonomous Driving: Dataset, Baseline and Active Learning ( http://arxiv.org/abs/2104.14042v1 )

Mahesh M Dhananjaya, Varun Ravi Kumar and Senthil Yogamani(参考訳) 自動運転は急速に進歩しており、レベル2機能は標準機能になりつつある。 最も顕著なハードルの1つは、精度の劣化が深刻である厳しい天候や低照度環境で、堅牢な視覚的知覚を得ることである。 これらのシナリオにおいて、視覚的認識の信頼性を低下させるための気象分類モデルを持つことが重要である。 そこで我々は,気象(fog,雨,雪)分類と光度(bright, medium, and low)分類のための新しいデータセットを構築した。 さらに, 道路タイプ (アスファルト, 草, 石石) の分類を行い, 9つのラベルを得た。 それぞれの画像には、天気、光度、街路に対応した3つのラベルがある。 RCCC(red/clear)フォーマットの工業用フロントカメラを用いて1024\times1084$の解像度でデータを記録した。 我々は15kの映像を収集し、60kの画像をサンプリングした。 本研究では,データセットの冗長性を低減するためのアクティブラーニングフレームワークを実装し,モデルのトレーニングに最適なフレーム群を求める。 60k画像をさらに1.1k画像に蒸留し、プライバシーの匿名化後に公開します。 当社の知識を最大限活用するための自動運転に焦点を当てた、気象と光レベルの分類に関する公開データセットはありません。 気象分類に使用されるベースラインのResNet18ネットワークは、2つの非音響気象分類公開データセットにおいて最先端の結果を得るが、提案データセットの精度は著しく低く、飽和せず、さらなる研究が必要であることを示す。

Autonomous driving is rapidly advancing, and Level 2 functions are becoming a standard feature. One of the foremost outstanding hurdles is to obtain robust visual perception in harsh weather and low light conditions where accuracy degradation is severe. It is critical to have a weather classification model to decrease visual perception confidence during these scenarios. Thus, we have built a new dataset for weather (fog, rain, and snow) classification and light level (bright, moderate, and low) classification. Furthermore, we provide street type (asphalt, grass, and cobblestone) classification, leading to 9 labels. Each image has three labels corresponding to weather, light level, and street type. We recorded the data utilizing an industrial front camera of RCCC (red/clear) format with a resolution of $1024\times1084$. We collected 15k video sequences and sampled 60k images. We implement an active learning framework to reduce the dataset's redundancy and find the optimal set of frames for training a model. We distilled the 60k images further to 1.1k images, which will be shared publicly after privacy anonymization. There is no public dataset for weather and light level classification focused on autonomous driving to the best of our knowledge. The baseline ResNet18 network used for weather classification achieves state-of-the-art results in two non-automotive weather classification public datasets but significantly lower accuracy on our proposed dataset, demonstrating it is not saturated and needs further research.
# smlsom: 最大可能性の自己組織化マップ

SMLSOM: The shrinking maximum likelihood self-organizing map ( http://arxiv.org/abs/2104.13971v1 )

Ryosuke Motegi and Yoichi Seki(参考訳) データセット内のクラスタ数を決定することは、データクラスタリングにおける根本的な問題である。 モデル選択に関する問題として,クラスタ数の選択という課題を解決するために,多くの手法が提案されている。 本稿では,確率分布モデルフレームワークに基づいて,適切な数のクラスタを自動的に選択するグリージーアルゴリズムを提案する。 アルゴリズムは2つのコンポーネントを含む。 まず,確率分布モデルにリンクしたノードを持つ,各ノードの確率に基づいて,アルゴリズムが勝者を探索することのできる,コホーネンの自己組織化マップ(SOM)の一般化を紹介する。 第2に,提案手法は,ノードがユークリッド空間に固定されたコホーネンのSOMとは対照的に,ノード間の最短経路の長さで定義されるグラフ構造と近傍を用いる。 この実装により、不要なノード削除を避けるために弱連結ノードへのリンクを切断することで、グラフ構造を更新することができる。 Kullback-Leibler分散を用いてノード接続の弱点を測定し、最小記述長(MDL)によりノードの冗長性を測定する。 この更新ステップにより、適切な数のクラスタを簡単に決定できる。 既存の手法と比較して,提案手法は計算効率が高く,クラスタ数を正確に選択し,クラスタリングを行うことができる。

Determining the number of clusters in a dataset is a fundamental issue in data clustering. Many methods have been proposed to solve the problem of selecting the number of clusters, considering it to be a problem with regard to model selection. This paper proposes a greedy algorithm that automatically selects a suitable number of clusters based on a probability distribution model framework. The algorithm includes two components. First, a generalization of Kohonen's self-organizing map (SOM), which has nodes linked to a probability distribution model, and which enables the algorithm to search for the winner based on the likelihood of each node, is introduced. Second, the proposed method uses a graph structure and a neighbor defined by the length of the shortest path between nodes, in contrast to Kohonen's SOM in which the nodes are fixed in the Euclidean space. This implementation makes it possible to update its graph structure by cutting links to weakly connected nodes to avoid unnecessary node deletion. The weakness of a node connection is measured using the Kullback--Leibler divergence and the redundancy of a node is measured by the minimum description length (MDL). This updating step makes it easy to determine the suitable number of clusters. Compared with existing methods, our proposed method is computationally efficient and can accurately select the number of clusters and perform clustering.
# 非負行列分解法による法的文書の分析

Analysis of Legal Documents via Non-negative Matrix Factorization Methods ( http://arxiv.org/abs/2104.14028v1 )

Ryan Budahazy, Lu Cheng, Yihuan Huang, Andrew Johnson, Pengyu Li, Joshua Vendrow, Zhoutong Wu, Denali Molitor, Elizaveta Rebrova, Deanna Needell(参考訳) california innocence project(cip)は、有罪判決を受けた囚人を解放することを目的とした臨床法学校プログラムで、新しい支援要請と対応する事件ファイルを含む数千のメールを評価している。 この大量の情報処理と解釈はCIP職員にとって重要な課題であり、トピックモデリング技術によって支援できる。本論文では、非負行列分解法(NMF)を適用し、CIPがコンパイルした重要かつ未研究のデータセットに対して、その様々なオフシュートを実装する。 既存の事例ファイルの下位のトピックを特定し、犯罪タイプとケースステータス(決定型)で要求ファイルを分類する。 その結果、現在のケースファイルの意味構造を明らかにし、cip職員に新たな受信したケースファイルに関する一般的な理解を与えることができる。 また、NMFの一般的な変種を実験結果とともに展示し、実世界の応用を通して各変種の利点と欠点について議論する。

The California Innocence Project (CIP), a clinical law school program aiming to free wrongfully convicted prisoners, evaluates thousands of mails containing new requests for assistance and corresponding case files. Processing and interpreting this large amount of information presents a significant challenge for CIP officials, which can be successfully aided by topic modeling techniques.In this paper, we apply Non-negative Matrix Factorization (NMF) method and implement various offshoots of it to the important and previously unstudied data set compiled by CIP. We identify underlying topics of existing case files and classify request files by crime type and case status (decision type). The results uncover the semantic structure of current case files and can provide CIP officials with a general understanding of newly received case files before further examinations. We also provide an exposition of popular variants of NMF with their experimental results and discuss the benefits and drawbacks of each variant through the real-world application.
# 話者と環境情報を用いた個人化キーワード検出

Personalized Keyphrase Detection using Speaker and Environment Information ( http://arxiv.org/abs/2104.13970v1 )

Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng (Arden) Huang, Arun Narayanan, Ian McGraw(参考訳) 本稿では,大語彙から単語からなるフレーズを正確に検出できるように,容易にカスタマイズ可能なストリーミングキーフレーズ検出システムを提案する。 本システムは,エンドツーエンドで訓練された自動音声認識(ASR)モデルと,テキスト非依存話者検証モデルを用いて実装される。 様々な雑音条件下でこれらのキーフレーズを検出する課題に対処するため、話者検証モデルの特徴フロントエンドに話者分離モデルを追加し、マイクロホン間ノイズコヒーレンスを利用するための適応ノイズキャンセリング(anc)アルゴリズムを含む。 実験の結果,テキスト非依存話者検証モデルはキーフレーズ検出の誤発率を大幅に低減し,話者分離モデルと適応雑音キャンセリングは誤認率を大幅に低減することがわかった。

In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary. The system is implemented with an end-to-end trained automatic speech recognition (ASR) model and a text-independent speaker verification model. To address the challenge of detecting these keyphrases under various noisy conditions, a speaker separation model is added to the feature frontend of the speaker verification model, and an adaptive noise cancellation (ANC) algorithm is included to exploit cross-microphone noise coherence. Our experiments show that the text-independent speaker verification model largely reduces the false triggering rate of the keyphrase detection, while the speaker separation model and adaptive noise cancellation largely reduce false rejections.
# 複雑なネットワーク指標を用いた都市小売エコシステムにおける協調・競争のモデル化

Modelling Cooperation and Competition in Urban Retail Ecosystems with Complex Network Metrics ( http://arxiv.org/abs/2104.13981v1 )

Jordan Cambe, Krittika D'Silva, Anastasios Noulas, Cecilia Mascolo, Adam Waksman(参考訳) 新しいビジネスがローカル市場エコシステムに与える影響を理解することは、本質的に多面的であるため、難しい課題です。 この分野における過去の研究は、均質な会場タイプ(すなわち、協調的または競争的な役割)について検討してきた。 新しい書店が既存の書店に与える影響)。 しかし、これらの以前の作品は範囲や説明力に限られていた。 現代の都市における小売業のパフォーマンスをよりよく測定するために、モデルは同期的に相互作用する多くの要因を考慮すべきである。 本稿は,新しいビジネスが与える影響を調べる上で,都市における多面的な相互作用について考察した最初の事例である。 まず,各地域における新事業の役割を検討するためのモデリングフレームワークを提案する。 位置技術プラットフォームfoursquareの縦長データセットを使って、世界中の26の主要都市にまたがる新しい会場の影響をモデル化する。 都市を会場のネットワークとして表現し、その構造を定量化し、時間とともにそのダイナミクスを特徴づける。 これらの小売ネットワークに出現する強力なコミュニティ構造に注目し,小売施設の地域生態系に出現する協力的・競争的な力の相互作用に注目した。 次に、会場タイプ間の均質な相互作用と不均質な相互作用の両方を考慮に入れ、近隣の小売業者に対する新しい店舗の影響を1次相関で把握するデータ駆動メトリクスを考案する。 最後に、新しい会場が地域の小売エコシステムに与える影響を予測するために、教師付き機械学習モデルを構築します。 このアプローチは、機械学習予測モデル構築における複雑なネットワーク計測のパワーを強調する。 これらのモデルは小売業界で多くの応用があり、都市環境の変化を特徴づけ予測するモデルの開発において政策立案者、事業主、都市計画者を支援することができる。

Understanding the impact that a new business has on the local market ecosystem is a challenging task as it is multifaceted in nature. Past work in this space has examined the collaborative or competitive role of homogeneous venue types (i.e. the impact of a new bookstore on existing bookstores). However, these prior works have been limited in their scope and explanatory power. To better measure retail performance in a modern city, a model should consider a number of factors that interact synchronously. This paper is the first which considers the multifaceted types of interactions that occur in urban cities when examining the impact of new businesses. We first present a modeling framework which examines the role of new businesses in their respective local areas. Using a longitudinal dataset from location technology platform Foursquare, we model new venue impact across 26 major cities worldwide. Representing cities as connected networks of venues, we quantify their structure and characterise their dynamics over time. We note a strong community structure emerging in these retail networks, an observation that highlights the interplay of cooperative and competitive forces that emerge in local ecosystems of retail establishments. We next devise a data-driven metric that captures the first-order correlation on the impact of a new venue on retailers within its vicinity accounting for both homogeneous and heterogeneous interactions between venue types. Lastly, we build a supervised machine learning model to predict the impact of a given new venue on its local retail ecosystem. Our approach highlights the power of complex network measures in building machine learning prediction models. These models have numerous applications within the retail sector and can support policymakers, business owners, and urban planners in the development of models to characterize and predict changes in urban settings.
# 長期情報の解釈可能な表現による表現規則の最適化

Optimizing Rescoring Rules with Interpretable Representations of Long-Term Information ( http://arxiv.org/abs/2104.14291v1 )

Aaron Fisher(参考訳) 時間的データ(例えばウェアラブルデバイスデータ)を分析するには、最近のものと遠い過去の情報を組み合わせる方法を決定する必要がある。 睡眠状態をアクチグラムから分類する文脈において、websterのリカリングルールは、移動ウィンドウモデルの出力の長期的なパターンに基づく1つの一般的なソリューションを提供する。 残念なことに、任意の設定に対するリスコリングルールの最適化方法に関する問題は未解決のままである。 この問題に対処し,再利用ルールの可能なユースケースを拡大するため,我々は,エポック特有の特徴の観点でこれらのルールを再現することを提案する。 われわれの特徴は2つの一般的な形態を採っている: (1) 与えられた状態に費やされた現在と最も近い(または最も近い)までの時間ラグ、(2) ある状態に費やされた最新の(または最も近い)時間の長さ。 初期移動ウィンドウモデルが与えられた場合、これらの機能は再帰的に定義でき、再列ルールの最適化が容易になる。 移動ウィンドウモデルとその後の再描画規則の協調最適化は、テンソルフローのような勾配に基づく最適化ソフトウェアを用いて実装することもできる。 二分分類問題(例えばスリープウォーク)以外にも、複数状態分類問題(例えば、座位、歩行、階段登山)の長期的なパターンを要約するためにも同様のアプローチが適用できる。 最適化されたRescoringルールは、スリープウェイク分類器の性能を改善し、特定のニューラルネットワークアーキテクチャと同等の精度を達成する。

Analyzing temporal data (e.g., wearable device data) requires a decision about how to combine information from the recent and distant past. In the context of classifying sleep status from actigraphy, Webster's rescoring rules offer one popular solution based on the long-term patterns in the output of a moving-window model. Unfortunately, the question of how to optimize rescoring rules for any given setting has remained unsolved. To address this problem and expand the possible use cases of rescoring rules, we propose rephrasing these rules in terms of epoch-specific features. Our features take two general forms: (1) the time lag between now and the most recent [or closest upcoming] bout of time spent in a given state, and (2) the length of the most recent [or closest upcoming] bout of time spent in a given state. Given any initial moving window model, these features can be defined recursively, allowing for straightforward optimization of rescoring rules. Joint optimization of the moving window model and the subsequent rescoring rules can also be implemented using gradient-based optimization software, such as Tensorflow. Beyond binary classification problems (e.g., sleep-wake), the same approach can be applied to summarize long-term patterns for multi-state classification problems (e.g., sitting, walking, or stair climbing). We find that optimized rescoring rules improve the performance of sleep-wake classifiers, achieving accuracy comparable to that of certain neural network architectures.
# deep neural network と long short-term memory $(2)$ を用いた2つの気象因子の動的予測

Dynamical prediction of two meteorological factors using the deep neural network and the long short-term memory $(2)$ ( http://arxiv.org/abs/2104.14406v1 )

Ki-Hong Shin, Jae-Won Jung, Ki-Ho Chang, Dong-In Lee, Cheol-Hwan You, Kyungsik Kim(参考訳) 本稿では,ニューラルネットワークアルゴリズムにおける2変量気象因子,平均温度,平均湿度を用いた予測精度を提案する。 本研究では,従来のニューラルネットワーク,ディープニューラルネットワーク,極端な学習マシン,長期記憶,ピープホール接続による長期記憶などの5つの学習アーキテクチャを計算機シミュレーションにより解析する。 私たちのニューラルネットワークモードは、7年間(2014年から2020年まで)に毎日の時系列データセットでトレーニングされています。 2500,5000,7500エポックの訓練結果から,首都10都市(セオウル,デジョン,デグ,ブサン,インチョン,ガンジュ,ポハン,モクポ,トンジュン,ジュンジュ)のアウトプットから得られた気象要因の予測精度を得た。 誤差統計は出力の結果から得られ、これらの値は5つのニューラルネットワークの操作後に互いに比較する。 テスト1(入力層から6つの入力ノードが予測される平均温度)における短期記憶モデルを用いて、tonyongは、温度を予測するためにコンピュータシミュレーションから夏が最も低い根平均二乗誤差(rmse)値を0.866$(%)$とする。 湿度を予測するために、テスト2ではmokpoの夏季の長期短期記憶モデル(入力層から6つの入力ノードで予測される平均湿度)を用いて、rmseの最低値が5.732$(%)$であることを示す。 特に、長期の短期記憶モデルは、温度と湿度の予測において、他のニューラルネットワークモデルよりも日々のレベルを予測するのに正確であることがわかっている。 この結果は,将来,新しいニューラルネットワーク評価手法を探索し,発展させるために必要なコンピュータシミュレーション基盤を提供する可能性がある。

This paper presents the predictive accuracy using two-variate meteorological factors, average temperature and average humidity, in neural network algorithms. We analyze result in five learning architectures such as the traditional artificial neural network, deep neural network, and extreme learning machine, long short-term memory, and long-short-term memory with peephole connections, after manipulating the computer-simulation. Our neural network modes are trained on the daily time-series dataset during seven years (from 2014 to 2020). From the trained results for 2500, 5000, and 7500 epochs, we obtain the predicted accuracies of the meteorological factors produced from outputs in ten metropolitan cities (Seoul, Daejeon, Daegu, Busan, Incheon, Gwangju, Pohang, Mokpo, Tongyeong, and Jeonju). The error statistics is found from the result of outputs, and we compare these values to each other after the manipulation of five neural networks. As using the long-short-term memory model in testing 1 (the average temperature predicted from the input layer with six input nodes), Tonyeong has the lowest root mean squared error (RMSE) value of 0.866 $(%)$ in summer from the computer-simulation in order to predict the temperature. To predict the humidity, the RMSE is shown the lowest value of 5.732 $(%)$, when using the long short-term memory model in summer in Mokpo in testing 2 (the average humidity predicted from the input layer with six input nodes). Particularly, the long short-term memory model is is found to be more accurate in forecasting daily levels than other neural network models in temperature and humidity forecastings. Our result may provide a computer-simuation basis for the necessity of exploring and develping a novel neural network evaluation method in the future.
# (参考訳) AraStance: ファクトチェックのためのアラビアスタンス検出のためのマルチカウンタとマルチドメインデータセット

AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking ( http://arxiv.org/abs/2104.13559v1 )

Tariq Alhindi, Amal Alabdulkarim, Ali Alshehri, Muhammad Abdul-Mageed and Preslav Nakov(参考訳) オンライン上の偽情報や偽情報の拡散が続く中、複数の言語をサポートする自動システムという形で大規模に戦闘機構を開発することが重要性を増している。 1つの課題はクレームの正確性予測であり、オンラインで検索された関連文書に対するスタンス検出を用いて対処することができる。 そこで本研究では,3つのファクトチェックサイトと1つのニュースサイトからなる多種多様な情報源から,新たなアラビアスタンス検出データセット(AraStance)を提示する。 アラスタンスは複数のドメイン(例えば、政治、スポーツ、健康)といくつかのアラブ諸国からの虚偽の主張と真偽の主張をカバーしており、その主張に関して関連文書と無関係の文書との間にバランスが取れている。 AraStanceと他の2つのスタンス検出データセットを、BERTベースのモデルを使ってベンチマークします。 我々の最善のモデルは85%の精度と78%のマクロf1スコアを達成し、改善の余地を残し、悲惨なアラスタンスの性質と一般的なスタンス検出の課題を反映している。

With the continuing spread of misinformation and disinformation online, it is of increasing importance to develop combating mechanisms at scale in the form of automated systems that support multiple languages. One task of interest is claim veracity prediction, which can be addressed using stance detection with respect to relevant documents retrieved online. To this end, we present our new Arabic Stance Detection dataset (AraStance) of 910 claims from a diverse set of sources comprising three fact-checking websites and one news website. AraStance covers false and true claims from multiple domains (e.g., politics, sports, health) and several Arab countries, and it is wellbalanced between related and unrelated documents with respect to the claims. We benchmark AraStance, along with two other stance detection datasets, using a number of BERTbased models. Our best model achieves an accuracy of 85% and a macro F1 score of 78%, which leaves room for improvement and reflects the challenging nature of AraStance and the task of stance detection in general.
# (参考訳) ニューラルレイトレーシング:学習面とリライトとビュー合成のための反射率

Neural Ray-Tracing: Learning Surfaces and Reflectance for Relighting and View Synthesis ( http://arxiv.org/abs/2104.13562v1 )

Julian Knodt, Seung-Hwan Baek, Felix Heide(参考訳) 最近のニューラルレンダリング手法では、ニューラルネットワークを用いてボリューム密度と色を予測することにより、正確な視野補間が示されている。 このようなボリューム表現は静的および動的シーンで管理できるが、既存の方法では、表面モデリング、双方向散乱分布関数、間接照明効果を含む、特定のシーンのための単一のニューラルネットワークへの完全なシーン光輸送を暗黙的に焼く。 従来のレンダリングパイプラインとは対照的に、シーン内の他のオブジェクトの表面反射、照明、構成の変更を禁止している。 本研究では,シーン表面間の光伝達を明示的にモデル化し,従来の統合スキームと,シーンを再現するためのレンダリング式に依存する。 提案手法は、未知の光条件とパストラシングのような古典的な光輸送でBSDFの回復を可能にする。 従来のレンダリング法で確立された表面表現による分解輸送を学習することにより、自然に形状、反射性、照明、シーン構成の編集が容易になる。 この方法は、既知の照明条件下でのライトアップのためのnervよりも優れており、リライトや編集シーンのリアルな再構成を生成する。 提案手法は,NERVデータセットのサブセットの合成およびキャプチャビューから得られたシーン編集,リライティング,反射率推定に有効である。

Recent neural rendering methods have demonstrated accurate view interpolation by predicting volumetric density and color with a neural network. Although such volumetric representations can be supervised on static and dynamic scenes, existing methods implicitly bake the complete scene light transport into a single neural network for a given scene, including surface modeling, bidirectional scattering distribution functions, and indirect lighting effects. In contrast to traditional rendering pipelines, this prohibits changing surface reflectance, illumination, or composing other objects in the scene. In this work, we explicitly model the light transport between scene surfaces and we rely on traditional integration schemes and the rendering equation to reconstruct a scene. The proposed method allows BSDF recovery with unknown light conditions and classic light transports such as pathtracing. By learning decomposed transport with surface representations established in conventional rendering methods, the method naturally facilitates editing shape, reflectance, lighting and scene composition. The method outperforms NeRV for relighting under known lighting conditions, and produces realistic reconstructions for relit and edited scenes. We validate the proposed approach for scene editing, relighting and reflectance estimation learned from synthetic and captured views on a subset of NeRV's datasets.
# (参考訳) 未知知識を用いた関係抽出のための多視点推論

Multi-view Inference for Relation Extraction with Uncertain Knowledge ( http://arxiv.org/abs/2104.13579v1 )

Bo Li, Wei Ye, Canming Huang, and Shikun Zhang(参考訳) 知識グラフ(KG)は関係抽出(RE)作業を容易にするために広く使われている。 従来のRE手法は決定論的KGの活用に重点を置いているが、関係インスタンスごとに信頼スコアを割り当てる不確実なKGは、関係事実の事前確率分布をREモデルにとって価値のある外部知識として提供することができる。 本稿では,不確実な知識を利用して関係抽出を改善することを提案する。 具体的には、ターゲットエンティティがコンセプトに属する範囲を示す不確実なKGであるProBaseを、当社のREアーキテクチャに導入する。 次に,3つの視点にわたる局所的文脈とグローバル知識を体系的に統合する,新しいマルチビュー推論フレームワークを設計した。 実験の結果,本モデルは文間関係抽出と文書間関係抽出の両方において競争性能を達成でき,我々が設計する多視点推論フレームワークと不確定な知識の導入の有効性を検証できることがわかった。

Knowledge graphs (KGs) are widely used to facilitate relation extraction (RE) tasks. While most previous RE methods focus on leveraging deterministic KGs, uncertain KGs, which assign a confidence score for each relation instance, can provide prior probability distributions of relational facts as valuable external knowledge for RE models. This paper proposes to exploit uncertain knowledge to improve relation extraction. Specifically, we introduce ProBase, an uncertain KG that indicates to what extent a target entity belongs to a concept, into our RE architecture. We then design a novel multi-view inference framework to systematically integrate local context and global knowledge across three views: mention-, entity- and concept-view. The experimental results show that our model achieves competitive performances on both sentence- and document-level relation extraction, which verifies the effectiveness of introducing uncertain knowledge and the multi-view inference framework that we design.
# (参考訳) [Re]コンテキストでオブジェクトを判断しない:コンテキストバイアスを克服する学習

[Re] Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias ( http://arxiv.org/abs/2104.13582v1 )

Sunnie S. Y. Kim, Sharon Zhang, Nicole Meister, Olga Russakovsky(参考訳) Singh et al. (2020) 視覚認識データセットにおける文脈バイアスの危険性を指摘する。 彼らはCAMベースと特徴分割という2つの手法を提案し、オブジェクトや属性を通常のコンテキストなしでよりよく認識し、競合するコンテキスト内精度を維持している。 それらの性能を検証するため,本論文では付録を含む12の表を全て再現する試みを行った。 また,提案手法をよりよく理解するための追加実験を行い,camによる正規化の増大や特徴分割の重み付き損失の除去などを行った。 オリジナルのコードが利用できないため、パイプライン全体をスクラッチからPyTorch 1.7.0で実装しました。 実装は著者との論文と電子メールの交換に基づいています。 提案手法は両手法とも文脈バイアスを軽減するのに有効であることがわかったが,いくつかの手法では,広範囲なハイパーパラメータ探索を完了しても,論文の定量的結果を完全に再現することはできなかった。 例えば、COCO-Stuff、DeepFashion、UnRelでは、標準ベースライン上でのコンテキスト外画像の精度が向上しましたが、AwAではパフォーマンスが低下しました。 提案手法では,元の論文の結果を0.5$\%$mAP以内で再現することができた。 実装はhttps://github.com/princetonvisualai/contextualbiasで確認できます。

Singh et al. (2020) point out the dangers of contextual bias in visual recognition datasets. They propose two methods, CAM-based and feature-split, that better recognize an object or attribute in the absence of its typical context while maintaining competitive within-context accuracy. To verify their performance, we attempted to reproduce all 12 tables in the original paper, including those in the appendix. We also conducted additional experiments to better understand the proposed methods, including increasing the regularization in CAM-based and removing the weighted loss in feature-split. As the original code was not made available, we implemented the entire pipeline from scratch in PyTorch 1.7.0. Our implementation is based on the paper and email exchanges with the authors. We found that both proposed methods in the original paper help mitigate contextual bias, although for some methods, we could not completely replicate the quantitative results in the paper even after completing an extensive hyperparameter search. For example, on COCO-Stuff, DeepFashion, and UnRel, our feature-split model achieved an increase in accuracy on out-of-context images over the standard baseline, whereas on AwA, we saw a drop in performance. For the proposed CAM-based method, we were able to reproduce the original paper's results to within 0.5$\%$ mAP. Our implementation can be found at https://github.com/princetonvisualai/ContextualBias.
# (参考訳) Transformerによるポイントクラウド学習

Point Cloud Learning with Transformer ( http://arxiv.org/abs/2104.13636v1 )

Xian-Feng Han, Yu-Jia Kuang, Guo-Qiang Xiao(参考訳) 自然言語処理におけるトランスフォーマーネットワークの顕著な性能は、画像認識やセグメンテーションといったコンピュータビジョンタスクを扱う際のこれらのモデルの開発を促進する。 本稿では,多レベルマルチスケールポイントトランスフォーマ(mlmspt)と呼ばれる,表現学習のための不規則なポイントクラウド上で直接動作する新しいフレームワークを提案する。 具体的には,各スケールの異なるレベルからコンテキスト情報を集約し,それらの相互作用を強化するマルチレベルトランスフォーマーモジュールを用いて,様々な解像度やスケールで特徴をモデル化する。 マルチスケールトランスフォーマーモジュールは、異なるスケールの表現間の依存関係をキャプチャするように設計されている。 公開ベンチマークデータセットの広範な評価は,3次元形状分類,パートセグメンテーション,セマンティックセグメンテーションタスクにおける提案手法の有効性と競合性を示している。

Remarkable performance from Transformer networks in Natural Language Processing promote the development of these models in dealing with computer vision tasks such as image recognition and segmentation. In this paper, we introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT) that works directly on the irregular point clouds for representation learning. Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales we defined, followed by a multi-level transformer module to aggregate contextual information from different levels of each scale and enhance their interactions. While a multi-scale transformer module is designed to capture the dependencies among representations across different scales. Extensive evaluation on public benchmark datasets demonstrate the effectiveness and the competitive performance of our methods on 3D shape classification, part segmentation and semantic segmentation tasks.
# (参考訳) pytorch tabular: 表データを用いたディープラーニングフレームワーク

PyTorch Tabular: A Framework for Deep Learning with Tabular Data ( http://arxiv.org/abs/2104.13638v1 )

Manu Joseph(参考訳) テキストや画像のようなモダリティにおいて不合理な効果を示すにもかかわらず、Deep Learningは常にグラフデータのグラディエントブースティング(Gradient Boosting)に人気とパフォーマンスの両面でタグ付けしてきた。 しかし最近、表データ専用に作られた新しいモデルがあり、パフォーマンスバーを押している。 しかし、sci-kitのような簡単に使えるライブラリがディープラーニングのために学習できないため、人気は依然として課題である。 PyTorch Tabularは、Deep Learningと表データを簡単に高速に扱える新しいディープラーニングライブラリである。 PyTorchとPyTorch Lightningの上に構築されたライブラリで、パンダのデータフレームを直接処理する。 NODEやTabNetのような多くのSOTAモデルは、すでに統合されたAPIでライブラリに統合され実装されている。 pytorch tabularは、研究者にとって容易に拡張可能で、実践者にとってシンプルで、産業展開において堅牢であるように設計されている。

In spite of showing unreasonable effectiveness in modalities like Text and Image, Deep Learning has always lagged Gradient Boosting in tabular data - both in popularity and performance. But recently there have been newer models created specifically for tabular data, which is pushing the performance bar. But popularity is still a challenge because there is no easy, ready-to-use library like Sci-Kit Learn for deep learning. PyTorch Tabular is a new deep learning library which makes working with Deep Learning and tabular data easy and fast. It is a library built on top of PyTorch and PyTorch Lightning and works on pandas dataframes directly. Many SOTA models like NODE and TabNet are already integrated and implemented in the library with a unified API. PyTorch Tabular is designed to be easily extensible for researchers, simple for practitioners, and robust in industrial deployments.
# (参考訳) 履歴桁文字列認識のためのエンドツーエンドアプローチ

End-to-End Approach for Recognition of Historical Digit Strings ( http://arxiv.org/abs/2104.13666v1 )

Mengqiao Zhao, Andre G. Hochuli, Abbas Cheddad(参考訳) 近年、デジタル化された歴史文書データセットが多数登場し、手書き文字認識の分野への興味が再燃している。 同じ流れの中で、ARDISとして知られる最近発表されたデータセットは、スウェーデンの教会書の15万冊のスキャンされた文書から手書きの数字を抽出し、様々な筆跡を提示している。 そこで本研究では,ardisデータセット (4桁長文字列) に存在する日付の,この難解な手書きスタイルを扱うために,エンドツーエンドのセグメントフリーなディープラーニング手法を提案する。 vgg-16深層モデルにわずかな修正を加えることで、93.2%の認識率を達成でき、その結果、ヒューリスティックな方法、セグメンテーション、融合方法のない実現可能な解が得られることを示した。 さらに,提案手法はCRNN法(手書き認識タスクに広く適用されているモデル)よりも優れている。

The plethora of digitalised historical document datasets released in recent years has rekindled interest in advancing the field of handwriting pattern recognition. In the same vein, a recently published data set, known as ARDIS, presents handwritten digits manually cropped from 15.000 scanned documents of Swedish church books and exhibiting various handwriting styles. To this end, we propose an end-to-end segmentation-free deep learning approach to handle this challenging ancient handwriting style of dates present in the ARDIS dataset (4-digits long strings). We show that with slight modifications in the VGG-16 deep model, the framework can achieve a recognition rate of 93.2%, resulting in a feasible solution free of heuristic methods, segmentation, and fusion methods. Moreover, the proposed approach outperforms the well-known CRNN method (a model widely applied in handwriting recognition tasks).
# (参考訳) ランダムニューラルネットワークによる最適停止

Optimal Stopping via Randomized Neural Networks ( http://arxiv.org/abs/2104.13669v1 )

Calypso Herrera, Florian Krack, Pierre Ruyssen, Josef Teichmann(参考訳) 本稿では,最適停止問題の解を近似する新しい機械学習手法を提案する。 これらの方法の重要なアイデアは、隠れた層がランダムに生成され、最後の層のみがトレーニングされるニューラルネットワークを使用することで、継続値を近似する。 我々のアプローチは、既存のアプローチがますます現実的でない高次元問題に適用できる。 さらに,本手法は単純な線形回帰法を用いて最適化できるため,実装は非常に容易であり,理論的保証も提供できる。 マルコフの例では、ランダム化された強化学習アプローチと非マルコフの例では、ランダム化されたリカレントニューラルネットワークアプローチが最先端や他の関連する機械学習アプローチより優れている。

This paper presents new machine learning approaches to approximate the solution of optimal stopping problems. The key idea of these methods is to use neural networks, where the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable for high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using a simple linear regression, they are very easy to implement and theoretical guarantees can be provided. In Markovian examples our randomized reinforcement learning approach and in non-Markovian examples our randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches.
# (参考訳) HOTR:変換器による人間と物体の相互作用検出

HOTR: End-to-End Human-Object Interaction Detection with Transformers ( http://arxiv.org/abs/2104.13682v1 )

Bumsoo Kim, Junhyun Lee, Jaewoo Kang, Eun-Sol Kim, Hyunwoo J. Kim(参考訳) 人間と物体の相互作用(hoi:human-object interaction)検出(human-object interaction)は、画像中の「一連の相互作用」を識別するタスクであり、i)対象(つまり、人間)と対象(すなわち、対象)の相互作用の分類、ii)相互作用ラベルの分類を含む。 既存のほとんどのメソッドは、人間とオブジェクトのインスタンスを検出し、検出されたインスタンスのペアを個別に推測することで、このタスクに間接的に対処している。 本稿では,変換器エンコーダ-デコーダアーキテクチャに基づく画像から<human, object, interaction>トリプレットのセットを直接予測するhotrによって言及される新しいフレームワークを提案する。 本手法は,画像中の意味的関係を効果的に利用し,既存の手法の主なボトルネックである時間を要する後処理を必要としない。 提案アルゴリズムは,物体検出後1ms以下の推定時間を持つ2つのHOI検出ベンチマークにおいて,最先端性能を実現する。

Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels. Most existing methods have indirectly addressed this task by detecting human and object instances and individually inferring every pair of the detected instances. In this paper, we present a novel framework, referred to by HOTR, which directly predicts a set of <human, object, interaction> triplets from an image based on a transformer encoder-decoder architecture. Through the set prediction, our method effectively exploits the inherent semantic relationships in an image and does not require time-consuming post-processing which is the main bottleneck of existing methods. Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.
# (参考訳) SELF & FEIL:フィンランドの感情と強度のレキシコン

SELF & FEIL: Emotion and Intensity Lexicons for Finnish ( http://arxiv.org/abs/2104.13691v1 )

Emily \"Ohman(参考訳) 本稿では,フィンランドのSELF(Sentiment and Emotion Lexicon)とフィンランドのFEIL(Emotion Intensity Lexicon)を紹介する。 本稿では,レキシコン生成プロセスを説明し,一般的なツールを用いてレキシコンを評価する。 辞書は、NRC Emotion Lexiconから投影されたアノテーションを使用して、注意深く編集された翻訳を行う。 私たちの知る限り、これはフィンランドにとって初めての総合的な感情と感情のレキシコンです。

This paper introduces a Sentiment and Emotion Lexicon for Finnish (SELF) and a Finnish Emotion Intensity Lexicon (FEIL). We describe the lexicon creation process and evaluate the lexicon using some commonly available tools. The lexicon uses annotations projected from the NRC Emotion Lexicon with carefully edited translations. To our knowledge, this is the first comprehensive sentiment and emotion lexicon for Finnish.
# (参考訳) 選択採用の情報交流プロファイル

Information Interaction Profile of Choice Adoption ( http://arxiv.org/abs/2104.13695v1 )

Ga\"el Poux-M\'edard and Julien Velcin and Sabine Loudcher(参考訳) 情報の断片(エンティティ)間の相互作用は、製品の採用、ニュースの拡散、戦略の選択など、個人の行動の仕方において重要な役割を果たす。 しかし、基礎となる相互作用機構はよく分かっておらず、文献ではほとんど研究されていない。 本稿では,相互作用するエンティティを分離する時間的距離に応じて,相互作用ネットワークとその進化を推論する効率的な手法を提案する。 相互作用プロファイルは、相互作用プロセスのメカニズムを特徴づけることができる。 マルチカーネル推論の最近の進歩に基づいて凸モデルを用いてこの問題にアプローチする。 エンティティ(url、広告、状況)への露光順序と、ユーザがそれに対して行うアクション(共有、クリック、決定)について検討する。 本研究では,曝露の組合せによってユーザが異なる行動を示す方法を検討する。 ユーザに対する露出の組み合わせの効果は、各露出の独立した効果の総和以上のものであることを示す。 我々はこのモデリングを並列に解くことができる非パラメトリック凸最適化問題に還元する。 提案手法は,3つの実世界のデータセット上でのインタラクションプロセスの状態を復元し,基礎となるデータ生成機構の推論においてベースラインを上回ります。 最後に,インタラクションプロファイルを直感的に視覚化することで,モデルの解釈を緩和できることを示す。

Interactions between pieces of information (entities) play a substantial role in the way an individual acts on them: adoption of a product, the spread of news, strategy choice, etc. However, the underlying interaction mechanisms are often unknown and have been little explored in the literature. We introduce an efficient method to infer both the entities interaction network and its evolution according to the temporal distance separating interacting entities; together, they form the interaction profile. The interaction profile allows characterizing the mechanisms of the interaction processes. We approach this problem via a convex model based on recent advances in multi-kernel inference. We consider an ordered sequence of exposures to entities (URL, ads, situations) and the actions the user exerts on them (share, click, decision). We study how users exhibit different behaviors according to combinations of exposures they have been exposed to. We show that the effect of a combination of exposures on a user is more than the sum of each exposure's independent effect--there is an interaction. We reduce this modeling to a non-parametric convex optimization problem that can be solved in parallel. Our method recovers state-of-the-art results on interaction processes on three real-world datasets and outperforms baselines in the inference of the underlying data generation mechanisms. Finally, we show that interaction profiles can be visualized intuitively, easing the interpretation of the model.
# (参考訳) ニューラルネットワークと視覚形状を用いた3次元頭部再構成のためのハイブリッドアプローチ

Hybrid Approach for 3D Head Reconstruction: Using Neural Networks and Visual Geometry ( http://arxiv.org/abs/2104.13710v1 )

Oussema Bouafif, Bogdan Khomutenko, Mohamed Daoudi(参考訳) 単一の入力画像から顔の3次元幾何学構造を復元することは、コンピュータビジョンにおける挑戦的な研究領域である。 本稿では,ディープラーニングと幾何学的手法に基づくハイブリッド手法を用いて,単一または複数画像から3次元頭部を再構成する新しい手法を提案する。 本稿では,U-netアーキテクチャに基づくエンコーダ・デコーダネットワークを提案し,合成データのみを訓練する。 ピクセル単位の正規ベクトルとランドマークの両方を単一の入力写真から予測する。 ランドマークはポーズ計算や最適化問題の初期化に使われ、パラメトリックな形態素モデルと正規ベクトル場を用いて3次元頭部形状を再構成する。 現状の成果は、単一および多視点設定の質的および定量的評価テストによって達成される。 モデルが合成データのみに基づいてトレーニングされたにもかかわらず、実世界の画像の3dジオメトリと正確なポーズを回復することに成功した。

Recovering the 3D geometric structure of a face from a single input image is a challenging active research area in computer vision. In this paper, we present a novel method for reconstructing 3D heads from a single or multiple image(s) using a hybrid approach based on deep learning and geometric techniques. We propose an encoder-decoder network based on the U-net architecture and trained on synthetic data only. It predicts both pixel-wise normal vectors and landmarks maps from a single input photo. Landmarks are used for the pose computation and the initialization of the optimization problem, which, in turn, reconstructs the 3D head geometry by using a parametric morphable model and normal vector fields. State-of-the-art results are achieved through qualitative and quantitative evaluation tests on both single and multi-view settings. Despite the fact that the model was trained only on synthetic data, it successfully recovers 3D geometry and precise poses for real-world images.
# (参考訳) 2021年の宇宙飛行士プロジェクト:人間の脳が世界の動きの感覚を作る方法

The Algonauts Project 2021 Challenge: How the Human Brain Makes Sense of a World in Motion ( http://arxiv.org/abs/2104.13714v1 )

R.M. Cichy, K. Dwivedi, B. Lahner, A. Lascelles, P. Iamshchinina, M. Graumann, A. Andonian, N.A.R. Murty, K. Kay, G. Roig, A. Oliva(参考訳) 自然と人工知能の科学は基本的に結びついている。 脳にインスパイアされた人間工学AIは、現在、視覚中の人間の脳反応を予測する標準となっている。 これらの分野間のより深いつながりを促進するため、我々は2021年版のalgonauts project challenge: how the human brain makes sense of a world in motion (http://algonauts.csail.mit.edu/)をリリースする。 10人の被験者が毎日の出来事を描写した1000本以上の短いビデオクリップのリッチなセットを見ている間、全脳 fMRI 応答が記録された。 このチャレンジの目的は、ビデオクリップに対する脳の反応を正確に予測することだ。 私たちの挑戦の形式は、迅速な開発を保証し、結果を直接比較し、透過的にし、誰に対してもオープンです。 このようにして、視覚知能を理解するという共通の目標に向けて、学際的なコラボレーションを促進する。 2021年のalgonautsプロジェクトはcognitive computational neuroscience (ccn) conferenceと共同で行われている。

The sciences of natural and artificial intelligence are fundamentally connected. Brain-inspired human-engineered AI are now the standard for predicting human brain responses during vision, and conversely, the brain continues to inspire invention in AI. To promote even deeper connections between these fields, we here release the 2021 edition of the Algonauts Project Challenge: How the Human Brain Makes Sense of a World in Motion (http://algonauts.csail.mit.edu/). We provide whole-brain fMRI responses recorded while 10 human participants viewed a rich set of over 1,000 short video clips depicting everyday events. The goal of the challenge is to accurately predict brain responses to these video clips. The format of our challenge ensures rapid development, makes results directly comparable and transparent, and is open to all. In this way it facilitates interdisciplinary collaboration towards a common goal of understanding visual intelligence. The 2021 Algonauts Project is conducted in collaboration with the Cognitive Computational Neuroscience (CCN) conference.
# (参考訳) PCFGは、多くのシンボルを持つ確率論的文脈自由文法を誘導する

PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols ( http://arxiv.org/abs/2104.13727v1 )

Songlin Yang, Yanpeng Zhao, Kewei Tu(参考訳) ニューラルパラメータ化を伴う確率論的文脈自由文法(pcfgs)は教師なし句構造文法誘導に有効であることが示されている。 しかし、PCFG表現と解析の3次計算の複雑さのため、従来の手法では比較的多くの(非終端および終端)シンボルにスケールアップできない。 本研究では,シンボル数に最大2次計算量を持つテンソル分解に基づく新しいパラメータ化形式を提案する。 さらに,ニューラルパラメタライゼーションを用いて,教師なし解析性能を向上させる。 我々は10言語にまたがってモデルを評価し,より多くのシンボルの使用の有効性を実証的に実証する。 コード:https://github.com/sustcsonglin/TN-PCFG

Probabilistic context-free grammars (PCFGs) with neural parameterization have been shown to be effective in unsupervised phrase-structure grammar induction. However, due to the cubic computational complexity of PCFG representation and parsing, previous approaches cannot scale up to a relatively large number of (nonterminal and preterminal) symbols. In this work, we present a new parameterization form of PCFGs based on tensor decomposition, which has at most quadratic computational complexity in the symbol number and therefore allows us to use a much larger number of symbols. We further use neural parameterization for the new form to improve unsupervised parsing performance. We evaluate our model across ten languages and empirically demonstrate the effectiveness of using more symbols. Our code: https://github.com/sustcsonglin/TN-PCFG
# (参考訳) 画像分割における外乱検出のための分布ガウス過程層

Distributional Gaussian Process Layers for Outlier Detection in Image Segmentation ( http://arxiv.org/abs/2104.13756v1 )

Sebastian G. Popescu, David J. Sharp, James H. Cole, Konstantinos Kamnitsas, Ben Glocker(参考訳) 我々は,wasserstein-2空間で動作するガウス過程を組み込んだ階層的畳み込みガウス過程に対して,不確かさを確実に伝播するパラメータ効率の高いベイズ層を提案する。 これは、ガウス過程を分布上の距離保存アフィン作用素に置き換える。 脳組織セグメンテーション実験の結果,従来の階層的ガウス過程では達成されていない,確立された決定論的セグメンテーションアルゴリズム(u-net)の性能にアプローチした。 さらに,同じセグメンテーションモデルを分散データ(例えば脳腫瘍などの病理画像)に適用することにより,不確実性推定の結果,従来のベイズネットワークや正規分布を学習するレコンストラクションに基づくアプローチよりも優れる分布外検出が得られることを示す。

We propose a parameter efficient Bayesian layer for hierarchical convolutional Gaussian Processes that incorporates Gaussian Processes operating in Wasserstein-2 space to reliably propagate uncertainty. This directly replaces convolving Gaussian Processes with a distance-preserving affine operator on distributions. Our experiments on brain tissue-segmentation show that the resulting architecture approaches the performance of well-established deterministic segmentation algorithms (U-Net), which has never been achieved with previous hierarchical Gaussian Processes. Moreover, by applying the same segmentation model to out-of-distribution data (i.e., images with pathology such as brain tumors), we show that our uncertainty estimates result in out-of-distribution detection that outperforms the capabilities of previous Bayesian networks and reconstruction-based approaches that learn normative distributions.
# (参考訳) FastAdaBelief:強い凸性による信頼に基づく適応最適化器の収束率の向上

FastAdaBelief: Improving Convergence Rate for Belief-based Adaptive Optimizer by Strong Convexity ( http://arxiv.org/abs/2104.13790v1 )

Yangfan Zhou, Kaizhu Huang, Cheng Cheng, Xuguang Wang, and Xin Liu(参考訳) adabeliefアルゴリズムは、観測された勾配の指数的移動平均を見ることにより、adamアルゴリズムの優れた一般化能力を示す。 AdaBelief はデータ依存の $O(\sqrt{T})$ regret bound を持つことが証明されている。 しかし、AdaBeliefの収束率をさらに向上させるために、強い凸性を利用する方法については、未解決の問題である。 この問題に対処するため,我々はfastadabeliefと呼ばれる強い凸性を持つ新しい最適化アルゴリズムを提案する。 我々は、FastAdaBeliefがデータ依存の$O(\log T)$ regret boundを達成したことを証明している。 さらに、画像分類と言語モデリングのために、オープンデータセット(CIFAR-10とPenn Treebank)上で行われた広範な実験によって理論解析が検証される。

翻訳日:2021-04-29 16:20:35 公開日:2021-04-28
# (参考訳) 部分観測可能なモンテカルロ計画のためのルールベースシールド

Rule-based Shielding for Partially Observable Monte-Carlo Planning ( http://arxiv.org/abs/2104.13791v1 )

Giulio Mazzi, Alberto Castellini, Alessandro Farinelli(参考訳) 部分的に観測可能なモンテカルロ計画 (POMCP) は、大規模な部分観測可能なマルコフ決定プロセスのための近似ポリシーを生成することができる強力なオンラインアルゴリズムである。 この手法のオンライン性は、完全なポリシー表現を避けてスケーラビリティをサポートする。 しかし、明示的な表現の欠如は政策解釈を妨げ、政策検証を非常に複雑にする。 本研究では,2つの貢献を提案する。 1つ目は、タスクの専門的な事前知識に関して、POMCPが選択した予期せぬ動作を特定する方法である。 2つ目は、POMCPが予期せぬ動作を選択するのを防ぐ遮蔽アプローチである。 最初の方法はSatifiability Modulo Theory (SMT) に基づいている。 POMCPが生成したトレース(すなわち、信念-行動-観測三重項の列)を検査し、専門家が定義したポリシー特性に関する論理公式のパラメータを計算する。 第2の貢献は、オンラインの論理式を使ってpomcpによって選択された異常なアクションを特定し、それらのアクションを専門家の知識を満たす論理式を満たすアクションに置き換えるモジュールである。 我々は,POMDPの標準ベンチマークであるTigerに対するアプローチと,移動ロボットナビゲーションにおける速度制御に関する現実の問題を評価する。 その結果, シールドされたPOMCPが標準のPOMCPよりも優れており, 間違ったパラメータのPOMCPが間違った動作を時折選択するケーススタディが得られた。 さらに,論理式パラメータが不適切な動作を含む軌跡を用いて最適化された場合にも,その手法が良好な性能を維持することを示す。

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders policy interpretability and makes policy verification very complex. In this work, we propose two contributions. The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. The first method is based on Satisfiability Modulo Theory (SMT). It inspects traces (i.e., sequences of belief-action-observation triplets) generated by POMCP to compute the parameters of logical formulas about policy properties defined by the expert. The second contribution is a module that uses online the logical formulas to identify anomalous actions selected by POMCP and substitutes those actions with actions that satisfy the logical formulas fulfilling expert knowledge. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation. Results show that the shielded POMCP outperforms the standard POMCP in a case study in which a wrong parameter of POMCP makes it select wrong actions from time to time. Moreover, we show that the approach keeps good performance also if the parameters of the logical formula are optimized using trajectories containing some wrong actions.
翻訳日:2021-04-29 15:59:15 公開日:2021-04-28
DeepSatData: Building large scale datasets of satellite images for training machine learning models ( http://arxiv.org/abs/2104.13824v1 )

ライセンス: CC BY 4.0
Michail Tarasiou, Stefanos Zafeiriou(参考訳) 本稿では,高度分類タスクに重点を置いた機械学習モデル学習のための衛星画像データセットの自動生成のための設計上の考慮事項について述べる。 セマンティクスのセグメンテーション。 実装では、自由に利用可能なsentinel-2データを使用して、ディープニューラルネットワークのトレーニングに必要な大規模データセットを生成できる。 本稿では,深層ニューラルネットワークのトレーニングの観点から直面する課題について考察し,基礎的真理データの質の検証や,そのアプローチのスケーラビリティに関するコメント等について述べる。 コードはhttps://github.com/michaeltrs/deepsatdataで提供される。

This report presents design considerations for automatically generating satellite imagery datasets for training machine learning models with emphasis placed on dense classification tasks, e.g. semantic segmentation. The implementation presented makes use of freely available Sentinel-2 data which allows generation of large scale datasets required for training deep neural networks. We discuss issues faced from the point of view of deep neural network training and evaluation such as checking the quality of ground truth data and comment on the scalability of the approach. Accompanying code is provided in https://github.com/michaeltrs/DeepSatData.
翻訳日:2021-04-29 15:42:57 公開日:2021-04-28
# (参考訳) 階層データのための深層自己回帰モデル学習

Learning deep autoregressive models for hierarchical data ( http://arxiv.org/abs/2104.13853v1 )

Carl R. Andersson, Niklas Wahlstr\"om, Thomas B. Sch\"on(参考訳) 本稿では,確率的時間畳み込みネットワーク(STCN)の拡張として階層構造データのモデルを提案する。 提案モデルでは,階層型変分オートエンコーダとダウンサンプリングを併用することで,計算複雑性を向上できる。 提案モデルを音声と手書きテキストの2種類の逐次データで評価した。 その結果,提案モデルによる最先端性能の達成が期待できる。

We propose a model for hierarchical structured data as an extension to the stochastic temporal convolutional network (STCN). The proposed model combines an autoregressive model with a hierarchical variational autoencoder and downsampling to achieve superior computational complexity. We evaluate the proposed model on two different types of sequential data: speech and handwritten text. The results are promising with the proposed model achieving state-of-the-art performance.
翻訳日:2021-04-29 15:36:18 公開日:2021-04-28
D-OccNet: Detailed 3D Reconstruction Using Cross-Domain Learning ( http://arxiv.org/abs/2104.13854v1 )

ライセンス: CC BY 4.0
Minhaj Uddin Ansari, Talha Bilal, Naeem Akhter(参考訳) 深層学習に基づく1つのビュー2d画像の3次元再構成は,実世界の応用範囲が広いため,ますます普及しつつあるが,単一視点からのオブジェクトの部分的可観測性から,この課題は本質的に困難である。 近年,技術確率に基づくOccupancy Networksは3種類の入力領域から,単一ビュー2D画像,点雲,ボクセルの3次元面を再構成した。 本研究では,画像領域とポイント領域のクロスドメイン学習を活用し,占有ネットワークに関する研究を拡大する。 具体的には、まず1つのビュー2D画像を単純な点雲表現に変換し、それから3次元表面を再構成する。 我々のネットワークであるDouble Occupancy Network(D-OccNet)は、3D再構成で捉えた視覚的品質と詳細という点でOccupancy Networksを上回っている。

Deep learning based 3D reconstruction of single view 2D image is becoming increasingly popular due to their wide range of real-world applications, but this task is inherently challenging because of the partial observability of an object from a single perspective. Recently, state of the art probability based Occupancy Networks reconstructed 3D surfaces from three different types of input domains: single view 2D image, point cloud and voxel. In this study, we extend the work on Occupancy Networks by exploiting cross-domain learning of image and point cloud domains. Specifically, we first convert the single view 2D image into a simpler point cloud representation, and then reconstruct a 3D surface from it. Our network, the Double Occupancy Network (D-OccNet) outperforms Occupancy Networks in terms of visual quality and details captured in the 3D reconstruction.
翻訳日:2021-04-29 15:24:24 公開日:2021-04-28
# (参考訳) グラフリカレントニューラルネットワークを用いた分散制御における通信トポロジ共設計

Communication Topology Co-Design in Graph Recurrent Neural Network Based Distributed Control ( http://arxiv.org/abs/2104.13868v1 )

Fengjun Yang and Nikolai Matni(参考訳) 大規模分散コントローラを設計する場合、通信トポロジによって定義されたサブコントローラ間の情報共有制約は、コントローラ自体と同じくらい重要である。 密集トポロジを用いて実装されたコントローラは、通常、スパーストポロジを用いて実装されたコントローラよりも優れているが、コントローラ配置のコストを最小限に抑えることも望ましい。 このようにして,分散コントローラと通信トポロジ共設計に適した分散コントローラのコンパクトかつ表現豊かなグラフリカレントニューラルネットワーク(GRNN)パラメータ化を導入する。 提案するパラメータ化は,従来のグラフニューラルネットワーク(GNN)ベースのパラメータ化と同様,局所的かつ分散的なアーキテクチャが好まれる一方で,分散コントローラと通信トポロジの協調最適化が自然に可能である。 分散制御/通信トポロジー共設計タスクを確率的勾配法を用いて効率的に解くことのできる$\ell_1$-regularized experimental risk minimization問題として提案する。 我々は、GRNNベースの分散コントローラの性能について広範なシミュレーションを行い、(a)自由パラメータを少なくしながらGNNベースのコントローラに匹敵する性能を実現し、(b)性能/通信密度トレードオフ曲線を効率的に近似できることを示す。

When designing large-scale distributed controllers, the information-sharing constraints between sub-controllers, as defined by a communication topology interconnecting them, are as important as the controller itself. Controllers implemented using dense topologies typically outperform those implemented using sparse topologies, but it is also desirable to minimize the cost of controller deployment. Motivated by the above, we introduce a compact but expressive graph recurrent neural network (GRNN) parameterization of distributed controllers that is well suited for distributed controller and communication topology co-design. Our proposed parameterization enjoys a local and distributed architecture, similar to previous Graph Neural Network (GNN)-based parameterizations, while further naturally allowing for joint optimization of the distributed controller and communication topology needed to implement it. We show that the distributed controller/communication topology co-design task can be posed as an $\ell_1$-regularized empirical risk minimization problem that can be efficiently solved using stochastic gradient methods. We run extensive simulations to study the performance of GRNN-based distributed controllers and show that (a) they achieve performance comparable to GNN-based controllers while having fewer free parameters, and (b) our method allows for performance/communication density tradeoff curves to be efficiently approximated.
翻訳日:2021-04-29 15:16:01 公開日:2021-04-28
# (参考訳) マルチタスク密集予測のための関係コンテキストの検討

Exploring Relational Context for Multi-Task Dense Prediction ( http://arxiv.org/abs/2104.13874v1 )

David Bruggemann, Menelaos Kanakis, Anton Obukhov, Stamatios Georgoulis, Luc Van Gool(参考訳) コンピュータビジョン研究のタイムラインは、学習の進歩と効率的な文脈表現の活用が特徴である。 しかし、そのほとんどは、単一の下流タスクでモデルパフォーマンスを改善することを目的としている。 我々は,共通バックボーンと独立タスク固有のヘッドで表される,密集予測タスクのためのマルチタスク環境を考える。 私たちの目標は、タスクの関係に依存するクロスタスクコンテキストをキャプチャすることで、各タスクの予測を洗練する最も効率的な方法を見つけることです。 マルチタスク設定において,グローバルやローカルなど,さまざまな注意に基づくコンテキストを探索し,各タスクを個別に洗練する際の動作を分析する。 実験により、異なるソースターゲットタスクペアが異なるコンテキストタイプから恩恵を受けることが確認された。 選択プロセスを自動化するために,ニューラルネットワークを用いてタスクペア毎に利用可能なコンテキストのプールをサンプリングし,最適な配置設定を出力する,適応タスク関連コンテキスト(ATRC)モジュールを提案する。 提案手法は,NYUD-v2 と PASCAL-Context という2つの重要なマルチタスクベンチマーク上での最先端性能を実現する。 提案したATRCは計算料金が低く、教師付きマルチタスクアーキテクチャのドロップイン改良モジュールとして使用できる。

The timeline of computer vision research is marked with advances in learning and utilizing efficient contextual representations. Most of them, however, are targeted at improving model performance on a single downstream task. We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads. Our goal is to find the most efficient way to refine each task prediction by capturing cross-task contexts dependent on tasks' relations. We explore various attention-based contexts, such as global and local, in the multi-task setting and analyze their behavior when applied to refine each task independently. Empirical findings confirm that different source-target task pairs benefit from different context types. To automate the selection process, we propose an Adaptive Task-Relational Context (ATRC) module, which samples the pool of all available contexts for each task pair using neural architecture search and outputs the optimal configuration for deployment. Our method achieves state-of-the-art performance on two important multi-task benchmarks, namely NYUD-v2 and PASCAL-Context. The proposed ATRC has a low computational toll and can be used as a drop-in refinement module for any supervised multi-task architecture.
翻訳日:2021-04-29 15:00:23 公開日:2021-04-28
# (参考訳) 高次元付加モデルに対する決定木の普遍的整合性

Universal Consistency of Decision Trees for High Dimensional Additive Models ( http://arxiv.org/abs/2104.13881v1 )

Jason M. Klusowski(参考訳) 本稿では,分類木と回帰木(CART)手法を用いて構築された決定木が,ある$\ell_1$スペーサ性制約の下で,次元が標本サイズと指数関数的にスケールする場合でも,加法モデルに対して普遍的に一致していることを示す。 整合性は、入力変数の分布に先入観が存在しないという意味で普遍的である。 驚くべきことに、この(近似的または正確な)間隔への適応性は、アンサンブルに期待されるものとは対照的に、単一の木で達成される。 最後に,個々の樹木の質的性質は,ブレイマンのランダム林に受け継がれていることを示す。 この分析における重要なステップは、適合性と複雑性のトレードオフを正確に特徴づけるオラクルの不平等の確立である。

This paper shows that decision trees constructed with Classification and Regression Trees (CART) methodology are universally consistent for additive models, even when the dimensionality scales exponentially with the sample size, under certain $\ell_1$ sparsity constraints. The consistency is universal in the sense that there are no a priori assumptions on the distribution of the input variables. Surprisingly, this adaptivity to (approximate or exact) sparsity is achieved with a single tree, as opposed to what might be expected for an ensemble. Finally, we show that these qualitative properties of individual trees are inherited by Breiman's random forests. A key step in the analysis is the establishment of an oracle inequality, which precisely characterizes the goodness-of-fit and complexity tradeoff.
翻訳日:2021-04-29 14:42:26 公開日:2021-04-28
# (参考訳) ナンバープレートのローカライゼーションアルゴリズムの分類と比較

Classification and comparison of license plates localization algorithms ( http://arxiv.org/abs/2104.13896v1 )

Mustapha Saidallah, Fatimazahra Taki, Abdelbaki El Belrhiti El Alaoui and Abdeslam El Fergougui(参考訳) インテリジェント・トランスポーテーション・システムズ(ITS)は世界経済競争の対象である。 これらは輸送部門における新しい情報・通信技術の応用であり、インフラをより効率的に、信頼性が高く、エコロジー的にするものである。 ライセンスプレート認識(lpr)は、このモジュールの速度と堅牢性を決定するため、ライセンスプレートのローカライゼーション(lpl)が最も重要な段階であるこれらのシステムのキーモジュールである。 したがって、このステップの間、アルゴリズムは画像を処理し、気候条件や照明条件、センサーや角度の変化、LPの標準化なし、リアルタイム処理などいくつかの制約を克服しなければならない。 本稿では,LPLアルゴリズムの分類と比較を行い,それぞれの利点,欠点,改善点について述べる。

The Intelligent Transportation Systems (ITS) are the subject of a world economic competition. They are the application of new information and communication technologies in the transport sector, to make the infrastructures more efficient, more reliable and more ecological. License Plates Recognition (LPR) is the key module of these systems, in which the License Plate Localization (LPL) is the most important stage, because it determines the speed and robustness of this module. Thus, during this step the algorithm must process the image and overcome several constraints as climatic and lighting conditions, sensors and angles variety, LPs no-standardization, and the real time processing. This paper presents a classification and comparison of License Plates Localization (LPL) algorithms and describes the advantages, disadvantages and improvements made by each of them
翻訳日:2021-04-29 14:22:04 公開日:2021-04-28
# (参考訳) ニューラルネットワークを用いた多スケール確率系における低速変数の発見

Discovery of slow variables in a class of multiscale stochastic systems via neural networks ( http://arxiv.org/abs/2104.13911v1 )

Przemyslaw Zielinski and Jan S. Hesthaven(参考訳) 複雑で高次元のダイナミクスを本質的で低次元の「ハート」に還元することは、効率的な数値的アプローチを設計する上で必要不可欠な前提条件である。 機械学習手法は、そのような表現を自動的に発見する一般的なフレームワークを提供する可能性がある。 本稿では,局所的な低速時間スケール分離を伴うマルチスケール確率システムについて考察し,そのシステムから低速表現を抽出するマップをニューラルネットワークにエンコードする新しい手法を提案する。 ネットワークのアーキテクチャはエンコーダとデコーダのペアで構成されており、ボトルネック層に適切な低次元の埋め込みを学習するために教師付きで訓練する。 我々は、このメソッドを、正しいスロー表現を見つける能力を示すいくつかの例でテストする。 さらに,組込みの品質を評価するための誤差尺度を提供し,ネットワークの刈り取りがシステムの基本的な座標をピンポイントして遅い表現を構築できることを実証する。

Finding a reduction of complex, high-dimensional dynamics to its essential, low-dimensional "heart" remains a challenging yet necessary prerequisite for designing efficient numerical approaches. Machine learning methods have the potential to provide a general framework to automatically discover such representations. In this paper, we consider multiscale stochastic systems with local slow-fast time scale separation and propose a new method to encode in an artificial neural network a map that extracts the slow representation from the system. The architecture of the network consists of an encoder-decoder pair that we train in a supervised manner to learn the appropriate low-dimensional embedding in the bottleneck layer. We test the method on a number of examples that illustrate the ability to discover a correct slow representation. Moreover, we provide an error measure to assess the quality of the embedding and demonstrate that pruning the network can pinpoint an essential coordinates of the system to build the slow representation.
翻訳日:2021-04-29 14:11:08 公開日:2021-04-28
# (参考訳) 1次元からの高分解能光流と相関

High-Resolution Optical Flow from 1D Attention and Correlation ( http://arxiv.org/abs/2104.13918v1 )

Haofei Xu, Jiaolong Yang, Jianfei Cai, Juyong Zhang, Xin Tong(参考訳) 光フローは本質的に2次元探索問題であり、計算複雑性は検索ウィンドウに対して2次的に増大し、高解像度画像に適合しない大きな変位が生じる。 本稿では,2次元光流を1次元の注意と相関で分解し,計算量を大幅に削減した高分解能光フロー推定法を提案する。 具体的には、まずターゲット画像の垂直方向の1Dアテンション操作を行い、次に参加者画像の水平方向の単純な1D相関により、2D対応モデリング効果が得られる。 注意と相関の方向も交換でき、光学フロー推定のために結合された2つの3dコストボリュームが得られる。 新たな1Dの定式化により,競争性能を維持しつつ,高解像度の入力画像にスケールすることができる。 Sintel,KITTI,および実世界の4K(2160 \times 3840$)解像度画像に対する大規模な実験により,提案手法の有効性と優位性を示した。

Optical flow is inherently a 2D search problem, and thus the computational complexity grows quadratically with respect to the search window, making large displacements matching infeasible for high-resolution images. In this paper, we propose a new method for high-resolution optical flow estimation with significantly less computation, which is achieved by factorizing 2D optical flow with 1D attention and correlation. Specifically, we first perform a 1D attention operation in the vertical direction of the target image, and then a simple 1D correlation in the horizontal direction of the attended image can achieve 2D correspondence modeling effect. The directions of attention and correlation can also be exchanged, resulting in two 3D cost volumes that are concatenated for optical flow estimation. The novel 1D formulation empowers our method to scale to very high-resolution input images while maintaining competitive performance. Extensive experiments on Sintel, KITTI and real-world 4K ($2160 \times 3840$) resolution images demonstrated the effectiveness and superiority of our proposed method.
翻訳日:2021-04-29 13:45:52 公開日:2021-04-28
# (参考訳) 教師なし画像キャプションにおける単語レベルのスプリアスアライメント除去と擬似カプセル化

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning ( http://arxiv.org/abs/2104.13872v1 )

Ukyo Honda, Yoshitaka Ushiku, Atsushi Hashimoto, Taro Watanabe, Yuji Matsumoto(参考訳) 教師なしのイメージキャプションは、イメージと文のペアを監督せずにキャプションを生成することを目的とした課題であり、画像から検出された異なるソースやオブジェクトラベルから引き出された画像や文のみを扱う。 以前の研究では、検出されたオブジェクトラベルを含む文である擬似カプセルが与えられた画像に割り当てられていた。 先行研究の焦点は,入力画像のアライメントと文レベルでの擬似カプセル化であった。 しかし、疑似キャプチャには、ある画像と無関係な多くの単語が含まれている。 本研究では,画像・文のアライメントからミスマッチした単語を除去し,その処理を困難にする方法を検討する。 本稿では,画像特徴を擬似キャプションの最も信頼性の高い単語(検出対象ラベル)と整合させるための簡単なゲーティング機構を提案する。 実験の結果,提案手法は複雑な文レベルの学習目標を導入することなく,従来の手法よりも優れていた。 従来の作業の文レベルのアライメント手法と組み合わせることで,その性能をさらに向上する。 これらの結果は,単語レベルの細部における注意的アライメントの重要性を裏付けるものである。

Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and object labels detected from the images. In previous work, pseudo-captions, i.e., sentences that contain the detected object labels, were assigned to a given image. The focus of the previous work was on the alignment of input images and pseudo-captions at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image. In this work, we investigate the effect of removing mismatched words from image-sentence alignment to determine how they make this task difficult. We propose a simple gating mechanism that is trained to align image features with only the most reliable words in pseudo-captions: the detected object labels. The experimental results show that our proposed method outperforms the previous methods without introducing complex sentence-level learning objectives. Combined with the sentence-level alignment method of previous work, our method further improves its performance. These results confirm the importance of careful alignment in word-level details.
翻訳日:2021-04-29 13:19:31 公開日:2021-04-28
# オフライン政策評価と最適化のための自己回帰ダイナミクスモデル

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization ( http://arxiv.org/abs/2104.13877v1 )

ライセンス: Link先を確認
Standard dynamics models for continuous control make use of feedforward computation to predict the conditional distribution of next state and reward given current state and action using a multivariate Gaussian with a diagonal covariance structure. This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and may be driven by the fact that fully observable physics-based simulation environments entail deterministic transition dynamics. In this paper, we challenge this conditional independence assumption and propose a family of expressive autoregressive dynamics models that generate different dimensions of the next state and reward sequentially conditioned on previous dimensions. We demonstrate that autoregressive dynamics models indeed outperform standard feedforward models in log-likelihood on heldout transitions. Furthermore, we compare different model-based and model-free off-policy evaluation (OPE) methods on RL Unplugged, a suite of offline MuJoCo datasets, and find that autoregressive dynamics models consistently outperform all baselines, achieving a new state-of-the-art. Finally, we show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer through data augmentation and improving performance using model-based planning.
翻訳日:2021-04-29 13:02:32 公開日:2021-04-28
# Feature-norm NetworkによるDeep Domain Generalization

Deep Domain Generalization with Feature-norm Network ( http://arxiv.org/abs/2104.13581v1 )

Mohammad Mahfujur Rahman, Clinton Fookes, Sridha Sridharan(参考訳) 本稿では,複数のソースドメインを用いたトレーニングの課題を,適応ステップなしでテスト時に新しいドメインに一般化する目的で解決する。 これはドメイン一般化 (dg) と呼ばれる。 dgの以前の作品は、ソースドメイン全体で同一のカテゴリまたはラベル空間を仮定している。 ソース領域間のカテゴリシフトの場合、ラベル空間間の大きなミスマッチによりDGの従来の手法は負の転送に弱いため、ターゲットの分類精度は低下する。 上記の問題に対処するために、ソースドメイン間の特徴分布に一致しないため、負の転送に頑健なエンドツーエンド機能ノルムネットワーク(FNN)を導入する。 また、FNNの一般化機能を改善するために、協調的機能ノルムネットワーク(CFNN)を導入する。 CFNNは、各トレーニングサンプルの次の最も可能性の高いカテゴリの予測と一致し、各ネットワークの後方エントロピーが増加する。 画像分類タスクのDG問題に対して提案するFNNおよびCFNNネットワークを適用し,最先端技術に対する大幅な改善を示す。

In this paper, we tackle the problem of training with multiple source domains with the aim to generalize to new domains at test time without an adaptation step. This is known as domain generalization (DG). Previous works on DG assume identical categories or label space across the source domains. In the case of category shift among the source domains, previous methods on DG are vulnerable to negative transfer due to the large mismatch among label spaces, decreasing the target classification accuracy. To tackle the aforementioned problem, we introduce an end-to-end feature-norm network (FNN) which is robust to negative transfer as it does not need to match the feature distribution among the source domains. We also introduce a collaborative feature-norm network (CFNN) to further improve the generalization capability of FNN. The CFNN matches the predictions of the next most likely categories for each training sample which increases each network's posterior entropy. We apply the proposed FNN and CFNN networks to the problem of DG for image classification tasks and demonstrate significant improvement over the state-of-the-art.
翻訳日:2021-04-29 13:01:57 公開日:2021-04-28
# 生成逆ネットワークを用いた教師なしドメイン適応における意味的一貫性の保存

Preserving Semantic Consistency in Unsupervised Domain Adaptation Using Generative Adversarial Networks ( http://arxiv.org/abs/2104.13725v1 )

ライセンス: Link先を確認
Unsupervised domain adaptation seeks to mitigate the distribution discrepancy between source and target domains, given labeled samples of the source domain and unlabeled samples of the target domain. Generative adversarial networks (GANs) have demonstrated significant improvement in domain adaptation by producing images which are domain specific for training. However, most of the existing GAN based techniques for unsupervised domain adaptation do not consider semantic information during domain matching, hence these methods degrade the performance when the source and target domain data are semantically different. In this paper, we propose an end-to-end novel semantic consistent generative adversarial network (SCGAN). This network can achieve source to target domain matching by capturing semantic information at the feature level and producing images for unsupervised domain adaptation from both the source and the target domains. We demonstrate the robustness of our proposed method which exceeds the state-of-the-art performance in unsupervised domain adaptation settings by performing experiments on digit and object classification tasks.
翻訳日:2021-04-29 13:01:41 公開日:2021-04-28
# 人物再同定のための姿勢誘導画像生成

Pose-driven Attention-guided Image Generation for Person Re-Identification ( http://arxiv.org/abs/2104.13773v1 )

ライセンス: Link先を確認
Person re-identification (re-ID) concerns the matching of subject images across different camera views in a multi camera surveillance system. One of the major challenges in person re-ID is pose variations across the camera network, which significantly affects the appearance of a person. Existing development data lack adequate pose variations to carry out effective training of person re-ID systems. To solve this issue, in this paper we propose an end-to-end pose-driven attention-guided generative adversarial network, to generate multiple poses of a person. We propose to attentively learn and transfer the subject pose through an attention mechanism. A semantic-consistency loss is proposed to preserve the semantic information of the person during pose transfer. To ensure fine image details are realistic after pose translation, an appearance discriminator is used while a pose discriminator is used to ensure the pose of the transferred images will exactly be the same as the target pose. We show that by incorporating the proposed approach in a person re-identification framework, realistic pose transferred images and state-of-the-art re-identification results can be achieved.
翻訳日:2021-04-29 13:01:26 公開日:2021-04-28
# 人物再同定のための意味的一貫性とアイデンティティマッピング多成分生成対向ネットワーク

Semantic Consistency and Identity Mapping Multi-Component Generative Adversarial Network for Person Re-Identification ( http://arxiv.org/abs/2104.13780v1 )

ライセンス: Link先を確認
In a real world environment, person re-identification (Re-ID) is a challenging task due to variations in lighting conditions, viewing angles, pose and occlusions. Despite recent performance gains, current person Re-ID algorithms still suffer heavily when encountering these variations. To address this problem, we propose a semantic consistency and identity mapping multi-component generative adversarial network (SC-IMGAN) which provides style adaptation from one to many domains. To ensure that transformed images are as realistic as possible, we propose novel identity mapping and semantic consistency losses to maintain identity across the diverse domains. For the Re-ID task, we propose a joint verification-identification quartet network which is trained with generated and real images, followed by an effective quartet loss for verification. Our proposed method outperforms state-of-the-art techniques on six challenging person Re-ID datasets: CUHK01, CUHK03, VIPeR, PRID2011, iLIDS and Market-1501.
翻訳日:2021-04-29 13:01:06 公開日:2021-04-28
# ツインズ:視覚変換器における空間的注意設計の再考

Twins: Revisiting Spatial Attention Design in Vision Transformers ( http://arxiv.org/abs/2104.13840v1 )

ライセンス: Link先を確認
Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks. In this work, we revisit the design of the spatial attention and demonstrate that a carefully-devised yet simple spatial attention mechanism performs favourably against the state-of-the-art schemes. As a result, we propose two vision transformer architectures, namely, Twins-PCPVT and Twins-SVT. Our proposed architectures are highly-efficient and easy to implement, only involving matrix multiplications that are highly optimized in modern deep learning frameworks. More importantly, the proposed architectures achieve excellent performance on a wide range of visual tasks including imagelevel classification as well as dense detection and segmentation. The simplicity and strong performance suggest that our proposed architectures may serve as stronger backbones for many vision tasks. Our code will be released soon at https://github.com/Meituan-AutoML/Twins .
翻訳日:2021-04-29 13:00:47 公開日:2021-04-28
# 視覚と言語知識蒸留によるゼロショット検出

Zero-Shot Detection via Vision and Language Knowledge Distillation ( http://arxiv.org/abs/2104.13921v1 )

ライセンス: Link先を確認
Zero-shot image classification has made promising progress by training the aligned image and text encoders. The goal of this work is to advance zero-shot object detection, which aims to detect novel objects without bounding box nor mask annotations. We propose ViLD, a training method via Vision and Language knowledge Distillation. We distill the knowledge from a pre-trained zero-shot image classification model (e.g., CLIP) into a two-stage detector (e.g., Mask R-CNN). Our method aligns the region embeddings in the detector to the text and image embeddings inferred by the pre-trained model. We use the text embeddings as the detection classifier, obtained by feeding category names into the pre-trained text encoder. We then minimize the distance between the region embeddings and image embeddings, obtained by feeding region proposals into the pre-trained image encoder. During inference, we include text embeddings of novel categories into the detection classifier for zero-shot detection. We benchmark the performance on LVIS dataset by holding out all rare categories as novel categories. ViLD obtains 16.1 mask AP$_r$ with a Mask R-CNN (ResNet-50 FPN) for zero-shot detection, outperforming the supervised counterpart by 3.8. The model can directly transfer to other datasets, achieving 72.2 AP$_{50}$, 36.6 AP and 11.8 AP on PASCAL VOC, COCO and Objects365, respectively.
翻訳日:2021-04-29 13:00:32 公開日:2021-04-28
# 医療用トランス:3次元MRI解析のためのユニバーサル脳エンコーダ

Medical Transformer: Universal Brain Encoder for 3D MRI Analysis ( http://arxiv.org/abs/2104.13633v1 )

ライセンス: Link先を確認
Transfer learning has gained attention in medical image analysis due to limited annotated 3D medical datasets for training data-driven deep learning models in the real world. Existing 3D-based methods have transferred the pre-trained models to downstream tasks, which achieved promising results with only a small number of training samples. However, they demand a massive amount of parameters to train the model for 3D medical imaging. In this work, we propose a novel transfer learning framework, called Medical Transformer, that effectively models 3D volumetric images in the form of a sequence of 2D image slices. To make a high-level representation in 3D-form empowering spatial relations better, we take a multi-view approach that leverages plenty of information from the three planes of 3D volume, while providing parameter-efficient training. For building a source model generally applicable to various tasks, we pre-train the model in a self-supervised learning manner for masked encoding vector prediction as a proxy task, using a large-scale normal, healthy brain magnetic resonance imaging (MRI) dataset. Our pre-trained model is evaluated on three downstream tasks: (i) brain disease diagnosis, (ii) brain age prediction, and (iii) brain tumor segmentation, which are actively studied in brain MRI research. The experimental results show that our Medical Transformer outperforms the state-of-the-art transfer learning methods, efficiently reducing the number of parameters up to about 92% for classification and
翻訳日:2021-04-29 12:59:51 公開日:2021-04-28
# 視覚物体追跡のための2段階

Two stages for visual object tracking ( http://arxiv.org/abs/2104.13648v1 )

ライセンス: Link先を確認
Siamese-based trackers have achived promising performance on visual object tracking tasks. Most existing Siamese-based trackers contain two separate branches for tracking, including classification branch and bounding box regression branch. In addition, image segmentation provides an alternative way to obetain the more accurate target region. In this paper, we propose a novel tracker with two-stages: detection and segmentation. The detection stage is capable of locating the target by Siamese networks. Then more accurate tracking results are obtained by segmentation module given the coarse state estimation in the first stage. We conduct experiments on four benchmarks. Our approach achieves state-of-the-art results, with the EAO of 52.6$\%$ on VOT2016, 51.3$\%$ on VOT2018, and 39.0$\%$ on VOT2019 datasets, respectively.
翻訳日:2021-04-29 12:59:25 公開日:2021-04-28
# MelBERT:メタフォリカル同定理論を用いた文脈的遅延相互作用によるメタフォリカル検出

MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories ( http://arxiv.org/abs/2104.13615v1 )

ライセンス: Link先を確認
Automated metaphor detection is a challenging task to identify metaphorical expressions of words in a sentence. To tackle this problem, we adopt pre-trained contextualized models, e.g., BERT and RoBERTa. To this end, we propose a novel metaphor detection model, namely metaphor-aware late interaction over BERT (MelBERT). Our model not only leverages contextualized word representation but also benefits from linguistic metaphor identification theories to distinguish between the contextual and literal meaning of words. Our empirical results demonstrate that MelBERT outperforms several strong baselines on four benchmark datasets, i.e., VUA-18, VUA-20, MOH-X, and TroFi.
翻訳日:2021-04-29 12:59:12 公開日:2021-04-28
# 半教師付きグラフノード分類のためのグラフデカップリング注意マルコフネットワーク

Graph Decoupling Attention Markov Networks for Semi-supervised Graph Node Classification ( http://arxiv.org/abs/2104.13718v1 )

ライセンス: Link先を確認
Graph neural networks (GNN) have been ubiquitous in graph learning tasks such as node classification. Most of GNN methods update the node embedding iteratively by aggregating its neighbors' information. However, they often suffer from negative disturbance, due to edges connecting nodes with different labels. One approach to alleviate this negative disturbance is to use attention, but current attention always considers feature similarity and suffers from the lack of supervision. In this paper, we consider the label dependency of graph nodes and propose a decoupling attention mechanism to learn both hard and soft attention. The hard attention is learned on labels for a refined graph structure with fewer inter-class edges. Its purpose is to reduce the aggregation's negative disturbance. The soft attention is learned on features maximizing the information gain by message passing over better graph structures. Moreover, the learned attention guides the label propagation and the feature propagation. Extensive experiments are performed on five well-known benchmark graph datasets to verify the effectiveness of the proposed method.
翻訳日:2021-04-29 12:58:58 公開日:2021-04-28
# 強化学習におけるオフポリシー値推定のための一般化投影ベルマン誤差

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning ( http://arxiv.org/abs/2104.13844v1 )

ライセンス: Link先を確認
Many reinforcement learning algorithms rely on value estimation. However, the most widely used algorithms -- namely temporal difference algorithms -- can diverge under both off-policy sampling and nonlinear function approximation. Many algorithms have been developed for off-policy value estimation which are sound under linear function approximation, based on the linear mean-squared projected Bellman error (PBE). Extending these methods to the non-linear case has been largely unsuccessful. Recently, several methods have been introduced that approximate a different objective, called the mean-squared Bellman error (BE), which naturally facilities nonlinear approximation. In this work, we build on these insights and introduce a new generalized PBE, that extends the linear PBE to the nonlinear setting. We show how this generalized objective unifies previous work, including previous theory, and obtain new bounds for the value error of the solutions of the generalized objective. We derive an easy-to-use, but sound, algorithm to minimize the generalized objective which is more stable across runs, is less sensitive to hyperparameters, and performs favorably across four control domains with neural network function approximation.
翻訳日:2021-04-29 12:58:45 公開日:2021-04-28
# MLDemon: 機械学習システムのデプロイ監視

MLDemon: Deployment Monitoring for Machine Learning Systems ( http://arxiv.org/abs/2104.13621v1 )

ライセンス: Link先を確認
Post-deployment monitoring of the performance of ML systems is critical for ensuring reliability, especially as new user inputs can differ from the training distribution. Here we propose a novel approach, MLDemon, for ML DEployment MONitoring. MLDemon integrates both unlabeled features and a small amount of on-demand labeled examples over time to produce a real-time estimate of the ML model's current performance on a given data stream. Subject to budget constraints, MLDemon decides when to acquire additional, potentially costly, supervised labels to verify the model. On temporal datasets with diverse distribution drifts and models, MLDemon substantially outperforms existing monitoring approaches. Moreover, we provide theoretical analysis to show that MLDemon is minimax rate optimal up to logarithmic factors and is provably robust against broad distribution drifts whereas prior approaches are not.
翻訳日:2021-04-29 12:58:16 公開日:2021-04-28
# タイトPAC-Bayesian C-Boundの直接最小化による自己境界多数投票学習アルゴリズム

Self-Bounding Majority Vote Learning Algorithms by the Direct Minimization of a Tight PAC-Bayesian C-Bound ( http://arxiv.org/abs/2104.13626v1 )

ライセンス: Link先を確認
In the PAC-Bayesian literature, the C-Bound refers to an insightful relation between the risk of a majority vote classifier (under the zero-one loss) and the first two moments of its margin (i.e., the expected margin and the voters' diversity). Until now, learning algorithms developed in this framework minimize the empirical version of the C-Bound, instead of explicit PAC-Bayesian generalization bounds. In this paper, by directly optimizing PAC-Bayesian guarantees on the C-Bound, we derive self-bounding majority vote learning algorithms. Moreover, our algorithms based on gradient descent are scalable and lead to accurate predictors paired with non-vacuous guarantees.
翻訳日:2021-04-29 12:58:01 公開日:2021-04-28
# 重ね合わせ主成分分析とグラフニューラルネットワークによる解釈可能な埋め込み手続き知識伝達

Interpretable Embedding Procedure Knowledge Transfer via Stacked Principal Component Analysis and Graph Neural Network ( http://arxiv.org/abs/2104.13561v1 )

ライセンス: Link先を確認
Knowledge distillation (KD) is one of the most useful techniques for light-weight neural networks. Although neural networks have a clear purpose of embedding datasets into the low-dimensional space, the existing knowledge was quite far from this purpose and provided only limited information. We argue that good knowledge should be able to interpret the embedding procedure. This paper proposes a method of generating interpretable embedding procedure (IEP) knowledge based on principal component analysis, and distilling it based on a message passing neural network. Experimental results show that the student network trained by the proposed KD method improves 2.28% in the CIFAR100 dataset, which is higher performance than the state-of-the-art (SOTA) method. We also demonstrate that the embedding procedure knowledge is interpretable via visualization of the proposed KD process. The implemented code is available at https://github.com/sseung0703/IEPKT.
翻訳日:2021-04-29 12:57:46 公開日:2021-04-28
# 先進的特化要因の活用による継続学習における先行知識の保存

Preserving Earlier Knowledge in Continual Learning with the Help of All Previous Feature Extractors ( http://arxiv.org/abs/2104.13614v1 )

ライセンス: Link先を確認
Continual learning of new knowledge over time is one desirable capability for intelligent systems to recognize more and more classes of objects. Without or with very limited amount of old data stored, an intelligent system often catastrophically forgets previously learned old knowledge when learning new knowledge. Recently, various approaches have been proposed to alleviate the catastrophic forgetting issue. However, old knowledge learned earlier is commonly less preserved than that learned more recently. In order to reduce the forgetting of particularly earlier learned old knowledge and improve the overall continual learning performance, we propose a simple yet effective fusion mechanism by including all the previously learned feature extractors into the intelligent model. In addition, a new feature extractor is included to the model when learning a new set of classes each time, and a feature extractor pruning is also applied to prevent the whole model size from growing rapidly. Experiments on multiple classification tasks show that the proposed approach can effectively reduce the forgetting of old knowledge, achieving state-of-the-art continual learning performance.
翻訳日:2021-04-29 12:57:31 公開日:2021-04-28
# 関節リウマチに対する深層学習 : X線による関節検出と損傷検査

Deep Learning for Rheumatoid Arthritis: Joint Detection and Damage Scoring in X-rays ( http://arxiv.org/abs/2104.13915v1 )

ライセンス: Link先を確認
Recent advancements in computer vision promise to automate medical image analysis. Rheumatoid arthritis is an autoimmune disease that would profit from computer-based diagnosis, as there are no direct markers known, and doctors have to rely on manual inspection of X-ray images. In this work, we present a multi-task deep learning model that simultaneously learns to localize joints on X-ray images and diagnose two kinds of joint damage: narrowing and erosion. Additionally, we propose a modification of label smoothing, which combines classification and regression cues into a single loss and achieves 5% relative error reduction compared to standard loss functions. Our final model obtained 4th place in joint space narrowing and 5th place in joint erosion in the global RA2 DREAM challenge.
翻訳日:2021-04-29 12:57:15 公開日:2021-04-28
# 顔認識の誤りは性別分類の誤りか?

Does Face Recognition Error Echo Gender Classification Error? ( http://arxiv.org/abs/2104.13803v1 )

ライセンス: Link先を確認
This paper is the first to explore the question of whether images that are classified incorrectly by a face analytics algorithm (e.g., gender classification) are any more or less likely to participate in an image pair that results in a face recognition error. We analyze results from three different gender classification algorithms (one open-source and two commercial), and two face recognition algorithms (one open-source and one commercial), on image sets representing four demographic groups (African-American female and male, Caucasian female and male). For impostor image pairs, our results show that pairs in which one image has a gender classification error have a better impostor distribution than pairs in which both images have correct gender classification, and so are less likely to generate a false match error. For genuine image pairs, our results show that individuals whose images have a mix of correct and incorrect gender classification have a worse genuine distribution (increased false non-match rate) compared to individuals whose images all have correct gender classification. Thus, compared to images that generate correct gender classification, images that generate gender classification errors do generate a different pattern of recognition errors, both better (false match) and worse (false non-match).
翻訳日:2021-04-29 12:57:04 公開日:2021-04-28
# 新型コロナウイルス(covid-19)のクローズドプラットフォームにおける噂の進化

The Evolution of Rumors on a Closed Platform during COVID-19 ( http://arxiv.org/abs/2104.13816v1 )

ライセンス: Link先を確認
In this work we looked into a dataset of 114 thousands of suspicious messages collected from the most popular closed messaging platform in Taiwan between January and July, 2020. We proposed an hybrid algorithm that could efficiently cluster a large number of text messages according their topics and narratives. That is, we obtained groups of messages that are within a limited content alterations within each other. By employing the algorithm to the dataset, we were able to look at the content alterations and the temporal dynamics of each particular rumor over time. With qualitative case studies of three COVID-19 related rumors, we have found that key authoritative figures were often misquoted in false information. It was an effective measure to increase the popularity of one false information. In addition, fact-check was not effective in stopping misinformation from getting attention. In fact, the popularity of one false information was often more influenced by major societal events and effective content alterations.
翻訳日:2021-04-29 12:56:41 公開日:2021-04-28
# 多エージェント深部強化学習を用いた終端区間ハンドリング

End-to-End Intersection Handling using Multi-Agent Deep Reinforcement Learning ( http://arxiv.org/abs/2104.13617v1 )

ライセンス: Link先を確認
Navigating through intersections is one of the main challenging tasks for an autonomous vehicle. However, for the majority of intersections regulated by traffic lights, the problem could be solved by a simple rule-based method in which the autonomous vehicle behavior is closely related to the traffic light states. In this work, we focus on the implementation of a system able to navigate through intersections where only traffic signs are provided. We propose a multi-agent system using a continuous, model-free Deep Reinforcement Learning algorithm used to train a neural network for predicting both the acceleration and the steering angle at each time step. We demonstrate that agents learn both the basic rules needed to handle intersections by understanding the priorities of other learners inside the environment, and to drive safely along their paths. Moreover, a comparison between our system and a rule-based method proves that our model achieves better results especially with dense traffic conditions. Finally, we test our system on real world scenarios using real recorded traffic data, proving that our module is able to generalize both to unseen environments and to different traffic conditions.
翻訳日:2021-04-29 12:55:52 公開日:2021-04-28
# 検索コンテンツにおける社会的バイアス--BERTランキングのフレームワークと対応緩和-

Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation for BERT Rankers ( http://arxiv.org/abs/2104.13640v1 )

ライセンス: Link先を確認
Societal biases resonate in the retrieved contents of information retrieval (IR) systems, resulting in reinforcing existing stereotypes. Approaching this issue requires established measures of fairness regarding the representation of various social groups in retrieved contents, as well as methods to mitigate such biases, particularly in the light of the advances in deep ranking models. In this work, we first provide a novel framework to measure the fairness in the retrieved text contents of ranking models. Introducing a ranker-agnostic measurement, the framework also enables the disentanglement of the effect on fairness of collection from that of rankers. Second, we propose an adversarial bias mitigation approach applied to the state-of-the-art Bert rankers, which jointly learns to predict relevance and remove protected attributes. We conduct experiments on two passage retrieval collections (MS MARCO Passage Re-ranking and TREC Deep Learning 2019 Passage Re-ranking), which we extend by fairness annotations of a selected subset of queries regarding gender attributes. Our results on the MS MARCO benchmark show that, while the fairness of all ranking models is lower than the ones of ranker-agnostic baselines, the fairness in retrieved contents significantly improves when applying the proposed adversarial training. Lastly, we investigate the trade-off between fairness and utility, showing that through applying a combinatorial model selection method, we can maintain the significant improvements in fairness without any significant loss in utility.
翻訳日:2021-04-29 12:55:35 公開日:2021-04-28
# データからのシンボル的抽象化:PAC学習アプローチ

Symbolic Abstractions From Data: A PAC Learning Approach ( http://arxiv.org/abs/2104.13901v1 )

ライセンス: Link先を確認
Symbolic control techniques aim to satisfy complex logic specifications. A critical step in these techniques is the construction of a symbolic (discrete) abstraction, a finite-state system whose behaviour mimics that of a given continuous-state system. The methods used to compute symbolic abstractions, however, require knowledge of an accurate closed-form model. To generalize them to systems with unknown dynamics, we present a new data-driven approach that does not require closed-form dynamics, instead relying only the ability to evaluate successors of each state under given inputs. To provide guarantees for the learned abstraction, we use the Probably Approximately Correct (PAC) statistical framework. We first introduce a PAC-style behavioural relationship and an appropriate refinement procedure. We then show how the symbolic abstraction can be constructed to satisfy this new behavioural relationship. Moreover, we provide PAC bounds that dictate the number of data required to guarantee a prescribed level of accuracy and confidence. Finally, we present an illustrative example.
翻訳日:2021-04-29 12:55:09 公開日:2021-04-28
# サブガウス混合系における超パラメータ最大マージン分類のリスク境界

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures ( http://arxiv.org/abs/2104.13628v1 )

ライセンス: Link先を確認
Modern machine learning systems such as deep neural networks are often highly over-parameterized so that they can fit the noisy training data exactly, yet they can still achieve small test errors in practice. In this paper, we study this "benign overfitting" (Bartlett et al. (2020)) phenomenon of the maximum margin classifier for linear classification problems. Specifically, we consider data generated from sub-Gaussian mixtures, and provide a tight risk bound for the maximum margin linear classifier in the over-parameterized setting. Our results precisely characterize the condition under which benign overfitting can occur in linear classification problems, and improve on previous work. They also have direct implications for over-parameterized logistic regression.
翻訳日:2021-04-29 12:54:40 公開日:2021-04-28
# NUQSGD:不均一量子化による通信効率の高いデータ並列SGD

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization ( http://arxiv.org/abs/2104.13818v1 )

ライセンス: Link先を確認
As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods.
翻訳日:2021-04-29 12:54:29 公開日:2021-04-28
# 腎細胞癌顕微鏡画像における核検出のためのマルチスケールディープラーニングアーキテクチャ

Multi-scale Deep Learning Architecture for Nucleus Detection in Renal Cell Carcinoma Microscopy Image ( http://arxiv.org/abs/2104.13557v1 )

ライセンス: Link先を確認
Clear cell renal cell carcinoma (ccRCC) is one of the most common forms of intratumoral heterogeneity in the study of renal cancer. ccRCC originates from the epithelial lining of proximal convoluted renal tubules. These cells undergo abnormal mutations in the presence of Ki67 protein and create a lump-like structure through cell proliferation. Manual counting of tumor cells in the tissue-affected sections is one of the strongest prognostic markers for renal cancer. However, this procedure is time-consuming and also prone to subjectivity. These assessments are based on the physical cell appearance and suffer wide intra-observer variations. Therefore, better cell nucleus detection and counting techniques can be an important biomarker for the assessment of tumor cell proliferation in routine pathological investigations. In this paper, we introduce a deep learning-based detection model for cell classification on IHC stained histology images. These images are classified into binary classes to find the presence of Ki67 protein in cancer-affected nucleus regions. Our model maps the multi-scale pyramid features and saliency information from local bounded regions and predicts the bounding box coordinates through regression. Our method validates the impact of Ki67 expression across a cohort of four hundred histology images treated with localized ccRCC and compares our results with the existing state-of-the-art nucleus detection methods. The precision and recall scores of the proposed method are computed and compared on the clinical data sets. The experimental results demonstrate that our model improves the F1 score up to 86.3% and an average area under the Precision-Recall curve as 85.73%.
翻訳日:2021-04-29 12:53:41 公開日:2021-04-28
# 画像から画像への変換による組織像中の癌領域の教師なし検出

Unsupervised Detection of Cancerous Regions in Histology Imagery using Image-to-Image Translation ( http://arxiv.org/abs/2104.13786v1 )

ライセンス: Link先を確認
Detection of visual anomalies refers to the problem of finding patterns in different imaging data that do not conform to the expected visual appearance and is a widely studied problem in different domains. Due to the nature of anomaly occurrences and underlying generating processes, it is hard to characterize them and obtain labeled data. Obtaining labeled data is especially difficult in biomedical applications, where only trained domain experts can provide labels, which often come in large diversity and complexity. Recently presented approaches for unsupervised detection of visual anomalies approaches omit the need for labeled data and demonstrate promising results in domains, where anomalous samples significantly deviate from the normal appearance. Despite promising results, the performance of such approaches still lags behind supervised approaches and does not provide a one-fits-all solution. In this work, we present an image-to-image translation-based framework that significantly surpasses the performance of existing unsupervised methods and approaches the performance of supervised methods in a challenging domain of cancerous region detection in histology imagery.
翻訳日:2021-04-29 12:53:20 公開日:2021-04-28
# MRIとCTの深層学習身体領域分類

Deep Learning Body Region Classification of MRI and CT examinations ( http://arxiv.org/abs/2104.13826v1 )

ライセンス: Link先を確認
Standardized body region labelling of individual images provides data that can improve human and computer use of medical images. A CNN-based classifier was developed to identify body regions in CT and MRI. 17 CT (18 MRI) body regions covering the entire human body were defined for the classification task. Three retrospective databases were built for the AI model training, validation, and testing, with a balanced distribution of studies per body region. The test databases originated from a different healthcare network. Accuracy, recall and precision of the classifier was evaluated for patient age, patient gender, institution, scanner manufacturer, contrast, slice thickness, MRI sequence, and CT kernel. The data included a retrospective cohort of 2,934 anonymized CT cases (training: 1,804 studies, validation: 602 studies, test: 528 studies) and 3,185 anonymized MRI cases (training: 1,911 studies, validation: 636 studies, test: 638 studies). 27 institutions from primary care hospitals, community hospitals and imaging centers contributed to the test datasets. The data included cases of all genders in equal proportions and subjects aged from a few months old to +90 years old. An image-level prediction accuracy of 91.9% (90.2 - 92.1) for CT, and 94.2% (92.0 - 95.6) for MRI was achieved. The classification results were robust across all body regions and confounding factors. Due to limited data, performance results for subjects under 10 years-old could not be reliably evaluated. We show that deep learning models can classify CT and MRI images by body region including lower and upper extremities with high accuracy.
翻訳日:2021-04-29 12:53:03 公開日:2021-04-28
# スパイク矩形モデルにおける信号の検出

Detection of Signal in the Spiked Rectangular Models ( http://arxiv.org/abs/2104.13517v1 )

ライセンス: Link先を確認
We consider the problem of detecting signals in the rank-one signal-plus-noise data matrix models that generalize the spiked Wishart matrices. We show that the principal component analysis can be improved by pre-transforming the matrix entries if the noise is non-Gaussian. As an intermediate step, we prove a sharp phase transition of the largest eigenvalues of spiked rectangular matrices, which extends the Baik-Ben Arous-P\'ech\'e (BBP) transition. We also propose a hypothesis test to detect the presence of signal with low computational complexity, based on the linear spectral statistics, which minimizes the sum of the Type-I and Type-II errors when the noise is Gaussian.
翻訳日:2021-04-29 12:52:33 公開日:2021-04-28
# 生体関係抽出のためのコントラスト学習を用いたBERTモデルの改良

Improving BERT Model Using Contrastive Learning for Biomedical Relation Extraction ( http://arxiv.org/abs/2104.13913v1 )

ライセンス: Link先を確認
Contrastive learning has been used to learn a high-quality representation of the image in computer vision. However, contrastive learning is not widely utilized in natural language processing due to the lack of a general method of data augmentation for text data. In this work, we explore the method of employing contrastive learning to improve the text representation from the BERT model for relation extraction. The key knob of our framework is a unique contrastive pre-training step tailored for the relation extraction tasks by seamlessly integrating linguistic knowledge into the data augmentation. Furthermore, we investigate how large-scale data constructed from the external knowledge bases can enhance the generality of contrastive pre-training of BERT. The experimental results on three relation extraction benchmark datasets demonstrate that our method can improve the BERT model representation and achieve state-of-the-art performance. In addition, we explore the interpretability of models by showing that BERT with contrastive pre-training relies more on rationales for prediction. Our code and data are publicly available at: https://github.com/udel-biotm-lab/BERT-CLRE.
翻訳日:2021-04-29 12:52:10 公開日:2021-04-28
# 密相関量を用いた極端回転推定

Extreme Rotation Estimation using Dense Correlation Volumes ( http://arxiv.org/abs/2104.13530v1 )

ライセンス: Link先を確認
We present a technique for estimating the relative 3D rotation of an RGB image pair in an extreme setting, where the images have little or no overlap. We observe that, even when images do not overlap, there may be rich hidden cues as to their geometric relationship, such as light source directions, vanishing points, and symmetries present in the scene. We propose a network design that can automatically learn such implicit cues by comparing all pairs of points between the two input images. Our method therefore constructs dense feature correlation volumes and processes these to predict relative 3D rotations. Our predictions are formed over a fine-grained discretization of rotations, bypassing difficulties associated with regressing 3D rotations. We demonstrate our approach on a large variety of extreme RGB image pairs, including indoor and outdoor images captured under different lighting conditions and geographic locations. Our evaluation shows that our model can successfully estimate relative rotations among non-overlapping images without compromising performance over overlapping image pairs.
翻訳日:2021-04-29 12:50:19 公開日:2021-04-28
# PAFNet: 効率的なアンカーフリーオブジェクト検出器ガイダンス

PAFNet: An Efficient Anchor-Free Object Detector Guidance ( http://arxiv.org/abs/2104.13534v1 )

ライセンス: Link先を確認
Object detection is a basic but challenging task in computer vision, which plays a key role in a variety of industrial applications. However, object detectors based on deep learning usually require greater storage requirements and longer inference time, which hinders its practicality seriously. Therefore, a trade-off between effectiveness and efficiency is necessary in practical scenarios. Considering that without constraint of pre-defined anchors, anchor-free detectors can achieve acceptable accuracy and inference speed simultaneously. In this paper, we start from an anchor-free detector called TTFNet, modify the structure of TTFNet and introduce multiple existing tricks to realize effective server and mobile solutions respectively. Since all experiments in this paper are conducted based on PaddlePaddle, we call the model as PAFNet(Paddle Anchor Free Network). For server side, PAFNet can achieve a better balance between effectiveness (42.2% mAP) and efficiency (67.15 FPS) on a single V100 GPU. For moblie side, PAFNet-lite can achieve a better accuracy of (23.9% mAP) and 26.00 ms on Kirin 990 ARM CPU, outperforming the existing state-of-the-art anchor-free detectors by significant margins. Source code is at https://github.com/PaddlePaddle/PaddleDetection.
翻訳日:2021-04-29 12:50:01 公開日:2021-04-28
# シーン境界検出のためのショットコントラスト自己監督学習

Shot Contrastive Self-Supervised Learning for Scene Boundary Detection ( http://arxiv.org/abs/2104.13537v1 )

ライセンス: Link先を確認
Scenes play a crucial role in breaking the storyline of movies and TV episodes into semantically cohesive parts. However, given their complex temporal structure, finding scene boundaries can be a challenging task requiring large amounts of labeled training data. To address this challenge, we present a self-supervised shot contrastive learning approach (ShotCoL) to learn a shot representation that maximizes the similarity between nearby shots compared to randomly selected shots. We show how to apply our learned shot representation for the task of scene boundary detection to offer state-of-the-art performance on the MovieNet dataset while requiring only ~25% of the training labels, using 9x fewer model parameters and offering 7x faster runtime. To assess the effectiveness of ShotCoL on novel applications of scene boundary detection, we take on the problem of finding timestamps in movies and TV episodes where video-ads can be inserted while offering a minimally disruptive viewing experience. To this end, we collected a new dataset called AdCuepoints with 3,975 movies and TV episodes, 2.2 million shots and 19,119 minimally disruptive ad cue-point labels. We present a thorough empirical analysis on this dataset demonstrating the effectiveness of ShotCoL for ad cue-points detection.
翻訳日:2021-04-29 12:49:41 公開日:2021-04-28
# 骨格に基づく行動認識の再考

Revisiting Skeleton-based Action Recognition ( http://arxiv.org/abs/2104.13586v1 )

ライセンス: Link先を確認
Human skeleton, as a compact representation of human action, has received increasing attention in recent years. Many skeleton-based action recognition methods adopt graph convolutional networks (GCN) to extract features on top of human skeletons. Despite the positive results shown in previous works, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. In this work, we propose PoseC3D, a new approach to skeleton-based action recognition, which relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons. Compared to GCN-based methods, PoseC3D is more effective in learning spatiotemporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings. Also, PoseC3D can handle multiple-person scenarios without additional computation cost, and its features can be easily integrated with other modalities at early fusion stages, which provides a great design space to further boost the performance. On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality.
翻訳日:2021-04-29 12:49:18 公開日:2021-04-28
# DeRenderNet:形状(In)依存シェーディングレンダリングによる都市景観の内在的画像分解

DeRenderNet: Intrinsic Image Decomposition of Urban Scenes with Shape-(In)dependent Shading Rendering ( http://arxiv.org/abs/2104.13602v1 )

ライセンス: Link先を確認
We propose DeRenderNet, a deep neural network to decompose the albedo and latent lighting, and render shape-(in)dependent shadings, given a single image of an outdoor urban scene, trained in a self-supervised manner. To achieve this goal, we propose to use the albedo maps extracted from scenes in videogames as direct supervision and pre-compute the normal and shadow prior maps based on the depth maps provided as indirect supervision. Compared with state-of-the-art intrinsic image decomposition methods, DeRenderNet produces shadow-free albedo maps with clean details and an accurate prediction of shadows in the shape-independent shading, which is shown to be effective in re-rendering and improving the accuracy of high-level vision tasks for urban scenes.
翻訳日:2021-04-29 12:49:02 公開日:2021-04-28
# 自己監督深度推定によるドメイン適応セマンティックセマンティックセグメンテーション

Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation ( http://arxiv.org/abs/2104.13613v1 )

ライセンス: Link先を確認
Domain adaptation for semantic segmentation aims to improve the model performance in the presence of a distribution shift between source and target domain. Leveraging the supervision from auxiliary tasks~(such as depth estimation) has the potential to heal this shift because many visual tasks are closely related to each other. However, such a supervision is not always available. In this work, we leverage the guidance from self-supervised depth estimation, which is available on both domains, to bridge the domain gap. On the one hand, we propose to explicitly learn the task feature correlation to strengthen the target semantic predictions with the help of target depth estimation. On the other hand, we use the depth prediction discrepancy from source and target depth decoders to approximate the pixel-wise adaptation difficulty. The adaptation difficulty, inferred from depth, is then used to refine the target semantic segmentation pseudo-labels. The proposed method can be easily implemented into existing segmentation frameworks. We demonstrate the effectiveness of our proposed approach on the benchmark tasks SYNTHIA-to-Cityscapes and GTA-to-Cityscapes, on which we achieve the new state-of-the-art performance of $55.0\%$ and $56.6\%$, respectively. Our code is available at \url{https://github.com/qinenergy/corda}.
翻訳日:2021-04-29 12:48:46 公開日:2021-04-28
# 効率的なクラスタ初期化のためのディープラーニングオブジェクト検出法

A Deep Learning Object Detection Method for an Efficient Clusters Initializatio ( http://arxiv.org/abs/2104.13634v1 )

ライセンス: Link先を確認
Clustering is an unsupervised machine learning method grouping data samples into clusters of similar objects. In practice, clustering has been used in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines. However, the existing clustering techniques present significant limitations, from which is the dependability of their stability on the initialization parameters (e.g. number of clusters, centroids). Different solutions were presented in the literature to overcome this limitation (i.e. internal and external validation metrics). However, these solutions require high computational complexity and memory consumption, especially when dealing with high dimensional data. In this paper, we apply the recent object detection Deep Learning (DL) model, named YOLO-v5, to detect the initial clustering parameters such as the number of clusters with their sizes and possible centroids. Mainly, the proposed solution consists of adding a DL-based initialization phase making the clustering algorithms free of initialization. The results show that the proposed solution can provide near-optimal clusters initialization parameters with low computational and resources overhead compared to existing solutions.
翻訳日:2021-04-29 12:48:22 公開日:2021-04-28
# 画像検索におけるcentroidsの不合理な効果について

On the Unreasonable Effectiveness of Centroids in Image Retrieval ( http://arxiv.org/abs/2104.13643v1 )

ライセンス: Link先を確認
Image retrieval task consists of finding similar images to a query image from a set of gallery (database) images. Such systems are used in various applications e.g. person re-identification (ReID) or visual product search. Despite active development of retrieval models it still remains a challenging task mainly due to large intra-class variance caused by changes in view angle, lighting, background clutter or occlusion, while inter-class variance may be relatively low. A large portion of current research focuses on creating more robust features and modifying objective functions, usually based on Triplet Loss. Some works experiment with using centroid/proxy representation of a class to alleviate problems with computing speed and hard samples mining used with Triplet Loss. However, these approaches are used for training alone and discarded during the retrieval stage. In this paper we propose to use the mean centroid representation both during training and retrieval. Such an aggregated representation is more robust to outliers and assures more stable features. As each class is represented by a single embedding - the class centroid - both retrieval time and storage requirements are reduced significantly. Aggregating multiple embeddings results in a significant reduction of the search space due to lowering the number of candidate target vectors, which makes the method especially suitable for production deployments. Comprehensive experiments conducted on two ReID and Fashion Retrieval datasets demonstrate effectiveness of our method, which outperforms the current state-of-the-art. We propose centroid training and retrieval as a viable method for both Fashion Retrieval and ReID applications.
翻訳日:2021-04-29 12:48:06 公開日:2021-04-28
# 3次元顔形状情報に基づくロバスト・フェイススワップ検出

Robust Face-Swap Detection Based on 3D Facial Shape Information ( http://arxiv.org/abs/2104.13665v1 )

ライセンス: Link先を確認
Maliciously-manipulated images or videos - so-called deep fakes - especially face-swap images and videos have attracted more and more malicious attackers to discredit some key figures. Previous pixel-level artifacts based detection techniques always focus on some unclear patterns but ignore some available semantic clues. Therefore, these approaches show weak interpretability and robustness. In this paper, we propose a biometric information based method to fully exploit the appearance and shape feature for face-swap detection of key figures. The key aspect of our method is obtaining the inconsistency of 3D facial shape and facial appearance, and the inconsistency based clue offers natural interpretability for the proposed face-swap detection method. Experimental results show the superiority of our method in robustness on various laundering and cross-domain data, which validates the effectiveness of the proposed method.
翻訳日:2021-04-29 12:47:41 公開日:2021-04-28
# AdvHaze: 敵のヘイズ攻撃

AdvHaze: Adversarial Haze Attack ( http://arxiv.org/abs/2104.13673v1 )

ライセンス: Link先を確認
In recent years, adversarial attacks have drawn more attention for their value on evaluating and improving the robustness of machine learning models, especially, neural network models. However, previous attack methods have mainly focused on applying some $l^p$ norm-bounded noise perturbations. In this paper, we instead introduce a novel adversarial attack method based on haze, which is a common phenomenon in real-world scenery. Our method can synthesize potentially adversarial haze into an image based on the atmospheric scattering model with high realisticity and mislead classifiers to predict an incorrect class. We launch experiments on two popular datasets, i.e., ImageNet and NIPS~2017. We demonstrate that the proposed method achieves a high success rate, and holds better transferability across different classification models than the baselines. We also visualize the correlation matrices, which inspire us to jointly apply different perturbations to improve the success rate of the attack. We hope this work can boost the development of non-noise-based adversarial attacks and help evaluate and improve the robustness of DNNs.
翻訳日:2021-04-29 12:47:24 公開日:2021-04-28
# PANDA : 知覚神経による異常検出

PANDA : Perceptually Aware Neural Detection of Anomalies ( http://arxiv.org/abs/2104.13702v1 )

ライセンス: Link先を確認
Semi-supervised methods of anomaly detection have seen substantial advancement in recent years. Of particular interest are applications of such methods to diverse, real-world anomaly detection problems where anomalous variations can vary from the visually obvious to the very subtle. In this work, we propose a novel fine-grained VAE-GAN architecture trained in a semi-supervised manner in order to detect both visually distinct and subtle anomalies. With the use of a residually connected dual-feature extractor, a fine-grained discriminator and a perceptual loss function, we are able to detect subtle, low inter-class (anomaly vs. normal) variant anomalies with greater detection capability and smaller margins of deviation in AUC value during inference compared to prior work whilst also remaining time-efficient during inference. We achieve state of-the-art anomaly detection results when compared extensively with prior semi-supervised approaches across a multitude of anomaly detection benchmark tasks including trivial leave-one out tasks (CIFAR-10 - AUPRCavg: 0.91; MNIST - AUPRCavg: 0.90) in addition to challenging real-world anomaly detection tasks (plant leaf disease - AUC: 0.776; threat item X-ray - AUC: 0.51), video frame-level anomaly detection (UCSDPed1 - AUC: 0.95) and high frequency texture with object anomalous defect detection (MVTEC - AUCavg: 0.83).
翻訳日:2021-04-29 12:47:09 公開日:2021-04-28
# minegan++: 限られたデータドメインへの効率的な知識伝達のための生成モデル

MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains ( http://arxiv.org/abs/2104.13742v1 )

ライセンス: Link先を確認
GANs largely increases the potential impact of generative models. Therefore, we propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs. This is done using a miner network that identifies which part of the generative distribution of each pretrained GAN outputs samples closest to the target domain. Mining effectively steers GAN sampling towards suitable regions of the latent space, which facilitates the posterior finetuning and avoids pathologies of other methods, such as mode collapse and lack of flexibility. Furthermore, to prevent overfitting on small target domains, we introduce sparse subnetwork selection, that restricts the set of trainable neurons to those that are relevant for the target dataset. We perform comprehensive experiments on several challenging datasets using various GAN architectures (BigGAN, Progressive GAN, and StyleGAN) and show that the proposed method, called MineGAN, effectively transfers knowledge to domains with few target images, outperforming existing methods. In addition, MineGAN can successfully transfer knowledge from multiple pretrained GANs.
翻訳日:2021-04-29 12:46:42 公開日:2021-04-28
# マスク認識によるエンド・ツー・エンドキャッケードリファインメントによるイメージインペインティング

Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness ( http://arxiv.org/abs/2104.13743v1 )

ライセンス: Link先を確認
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial. Though U-shaped encoder-decoder frameworks have been witnessed to be successful, most of them share a common drawback of mask unawareness in feature extraction because all convolution windows (or regions), including those with various shapes of missing pixels, are treated equally and filtered with fixed learned kernels. To this end, we propose our novel mask-aware inpainting solution. Firstly, a Mask-Aware Dynamic Filtering (MADF) module is designed to effectively learn multi-scale features for missing regions in the encoding phase. Specifically, filters for each convolution window are generated from features of the corresponding region of the mask. The second fold of mask awareness is achieved by adopting Point-wise Normalization (PN) in our decoding phase, considering that statistical natures of features at masked points differentiate from those of unmasked points. The proposed PN can tackle this issue by dynamically assigning point-wise scaling factor and bias. Lastly, our model is designed to be an end-to-end cascaded refinement one. Supervision information such as reconstruction loss, perceptual loss and total variation loss is incrementally leveraged to boost the inpainting results from coarse to fine. Effectiveness of the proposed framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets including Places2, CelebA and Paris StreetView.
翻訳日:2021-04-29 12:46:20 公開日:2021-04-28
# mod: 軍事用物体検出ベンチマーク

MOD: Benchmark for Military Object Detection ( http://arxiv.org/abs/2104.13763v1 )

ライセンス: Link先を確認
Object detection is widely studied in computer vision filed. In recent years, certain representative deep learning based detection methods along with solid benchmarks are proposed, which boosts the development of related researchs. However, there is no object detection benchmark targeted at military field so far. To facilitate future military object detection research, we propose a novel, publicly available object detection benchmark in military filed called MOD, which contains 6,000 images and 17,465 labeled instances. Unlike previous benchmarks, objects in MOD contain unique challenges such as camouflage, blur, inter-class similarity, intra-class variance and complex military environment. Experiments show that under above chanllenges, existing detection methods suffer from undesirable performance. To address this issue, we propose LGA-RCNN which utilizes a loss-guided attention (LGA) module to highlight representative region of objects. Then, those highlighted local information are fused with global information for precise classification and localization. Extensive experiments on MOD validate the effectiveness of our method.
翻訳日:2021-04-29 12:45:55 公開日:2021-04-28
# 一方向歩行者検出のためのセグメンテーションベースバウンディングボックス生成

Segmentation-Based Bounding Box Generation for Omnidirectional Pedestrian Detection ( http://arxiv.org/abs/2104.13764v1 )

ライセンス: Link先を確認
We propose a segmentation-based bounding box generation method for omnidirectional pedestrian detection, which enables detectors to tightly fit bounding boxes to pedestrians without omnidirectional images for training. Because the appearance of pedestrians in omnidirectional images may be rotated to any angle, the performance of common pedestrian detectors is likely to be substantially degraded. Existing methods mitigate this issue by transforming images during inference or training detectors with omnidirectional images. However, the first approach substantially degrades the inference speed, and the second approach requires laborious annotations. To overcome these drawbacks, we leverage an existing large-scale dataset, whose segmentation annotations can be utilized, to generate tightly fitted bounding box annotations. We also develop a pseudo-fisheye distortion augmentation method, which further enhances the performance. Extensive analysis shows that our detector successfully fits bounding boxes to pedestrians and demonstrates substantial performance improvement.
翻訳日:2021-04-29 12:45:41 公開日:2021-04-28
# 圧縮正規化によるラベル雑音の強化

Boosting Co-teaching with Compression Regularization for Label Noise ( http://arxiv.org/abs/2104.13766v1 )

ライセンス: Link先を確認
In this paper, we study the problem of learning image classification models in the presence of label noise. We revisit a simple compression regularization named Nested Dropout. We find that Nested Dropout, though originally proposed to perform fast information retrieval and adaptive data compression, can properly regularize a neural network to combat label noise. Moreover, owing to its simplicity, it can be easily combined with Co-teaching to further boost the performance. Our final model remains simple yet effective: it achieves comparable or even better performance than the state-of-the-art approaches on two real-world datasets with label noise which are Clothing1M and ANIMAL-10N. On Clothing1M, our approach obtains 74.9% accuracy which is slightly better than that of DivideMix. On ANIMAL-10N, we achieve 84.1% accuracy while the best public result by PLC is 83.4%. We hope that our simple approach can be served as a strong baseline for learning with label noise. Our implementation is available at https://github.com/yingyichen-cyy/Nested-Co-teaching.
翻訳日:2021-04-29 12:45:26 公開日:2021-04-28
# 変化点変調擬似ラベルによる符号分割

Sign Segmentation with Changepoint-Modulated Pseudo-Labelling ( http://arxiv.org/abs/2104.13817v1 )

ライセンス: Link先を確認
The objective of this work is to find temporal boundaries between signs in continuous sign language. Motivated by the paucity of annotation available for this task, we propose a simple yet effective algorithm to improve segmentation performance on unlabelled signing footage from a domain of interest. We make the following contributions: (1) We motivate and introduce the task of source-free domain adaptation for sign language segmentation, in which labelled source data is available for an initial training phase, but is not available during adaptation. (2) We propose the Changepoint-Modulated Pseudo-Labelling (CMPL) algorithm to leverage cues from abrupt changes in motion-sensitive feature space to improve pseudo-labelling quality for adaptation. (3) We showcase the effectiveness of our approach for category-agnostic sign segmentation, transferring from the BSLCORPUS to the BSL-1K and RWTH-PHOENIX-Weather 2014 datasets, where we outperform the prior state of the art.
翻訳日:2021-04-29 12:45:09 公開日:2021-04-28
# PDNet: 予測デカップリングによるワンステージオブジェクト検出の改善を目指す

PDNet: Towards Better One-stage Object Detection with Prediction Decoupling ( http://arxiv.org/abs/2104.13876v1 )

ライセンス: Link先を確認
Recent one-stage object detectors follow a per-pixel prediction approach that predicts both the object category scores and boundary positions from every single grid location. However, the most suitable positions for inferring different targets, i.e., the object category and boundaries, are generally different. Predicting all these targets from the same grid location thus may lead to sub-optimal results. In this paper, we analyze the suitable inference positions for object category and boundaries, and propose a prediction-target-decoupled detector named PDNet to establish a more flexible detection paradigm. Our PDNet with the prediction decoupling mechanism encodes different targets separately in different locations. A learnable prediction collection module is devised with two sets of dynamic points, i.e., dynamic boundary points and semantic points, to collect and aggregate the predictions from the favorable regions for localization and classification. We adopt a two-step strategy to learn these dynamic point positions, where the prior positions are estimated for different targets first, and the network further predicts residual offsets to the positions with better perceptions of the object properties. Extensive experiments on the MS COCO benchmark demonstrate the effectiveness and efficiency of our method. With a single ResNeXt-64x4d-101 as the backbone, our detector achieves 48.7 AP with single-scale testing, which outperforms the state-of-the-art methods by an appreciable margin under the same experimental settings. Moreover, our detector is highly efficient as a one-stage framework. Our code will be public.
翻訳日:2021-04-29 12:44:53 公開日:2021-04-28
# 異常検出用塗装変圧器

Inpainting Transformer for Anomaly Detection ( http://arxiv.org/abs/2104.13897v1 )

ライセンス: Link先を確認
Anomaly detection in computer vision is the task of identifying images which deviate from a set of normal images. A common approach is to train deep convolutional autoencoders to inpaint covered parts of an image and compare the output with the original image. By training on anomaly-free samples only, the model is assumed to not being able to reconstruct anomalous regions properly. For anomaly detection by inpainting we suggest it to be beneficial to incorporate information from potentially distant regions. In particular we pose anomaly detection as a patch-inpainting problem and propose to solve it with a purely self-attention based approach discarding convolutions. The proposed Inpainting Transformer (InTra) is trained to inpaint covered patches in a large sequence of image patches, thereby integrating information across large regions of the input image. When learning from scratch, InTra achieves better than state-of-the-art results on the MVTec AD [1] dataset for detection and localization.
翻訳日:2021-04-29 12:44:29 公開日:2021-04-28
# 光場高次物体検出のためのシナジスティックアテンションの学習

Learning Synergistic Attention for Light Field Salient Object Detection ( http://arxiv.org/abs/2104.13916v1 )

ライセンス: Link先を確認
We propose a novel Synergistic Attention Network (SA-Net) to address the light field salient object detection by establishing a synergistic effect between multi-modal features with advanced attention mechanisms. Our SA-Net exploits the rich information of focal stacks via 3D convolutional neural networks, decodes the high-level features of multi-modal light field data with two cascaded synergistic attention modules, and predicts the saliency map using an effective feature fusion module in a progressive manner. Extensive experiments on three widely-used benchmark datasets show that our SA-Net outperforms 28 state-of-the-art models, sufficiently demonstrating its effectiveness and superiority. Our code will be made publicly available.
翻訳日:2021-04-29 12:44:12 公開日:2021-04-28
# 内容に基づく法律文献推薦のための文書表現の評価

Evaluating Document Representations for Content-based Legal Literature Recommendations ( http://arxiv.org/abs/2104.13841v1 )

ライセンス: Link先を確認
Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets. Thus, these studies have limited reproducibility. To address the gap between research and practice, we explore a set of state-of-the-art document representation methods for the task of retrieving semantically related US case law. We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincar\'e), and hybrid methods. We compare in total 27 methods using two silver standards with annotations for 2,964 documents. The silver standards are newly created from Open Case Book and Wikisource and can be reused under an open license facilitating reproducibility. Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results, closely followed by Poincar\'e citation embeddings. Combining fastText and Poincar\'e in a hybrid manner further improves the overall result. Besides the overall performance, we analyze the methods depending on document length, citation count, and the coverage of their recommendations. We make our source code, models, and datasets publicly available at https://github.com/malteos/legal-document-similarity/.
翻訳日:2021-04-29 12:43:58 公開日:2021-04-28
# グラフニューラルネットワークを用いた配水系統における潮圧の再構成

Reconstructing nodal pressures in water distribution systems with graph neural networks ( http://arxiv.org/abs/2104.13619v1 )

ライセンス: Link先を確認
Knowing the pressure at all times in each node of a water distribution system (WDS) facilitates safe and efficient operation. Yet, complete measurement data cannot be collected due to the limited number of instruments in a real-life WDS. The data-driven methodology of reconstructing all the nodal pressures by observing only a limited number of nodes is presented in the paper. The reconstruction method is based on K-localized spectral graph filters, wherewith graph convolution on water networks is possible. The effect of the number of layers, layer depth and the degree of the Chebyshev-polynomial applied in the kernel is discussed taking into account the peculiarities of the application. In addition, a weighting method is shown, wherewith information on friction loss can be embed into the spectral graph filters through the adjacency matrix. The performance of the proposed model is presented on 3 WDSs at different number of nodes observed compared to the total number of nodes. The weighted connections prove no benefit over the binary connections, but the proposed model reconstructs the nodal pressure with at most 5% relative error on average at an observation ratio of 5% at least. The results are achieved with shallow graph neural networks by following the considerations discussed in the paper.
翻訳日:2021-04-29 12:43:26 公開日:2021-04-28
# barlow双生児と負のサンプルフリーコントラスト学習の関連について

A Note on Connecting Barlow Twins with Negative-Sample-Free Contrastive Learning ( http://arxiv.org/abs/2104.13712v1 )

ライセンス: Link先を確認
In this report, we relate the algorithmic design of Barlow Twins' method to the Hilbert-Schmidt Independence Criterion (HSIC), thus establishing it as a contrastive learning approach that is free of negative samples. Through this perspective, we argue that Barlow Twins (and thus the class of negative-sample-free contrastive learning methods) suggests a possibility to bridge the two major families of self-supervised learning philosophies: non-contrastive and contrastive approaches. In particular, Barlow twins exemplified how we could combine the best practices of both worlds: avoiding the need of large training batch size and negative sample pairing (like non-contrastive methods) and avoiding symmetry-breaking network designs (like contrastive methods).
翻訳日:2021-04-29 12:43:08 公開日:2021-04-28
# 微分可能凸プログラミングによる高値トレーニングデータサブセットの探索

Finding High-Value Training Data Subset through Differentiable Convex Programming ( http://arxiv.org/abs/2104.13794v1 )

ライセンス: Link先を確認
Finding valuable training data points for deep neural networks has been a core research challenge with many applications. In recent years, various techniques for calculating the "value" of individual training datapoints have been proposed for explaining trained models. However, the value of a training datapoint also depends on other selected training datapoints - a notion that is not explicitly captured by existing methods. In this paper, we study the problem of selecting high-value subsets of training data. The key idea is to design a learnable framework for online subset selection, which can be learned using mini-batches of training data, thus making our method scalable. This results in a parameterized convex subset selection problem that is amenable to a differentiable convex programming paradigm, thus allowing us to learn the parameters of the selection model in end-to-end training. Using this framework, we design an online alternating minimization-based algorithm for jointly learning the parameters of the selection model and ML model. Extensive evaluation on a synthetic dataset, and three standard datasets, show that our algorithm finds consistently higher value subsets of training data, compared to the recent state-of-the-art methods, sometimes ~20% higher value than existing methods. The subsets are also useful in finding mislabelled training data. Our algorithm takes running time comparable to the existing valuation functions.
翻訳日:2021-04-29 12:42:53 公開日:2021-04-28
# 自動運転のための報酬(mis)設計

Reward (Mis)design for Autonomous Driving ( http://arxiv.org/abs/2104.13906v1 )

ライセンス: Link先を確認
This paper considers the problem of reward design for autonomous driving (AD), with insights that are also applicable to the design of cost functions and performance metrics more generally. Herein we develop 8 simple sanity checks for identifying flaws in reward functions. The sanity checks are applied to reward functions from past work on reinforcement learning (RL) for autonomous driving, revealing near-universal flaws in reward design for AD that might also exist pervasively across reward design for other tasks. Lastly, we explore promising directions that may help future researchers design reward functions for AD.
翻訳日:2021-04-29 12:42:34 公開日:2021-04-28
# L}ukasiewicz と Meredith から学ぶ:証明構造の研究(拡張版)

Learning from {\L}ukasiewicz and Meredith: Investigations into Proof Structures (Extended Version) ( http://arxiv.org/abs/2104.13645v1 )

ライセンス: Link先を確認
The material presented in this paper contributes to establishing a basis deemed essential for substantial progress in Automated Deduction. It identifies and studies global features in selected problems and their proofs which offer the potential of guiding proof search in a more direct way. The studied problems are of the wide-spread form of "axiom(s) and rule(s) imply goal(s)". The features include the well-known concept of lemmas. For their elaboration both human and automated proofs of selected theorems are taken into a close comparative consideration. The study at the same time accounts for a coherent and comprehensive formal reconstruction of historical work by {\L}ukasiewicz, Meredith and others. First experiments resulting from the study indicate novel ways of lemma generation to supplement automated first-order provers of various families, strengthening in particular their ability to find short proofs.
翻訳日:2021-04-29 12:42:24 公開日:2021-04-28
# IDMT-Traffic:音響交通モニタリング研究のためのオープンベンチマークデータセット

IDMT-Traffic: An Open Benchmark Dataset for Acoustic Traffic Monitoring Research ( http://arxiv.org/abs/2104.13620v1 )

ライセンス: Link先を確認
In many urban areas, traffic load and noise pollution are constantly increasing. Automated systems for traffic monitoring are promising countermeasures, which allow to systematically quantify and predict local traffic flow in order to to support municipal traffic planning decisions. In this paper, we present a novel open benchmark dataset, containing 2.5 hours of stereo audio recordings of 4718 vehicle passing events captured with both high-quality sE8 and medium-quality MEMS microphones. This dataset is well suited to evaluate the use-case of deploying audio classification algorithms to embedded sensor devices with restricted microphone quality and hardware processing power. In addition, this paper provides a detailed review of recent acoustic traffic monitoring (ATM) algorithms as well as the results of two benchmark experiments on vehicle type classification and direction of movement estimation using four state-of-the-art convolutional neural network architectures.
翻訳日:2021-04-29 12:42:10 公開日:2021-04-28
# ZePHyR:ゼロショットポス仮説のレーティング

ZePHyR: Zero-shot Pose Hypothesis Rating ( http://arxiv.org/abs/2104.13526v1 )

ライセンス: Link先を確認
Pose estimation is a basic module in many robot manipulation pipelines. Estimating the pose of objects in the environment can be useful for grasping, motion planning, or manipulation. However, current state-of-the-art methods for pose estimation either rely on large annotated training sets or simulated data. Further, the long training times for these methods prohibit quick interaction with novel objects. To address these issues, we introduce a novel method for zero-shot object pose estimation in clutter. Our approach uses a hypothesis generation and scoring framework, with a focus on learning a scoring function that generalizes to objects not used for training. We achieve zero-shot generalization by rating hypotheses as a function of unordered point differences. We evaluate our method on challenging datasets with both textured and untextured objects in cluttered scenes and demonstrate that our method significantly outperforms previous methods on this task. We also demonstrate how our system can be used by quickly scanning and building a model of a novel object, which can immediately be used by our method for pose estimation. Our work allows users to estimate the pose of novel objects without requiring any retraining. Additional information can be found on our website https://bokorn.github.io/zephyr/
翻訳日:2021-04-29 12:41:54 公開日:2021-04-28
# ソフトウェアリポジトリにおける情報フラグメント探索のためのインタラクティブ可視化

Interactive Visualization for Exploring Information Fragments in Software Repositories ( http://arxiv.org/abs/2104.13568v1 )

ライセンス: Link先を確認
Software developers explore and inspect software repository data to obtain detailed information archived in the development history. However, developers who are not acquainted with the development context suffer from delving into the repositories with a handful of information; they have difficulty discovering and expanding information fragments considering the topological and sequential multi-dimensional structure of repositories. We introduce ExIF, an interactive visualization for exploring information fragments in software repositories. ExIF helps users discover new information fragments within clusters or topological neighbors and identify revisions incorporating user-collected fragments.
翻訳日:2021-04-29 12:41:36 公開日:2021-04-28
# 教師なし病理診断の序文としての画像合成

Image Synthesis as a Pretext for Unsupervised Histopathological Diagnosis ( http://arxiv.org/abs/2104.13797v1 )

ライセンス: Link先を確認
Anomaly detection in visual data refers to the problem of differentiating abnormal appearances from normal cases. Supervised approaches have been successfully applied to different domains, but require an abundance of labeled data. Due to the nature of how anomalies occur and their underlying generating processes, it is hard to characterize and label them. Recent advances in deep generative-based models have sparked interest in applying such methods for unsupervised anomaly detection and have shown promising results in medical and industrial inspection domains. In this work we evaluate a crucial part of the unsupervised visual anomaly detection pipeline, that is needed for normal appearance modeling, as well as the ability to reconstruct closest looking normal and tumor samples. We adapt and evaluate different high-resolution state-of-the-art generative models from the face synthesis domain and demonstrate their superiority over currently used approaches on a challenging domain of digital pathology. Multifold improvement in image synthesis is demonstrated in terms of the quality and resolution of the generated images, validated also against the supervised model.
翻訳日:2021-04-29 12:41:24 公開日:2021-04-28
# 非視線イメージングの最近の進歩:従来の物理モデル、深層学習、新しいシーン

Recent Advances on Non-Line-of-Sight Imaging: Conventional Physical Models, Deep Learning, and New Scenes ( http://arxiv.org/abs/2104.13807v1 )

ライセンス: Link先を確認
As an emerging technology that has attracted huge attention, non-line-of-sight (NLOS) imaging can reconstruct hidden objects by analyzing the diffuse reflection on a relay surface, with broad application prospects in the fields of autonomous driving, medical imaging, and defense. Despite the challenges of low signal-to-noise ratio (SNR) and high ill-posedness, NLOS imaging has been developed rapidly in recent years. Most current NLOS imaging technologies use conventional physical models, constructing imaging models through active or passive illumination and using reconstruction algorithms to restore hidden scenes. Moreover, deep learning algorithms for NLOS imaging have also received much attention recently. This paper presents a comprehensive overview of both conventional and deep learning-based NLOS imaging techniques. Besides, we also survey new proposed NLOS scenes, and discuss the challenges and prospects of existing technologies. Such a survey can help readers have an overview of different types of NLOS imaging, thus expediting the development of seeing around corners.
翻訳日:2021-04-29 12:41:07 公開日:2021-04-28
# LambdaUNet:拡散強調MRI画像の2.5Dストローク病変分割

LambdaUNet: 2.5D Stroke Lesion Segmentation of Diffusion-weighted MR Images ( http://arxiv.org/abs/2104.13917v1 )

ライセンス: Link先を確認
Diffusion-weighted (DW) magnetic resonance imaging is essential for the diagnosis and treatment of ischemic stroke. DW images (DWIs) are usually acquired in multi-slice settings where lesion areas in two consecutive 2D slices are highly discontinuous due to large slice thickness and sometimes even slice gaps. Therefore, although DWIs contain rich 3D information, they cannot be treated as regular 3D or 2D images. Instead, DWIs are somewhere in-between (or 2.5D) due to the volumetric nature but inter-slice discontinuities. Thus, it is not ideal to apply most existing segmentation methods as they are designed for either 2D or 3D images. To tackle this problem, we propose a new neural network architecture tailored for segmenting highly-discontinuous 2.5D data such as DWIs. Our network, termed LambdaUNet, extends UNet by replacing convolutional layers with our proposed Lambda+ layers. In particular, Lambda+ layers transform both intra-slice and inter-slice context around a pixel into linear functions, called lambdas, which are then applied to the pixel to produce informative 2.5D features. LambdaUNet is simple yet effective in combining sparse inter-slice information from adjacent slices while also capturing dense contextual features within a single slice. Experiments on a unique clinical dataset demonstrate that LambdaUNet outperforms existing 3D/2D image segmentation methods including recent variants of UNet. Code for LambdaUNet will be released with the publication to facilitate future research.
翻訳日:2021-04-29 12:40:51 公開日:2021-04-28
# rate-distortion-perception関数の符号化定理

A coding theorem for the rate-distortion-perception function ( http://arxiv.org/abs/2104.13662v1 )

ライセンス: Link先を確認
The rate-distortion-perception function (RDPF; Blau and Michaeli, 2019) has emerged as a useful tool for thinking about realism and distortion of reconstructions in lossy compression. Unlike the rate-distortion function, however, it is unknown whether encoders and decoders exist that achieve the rate suggested by the RDPF. Building on results by Li and El Gamal (2018), we show that the RDPF can indeed be achieved using stochastic, variable-length codes. For this class of codes, we also prove that the RDPF lower-bounds the achievable rate
翻訳日:2021-04-29 12:40:25 公開日:2021-04-28
# 影響要因:人口データから個々の反応を学習する

Causes of Effects: Learning individual responses from population data ( http://arxiv.org/abs/2104.13730v1 )

ライセンス: Link先を確認
The problem of individualization is recognized as crucial in almost every field. Identifying causes of effects in specific events is likewise essential for accurate decision making. However, such estimates invoke counterfactual relationships, and are therefore indeterminable from population data. For example, the probability of benefiting from a treatment concerns an individual having a favorable outcome if treated and an unfavorable outcome if untreated. Experiments conditioning on fine-grained features are fundamentally inadequate because we can't test both possibilities for an individual. Tian and Pearl provided bounds on this and other probabilities of causation using a combination of experimental and observational data. Even though those bounds were proven tight, narrower bounds, sometimes significantly so, can be achieved when structural information is available in the form of a causal model. This has the power to solve central problems, such as explainable AI, legal responsibility, and personalized medicine, all of which demand counterfactual logic. We analyze and expand on existing research by applying bounds to the probability of necessity and sufficiency (PNS) along with graphical criteria and practical applications.
翻訳日:2021-04-29 12:39:38 公開日:2021-04-28
# すべての角度を見る:デモからコンタクトリッチタスクのためのマルチビュー操作ポリシーを学ぶ

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations ( http://arxiv.org/abs/2104.13907v1 )

ライセンス: Link先を確認
Learned visuomotor policies have shown considerable success as an alternative to traditional, hand-crafted frameworks for robotic manipulation tasks. Surprisingly, the extension of these methods to the multiview domain is relatively unexplored. A successful multiview policy could be deployed on a mobile manipulation platform, allowing it to complete a task regardless of its view of the scene. In this work, we demonstrate that a multiview policy can be found through imitation learning by collecting data from a variety of viewpoints. We illustrate the general applicability of the method by learning to complete several challenging multi-stage and contact-rich tasks, from numerous viewpoints, both in a simulated environment and on a real mobile manipulation platform. Furthermore, we analyze our policies to determine the benefits of learning from multiview data compared to learning with data from a fixed perspective. We show that learning from multiview data has little, if any, penalty to performance for a fixed-view task compared to learning with an equivalent amount of fixed-view data. Finally, we examine the visual features learned by the multiview and fixed-view policies. Our results indicate that multiview policies implicitly learn to identify spatially correlated features with a degree of view-invariance.
翻訳日:2021-04-29 12:39:21 公開日:2021-04-28
# AMSS-Net:テキストクエリによるユーザ指定ソースの音声操作

AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries ( http://arxiv.org/abs/2104.13553v1 )

ライセンス: Link先を確認
This paper proposes a neural network that performs audio transformations to user-specified sources (e.g., vocals) of a given audio track according to a given description while preserving other sources not mentioned in the description. Audio Manipulation on a Specific Source (AMSS) is challenging because a sound object (i.e., a waveform sample or frequency bin) is `transparent'; it usually carries information from multiple sources, in contrast to a pixel in an image. To address this challenging problem, we propose AMSS-Net, which extracts latent sources and selectively manipulates them while preserving irrelevant sources. We also propose an evaluation benchmark for several AMSS tasks, and we show that AMSS-Net outperforms baselines on several AMSS tasks via objective metrics and empirical verification.
翻訳日:2021-04-29 12:38:31 公開日:2021-04-28
# 遅延感度深層学習のためのパケットロス耐性スプリット推定

Packet-Loss-Tolerant Split Inference for Delay-Sensitive Deep Learning in Lossy Wireless Networks ( http://arxiv.org/abs/2104.13629v1 )

ライセンス: Link先を確認
The distributed inference framework is an emerging technology for real-time applications empowered by cutting-edge deep machine learning (ML) on resource-constrained Internet of things (IoT) devices. In distributed inference, computational tasks are offloaded from the IoT device to other devices or the edge server via lossy IoT networks. However, narrow-band and lossy IoT networks cause non-negligible packet losses and retransmissions, resulting in non-negligible communication latency. This study solves the problem of the incremental retransmission latency caused by packet loss in a lossy IoT network. We propose a split inference with no retransmissions (SI-NR) method that achieves high accuracy without any retransmissions, even when packet loss occurs. In SI-NR, the key idea is to train the ML model by emulating the packet loss by a dropout method, which randomly drops the output of hidden units in a DNN layer. This enables the SI-NR system to obtain robustness against packet losses. Our ML experimental evaluation reveals that SI-NR obtains accurate predictions without packet retransmission at a packet loss rate of 60%.
翻訳日:2021-04-29 12:38:16 公開日:2021-04-28
# ニアメモリ処理システムにおけるデータと計算マッピングの改善のための連続学習手法

Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System ( http://arxiv.org/abs/2104.13671v1 )

ライセンス: Link先を確認
The resurgence of near-memory processing (NMP) with the advent of big data has shifted the computation paradigm from processor-centric to memory-centric computing. To meet the bandwidth and capacity demands of memory-centric computing, 3D memory has been adopted to form a scalable memory-cube network. Along with NMP and memory system development, the mapping for placing data and guiding computation in the memory-cube network has become crucial in driving the performance improvement in NMP. However, it is very challenging to design a universal optimal mapping for all applications due to unique application behavior and intractable decision space. In this paper, we propose an artificially intelligent memory mapping scheme, AIMM, that optimizes data placement and resource utilization through page and computation remapping. Our proposed technique involves continuously evaluating and learning the impact of mapping decisions on system performance for any application. AIMM uses a neural network to achieve a near-optimal mapping during execution, trained using a reinforcement learning algorithm that is known to be effective for exploring a vast design space. We also provide a detailed AIMM hardware design that can be adopted as a plugin module for various NMP systems. Our experimental evaluation shows that AIMM improves the baseline NMP performance in single and multiple program scenario by up to 70% and 50%, respectively.
翻訳日:2021-04-29 12:37:58 公開日:2021-04-28
# 多面的最適化のための強化学習環境

A Reinforcement Learning Environment for Polyhedral Optimizations ( http://arxiv.org/abs/2104.13732v1 )

ライセンス: Link先を確認
The polyhedral model allows a structured way of defining semantics-preserving transformations to improve the performance of a large class of loops. Finding profitable points in this space is a hard problem which is usually approached by heuristics that generalize from domain-expert knowledge. Existing problem formulations in state-of-the-art heuristics depend on the shape of particular loops, making it hard to leverage generic and more powerful optimization techniques from the machine learning domain. In this paper, we propose PolyGym, a shape-agnostic formulation for the space of legal transformations in the polyhedral model as a Markov Decision Process (MDP). Instead of using transformations, the formulation is based on an abstract space of possible schedules. In this formulation, states model partial schedules, which are constructed by actions that are reusable across different loops. With a simple heuristic to traverse the space, we demonstrate that our formulation is powerful enough to match and outperform state-of-the-art heuristics. On the Polybench benchmark suite, we found transformations that led to a speedup of 3.39x over LLVM O3, which is 1.83x better than the speedup achieved by ISL. Our generic MDP formulation enables using reinforcement learning to learn optimization policies over a wide range of loops. This also contributes to the emerging field of machine learning in compilers, as it exposes a novel problem formulation that can push the limits of existing methods.
翻訳日:2021-04-29 12:37:39 公開日:2021-04-28
# sum-of-normsクラスタリングは近くのボールを分離しない

Sum-of-norms clustering does not separate nearby balls ( http://arxiv.org/abs/2104.13753v1 )

ライセンス: Link先を確認
Sum-of-norms clustering is a popular convexification of $K$-means clustering. We show that, if the dataset is made of a large number of independent random variables distributed according to the uniform measure on the union of two disjoint balls of unit radius, and if the balls are sufficiently close to one another, then sum-of-norms clustering will typically fail to recover the decomposition of the dataset into two clusters. As the dimension tends to infinity, this happens even when the distance between the centers of the two balls is taken to be as large as $2\sqrt{2}$. In order to show this, we introduce and analyze a continuous version of sum-of-norms clustering, where the dataset is replaced by a general measure. In particular, we state and prove a local-global characterization of the clustering that seems to be new even in the case of discrete datapoints.
翻訳日:2021-04-29 12:37:15 公開日:2021-04-28
# Weighed $\ell_1$ on the simplex: Compressive Sensor meets locality

Weighed $\ell_1$ on the simplex: Compressive sensing meets locality ( http://arxiv.org/abs/2104.13894v1 )

ライセンス: Link先を確認
Sparse manifold learning algorithms combine techniques in manifold learning and sparse optimization to learn features that could be utilized for downstream tasks. The standard setting of compressive sensing can not be immediately applied to this setup. Due to the intrinsic geometric structure of data, dictionary atoms might be redundant and do not satisfy the restricted isometry property or coherence condition. In addition, manifold learning emphasizes learning local geometry which is not reflected in a standard $\ell_1$ minimization problem. We propose weighted $\ell_0$ and weighted $\ell_1$ metrics that encourage representation via neighborhood atoms suited for dictionary based manifold learning. Assuming that the data is generated from Delaunay triangulation, we show the equivalence of weighted $\ell_1$ and weighted $\ell_0$. We discuss an optimization program that learns the dictionaries and sparse coefficients and demonstrate the utility of our regularization on synthetic and real datasets.
翻訳日:2021-04-29 12:37:00 公開日:2021-04-28
# (参考訳) Fair-Capacitated Clustering

Fair-Capacitated Clustering ( http://arxiv.org/abs/2104.12116v2 )

ライセンス: CC BY 4.0
Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.
翻訳日:2021-04-29 11:23:40 公開日:2021-04-28
# (参考訳) 銀行小切手の署名のための新しいセグメンテーションデータセット

A novel segmentation dataset for signatures on bank checks ( http://arxiv.org/abs/2104.12203v2 )

ライセンス: CC0 1.0
The dataset presented provides high-resolution images of real, filled out bank checks containing various complex backgrounds, and handwritten text and signatures in the respective fields, along with both pixel-level and patch-level segmentation masks for the signatures on the checks. The images of bank checks were obtained from different sources, including other publicly available check datasets, publicly available images on the internet, as well as scans and images of real checks. Using the GIMP graphics software, pixel-level segmentation masks for signatures on these checks were manually generated as binary images. An automated script was then used to generate patch-level masks. The dataset was created to train and test networks for extracting signatures from bank checks and other similar documents with very complex backgrounds.
翻訳日:2021-04-29 11:10:46 公開日:2021-04-28
# (参考訳) 画像復元・融合における動的劣化

Dynamic Degradation for Image Restoration and Fusion ( http://arxiv.org/abs/2104.12347v2 )

ライセンス: CC0 1.0
The deep-learning-based image restoration and fusion methods have achieved remarkable results. However, the existing restoration and fusion methods paid little research attention to the robustness problem caused by dynamic degradation. In this paper, we propose a novel dynamic image restoration and fusion neural network, termed as DDRF-Net, which is capable of solving two problems, i.e., static restoration and fusion, dynamic degradation. In order to solve the static fusion problem of existing methods, dynamic convolution is introduced to learn dynamic restoration and fusion weights. In addition, a dynamic degradation kernel is proposed to improve the robustness of image restoration and fusion. Our network framework can effectively combine image degradation with image fusion tasks, provide more detailed information for image fusion tasks through image restoration loss, and optimize image restoration tasks through image fusion loss. Therefore, the stumbling blocks of deep learning in image fusion, e.g., static fusion weight and specifically designed network architecture, are greatly mitigated. Extensive experiments show that our method is more superior compared with the state-of-the-art methods.
翻訳日:2021-04-29 11:07:40 公開日:2021-04-28
# (参考訳) グラフニューラルネットワークを用いたトラヒック予測のための時空間モデリング

Unified Spatio-Temporal Modeling for Traffic Forecasting using Graph Neural Network ( http://arxiv.org/abs/2104.12518v2 )

ライセンス: CC BY 4.0
Research in deep learning models to forecast traffic intensities has gained great attention in recent years due to their capability to capture the complex spatio-temporal relationships within the traffic data. However, most state-of-the-art approaches have designed spatial-only (e.g. Graph Neural Networks) and temporal-only (e.g. Recurrent Neural Networks) modules to separately extract spatial and temporal features. However, we argue that it is less effective to extract the complex spatio-temporal relationship with such factorized modules. Besides, most existing works predict the traffic intensity of a particular time interval only based on the traffic data of the previous one hour of that day. And thereby ignores the repetitive daily/weekly pattern that may exist in the last hour of data. Therefore, we propose a Unified Spatio-Temporal Graph Convolution Network (USTGCN) for traffic forecasting that performs both spatial and temporal aggregation through direct information propagation across different timestamp nodes with the help of spectral graph convolution on a spatio-temporal graph. Furthermore, it captures historical daily patterns in previous days and current-day patterns in current-day traffic data. Finally, we validate our work's effectiveness through experimental analysis, which shows that our model USTGCN can outperform state-of-the-art performances in three popular benchmark datasets from the Performance Measurement System (PeMS). Moreover, the training time is reduced significantly with our proposed USTGCN model.
翻訳日:2021-04-29 11:06:53 公開日:2021-04-28
# (参考訳) 複雑な自然環境における小型目標運動検出のための注意と予測誘導視覚システム

An Attention and Prediction Guided Visual System for Small Target Motion Detection in Complex Natural Environments ( http://arxiv.org/abs/2104.13018v2 )

ライセンス: CC BY 4.0
Small target motion detection within complex natural environment is an extremely challenging task for autonomous robots. Surprisingly, visual systems of insects have evolved to be highly efficient in detecting mates and tracking prey, even though targets are as small as a few pixels in visual field. The excellent sensitivity to small target motion relies on a class of specialized neurons called small target motion detectors (STMDs). However, existing STMD-based models are heavily dependent on visual contrast and perform poorly in complex natural environment where small targets always exhibit extremely low contrast to neighboring backgrounds. In this paper, we propose an attention and prediction guided visual system to overcome this limitation. The proposed visual system mainly consists of three subsystems, including an attention module, a STMD-based neural network, and a prediction module. The attention module searches for potential small targets in the predicted areas of input image and enhances their contrast to complex background. The STMD-based neural network receives the contrast-enhanced image and discriminates small moving targets from background false positives. The prediction module foresees future positions of the detected targets and generates a prediction map for the attention module. The three subsystems are connected in a recurrent architecture allowing information processed sequentially to activate specific areas for small target detection. Extensive experiments on synthetic and real-world datasets demonstrate the effectiveness and superiority of the proposed visual system for detecting small, low-contrast moving targets against complex natural environment.
翻訳日:2021-04-29 11:05:56 公開日:2021-04-28
# 自己学習による複雑な分布シフトへのイメージネットスケールモデルの適用

Adapting ImageNet-scale models to complex distribution shifts with self-learning ( http://arxiv.org/abs/2104.12928v2 )

ライセンス: Link先を確認
While self-learning methods are an important component in many recent domain adaptation techniques, they are not yet comprehensively evaluated on ImageNet-scale datasets common in robustness research. In extensive experiments on ResNet and EfficientNet models, we find that three components are crucial for increasing performance with self-learning: (i) using short update times between the teacher and the student network, (ii) fine-tuning only few affine parameters distributed across the network, and (iii) leveraging methods from robust classification to counteract the effect of label noise. We use these insights to obtain drastically improved state-of-the-art results on ImageNet-C (22.0% mCE), ImageNet-R (17.4% error) and ImageNet-A (14.8% error). Our techniques yield further improvements in combination with previously proposed robustification methods. Self-learning is able to reduce the top-1 error to a point where no substantial further progress can be expected. We therefore re-purpose the dataset from the Visual Domain Adaptation Challenge 2019 and use a subset of it as a new robustness benchmark (ImageNet-D) which proves to be a more challenging dataset for all current state-of-the-art models (58.2% error) to guide future research efforts at the intersection of robustness and domain adaptation on ImageNet scale.
翻訳日:2021-04-29 10:38:19 公開日:2021-04-28
# AIが思った以上に難しい理由

Why AI is Harder Than We Think ( http://arxiv.org/abs/2104.12871v2 )

ライセンス: Link先を確認
Since its beginning in the 1950s, the field of artificial intelligence has cycled several times between periods of optimistic predictions and massive investment ("AI spring") and periods of disappointment, loss of confidence, and reduced funding ("AI winter"). Even with today's seemingly fast pace of AI breakthroughs, the development of long-promised technologies such as self-driving cars, housekeeping robots, and conversational companions has turned out to be much harder than many people expected. One reason for these repeating cycles is our limited understanding of the nature and complexity of intelligence itself. In this paper I describe four fallacies in common assumptions made by AI researchers, which can lead to overconfident predictions about the field. I conclude by discussing the open questions spurred by these fallacies, including the age-old challenge of imbuing machines with humanlike common sense.
翻訳日:2021-04-29 10:37:49 公開日:2021-04-28
# 深部構造モデルを用いた実用的広角画像補正

Practical Wide-Angle Portraits Correction with Deep Structured Models ( http://arxiv.org/abs/2104.12464v3 )

ライセンス: Link先を確認
Wide-angle portraits often enjoy expanded views. However, they contain perspective distortions, especially noticeable when capturing group portrait photos, where the background is skewed and faces are stretched. This paper introduces the first deep learning based approach to remove such artifacts from freely-shot photos. Specifically, given a wide-angle portrait as input, we build a cascaded network consisting of a LineNet, a ShapeNet, and a transition module (TM), which corrects perspective distortions on the background, adapts to the stereographic projection on facial regions, and achieves smooth transitions between these two projections, accordingly. To train our network, we build the first perspective portrait dataset with a large diversity in identities, scenes and camera modules. For the quantitative evaluation, we introduce two novel metrics, line consistency and face congruence. Compared to the previous state-of-the-art approach, our method does not require camera distortion parameters. We demonstrate that our approach significantly outperforms the previous state-of-the-art approach both qualitatively and quantitatively.
翻訳日:2021-04-29 10:37:35 公開日:2021-04-28