# 結合キャビティアレイの量子状態伝達に及ぼすエミッタの影響

Effect of Emitters on Quantum State Transfer in Coupled Cavity Arrays ( http://arxiv.org/abs/2112.05740v2 )

Eli Baum, Amelia Broman, Trevor Clarke, Natanael C. Costa, Jack Mucciaccio, Alexander Yue, Yuxi Zhang, Victoria Norman, Jesse Patton, Marina Radulaski, Richard T. Scalettar(参考訳) 過去10年間で、量子スピン鎖における完全状態転移の条件が発見され、その実験的実現が解決された。 本稿では、これらの研究を結合キャビティアレイ内の量子状態転移に拡張し、キャビティ内の原子がアレーを伝播する際に光子を吸収・放出する効果を含むことを考察する。 我々のモデルは、1励起セクターおよびエミッタの欠如において以前に検討されたスピン鎖と等価である。 逆固有値問題に対するモンテカルロ法を導入することで、空洞間結合と空洞-エミッターカップリングがほぼ完全な量子状態転移の忠実性をもたらすことを証明し、タヴィス・カンミングス・ハバード・ハミルトニアンの正確な対角化による時間依存ポラリトン波動関数を考察する。 不均質エミッタ位置の影響も評価される。

Over the last decade, conditions for perfect state transfer in quantum spin chains have been discovered, and their experimental realizations addressed. In this paper, we consider an extension of such studies to quantum state transfer in a coupled cavity array including the effects of atoms in the cavities which can absorb and emit photons as they propagate down the array. Our model is equivalent to previously examined spin chains in the one-excitation sector and in the absence of emitters. We introduce a Monte Carlo approach to the inverse eigenvalue problem which allows the determination of the inter-cavity and cavity-emitter couplings resulting in near-perfect quantum state transfer fidelity, and examine the time dependent polariton wave function through exact diagonalization of the resulting Tavis-Cummings-Hubbard Hamiltonian. The effect of inhomogeneous emitter locations is also evaluated.
# テキスト予測とエンターテイメントシステムに関する総合的レビューと評価

A comprehensive review and evaluation on text predictive and entertainment systems ( http://arxiv.org/abs/2201.10623v1 )

Hozan K. Hamarashid, Soran A. Saeed, Tarik A. Rashid(参考訳) コミュニケーションを経験し、システムと対話する最も重要な方法の1つは、文字や単語を入力した後、最も起こりそうな単語の予測を扱うことである。 限られた速度でテキストをタイプしたり入力したりできる人を無効にするため、障害のある人には役立ちます。 また、失読症や言葉の綴りが苦手な人にとっても有益である。 しかし、例えば、入力技術は、次の単語提案によって、例えばスマートフォンのタイピングプロセスが促進される。 つまり、ユーザが単語を入力したとき、システムは、ユーザが必要とする単語が選択される次の単語を提案する。 また、例えば、目的語を判定し、それに到達したり、予測の10回以内に取り組んだりするために、エンタテインメントとして使用できる。 一般に、システムは予測を行うためにシステムに提供されるテキストコーパスに依存している。 各単語の書き方は時間を要するため、ユーザが選択する最も可能性の高い単語を提供することで、システム内のテキスト入力の労力を削減し、時間消費を減らすことが極めて重要である。 文献にはいくつかのテクニックがあり、様々なアプローチを用いて様々な単語予測システムを実行するために利用される。 本稿では,次の単語予測システムに向けた雑多な手法に関する調査を行う。 また,予測システムの評価についても検討する。 そして、実装の容易さと良い結果の獲得の観点から、次の単語予測システムにおいて、モーダル技術が利用されると判断する。

One of the most important ways to experience communication and interact with the systems is by handling the prediction of the most likely words to happen after typing letters or words. It is helpful for people with disabilities due to disabling people who could type or enter texts at a limited slow speed. Also, it is beneficial for people with dyslexia and those people who are not well with spells of words. Though, an input technology, for instance, the next word suggestion facilitates the typing process in smartphones as an example. This means that when a user types a word, then the system suggests the next words to be chosen in which the necessary word by the user. Besides, it can be used in entertainment as a gam, for example, to determine a target word and reach it or tackle it within 10 attempts of prediction. Generally, the systems depend on a text corpus, which was provided in the system to conduct the prediction. Writing every single word is time-consuming, therefore, it is vitally important to decrease time consumption by reducing efforts to input texts in the systems by offering most probable words for the user to select, this could be done via next word prediction systems. There are several techniques can be found in literature, which is utilized to conduct a variety of next word prediction systems by using different approaches. In this paper, a survey of miscellaneous techniques towards the next word prediction systems will be addressed. Besides, the evaluation of the prediction systems will be discussed. Then, a modal technique will be determined to be utilized for the next word prediction system from the perspective of easiness of implementation and obtaining a good result.
# Qiskit上のVQEアルゴリズムを用いた分子シミュレーション

Simulating molecules using the VQE algorithm on Qiskit ( http://arxiv.org/abs/2201.04216v1 )

Alan Anaya, Francisco Delgado(参考訳) ファインマンが他の量子系をシミュレートするために量子系を使うというアイデアは、量子シミュレーションを生み出した。 現在の量子コンピュータはまだデコヒーレンスに頼りやすいが、量子計算と古典計算の両方を利用するハイブリッドアルゴリズムの開発は量子シミュレーションの実行を可能にしている。 これらのアルゴリズムのうち、変分量子固有ソルバアルゴリズムはレイリー・リッツの変分原理を利用して単純な原子や分子の電子構造を探索することを許されている。 本研究では, ピソン用カイスキットライブラリ上で水素分子の基底状態エネルギーを求めるためのVQEアルゴリズムの実装について述べる。

Feynmans ideas to employ quantum systems for simulating other quantum systems gave rise to quantum simulation. While current quantum computers are still prone to decoherence and rely on error correction, the development of hybrid algorithms that employ both quantumand classical computation allow to perform quantum simulation. Among these algorithms, the Variational Quantum Eigensolver algorithm has permitted to explore the electronic structure of simple atoms and molecules by exploiting the Rayleigh Ritz variational principle. In this work we provide the implementation of the VQE algorithm for finding the ground state energy of the hydrogen molecule on Qiskit library for python.
# 1+1場理論とサイクルベンチマークによるNISQゲートベース量子安定性の測定

Measuring NISQ Gate-Based Qubit Stability Using a 1+1 Field Theory and Cycle Benchmarking ( http://arxiv.org/abs/2201.02899v1 )

Kubra Yeter-Aydeniz, Zachary Parks, Aadithya Nair, Erik Gustafson, Alexander F. Kemper, Raphael C. Pooser, Yannick Meurice, Patrick Dreher(参考訳) NISQ(Noisy Intermediate Scale Quantum)マシン上のアプリケーションの実装を制限する最も大きな問題は、一貫性のないエラーとコヒーレントなエラーの両方による悪影響である。 量子ハードウェアプラットフォーム上でのコヒーレントエラーの詳細な調査を行い,サンプルユーザアプリケーションとしてトランスバースフィールドイジングモデルハミルトニアンを用いた。 本稿では,これらの誤りをプロファイリングする複数の誤り緩和プロトコルを用いた計算結果について報告するとともに,ハードウェア量子ビットの安定性を示す。 詳細な測定によって、日内および日内キュービット校正ドリフトと、プロセッサ上の異なる物理位置にあるキュービットのグループに対する量子回路配置の影響を同定する。 また,これらの測定値が,これらの種類の誤差をよりよく理解し,量子計算の正確性を評価するための取り組みを改善する方法についても論じる。

Some of the most problematic issues that limit the implementation of applications on Noisy Intermediate Scale Quantum (NISQ) machines are the adverse impacts of both incoherent and coherent errors. We conducted an in-depth study of coherent errors on a quantum hardware platform using a transverse field Ising model Hamiltonian as a sample user application. We report here on the results from these computations using several error mitigation protocols that profile these errors and provide an indication of the hardware qubit stability. Through a detailed set of measurements we identify inter-day and intra-day qubit calibration drift and the impacts of quantum circuit placement on groups of qubits in different physical locations on the processor. This paper also discusses how these measurements can provide a better understanding of these types of errors and how they may improve efforts to validate the accuracy of quantum computations.
# 希薄な2次元接続を有するシリコン量子プロセッサのコンパイルとスケーリング戦略

Compilation and scaling strategies for a silicon quantum processor with sparse two-dimensional connectivity ( http://arxiv.org/abs/2201.02877v1 )

O. Crawford, J. R. Cruise, N. Mertig and M. F. Gonzalez-Zalba(参考訳) 既存のシリコン量子ハードウェアのスケールアップという課題に触発され、疎結合な2d量子ビット配置のコンパイル戦略を調査し、最小のコンパイルオーバーヘッドを持つスピン量子ビットアーキテクチャを提案する。 我々のアーキテクチャはシリコンナノワイヤスプリットゲートトランジスタに基づいており、スピン量子ビットの有限な1d鎖を形成することができ、近隣のスワップゲートのような2量子演算を実行できる。 これに加えて、最大4つのナノワイヤをスピンシャットリングとスワップ操作によって2次元配列に分割できる新しいシリコン接合について述べる。 これらのハードウェア要素を考慮に入れ,両端にナノワイヤを配した対角方向の正方形と角の接合部からなる単位セルを有する,モジュラースパース2dスピンキュービットアーキテクチャを提案する。 このアーキテクチャは1dチェーンのクラス内最善のコンパイル戦略よりも優れたコンパイル戦略を可能にし、漸近的にだけでなく、単一平方の最小構造にまで達することを示した。 提案するアーキテクチャは,ナノワイヤの長さを調整することで,各正方形内の古典制御電子回路のコンパイルオーバーヘッドとコロケーションのトレードオフを両立させることができる,優れたスケーリング特性を示す。 提案アーキテクチャの特長は、相補的金属酸化物半導体(CMOS)製造プロセスを用いた製造性である。 最後に、我々のコンパイル戦略はスピン量子ビットにインスパイアされているが、スパース2d接続を持つ他の量子プロセッサにも等しく有効である。

Inspired by the challenge of scaling up existing silicon quantum hardware, we investigate compilation strategies for sparsely-connected 2d qubit arrangements and propose a spin-qubit architecture with minimal compilation overhead. Our architecture is based on silicon nanowire split-gate transistors which can form finite 1d chains of spin-qubits and allow the execution of two-qubit operations such as Swap gates among neighbors. Adding to this, we describe a novel silicon junction which can couple up to four nanowires into 2d arrangements via spin shuttling and Swap operations. Given these hardware elements, we propose a modular sparse 2d spin-qubit architecture with unit cells consisting of diagonally-oriented squares with nanowires along the edges and junctions on the corners. We show that this architecture allows for compilation strategies which outperform the best-in-class compilation strategy for 1d chains, not only asymptotically, but also down to the minimal structure of a single square. The proposed architecture exhibits favorable scaling properties which allow for balancing the trade-off between compilation overhead and co-location of classical control electronics within each square by adjusting the length of the nanowires. An appealing feature of the proposed architecture is its manufacturability using complementary-metal-oxide-semiconductor (CMOS) fabrication processes. Finally, we note that our compilation strategies, while being inspired by spin-qubits, are equally valid for any other quantum processor with sparse 2d connectivity.
# 1+1次元のカイラル異常再訪:相補的運動論的観点と普遍性

Chiral anomaly in (1+1) dimensions revisited: complementary kinetic perspective and universality ( http://arxiv.org/abs/2201.02844v1 )

ライセンス: Link先を確認
We reinvestigate the classic example of chiral anomaly in (1+1) dimensional spacetime. By reviewing the derivation of charge conservation with the semiclassical Boltzmann equation, we argue that chiral anomalies could emerge in (1+1) dimensions without Berry curvature corrections to the kinetic theory. The pivotal step depends only on the asymptotic behavior of the distribution function of the quasiparticle, and thus its dispersion relation, in the limit of $|\mathbf p|\to\pm\infty$ rather than the detailed functional form of the dispersion. We address two subjects motivated by this observation. One concerns reformulating (1+1) dimensional chiral anomaly using kinetic theory with the current algebra approach and the gradient expansion of the Dirac Lagrangian, adding a complementary perspective to the existing approaches. The other demonstrates the universality of chiral anomaly across various quasiparticle dispersions. For two-band models linear in the temporal derivative, with Fujikawa's method we show it is sufficient tohave a chirality-odd strictly monotonic dispersion in order to exhibit chiral anomaly.
# ストロンチウム光格子クロックにおける3P0クロック状態の二次ゼーマンシフト係数の理論計算

Theoretical Calculation of the Quadratic Zeeman Shift Coefficient of the 3P0 clock state for Strontium Optical Lattice Clock ( http://arxiv.org/abs/2201.02843v1 )

ライセンス: Link先を確認
The quadratic Zeeman shift coefficient of 3P0 clock state for strontium is determined in theory and experiment. In theory, we derived the expression of the quadratic Zeeman shift of 3P0 clock state for 88Sr and 87Sr in the weak-magnetic-field approximation. By using the multi-configuration Dirac-Hartree-Fock theory, the quadratic Zeeman shift coefficients were calculated. To determine the calculated results, the quadratic Zeeman shift coefficient of 3P0,F=9/2,MF=+/-9/2 clock state was measured in our 87Sr optical lattice clock. The calculated results C2=-23.38(5) MHz/T2 for 88Sr and the 3P0,F=9/2,MF=+/-9/2 clock state for 87Sr agree well with the other experimental and theoretical values, especially the most accurate measurement recently. As the 1S0,F=9/2,MF=+/-5/2-3P0,F=9/2,MF=+/-3/2 transitions have been used as another clock transition for less sensitive to the magnetic field noise, we also calculated the quadratic Zeeman shift coefficients for the other magnetic states.
# 分子偏光子の理論的展望

A theoretical perspective on molecular polaritonics ( http://arxiv.org/abs/2201.02827v1 )

ライセンス: Link先を確認
In the last decade, much theoretical research has focused on studying the strong coupling between organic molecules (or quantum emitters, in general) and light modes. The description and prediction of polaritonic phenomena emerging in this light-matter interaction regime have proven to be difficult tasks. The challenge originates from the enormous number of degrees of freedom that need to be taken into account, both in the organic molecules and in their photonic environment. On the one hand, the accurate treatment of the vibrational spectrum of the former is key, and simplified quantum models are not valid in many cases. On the other hand, most photonic setups have complex geometric and material characteristics, with the result that photon fields corresponding to more than just a single electromagnetic mode contribute to the light-matter interaction in these platforms. Moreover, loss and dissipation, in the form of absorption or radiation, must also be included in the theoretical description of polaritons. Here, we review and offer our own perspective on some of the work recently done in the modelling of interacting molecular and optical states with increasing complexity.
# 量子コンピューティング : 化学・生化学技術者の基礎・動向・展望

Quantum Computing: Fundamentals, Trends and Perspectives for Chemical and Biochemical Engineers ( http://arxiv.org/abs/2201.02823v1 )

ライセンス: Link先を確認
We use the benefits and components of classical computers every day. However, there are many types of problems which, as they grow in size, their computational complexity grows larger than classical computers will ever be able to solve. Quantum computing (QC) is a computation model that uses quantum physical properties to solve such problems. QC is at the early stage of large-scale adoption in various industry domains to take advantage of the algorithmic speed-ups it has to offer. It can be applied in a variety of areas, such as computer science, mathematics, chemical and biochemical engineering, and the financial industry. The main goal of this paper is to give an overview to chemical and biochemical researchers and engineers who may not be familiar with quantum computation. Thus, the paper begins by explaining the fundamental concepts of QC. The second contribution this publication tries to tackle is the fact that the chemical engineering literature still lacks a comprehensive review of the recent advances of QC. Therefore, this article reviews and summarizes the state of the art to gain insight into how quantum computation can benefit and optimize chemical engineering issues. A bibliography analysis covers the comprehensive literature in QC and analyzes quantum computing research in chemical engineering on various publication topics, using Clarivate analytics covering the years 1990 to 2020. After the bibliographic analysis, relevant applications of QC in chemical and biochemical engineering are highlighted and a conclusion offers an outlook of future directions within the field.
# 会員プライバシと敵対的ロバストな学習のトレードオフ

Trade-offs between membership privacy & adversarially robust learning ( http://arxiv.org/abs/2006.04622v2 )

ライセンス: Link先を確認
Historically, machine learning methods have not been designed with security in mind. In turn, this has given rise to adversarial examples, carefully perturbed input samples aimed to mislead detection at test time, which have been applied to attack spam and malware classification, and more recently to attack image classification. Consequently, an abundance of research has been devoted to designing machine learning methods that are robust to adversarial examples. Unfortunately, there are desiderata besides robustness that a secure and safe machine learning model must satisfy, such as fairness and privacy. Recent work by Song et al. (2019) has shown, empirically, that there exists a trade-off between robust and private machine learning models. Models designed to be robust to adversarial examples often overfit on training data to a larger extent than standard (non-robust) models. If a dataset contains private information, then any statistical test that separates training and test data by observing a model's outputs can represent a privacy breach, and if a model overfits on training data, these statistical tests become easier. In this work, we identify settings where standard models will overfit to a larger extent in comparison to robust models, and as empirically observed in previous works, settings where the opposite behavior occurs. Thus, it is not necessarily the case that privacy must be sacrificed to achieve robustness. The degree of overfitting naturally depends on the amount of data available for training. We go on to characterize how the training set size factors into the privacy risks exposed by training a robust model on a simple Gaussian data task, and show empirically that our findings hold on image classification benchmark datasets, such as CIFAR-10 and CIFAR-100.
# Renyiフィルタによる個人プライバシ会計

Individual Privacy Accounting via a Renyi Filter ( http://arxiv.org/abs/2008.11193v4 )

ライセンス: Link先を確認
We consider a sequential setting in which a single dataset of individuals is used to perform adaptively-chosen analyses, while ensuring that the differential privacy loss of each participant does not exceed a pre-specified privacy budget. The standard approach to this problem relies on bounding a worst-case estimate of the privacy loss over all individuals and all possible values of their data, for every single analysis. Yet, in many scenarios this approach is overly conservative, especially for "typical" data points which incur little privacy loss by participation in most of the analyses. In this work, we give a method for tighter privacy loss accounting based on the value of a personalized privacy loss estimate for each individual in each analysis. To implement the accounting method we design a filter for R\'enyi differential privacy. A filter is a tool that ensures that the privacy parameter of a composed sequence of algorithms with adaptively-chosen privacy parameters does not exceed a pre-specified budget. Our filter is simpler and tighter than the known filter for $(\epsilon,\delta)$-differential privacy by Rogers et al. We apply our results to the analysis of noisy gradient descent and show that personalized accounting can be practical, easy to implement, and can only make the privacy-utility tradeoff tighter.
# 部分観測マルコフ決定過程における有限記憶フィードバックポリシーの近似的最適性

Near Optimality of Finite Memory Feedback Policies in Partially Observed Markov Decision Processes ( http://arxiv.org/abs/2010.07452v2 )

ライセンス: Link先を確認
In the theory of Partially Observed Markov Decision Processes (POMDPs), existence of optimal policies have in general been established via converting the original partially observed stochastic control problem to a fully observed one on the belief space, leading to a belief-MDP. However, computing an optimal policy for this fully observed model, and so for the original POMDP, using classical dynamic or linear programming methods is challenging even if the original system has finite state and action spaces, since the state space of the fully observed belief-MDP model is always uncountable. Furthermore, there exist very few rigorous value function approximation and optimal policy approximation results, as regularity conditions needed often require a tedious study involving the spaces of probability measures leading to properties such as Feller continuity. In this paper, we study a planning problem for POMDPs where the system dynamics and measurement channel model are assumed to be known. We construct an approximate belief model by discretizing the belief space using only finite window information variables. We then find optimal policies for the approximate model and we rigorously establish near optimality of the constructed finite window control policies in POMDPs under mild non-linear filter stability conditions and the assumption that the measurement and action sets are finite (and the state space is real vector valued). We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound, where the rate of convergence is exponential under explicit and testable exponential filter stability conditions. While there exist many experimental results and few rigorous asymptotic convergence results, an explicit rate of convergence result is new in the literature, to our knowledge.
# 医用画像からのレポート自動生成のための深層学習と説明可能性に関する調査

A Survey on Deep Learning and Explainability for Automatic Report Generation from Medical Images ( http://arxiv.org/abs/2010.10563v2 )

ライセンス: Link先を確認
Every year physicians face an increasing demand of image-based diagnosis from patients, a problem that can be addressed with recent artificial intelligence methods. In this context, we survey works in the area of automatic report generation from medical images, with emphasis on methods using deep neural networks, with respect to: (1) Datasets, (2) Architecture Design, (3) Explainability and (4) Evaluation Metrics. Our survey identifies interesting developments, but also remaining challenges. Among them, the current evaluation of generated reports is especially weak, since it mostly relies on traditional Natural Language Processing (NLP) metrics, which do not accurately capture medical correctness.
# BIGPrior:画像修復における先行幻覚とデータ忠実性の分離を目指して

BIGPrior: Towards Decoupling Learned Prior Hallucination and Data Fidelity in Image Restoration ( http://arxiv.org/abs/2011.01406v3 )

ライセンス: Link先を確認
Classic image-restoration algorithms use a variety of priors, either implicitly or explicitly. Their priors are hand-designed and their corresponding weights are heuristically assigned. Hence, deep learning methods often produce superior image restoration quality. Deep networks are, however, capable of inducing strong and hardly predictable hallucinations. Networks implicitly learn to be jointly faithful to the observed data while learning an image prior; and the separation of original data and hallucinated data downstream is then not possible. This limits their wide-spread adoption in image restoration. Furthermore, it is often the hallucinated part that is victim to degradation-model overfitting. We present an approach with decoupled network-prior based hallucination and data fidelity terms. We refer to our framework as the Bayesian Integration of a Generative Prior (BIGPrior). Our method is rooted in a Bayesian framework and tightly connected to classic restoration methods. In fact, it can be viewed as a generalization of a large family of classic restoration algorithms. We use network inversion to extract image prior information from a generative network. We show that, on image colorization, inpainting and denoising, our framework consistently improves the inversion results. Our method, though partly reliant on the quality of the generative network inversion, is competitive with state-of-the-art supervised and task-specific restoration methods. It also provides an additional metric that sets forth the degree of prior reliance per pixel relative to data fidelity.
# (参考訳) 知識の追跡:調査

Knowledge Tracing: A Survey ( http://arxiv.org/abs/2201.06953v1 )

ライセンス: CC BY 4.0
Humans ability to transfer knowledge through teaching is one of the essential aspects for human intelligence. A human teacher can track the knowledge of students to customize the teaching on students needs. With the rise of online education platforms, there is a similar need for machines to track the knowledge of students and tailor their learning experience. This is known as the Knowledge Tracing (KT) problem in the literature. Effectively solving the KT problem would unlock the potential of computer-aided education applications such as intelligent tutoring systems, curriculum learning, and learning materials' recommendation. Moreover, from a more general viewpoint, a student may represent any kind of intelligent agents including both human and artificial agents. Thus, the potential of KT can be extended to any machine teaching application scenarios which seek for customizing the learning experience for a student agent (i.e., a machine learning model). In this paper, we provide a comprehensive and systematic review for the KT literature. We cover a broad range of methods starting from the early attempts to the recent state-of-the-art methods using deep learning, while highlighting the theoretical aspects of models and the characteristics of benchmark datasets. Besides these, we shed light on key modelling differences between closely related methods and summarize them in an easy-to-understand format. Finally, we discuss current research gaps in the KT literature and possible future research and application directions.
# (参考訳) beyond 5g通信システムにおける電力制御, リンク適応, キャパシティの共同改善のための機械学習に基づくアルゴリズム

A Machine Learning Based Algorithm for Joint Improvement of Power Control, link adaptation, and Capacity in Beyond 5G Communication systems ( http://arxiv.org/abs/2201.07090v1 )

ライセンス: CC BY 4.0
In this study, we propose a novel machine learning based algorithm to improve the performance of beyond 5 generation (B5G) wireless communication system that is assisted by Orthogonal Frequency Division Multiplexing (OFDM) and Non-Orthogonal Multiple Access (NOMA) techniques. The non-linear soft margin support vector machine (SVM) problem is used to provide an automatic modulation classifier (AMC) and a signal power to noise and interference ratio (SINR) estimator. The estimation results of AMC and SINR are used to reassign the modulation type, codding rate, and transmit power through frames of eNode B connections. The AMC success rate versus SINR, total power consuming, and sum capacity are evaluated for OFDM-NOMA assisted 5G system. Results show improvement of success rate compared of some published method. Furthermore, the algorithm directly computes SINR after signal is detected by successive interference cancellation (SIC) and before any signal decoding. Moreover, because of the direct sense of physical channel, the presented algorithm can discount occupied symbols (overhead signaling) for channel quality information (CQI) in network communication signaling. The results also prove that the proposed algorithm reduces the total power consumption and increases the sum capacity through the eNode B connections. Simulation results in compare to other algorithms show more successful AMC, efficient SINR estimator, easier practical implantation, less overhead signaling, less power consumption, and more capacity achievement.
# 機械学習を用いた光ネットワークにおける信号損失予測

Forecasting Loss of Signal in Optical Networks with Machine Learning ( http://arxiv.org/abs/2201.07089v1 )

ライセンス: Link先を確認
Loss of Signal (LOS) represents a significant cost for operators of optical networks. By studying large sets of real-world Performance Monitoring (PM) data collected from six international optical networks, we find that it is possible to forecast LOS events with good precision 1-7 days before they occur, albeit at relatively low recall, with supervised machine learning (ML). Our study covers twelve facility types, including 100G lines and ETH10G clients. We show that the precision for a given network improves when training on multiple networks simultaneously relative to training on an individual network. Furthermore, we show that it is possible to forecast LOS from all facility types and all networks with a single model, whereas fine-tuning for a particular facility or network only brings modest improvements. Hence our ML models remain effective for optical networks previously unknown to the model, which makes them usable for commercial applications.
# beyond modeling: 効率的な環境政策分析のためのnlpパイプライン

Beyond modeling: NLP Pipeline for efficient environmental policy analysis ( http://arxiv.org/abs/2201.07105v1 )

ライセンス: Link先を確認
As we enter the UN Decade on Ecosystem Restoration, creating effective incentive structures for forest and landscape restoration has never been more critical. Policy analysis is necessary for policymakers to understand the actors and rules involved in restoration in order to shift economic and financial incentives to the right places. Classical policy analysis is resource-intensive and complex, lacks comprehensive central information sources, and is prone to overlapping jurisdictions. We propose a Knowledge Management Framework based on Natural Language Processing (NLP) techniques that would tackle these challenges and automate repetitive tasks, reducing the policy analysis process from weeks to minutes. Our framework was designed in collaboration with policy analysis experts and made to be platform-, language- and policy-agnostic. In this paper, we describe the design of the NLP pipeline, review the state-of-the-art methods for each of its components, and discuss the challenges that rise when building a framework oriented towards policy analysis.
# テキスト分類における停止セットがアクティブ学習の停止に及ぼす影響

Impact of Stop Sets on Stopping Active Learning for Text Classification ( http://arxiv.org/abs/2201.05460v1 )

ライセンス: Link先を確認
Active learning is an increasingly important branch of machine learning and a powerful technique for natural language processing. The main advantage of active learning is its potential to reduce the amount of labeled data needed to learn high-performing models. A vital aspect of an effective active learning algorithm is the determination of when to stop obtaining additional labeled data. Several leading state-of-the-art stopping methods use a stop set to help make this decision. However, there has been relatively less attention given to the choice of stop set than to the stopping algorithms that are applied on the stop set. Different choices of stop sets can lead to significant differences in stopping method performance. We investigate the impact of different stop set choices on different stopping methods. This paper shows the choice of the stop set can have a significant impact on the performance of stopping methods and the impact is different for stability-based methods from that on confidence-based methods. Furthermore, the unbiased representative stop sets suggested by original authors of methods work better than the systematically biased stop sets used in recently published work, and stopping methods based on stabilizing predictions have stronger performance than confidence-based stopping methods when unbiased representative stop sets are used. We provide the largest quantity of experimental results on the impact of stop sets to date. The findings are important for helping to illuminate the impact of this important aspect of stopping methods that has been under-considered in recently published work and that can have a large practical impact on the performance of stopping methods for important semantic computing applications such as technology assisted review and text classification more broadly.
# (参考訳) クロスエントロピー損失による多視点非負行列分解判別学習

Multi-View Non-negative Matrix Factorization Discriminant Learning via Cross Entropy Loss ( http://arxiv.org/abs/2201.04726v1 )

ライセンス: CC BY 4.0
Multi-view learning accomplishes the task objectives of classification by leverag-ing the relationships between different views of the same object. Most existing methods usually focus on consistency and complementarity between multiple views. But not all of this information is useful for classification tasks. Instead, it is the specific discriminating information that plays an important role. Zhong Zhang et al. explore the discriminative and non-discriminative information exist-ing in common and view-specific parts among different views via joint non-negative matrix factorization. In this paper, we improve this algorithm on this ba-sis by using the cross entropy loss function to constrain the objective function better. At last, we implement better classification effect than original on the same data sets and show its superiority over many state-of-the-art algorithms.
# VGAER:グラフニューラルネットワークを用いたコミュニティ検出

VGAER: graph neural network reconstruction based community detection ( http://arxiv.org/abs/2201.04066v1 )

ライセンス: Link先を確認
Community detection is a fundamental and important issue in network science, but there are only a few community detection algorithms based on graph neural networks, among which unsupervised algorithms are almost blank. By fusing the high-order modularity information with network features, this paper proposes a Variational Graph AutoEncoder Reconstruction based community detection VGAER for the first time, and gives its non-probabilistic version. They do not need any prior information. We have carefully designed corresponding input features, decoder, and downstream tasks based on the community detection task and these designs are concise, natural, and perform well (NMI values under our design are improved by 59.1% - 565.9%). Based on a series of experiments with wide range of datasets and advanced methods, VGAER has achieved superior performance and shows strong competitiveness and potential with a simpler design. Finally, we report the results of algorithm convergence analysis and t-SNE visualization, which clearly depicted the stable performance and powerful network modularity ability of VGAER. Our codes are available at https://github.com/qcydm/VGAER.
# (参考訳) 画像美的品質評価のための擬似ラベリングとメタリヘアリング学習

Pseudo-labelling and Meta Reweighting Learning for Image Aesthetic Quality Assessment ( http://arxiv.org/abs/2201.02714v1 )

ライセンス: CC BY 4.0
In the tasks of image aesthetic quality evaluation, it is difficult to reach both the high score area and low score area due to the normal distribution of aesthetic datasets. To reduce the error in labeling and solve the problem of normal data distribution, we propose a new aesthetic mixed dataset with classification and regression called AMD-CR, and we train a meta reweighting network to reweight the loss of training data differently. In addition, we provide a training strategy acccording to different stages, based on pseudo labels of the binary classification task, and then we use it for aesthetic training acccording to different stages in classification and regression tasks. In the construction of the network structure, we construct an aesthetic adaptive block (AAB) structure that can adapt to any size of the input images. Besides, we also use the efficient channel attention (ECA) to strengthen the feature extracting ability of each task. The experimental result shows that our method improves 0.1112 compared with the conventional methods in SROCC. The method can also help to find best aesthetic path planning for unmanned aerial vehicles (UAV) and vehicles.
# (参考訳) 機械学習による疾患診断:ビブリオメトリ分析

Machine Learning-Based Disease Diagnosis:A Bibliometric Analysis ( http://arxiv.org/abs/2201.02755v1 )

ライセンス: CC BY 4.0
Machine Learning (ML) has garnered considerable attention from researchers and practitioners as a new and adaptable tool for disease diagnosis. With the advancement of ML and the proliferation of papers and research in this field, a complete examination of Machine Learning-Based Disease Diagnosis (MLBDD) is required. From a bibliometrics standpoint, this article comprehensively studies MLBDD papers from 2012 to 2021. Consequently, with particular keywords, 1710 papers with associate information have been extracted from the Scopus and Web of Science (WOS) database and integrated into the excel datasheet for further analysis. First, we examine the publication structures based on yearly publications and the most productive countries/regions, institutions, and authors. Second, the co-citation networks of countries/regions, institutions, authors, and articles are visualized using R-studio software. They are further examined in terms of citation structure and the most influential ones. This article gives an overview of MLBDD for researchers interested in the subject and conducts a thorough and complete study of MLBDD for those interested in conducting more research in this field.
# (参考訳) 視覚トランスフォーマーの四分木注意

QuadTree Attention for Vision Transformers ( http://arxiv.org/abs/2201.02767v1 )

ライセンス: CC BY 4.0
Transformers have been successful in many vision tasks, thanks to their capability of capturing long-range dependency. However, their quadratic computational complexity poses a major obstacle for applying them to vision tasks requiring dense predictions, such as object detection, feature matching, stereo, etc. We introduce QuadTree Attention, which reduces the computational complexity from quadratic to linear. Our quadtree transformer builds token pyramids and computes attention in a coarse-to-fine manner. At each level, the top K patches with the highest attention scores are selected, such that at the next level, attention is only evaluated within the relevant regions corresponding to these top K patches. We demonstrate that quadtree attention achieves state-of-the-art performance in various vision tasks, e.g. with 4.0% improvement in feature matching on ScanNet, about 50% flops reduction in stereo matching, 0.4-1.5% improvement in top-1 accuracy on ImageNet classification, 1.2-1.8% improvement on COCO object detection, and 0.7-2.4% improvement on semantic segmentation over previous state-of-the-art transformers. The codes are available at https://github.com/Tangshitao/QuadtreeAttention}{https://github.com/Tangshitao/QuadtreeAttention.
# (参考訳) 逆数支配入力を用いた垂直協調学習システムへの攻撃

Attacking Vertical Collaborative Learning System Using Adversarial Dominating Inputs ( http://arxiv.org/abs/2201.02775v1 )

ライセンス: CC BY 4.0
Vertical collaborative learning system also known as vertical federated learning (VFL) system has recently become prominent as a concept to process data distributed across many individual sources without the need to centralize it. Multiple participants collaboratively train models based on their local data in a privacy-preserving manner. To date, VFL has become a de facto solution to securely learn a model among organizations, allowing knowledge to be shared without compromising privacy of any individual organizations. Despite the prosperous development of VFL systems, we find that certain inputs of a participant, named adversarial dominating inputs (ADIs), can dominate the joint inference towards the direction of the adversary's will and force other (victim) participants to make negligible contributions, losing rewards that are usually offered regarding the importance of their contributions in collaborative learning scenarios. We conduct a systematic study on ADIs by first proving their existence in typical VFL systems. We then propose gradient-based methods to synthesize ADIs of various formats and exploit common VFL systems. We further launch greybox fuzz testing, guided by the resiliency score of "victim" participants, to perturb adversary-controlled inputs and systematically explore the VFL attack surface in a privacy-preserving manner. We conduct an in-depth study on the influence of critical parameters and settings in synthesizing ADIs. Our study reveals new VFL attack opportunities, promoting the identification of unknown threats before breaches and building more secure VFL systems.
# (参考訳) AI強化CAIツールの最大許容レイテンシの定義

Defining maximum acceptable latency of AI-enhanced CAI tools ( http://arxiv.org/abs/2201.02792v1 )

ライセンス: CC BY 4.0
Recent years have seen an increasing number of studies around the design of computer-assisted interpreting tools with integrated automatic speech processing and their use by trainees and professional interpreters. This paper discusses the role of system latency of such tools and presents the results of an experiment designed to investigate the maximum system latency that is cognitively acceptable for interpreters working in the simultaneous modality. The results show that interpreters can cope with a system latency of 3 seconds without any major impact in the rendition of the original text, both in terms of accuracy and fluency. This value is above the typical latency of available AI-based CAI tools and paves the way to experiment with larger context-based language models and higher latencies.
# (参考訳) 自動医療コーディングのための深層学習の統一的レビュー

A Unified Review of Deep Learning for Automated Medical Coding ( http://arxiv.org/abs/2201.02797v1 )

ライセンス: CC BY 4.0
Automated medical coding, an essential task for healthcare operation and delivery, makes unstructured data manageable by predicting medical codes from clinical documents. Recent advances in deep learning models in natural language processing have been widely applied to this task. However, it lacks a unified view of the design of neural network architectures for medical coding. This review proposes a unified framework to provide a general understanding of the building blocks of medical coding models and summarizes recent advanced models under the proposed framework. Our unified framework decomposes medical coding into four main components, i.e., encoder modules for text feature extraction, mechanisms for building deep encoder architectures, decoder modules for transforming hidden representations into medical codes, and the usage of auxiliary information. Finally, we discuss key research challenges and future directions.
# (参考訳) RARA: 前景を追尾するゼロショットSim2ビジュアルナビゲーション

RARA: Zero-shot Sim2Real Visual Navigation with Following Foreground Cues ( http://arxiv.org/abs/2201.02798v1 )

ライセンス: CC BY 4.0
The gap between simulation and the real-world restrains many machine learning breakthroughs in computer vision and reinforcement learning from being applicable in the real world. In this work, we tackle this gap for the specific case of camera-based navigation, formulating it as following a visual cue in the foreground with arbitrary backgrounds. The visual cue in the foreground can often be simulated realistically, such as a line, gate or cone. The challenge then lies in coping with the unknown backgrounds and integrating both. As such, the goal is to train a visual agent on data captured in an empty simulated environment except for this foreground cue and test this model directly in a visually diverse real world. In order to bridge this big gap, we show it's crucial to combine following techniques namely: Randomized augmentation of the fore- and background, regularization with both deep supervision and triplet loss and finally abstraction of the dynamics by using waypoints rather than direct velocity commands. The various techniques are ablated in our experimental results both qualitatively and quantitatively finally demonstrating a successful transfer from simulation to the real world.
# (参考訳) 注意によるクラスタリングテキスト

Clustering Text Using Attention ( http://arxiv.org/abs/2201.02816v1 )

ライセンス: CC BY 4.0
Clustering Text has been an important problem in the domain of Natural Language Processing. While there are techniques to cluster text based on using conventional clustering techniques on top of contextual or non-contextual vector space representations, it still remains a prevalent area of research possible to various improvements in performance and implementation of these techniques. This paper discusses a novel technique to cluster text using attention mechanisms. Attention Mechanisms have proven to be highly effective in various NLP tasks in recent times. This paper extends the idea of attention mechanism in clustering space and sheds some light on a whole new area of research
# (参考訳) 完全畳み込みネットワークによる空間多重化を実現する再構成可能なインテリジェント表面

Reconfigurable Intelligent Surface Enabled Spatial Multiplexing with Fully Convolutional Network ( http://arxiv.org/abs/2201.02834v1 )

ライセンス: CC BY 4.0
Reconfigurable intelligent surface (RIS) is an emerging technology for future wireless communication systems. In this work, we consider downlink spatial multiplexing enabled by the RIS for weighted sum-rate (WSR) maximization. In the literature, most solutions use alternating gradient-based optimization, which has moderate performance, high complexity, and limited scalability. We propose to apply a fully convolutional network (FCN) to solve this problem, which was originally designed for semantic segmentation of images. The rectangular shape of the RIS and the spatial correlation of channels with adjacent RIS antennas due to the short distance between them encourage us to apply it for the RIS configuration. We design a set of channel features that includes both cascaded channels via the RIS and the direct channel. In the base station (BS), the differentiable minimum mean squared error (MMSE) precoder is used for pretraining and the weighted minimum mean squared error (WMMSE) precoder is then applied for fine-tuning, which is nondifferentiable, more complex, but achieves a better performance. Evaluation results show that the proposed solution has higher performance and allows for a faster evaluation than the baselines. Hence it scales better to a large number of antennas, advancing the RIS one step closer to practical deployment.
# (参考訳) uav車両再識別のための自己整合型空間特徴抽出ネットワーク

Self-aligned Spatial Feature Extraction Network for UAV Vehicle Re-identification ( http://arxiv.org/abs/2201.02836v1 )

ライセンス: CC BY 4.0
Compared with existing vehicle re-identification (ReID) tasks conducted with datasets collected by fixed surveillance cameras, vehicle ReID for unmanned aerial vehicle (UAV) is still under-explored and could be more challenging. Vehicles with the same color and type show extremely similar appearance from the UAV's perspective so that mining fine-grained characteristics becomes necessary. Recent works tend to extract distinguishing information by regional features and component features. The former requires input images to be aligned and the latter entails detailed annotations, both of which are difficult to meet in UAV application. In order to extract efficient fine-grained features and avoid tedious annotating work, this letter develops an unsupervised self-aligned network consisting of three branches. The network introduced a self-alignment module to convert the input images with variable orientations to a uniform orientation, which is implemented under the constraint of triple loss function designed with spatial features. On this basis, spatial features, obtained by vertical and horizontal segmentation methods, and global features are integrated to improve the representation ability in embedded space. Extensive experiments are conducted on UAV-VeID dataset, and our method achieves the best performance compared with recent ReID works.
# (参考訳) クロスシナリオビデオ時間グラウンドにおける学習サンプルの重要性

Learning Sample Importance for Cross-Scenario Video Temporal Grounding ( http://arxiv.org/abs/2201.02848v1 )

ライセンス: CC BY 4.0
The task of temporal grounding aims to locate video moment in an untrimmed video, with a given sentence query. This paper for the first time investigates some superficial biases that are specific to the temporal grounding task, and proposes a novel targeted solution. Most alarmingly, we observe that existing temporal ground models heavily rely on some biases (e.g., high preference on frequent concepts or certain temporal intervals) in the visual modal. This leads to inferior performance when generalizing the model in cross-scenario test setting. To this end, we propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases and enforce it to ground the query sentence based on true inter-modal relationship. Debias-TLL simultaneously trains two models. By our design, a large discrepancy of these two models' predictions when judging a sample reveals higher probability of being a biased sample. Harnessing the informative discrepancy, we devise a data re-weighing scheme for mitigating the data biases. We evaluate the proposed model in cross-scenario temporal grounding, where the train / test data are heterogeneously sourced. Experiments show large-margin superiority of the proposed method in comparison with state-of-the-art competitors.
# (参考訳) スケルトンベース動作認識のための時空間タプル変換器

Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition ( http://arxiv.org/abs/2201.02849v1 )

ライセンス: CC BY 4.0
Capturing the dependencies between joints is critical in skeleton-based action recognition task. Transformer shows great potential to model the correlation of important joints. However, the existing Transformer-based methods cannot capture the correlation of different joints between frames, which the correlation is very useful since different body parts (such as the arms and legs in "long jump") between adjacent frames move together. Focus on this problem, A novel spatio-temporal tuples Transformer (STTFormer) method is proposed. The skeleton sequence is divided into several parts, and several consecutive frames contained in each part are encoded. And then a spatio-temporal tuples self-attention module is proposed to capture the relationship of different joints in consecutive frames. In addition, a feature aggregation module is introduced between non-adjacent frames to enhance the ability to distinguish similar actions. Compared with the state-of-the-art methods, our method achieves better performance on two large-scale datasets.
# (参考訳) 機械ビジョンを用いたフェイクヒルサ魚検出

Fake Hilsa Fish Detection Using Machine Vision ( http://arxiv.org/abs/2201.02853v1 )

ライセンス: CC BY 4.0
Hilsa is the national fish of Bangladesh. Bangladesh is earning a lot of foreign currency by exporting this fish. Unfortunately, in recent days, some unscrupulous businessmen are selling fake Hilsa fishes to gain profit. The Sardines and Sardinella are the most sold in the market as Hilsa. The government agency of Bangladesh, namely Bangladesh Food Safety Authority said that these fake Hilsa fish contain high levels of cadmium and lead which are detrimental for humans. In this research, we have proposed a method that can readily identify original Hilsa fish and fake Hilsa fish. Based on the research available on online literature, we are the first to do research on identifying original Hilsa fish. We have collected more than 16,000 images of original and counterfeit Hilsa fish. To classify these images, we have used several deep learning-based models. Then, the performance has been compared between them. Among those models, DenseNet201 achieved the highest accuracy of 97.02%.
# (参考訳) cryo-electron microscopにおけるボリューム再構成のための深部生成モデル

Deep Generative Modeling for Volume Reconstruction in Cryo-Electron Microscop ( http://arxiv.org/abs/2201.02867v1 )

ライセンス: CC BY 4.0
Recent breakthroughs in high resolution imaging of biomolecules in solution with cryo-electron microscopy (cryo-EM) have unlocked new doors for the reconstruction of molecular volumes, thereby promising further advances in biology, chemistry, and pharmacological research amongst others. Despite significant headway, the immense challenges in cryo-EM data analysis remain legion and intricately inter-disciplinary in nature, requiring insights from physicists, structural biologists, computer scientists, statisticians, and applied mathematicians. Meanwhile, recent next-generation volume reconstruction algorithms that combine generative modeling with end-to-end unsupervised deep learning techniques have shown promising results on simulated data, but still face considerable hurdles when applied to experimental cryo-EM images. In light of the proliferation of such methods and given the interdisciplinary nature of the task, we propose here a critical review of recent advances in the field of deep generative modeling for high resolution cryo-EM volume reconstruction. The present review aims to (i) compare and contrast these new methods, while (ii) presenting them from a perspective and using terminology familiar to scientists in each of the five aforementioned fields with no specific background in cryo-EM. The review begins with an introduction to the mathematical and computational challenges of deep generative models for cryo-EM volume reconstruction, along with an overview of the baseline methodology shared across this class of algorithms. Having established the common thread weaving through these different models, we provide a practical comparison of these state-of-the-art algorithms, highlighting their relative strengths and weaknesses, along with the assumptions that they rely on. This allows us to identify bottlenecks in current methods and avenues for future research.
# (参考訳) 新しいモジュールアーキテクチャを用いた強化学習における政策・損失・計画の組み合わせの評価

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning using a New Modular Architecture ( http://arxiv.org/abs/2201.02874v1 )

ライセンス: CC BY 4.0
The model-based reinforcement learning paradigm, which uses planning algorithms and neural network models, has recently achieved unprecedented results in diverse applications, leading to what is now known as deep reinforcement learning. These agents are quite complex and involve multiple components, factors that can create challenges for research. In this work, we propose a new modular software architecture suited for these types of agents, and a set of building blocks that can be easily reused and assembled to construct new model-based reinforcement learning agents. These building blocks include planning algorithms, policies, and loss functions. We illustrate the use of this architecture by combining several of these building blocks to implement and test agents that are optimized to three different test environments: Cartpole, Minigrid, and Tictactoe. One particular planning algorithm, made available in our implementation and not previously used in reinforcement learning, which we called averaged minimax, achieved good results in the three tested environments. Experiments performed with this architecture have shown that the best combination of planning algorithm, policy, and loss function is heavily problem dependent. This result provides evidence that the proposed architecture, which is modular and reusable, is useful for reinforcement learning researchers who want to study new environments and techniques.
# (参考訳) 機能的細粒間ネットワークによるデフォーカスデブラル顕微鏡

Defocus Deblur Microscopy via feature interactive coarse-to-fine network ( http://arxiv.org/abs/2201.02876v1 )

ライセンス: CC0 1.0
The clarity of microscopic images is vital in biology research and diagnosis. When taking microscopy images at cell or molecule level, mechanical drift occurs and could be difficult and expansive to counter. Such a problem could be overcome by developing an end-to-end deep learning-based workflow capable of predicting in focused microscopic images from out-of-focused counterparts. In our model, we adopt a structure of multi-level U-net, each level connected head-to-tail with corresponding convolution layers from each other. In contrast to the conventional coarse-to-fine model, our model uses the knowledge distilled from the coarse network transferred to the finer network. We evaluate the performance of our model and found our method to be effective and has a better performance by comparing the results with existing models.
# (参考訳) 遅延ラグランジアンによるオンライン学習の予測

Lazy Lagrangians with Predictions for Online Learning ( http://arxiv.org/abs/2201.02890v1 )

ライセンス: CC BY 4.0
We consider the general problem of online convex optimization with time-varying additive constraints in the presence of predictions for the next cost and constraint functions. A novel primal-dual algorithm is designed by combining a Follow-The-Regularized-Leader iteration with prediction-adaptive dynamic steps. The algorithm achieves $\mathcal O(T^{\frac{3-\beta}{4}})$ regret and $\mathcal O(T^{\frac{1+\beta}{2}})$ constraint violation bounds that are tunable via parameter $\beta\!\in\![1/2,1)$ and have constant factors that shrink with the predictions quality, achieving eventually $\mathcal O(1)$ regret for perfect predictions. Our work extends the FTRL framework for this constrained OCO setting and outperforms the respective state-of-the-art greedy-based solutions, without imposing conditions on the quality of predictions, the cost functions or the geometry of constraints, beyond convexity.
# SGUIE-Net:マルチスケール知覚を用いた意味的注意誘導水中画像強調

SGUIE-Net: Semantic Attention Guided Underwater Image Enhancement with Multi-Scale Perception ( http://arxiv.org/abs/2201.02832v1 )

ライセンス: Link先を確認
Due to the wavelength-dependent light attenuation, refraction and scattering, underwater images usually suffer from color distortion and blurred details. However, due to the limited number of paired underwater images with undistorted images as reference, training deep enhancement models for diverse degradation types is quite difficult. To boost the performance of data-driven approaches, it is essential to establish more effective learning mechanisms that mine richer supervised information from limited training sample resources. In this paper, we propose a novel underwater image enhancement network, called SGUIE-Net, in which we introduce semantic information as high-level guidance across different images that share common semantic regions. Accordingly, we propose semantic region-wise enhancement module to perceive the degradation of different semantic regions from multiple scales and feed it back to the global attention features extracted from its original scale. This strategy helps to achieve robust and visually pleasant enhancements to different semantic objects, which should thanks to the guidance of semantic information for differentiated enhancement. More importantly, for those degradation types that are not common in the training sample distribution, the guidance connects them with the already well-learned types according to their semantic relevance. Extensive experiments on the publicly available datasets and our proposed dataset demonstrated the impressive performance of SGUIE-Net. The code and proposed dataset are available at: https://trentqq.github.io/SGUIE-Net.html
# 動的単画素イメージングとセンシングのための重み付き符号化最適化

Weighted Encoding Optimization for Dynamic Single-pixel Imaging and Sensing ( http://arxiv.org/abs/2201.02833v1 )

ライセンス: Link先を確認
Using single-pixel detection, the end-to-end neural network that jointly optimizes both encoding and decoding enables high-precision imaging and high-level semantic sensing. However, for varied sampling rates, the large-scale network requires retraining that is laboursome and computation-consuming. In this letter, we report a weighted optimization technique for dynamic rate-adaptive single-pixel imaging and sensing, which only needs to train the network for one time that is available for any sampling rates. Specifically, we introduce a novel weighting scheme in the encoding process to characterize different patterns' modulation efficiency. While the network is training at a high sampling rate, the modulation patterns and corresponding weights are updated iteratively, which produces optimal ranked encoding series when converged. In the experimental implementation, the optimal pattern series with the highest weights are employed for light modulation, thus achieving highly-efficient imaging and sensing. The reported strategy saves the additional training of another low-rate network required by the existing dynamic single-pixel networks, which further doubles training efficiency. Experiments on the MNIST dataset validated that once the network is trained with a sampling rate of 1, the average imaging PSNR reaches 23.50 dB at 0.1 sampling rate, and the image-free classification accuracy reaches up to 95.00\% at a sampling rate of 0.03 and 97.91\% at a sampling rate of 0.1.
# 有毒なレビュー内容が製品全体の感情に及ぼす影響

Effect of Toxic Review Content on Overall Product Sentiment ( http://arxiv.org/abs/2201.02857v1 )

ライセンス: Link先を確認
Toxic contents in online product review are a common phenomenon. A content is perceived to be toxic when it is rude, disrespectful, or unreasonable and make individuals leave the discussion. Machine learning algorithms helps the sell side community to identify such toxic patterns and eventually moderate such inputs. Yet, the extant literature provides fewer information about the sentiment of a prospective consumer on the perception of a product after being exposed to such toxic review content. In this study, we collect a balanced data set of review comments from 18 different players segregated into three different sectors from google play-store. Then we calculate the sentence-level sentiment and toxicity score of individual review content. Finally, we use structural equation modelling to quantitatively study the influence of toxic content on overall product sentiment. We observe that comment toxicity negatively influences overall product sentiment but do not exhibit a mediating effect over reviewer score to influence sector-wise relative rating.
# pre-fall activity identificationを用いた転倒警報システム

A fall alert system with prior-fall activity identification ( http://arxiv.org/abs/2201.02803v1 )

ライセンス: Link先を確認
Falling, especially in the elderly, is a critical issue to care for and surveil. There have been many studies focusing on fall detection. However, from our survey, there is still no research indicating the prior-fall activities, which we believe that they have a strong correlation with the intensity of the fall. The purpose of this research is to develop a fall alert system that also identifies prior-fall activities. First, we want to find a suitable location to attach a sensor to the body. We created multiple-spot on-body devices to collect various activity data. We used that dataset to train 5 different classification models. We selected the XGBoost classification model for detecting a prior-fall activity and the chest location for use in fall detection from a comparison of the detection accuracy. We then tested 3 existing fall detection threshold algorithms to detect fall and fall to their knees first, and selected the 3-phase threshold algorithm of Chaitep and Chawachat [3] in our system. From the experiment, we found that the fall detection accuracy is 88.91%, the fall to their knees first detection accuracy is 91.25%, and the average accuracy of detection of prior-fall activities is 86.25%. Although we use an activity dataset of young to middle-aged adults (18-49 years), we are confident that this system can be developed to monitor activities before the fall, especially in the elderly, so that caretakers can better manage the situation.
# web から製品仕様を抽出する -- 表やリストを超えて

Extraction of Product Specifications from the Web -- Going Beyond Tables and Lists ( http://arxiv.org/abs/2201.02896v1 )

ライセンス: Link先を確認
E-commerce product pages on the web often present product specification data in structured tabular blocks. Extraction of these product attribute-value specifications has benefited applications like product catalogue curation, search, question answering, and others. However, across different Websites, there is a wide variety of HTML elements (like <table>, <ul>, <div>, <span>, <dl> etc.) typically used to render these blocks that makes their automatic extraction a challenge. Most of the current research has focused on extracting product specifications from tables and lists and, therefore, suffers from recall when applied to a large-scale extraction setting. In this paper, we present a product specification extraction approach that goes beyond tables or lists and generalizes across the diverse HTML elements used for rendering specification blocks. Using a combination of hand-coded features and deep learned spatial and token features, we first identify the specification blocks on a product page. We then extract the product attribute-value pairs from these blocks following an approach inspired by wrapper induction. We created a labeled dataset of product specifications extracted from 14,111 diverse specification blocks taken from a range of different product websites. Our experiments show the efficacy of our approach compared to the current specification extraction models and support our claim about its application to large-scale product specification extraction.
# ロバストユーザ支援マルチセグメンテーションのためのベースライン統計法

A Baseline Statistical Method For Robust User-Assisted Multiple Segmentation ( http://arxiv.org/abs/2201.02779v1 )

ライセンス: Link先を確認
Recently, several image segmentation methods that welcome and leverage different types of user assistance have been developed. In these methods, the user inputs can be provided by drawing bounding boxes over image objects, drawing scribbles or planting seeds that help to differentiate between image boundaries or by interactively refining the missegmented image regions. Due to the variety in the types and the amounts of these inputs, relative assessment of different segmentation methods becomes difficult. As a possible solution, we propose a simple yet effective, statistical segmentation method that can handle and utilize different input types and amounts. The proposed method is based on robust hypothesis testing, specifically the DGL test, and can be implemented with time complexity that is linear in the number of pixels and quadratic in the number of image regions. Therefore, it is suitable to be used as a baseline method for quick benchmarking and assessing the relative performance improvements of different types of user-assisted segmentation algorithms. We provide a mathematical analysis on the operation of the proposed method, discuss its capabilities and limitations, provide design guidelines and present simulations that validate its operation.
# ジョイント多段確率電力需要予測のための条件付き近似正規化流れ

Conditional Approximate Normalizing Flows for Joint Multi-Step Probabilistic Electricity Demand Forecasting ( http://arxiv.org/abs/2201.02753v1 )

ライセンス: Link先を確認
Some real-world decision-making problems require making probabilistic forecasts over multiple steps at once. However, methods for probabilistic forecasting may fail to capture correlations in the underlying time-series that exist over long time horizons as errors accumulate. One such application is with resource scheduling under uncertainty in a grid environment, which requires forecasting electricity demand that is inherently noisy, but often cyclic. In this paper, we introduce the conditional approximate normalizing flow (CANF) to make probabilistic multi-step time-series forecasts when correlations are present over long time horizons. We first demonstrate our method's efficacy on estimating the density of a toy distribution, finding that CANF improves the KL divergence by one-third compared to that of a Gaussian mixture model while still being amenable to explicit conditioning. We then use a publicly available household electricity consumption dataset to showcase the effectiveness of CANF on joint probabilistic multi-step forecasting. Empirical results show that conditional approximate normalizing flows outperform other methods in terms of multi-step forecast accuracy and lead to up to 10x better scheduling decisions. Our implementation is available at https://github.com/sisl/JointDemandForecasting.
# 人間-AIチーム決定のモデル化

Modeling Human-AI Team Decision Making ( http://arxiv.org/abs/2201.02759v1 )

ライセンス: Link先を確認
AI and humans bring complementary skills to group deliberations. Modeling this group decision making is especially challenging when the deliberations include an element of risk and an exploration-exploitation process of appraising the capabilities of the human and AI agents. To investigate this question, we presented a sequence of intellective issues to a set of human groups aided by imperfect AI agents. A group's goal was to appraise the relative expertise of the group's members and its available AI agents, evaluate the risks associated with different actions, and maximize the overall reward by reaching consensus. We propose and empirically validate models of human-AI team decision making under such uncertain circumstances, and show the value of socio-cognitive constructs of prospect theory, influence dynamics, and Bayesian learning in predicting the behavior of human-AI groups.
# AnomMAN: マルチビュー分散ネットワーク上の異常を検出する

AnomMAN: Detect Anomaly on Multi-view Attributed Networks ( http://arxiv.org/abs/2201.02822v1 )

ライセンス: Link先を確認
Anomaly detection on attributed networks is widely used in web shopping, financial transactions, communication networks, and so on. However, most work tries to detect anomalies on attributed networks only considering a single interaction action, which cannot consider rich kinds of interaction actions in multi-view attributed networks. In fact, it remains a challenging task to consider all different kinds of interaction actions uniformly and detect anomalous instances in multi-view attributed networks. In this paper, we propose a Graph Convolution based framework, AnomMAN, to detect \textbf{Anom}aly on \textbf{M}ulti-view \textbf{A}ttributed \textbf{N}etworks. To consider the attributes and all interaction actions jointly, we use the attention mechanism to define the importance of all views in networks. Besides, the Graph Convolution operation cannot be simply applied in anomaly detection tasks on account of its low-pass characteristic. Therefore, AnomMAN uses a graph auto-encoder module to overcome the shortcoming and transform it to our strength. According to experiments on real-world datasets, AnomMAN outperforms state-of-the-art models and two variants of our proposed model. Besides, the Accuracy@50 indicator of AnomMAN reaches 1.000 on the dataset, which shows that the top 50 anomalous instances detected by AnomMAN are all anomalous ones.
# 磁気共鳴イメージングに基づくグリオーマグレーディングのための知識誘導幾何表現学習

Expert Knowledge-guided Geometric Representation Learning for Magnetic Resonance Imaging-based Glioma Grading ( http://arxiv.org/abs/2201.02746v1 )

ライセンス: Link先を確認
Radiomics and deep learning have shown high popularity in automatic glioma grading. Radiomics can extract hand-crafted features that quantitatively describe the expert knowledge of glioma grades, and deep learning is powerful in extracting a large number of high-throughput features that facilitate the final classification. However, the performance of existing methods can still be improved as their complementary strengths have not been sufficiently investigated and integrated. Furthermore, lesion maps are usually needed for the final prediction at the testing phase, which is very troublesome. In this paper, we propose an expert knowledge-guided geometric representation learning (ENROL) framework . Geometric manifolds of hand-crafted features and learned features are constructed to mine the implicit relationship between deep learning and radiomics, and therefore to dig mutual consent and essential representation for the glioma grades. With a specially designed manifold discrepancy measurement, the grading model can exploit the input image data and expert knowledge more effectively in the training phase and get rid of the requirement of lesion segmentation maps at the testing phase. The proposed framework is flexible regarding deep learning architectures to be utilized. Three different architectures have been evaluated and five models have been compared, which show that our framework can always generate promising results.
# 教師付きクロスモーダル検索のための視覚言語事前学習モデルの総合的研究

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval ( http://arxiv.org/abs/2201.02772v1 )

ライセンス: Link先を確認
Cross-Modal Retrieval (CMR) is an important research topic across multimodal computing and information retrieval, which takes one type of data as the query to retrieve relevant data of another type, and has been widely used in many real-world applications. Recently, the vision-language pre-trained model represented by CLIP has demonstrated its superiority of learning visual and textual representations and its impressive performance on various vision and language related tasks. Although CLIP as well as the previous pre-trained models have shown great performance improvement in unsupervised CMR, the performance and impact of these pre-trained models on supervised CMR were rarely explored due to the lack of multimodal class-level associations. In this paper, we take CLIP as the current representative vision-language pre-trained model to conduct a comprehensive empirical study and provide insights on its performance and impact on supervised CMR. To this end, we first propose a novel model CLIP4CMR (\textbf{CLIP For} supervised \textbf{C}ross-\textbf{M}odal \textbf{R}etrieval) that employs pre-trained CLIP as backbone network to perform supervised CMR. We then revisit the existing loss function design in CMR, including the most common pair-wise losses, class-wise losses and hybrid ones, and provide insights on applying CLIP. Moreover, we investigate several concerned issues in supervised CMR and provide new perspectives for this field via CLIP4CMR, including the robustness to modality imbalance and the sensitivity to hyper-parameters. Extensive experimental results show that the CLIP4CMR achieves SOTA results with significant improvements on the benchmark datasets Wikipedia, NUS-WIDE, Pascal-Sentence and XmediaNet. Our data and codes are publicly available at https://github.com/zhixiongz/CLIP4CMR.
# スペクトルデータと完全連結ニューラルネットワークを用いたハイパースペクトル画像の分類

Classification of Hyperspectral Images by Using Spectral Data and Fully Connected Neural Network ( http://arxiv.org/abs/2201.02821v1 )

ライセンス: Link先を確認
It is observed that high classification performance is achieved for one- and two-dimensional signals by using deep learning methods. In this context, most researchers have tried to classify hyperspectral images by using deep learning methods and classification success over 90% has been achieved for these images. Deep neural networks (DNN) actually consist of two parts: i) Convolutional neural network (CNN) and ii) fully connected neural network (FCNN). While CNN determines the features, FCNN is used in classification. In classification of the hyperspectral images, it is observed that almost all of the researchers used 2D or 3D convolution filters on the spatial data beside spectral data (features). It is convenient to use convolution filters on images or time signals. In hyperspectral images, each pixel is represented by a signature vector which consists of individual features that are independent of each other. Since the order of the features in the vector can be changed, it doesn't make sense to use convolution filters on these features as on time signals. At the same time, since the hyperspectral images do not have a textural structure, there is no need to use spatial data besides spectral data. In this study, hyperspectral images of Indian pines, Salinas, Pavia centre, Pavia university and Botswana are classified by using only fully connected neural network and the spectral data with one dimensional. An average accuracy of 97.5% is achieved for the test sets of all hyperspectral images.
# CrossMoDA 2021 チャレンジ:前庭ショーナンノーマとコクランセグメンテーションのためのクロスモダリティドメイン適応手法のベンチマーク

CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation ( http://arxiv.org/abs/2201.02831v1 )

ライセンス: Link先を確認
Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice - VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice - VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.
# 構造モデルにおける高速推論のための低ランク制約

Low-Rank Constraints for Fast Inference in Structured Models ( http://arxiv.org/abs/2201.02715v1 )

ライセンス: Link先を確認
Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) require time and space quadratic and cubic in the number of hidden states respectively. This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. We show that by viewing the central inference step as a matrix-vector product and using a low-rank constraint, we can trade off model expressivity and speed via the rank. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces while providing practical speedups.
# 科学的文書のコヒーレンスに基づく分散文書表現学習

Coherence-Based Distributed Document Representation Learning for Scientific Documents ( http://arxiv.org/abs/2201.02846v1 )

ライセンス: Link先を確認
Distributed document representation is one of the basic problems in natural language processing. Currently distributed document representation methods mainly consider the context information of words or sentences. These methods do not take into account the coherence of the document as a whole, e.g., a relation between the paper title and abstract, headline and description, or adjacent bodies in the document. The coherence shows whether a document is meaningful, both logically and syntactically, especially in scientific documents (papers or patents, etc.). In this paper, we propose a coupled text pair embedding (CTPE) model to learn the representation of scientific documents, which maintains the coherence of the document with coupled text pairs formed by segmenting the document. First, we divide the document into two parts (e.g., title and abstract, etc) which construct a coupled text pair. Then, we adopt negative sampling to construct uncoupled text pairs whose two parts are from different documents. Finally, we train the model to judge whether the text pair is coupled or uncoupled and use the obtained embedding of coupled text pairs as the embedding of documents. We perform experiments on three datasets for one information retrieval task and two recommendation tasks. The experimental results verify the effectiveness of the proposed CTPE model.
# 3次元点雲に基づくリアルタイムレール認識

Real-time Rail Recognition Based on 3D Point Clouds ( http://arxiv.org/abs/2201.02726v1 )

ライセンス: Link先を確認
Accurate rail location is a crucial part in the railway support driving system for safety monitoring. LiDAR can obtain point clouds that carry 3D information for the railway environment, especially in darkness and terrible weather conditions. In this paper, a real-time rail recognition method based on 3D point clouds is proposed to solve the challenges, such as disorderly, uneven density and large volume of the point clouds. A voxel down-sampling method is first presented for density balanced of railway point clouds, and pyramid partition is designed to divide the 3D scanning area into the voxels with different volumes. Then, a feature encoding module is developed to find the nearest neighbor points and to aggregate their local geometric features for the center point. Finally, a multi-scale neural network is proposed to generate the prediction results of each voxel and the rail location. The experiments are conducted under 9 sequences of 3D point cloud data for the railway. The results show that the method has good performance in detecting straight, curved and other complex topologies rails.
# ペアワイズクラスバランスによるロングテールインスタンスセグメンテーションの回収

Relieving Long-tailed Instance Segmentation via Pairwise Class Balance ( http://arxiv.org/abs/2201.02784v1 )

ライセンス: Link先を確認
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes. It causes severe biases of the head classes (with majority samples) against the tailed ones. This renders "how to appropriately define and alleviate the bias" one of the most important issues. Prior works mainly use label distribution or mean score information to indicate a coarse-grained bias. In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one. To this end, we propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences. PCB generates fightback soft labels for regularization during training. Besides, an iterative learning paradigm is developed to support a progressive and smooth regularization in such debiasing. PCB can be plugged and played to any existing method as a complement. Experimental results on LVIS demonstrate that our method achieves state-of-the-art performance without bells and whistles. Superior results across various architectures show the generalization ability.
# アクティブサイバー脅威インテリジェンスのための生成逆学習によるダークウェブテキストベースのCAPTCHA

Counteracting Dark Web Text-Based CAPTCHA with Generative Adversarial Learning for Proactive Cyber Threat Intelligence ( http://arxiv.org/abs/2201.02799v1 )

ライセンス: Link先を確認
Automated monitoring of dark web (DW) platforms on a large scale is the first step toward developing proactive Cyber Threat Intelligence (CTI). While there are efficient methods for collecting data from the surface web, large-scale dark web data collection is often hindered by anti-crawling measures. In particular, text-based CAPTCHA serves as the most prevalent and prohibiting type of these measures in the dark web. Text-based CAPTCHA identifies and blocks automated crawlers by forcing the user to enter a combination of hard-to-recognize alphanumeric characters. In the dark web, CAPTCHA images are meticulously designed with additional background noise and variable character length to prevent automated CAPTCHA breaking. Existing automated CAPTCHA breaking methods have difficulties in overcoming these dark web challenges. As such, solving dark web text-based CAPTCHA has been relying heavily on human involvement, which is labor-intensive and time-consuming. In this study, we propose a novel framework for automated breaking of dark web CAPTCHA to facilitate dark web data collection. This framework encompasses a novel generative method to recognize dark web text-based CAPTCHA with noisy background and variable character length. To eliminate the need for human involvement, the proposed framework utilizes Generative Adversarial Network (GAN) to counteract dark web background noise and leverages an enhanced character segmentation algorithm to handle CAPTCHA images with variable character length. Our proposed framework, DW-GAN, was systematically evaluated on multiple dark web CAPTCHA testbeds. DW-GAN significantly outperformed the state-of-the-art benchmark methods on all datasets, achieving over 94.4% success rate on a carefully collected real-world dark web dataset...
# ロボットピッキング用RGB-Dセンサを用いたマッシュルーム検出・位置推定・3次元位置推定

Mushrooms Detection, Localization and 3D Pose Estimation using RGB-D Sensor for Robotic-picking Applications ( http://arxiv.org/abs/2201.02837v1 )

ライセンス: Link先を確認
In this paper, we propose mushrooms detection, localization and 3D pose estimation algorithm using RGB-D data acquired from a low-cost consumer RGB-D sensor. We use the RGB and depth information for different purposes. From RGB color, we first extract initial contour locations of the mushrooms and then provide both the initial contour locations and the original image to active contour for mushrooms segmentation. These segmented mushrooms are then used as input to a circular Hough transform for each mushroom detection including its center and radius. Once each mushroom's center position in the RGB image is known, we then use the depth information to locate it in 3D space i.e. in world coordinate system. In case of missing depth information at the detected center of each mushroom, we estimate from the nearest available depth information within the radius of each mushroom. We also estimate the 3D pose of each mushroom using a pre-prepared upright mushroom model. We use a global registration followed by local refine registration approach for this 3D pose estimation. From the estimated 3D pose, we use only the rotation part expressed in quaternion as an orientation of each mushroom. These estimated (X,Y,Z) positions, diameters and orientations of the mushrooms are used for robotic-picking applications. We carry out extensive experiments on both 3D printed and real mushrooms which show that our method has an interesting performance.
# 制約のないシナリオにおける画像ベース自動ダイヤル計測

Image-based Automatic Dial Meter Reading in Unconstrained Scenarios ( http://arxiv.org/abs/2201.02850v1 )

ライセンス: Link先を確認
The replacement of analog meters with smart meters is costly, laborious, and far from complete in developing countries. The Energy Company of Parana (Copel) (Brazil) performs more than 4 million meter readings (almost entirely of non-smart devices) per month, and we estimate that 850 thousand of them are from dial meters. Therefore, an image-based automatic reading system can reduce human errors, create a proof of reading, and enable the customers to perform the reading themselves through a mobile application. We propose novel approaches for Automatic Dial Meter Reading (ADMR) and introduce a new dataset for ADMR in unconstrained scenarios, called UFPR-ADMR-v2. Our best-performing method combines YOLOv4 with a novel regression approach (AngReg), and explores several postprocessing techniques. Compared to previous works, it decreased the Mean Absolute Error (MAE) from 1,343 to 129 and achieved a meter recognition rate (MRR) of 98.90% -- with an error tolerance of 1 Kilowatt-hour (kWh).
# デカップリングでローカル機能が大幅に改善

Decoupling Makes Weakly Supervised Local Feature Better ( http://arxiv.org/abs/2201.02861v1 )

ライセンス: Link先を確認
Weakly supervised learning can help local feature methods to overcome the obstacle of acquiring a large-scale dataset with densely labeled correspondences. However, since weak supervision cannot distinguish the losses caused by the detection and description steps, directly conducting weakly supervised learning within a joint describe-then-detect pipeline suffers limited performance. In this paper, we propose a decoupled describe-then-detect pipeline tailored for weakly supervised local feature learning. Within our pipeline, the detection step is decoupled from the description step and postponed until discriminative and robust descriptors are learned. In addition, we introduce a line-to-window search strategy to explicitly use the camera pose information for better descriptor learning. Extensive experiments show that our method, namely PoSFeat (Camera Pose Supervised Feature), outperforms previous fully and weakly supervised methods and achieves state-of-the-art performance on a wide range of downstream tasks.
# 分散電力予測のためのxgboostに基づく公平かつ効率的なハイブリッドフェデレーション学習フレームワーク

A Fair and Efficient Hybrid Federated Learning Framework based on XGBoost for Distributed Power Prediction ( http://arxiv.org/abs/2201.02783v1 )

ライセンス: Link先を確認
In a modern power system, real-time data on power generation/consumption and its relevant features are stored in various distributed parties, including household meters, transformer stations and external organizations. To fully exploit the underlying patterns of these distributed data for accurate power prediction, federated learning is needed as a collaborative but privacy-preserving training scheme. However, current federated learning frameworks are polarized towards addressing either the horizontal or vertical separation of data, and tend to overlook the case where both are present. Furthermore, in mainstream horizontal federated learning frameworks, only artificial neural networks are employed to learn the data patterns, which are considered less accurate and interpretable compared to tree-based models on tabular datasets. To this end, we propose a hybrid federated learning framework based on XGBoost, for distributed power prediction from real-time external features. In addition to introducing boosted trees to improve accuracy and interpretability, we combine horizontal and vertical federated learning, to address the scenario where features are scattered in local heterogeneous parties and samples are scattered in various local districts. Moreover, we design a dynamic task allocation scheme such that each party gets a fair share of information, and the computing power of each party can be fully leveraged to boost training efficiency. A follow-up case study is presented to justify the necessity of adopting the proposed framework. The advantages of the proposed framework in fairness, efficiency and accuracy performance are also confirmed.
# lomar: フェデレート学習に対する毒殺防止策

LoMar: A Local Defense Against Poisoning Attack on Federated Learning ( http://arxiv.org/abs/2201.02873v1 )

ライセンス: Link先を確認
Federated learning (FL) provides a high efficient decentralized machine learning framework, where the training data remains distributed at remote clients in a network. Though FL enables a privacy-preserving mobile edge computing framework using IoT devices, recent studies have shown that this approach is susceptible to poisoning attacks from the side of remote clients. To address the poisoning attacks on FL, we provide a \textit{two-phase} defense algorithm called {Lo}cal {Ma}licious Facto{r} (LoMar). In phase I, LoMar scores model updates from each remote client by measuring the relative distribution over their neighbors using a kernel density estimation method. In phase II, an optimal threshold is approximated to distinguish malicious and clean updates from a statistical perspective. Comprehensive experiments on four real-world datasets have been conducted, and the experimental results show that our defense strategy can effectively protect the FL system. {Specifically, the defense performance on Amazon dataset under a label-flipping attack indicates that, compared with FG+Krum, LoMar increases the target label testing accuracy from $96.0\%$ to $98.8\%$, and the overall averaged testing accuracy from $90.1\%$ to $97.0\%$.
# 最適方向を用いた一対の線形多様体の確率的クラスタリング

Provable Clustering of a Union of Linear Manifolds Using Optimal Directions ( http://arxiv.org/abs/2201.02745v1 )

ライセンス: Link先を確認
This paper focuses on the Matrix Factorization based Clustering (MFC) method which is one of the few closed form algorithms for the subspace clustering problem. Despite being simple, closed-form, and computation-efficient, MFC can outperform the other sophisticated subspace clustering methods in many challenging scenarios. We reveal the connection between MFC and the Innovation Pursuit (iPursuit) algorithm which was shown to be able to outperform the other spectral clustering based methods with a notable margin especially when the span of clusters are close. A novel theoretical study is presented which sheds light on the key performance factors of both algorithms (MFC/iPursuit) and it is shown that both algorithms can be robust to notable intersections between the span of clusters. Importantly, in contrast to the theoretical guarantees of other algorithms which emphasized on the distance between the subspaces as the key performance factor and without making the innovation assumption, it is shown that the performance of MFC/iPursuit mainly depends on the distance between the innovative components of the clusters.
# wganの最適1-wasserstein距離

Optimal 1-Wasserstein Distance for WGANs ( http://arxiv.org/abs/2201.02824v1 )

ライセンス: Link先を確認
The mathematical forces at work behind Generative Adversarial Networks raise challenging theoretical issues. Motivated by the important question of characterizing the geometrical properties of the generated distributions, we provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and asymptotic regimes. We study the specific case where the latent space is univariate and derive results valid regardless of the dimension of the output space. We show in particular that for a fixed sample size, the optimal WGANs are closely linked with connected paths minimizing the sum of the squared Euclidean distances between the sample points. We also highlight the fact that WGANs are able to approach (for the 1-Wasserstein distance) the target distribution as the sample size tends to infinity, at a given convergence rate and provided the family of generative Lipschitz functions grows appropriately. We derive in passing new results on optimal transport theory in the semi-discrete setting.
# 空間スペクトル全変動正規化を用いた非凸局所低ランク・スパース分離を用いたハイパースペクトル画像の雑音除去

Hyperspectral Image Denoising Using Non-convex Local Low-rank and Sparse Separation with Spatial-Spectral Total Variation Regularization ( http://arxiv.org/abs/2201.02812v1 )

ライセンス: Link先を確認
In this paper, we propose a novel nonconvex approach to robust principal component analysis for HSI denoising, which focuses on simultaneously developing more accurate approximations to both rank and column-wise sparsity for the low-rank and sparse components, respectively. In particular, the new method adopts the log-determinant rank approximation and a novel $\ell_{2,\log}$ norm, to restrict the local low-rank or column-wisely sparse properties for the component matrices, respectively. For the $\ell_{2,\log}$-regularized shrinkage problem, we develop an efficient, closed-form solution, which is named $\ell_{2,\log}$-shrinkage operator. The new regularization and the corresponding operator can be generally used in other problems that require column-wise sparsity. Moreover, we impose the spatial-spectral total variation regularization in the log-based nonconvex RPCA model, which enhances the global piece-wise smoothness and spectral consistency from the spatial and spectral views in the recovered HSI. Extensive experiments on both simulated and real HSIs demonstrate the effectiveness of the proposed method in denoising HSIs.
# コンピュータビジョンによるUAV作物画像からの農業プラントカタログ作成とデータフレームワークの構築

Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision ( http://arxiv.org/abs/2201.02885v1 )

ライセンス: Link先を確認
UAV-based image retrieval in modern agriculture enables gathering large amounts of spatially referenced crop image data. In large-scale experiments, however, UAV images suffer from containing a multitudinous amount of crops in a complex canopy architecture. Especially for the observation of temporal effects, this complicates the recognition of individual plants over several images and the extraction of relevant information tremendously. In this work, we present a hands-on workflow for the automatized temporal and spatial identification and individualization of crop images from UAVs abbreviated as "cataloging" based on comprehensible computer vision methods. We evaluate the workflow on two real-world datasets. One dataset is recorded for observation of Cercospora leaf spot - a fungal disease - in sugar beet over an entire growing cycle. The other one deals with harvest prediction of cauliflower plants. The plant catalog is utilized for the extraction of single plant images seen over multiple time points. This gathers large-scale spatio-temporal image dataset that in turn can be applied to train further machine learning models including various data layers. The presented approach improves analysis and interpretation of UAV data in agriculture significantly. By validation with some reference data, our method shows an accuracy that is similar to more complex deep learning-based recognition techniques. Our workflow is able to automatize plant cataloging and training image extraction, especially for large datasets.
# DeHIN: 大規模な異種情報ネットワークを組み込む分散型フレームワーク

DeHIN: A Decentralized Framework for Embedding Large-scale Heterogeneous Information Networks ( http://arxiv.org/abs/2201.02757v1 )

ライセンス: Link先を確認
Modeling heterogeneity by extraction and exploitation of high-order information from heterogeneous information networks (HINs) has been attracting immense research attention in recent times. Such heterogeneous network embedding (HNE) methods effectively harness the heterogeneity of small-scale HINs. However, in the real world, the size of HINs grow exponentially with the continuous introduction of new nodes and different types of links, making it a billion-scale network. Learning node embeddings on such HINs creates a performance bottleneck for existing HNE methods that are commonly centralized, i.e., complete data and the model are both on a single machine. To address large-scale HNE tasks with strong efficiency and effectiveness guarantee, we present \textit{Decentralized Embedding Framework for Heterogeneous Information Network} (DeHIN) in this paper. In DeHIN, we generate a distributed parallel pipeline that utilizes hypergraphs in order to infuse parallelization into the HNE task. DeHIN presents a context preserving partition mechanism that innovatively formulates a large HIN as a hypergraph, whose hyperedges connect semantically similar nodes. Our framework then adopts a decentralized strategy to efficiently partition HINs by adopting a tree-like pipeline. Then, each resulting subnetwork is assigned to a distributed worker, which employs the deep information maximization theorem to locally learn node embeddings from the partition it receives. We further devise a novel embedding alignment scheme to precisely project independently learned node embeddings from all subnetworks onto a common vector space, thus allowing for downstream tasks like link prediction and node classification.
# 一神経層を有する深部線形ネットワークのグローバル収束解析

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer ( http://arxiv.org/abs/2201.02761v1 )

ライセンス: Link先を確認
In this paper, we follow Eftekhari's work to give a non-local convergence analysis of deep linear networks. Specifically, we consider optimizing deep linear networks which have a layer with one neuron under quadratic loss. We describe the convergent point of trajectories with arbitrary starting point under gradient flow, including the paths which converge to one of the saddle points or the original point. We also show specific convergence rates of trajectories that converge to the global minimizer by stages. To achieve these results, this paper mainly extends the machinery in Eftekhari's work to provably identify the rank-stable set and the global minimizer convergent set. We also give specific examples to show the necessity of our definitions. Crucially, as far as we know, our results appear to be the first to give a non-local global analysis of linear neural networks from arbitrary initialized points, rather than the lazy training regime which has dominated the literature of neural networks, and restricted benign initialization in Eftekhari's work. We also note that extending our results to general linear networks without one hidden neuron assumption remains a challenging open problem.
# 知識グラフ埋め込みモデルのスケーリング

Scaling Knowledge Graph Embedding Models ( http://arxiv.org/abs/2201.02791v1 )

ライセンス: Link先を確認
Developing scalable solutions for training Graph Neural Networks (GNNs) for link prediction tasks is challenging due to the high data dependencies which entail high computational cost and huge memory footprint. We propose a new method for scaling training of knowledge graph embedding models for link prediction to address these challenges. Towards this end, we propose the following algorithmic strategies: self-sufficient partitions, constraint-based negative sampling, and edge mini-batch training. Both, partitioning strategy and constraint-based negative sampling, avoid cross partition data transfer during training. In our experimental evaluation, we show that our scaling solution for GNN-based knowledge graph embedding models achieves a 16x speed up on benchmark datasets while maintaining a comparable model performance as non-distributed methods on standard metrics.
# PocketNN: 純粋C++における直接フィードバックアライメントとポケットアクティベーションによるニューラルネットワークの整数のみのトレーニングと推論

PocketNN: Integer-only Training and Inference of Neural Networks via Direct Feedback Alignment and Pocket Activations in Pure C++ ( http://arxiv.org/abs/2201.02863v1 )

ライセンス: Link先を確認
Standard deep learning algorithms are implemented using floating-point real numbers. This presents an obstacle for implementing them on low-end devices which may not have dedicated floating-point units (FPUs). As a result, researchers in TinyML have considered machine learning algorithms that can train and run a deep neural network (DNN) on a low-end device using integer operations only. In this paper we propose PocketNN, a light and self-contained proof-of-concept framework in pure C++ for the training and inference of DNNs using only integers. Unlike other approaches, PocketNN directly operates on integers without requiring any explicit quantization algorithms or customized fixed-point formats. This was made possible by pocket activations, which are a family of activation functions devised for integer-only DNNs, and an emerging DNN training algorithm called direct feedback alignment (DFA). Unlike the standard backpropagation (BP), DFA trains each layer independently, thus avoiding integer overflow which is a key problem when using BP with integer-only operations. We used PocketNN to train some DNNs on two well-known datasets, MNIST and Fashion-MNIST. Our experiments show that the DNNs trained with our PocketNN achieved 96.98% and 87.7% accuracies on MNIST and Fashion-MNIST datasets, respectively. The accuracies are very close to the equivalent DNNs trained using BP with floating-point real number operations, such that accuracy degradations were just 1.02%p and 2.09%p, respectively. Finally, our PocketNN has high compatibility and portability for low-end devices as it is open source and implemented in pure C++ without any dependencies.
# 注意に基づくランダム森林と汚染モデル

Attention-based Random Forest and Contamination Model ( http://arxiv.org/abs/2201.02880v1 )

ライセンス: Link先を確認
A new approach called ABRF (the attention-based random forest) and its modifications for applying the attention mechanism to the random forest (RF) for regression and classification are proposed. The main idea behind the proposed ABRF models is to assign attention weights with trainable parameters to decision trees in a specific way. The weights depend on the distance between an instance, which falls into a corresponding leaf of a tree, and instances, which fall in the same leaf. This idea stems from representation of the Nadaraya-Watson kernel regression in the form of a RF. Three modifications of the general approach are proposed. The first one is based on applying the Huber's contamination model and on computing the attention weights by solving quadratic or linear optimization problems. The second and the third modifications use the gradient-based algorithms for computing trainable parameters. Numerical experiments with various regression and classification datasets illustrate the proposed method.
# ディープニューラルネットワーク分類器を用いた医用画像の分割に対するスニーク攻撃

A Sneak Attack on Segmentation of Medical Images Using Deep Neural Network Classifiers ( http://arxiv.org/abs/2201.02771v1 )

ライセンス: Link先を確認
Instead of using current deep-learning segmentation models (like the UNet and variants), we approach the segmentation problem using trained Convolutional Neural Network (CNN) classifiers, which automatically extract important features from classified targets for image classification. Those extracted features can be visualized and formed heatmaps using Gradient-weighted Class Activation Mapping (Grad-CAM). This study tested whether the heatmaps could be used to segment the classified targets. We also proposed an evaluation method for the heatmaps; that is, to re-train the CNN classifier using images filtered by heatmaps and examine its performance. We used the mean-Dice coefficient to evaluate segmentation results. Results from our experiments show that heatmaps can locate and segment partial tumor areas. But only use of the heatmaps from CNN classifiers may not be an optimal approach for segmentation. In addition, we have verified that the predictions of CNN classifiers mainly depend on tumor areas, and dark regions in Grad-CAM's heatmaps also contribute to classification.
翻訳日:2022-01-11 14:21:41 公開日:2022-01-08