Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230403となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 生理・医療系ニューラルネットワークにおけるモデル説明可能性 Model Explainability in Physiological and Healthcare-based Neural Networks ( http://arxiv.org/abs/2304.14495v1 ) ライセンス: Link先を確認	Rohit Sharma, Abhinav Gupta, Arnav Gupta, Bo Li	(参考訳) spo2の推定とモニタリングは肺機能の評価と慢性肺疾患の治療に不可欠である。新型コロナウイルス(covid-19)のパンデミックは、spo2の変化を早期に発見することの重要性を強調した。しかし,従来のSpO2測定法は接触式センシングに頼っており,手足灌流障害患者のクロス汚染や合併症のリスクが指摘されている。加えて、パルスオキシメータは、地域社会や未開発国では利用できない。これらの制限に対処し、より快適で控えめなSpO2モニタリング方法を提供するため、最近の研究では、ビデオを用いたSpO2測定について研究されている。しかし,特にスマートフォンのカメラを用いたSpO2測定は,生理学的信号の弱さと,スマートフォンカメラセンサの光学選択性低下により困難である。システムには3つの主要なステップがある。 1) スマートフォンで撮影したビデオから手のひらと背中を含む関心領域(ROI)を抽出すること。 2)R,G,B時系列を生成するためのROIの空間平均化 3) 時系列を光生理学的に誘発されたCNNに入力し, SpO2推定を行った。提案手法は,消費者のスマートフォンから撮影したビデオを用いて,より効率的かつ正確なSpO2モニタリングを行う方法であり,遠隔医療や健康診断に特に有用である。 The estimation and monitoring of SpO2 are crucial for assessing lung function and treating chronic pulmonary diseases. The COVID-19 pandemic has highlighted the importance of early detection of changes in SpO2, particularly in asymptomatic patients with clinical deterioration. However, conventional SpO2 measurement methods rely on contact-based sensing, presenting the risk of cross-contamination and complications in patients with impaired limb perfusion. Additionally, pulse oximeters may not be available in marginalized communities and undeveloped countries. To address these limitations and provide a more comfortable and unobtrusive way to monitor SpO2, recent studies have investigated SpO2 measurement using videos. However, measuring SpO2 using cameras in a contactless way, particularly from smartphones, is challenging due to weaker physiological signals and lower optical selectivity of smartphone camera sensors. The system includes three main steps: 1) extraction of the region of interest (ROI), which includes the palm and back of the hand, from the smartphone-captured videos; 2) spatial averaging of the ROI to produce R, G, and B time series; and 3) feeding the time series into an optophysiology-inspired CNN for SpO2 estimation. Our proposed method can provide a more efficient and accurate way to monitor SpO2 using videos captured from consumer-grade smartphones, which can be especially useful in telehealth and health screening settings.	翻訳日:2023-05-07 16:21:59 公開日:2023-04-03
# Transformer-based interpretable multi-modal data fusion による皮膚病変分類 Transformer-based interpretable multi-modal data fusion for skin lesion classification ( http://arxiv.org/abs/2304.14505v1 ) ライセンス: Link先を確認	Theodor Cheslerean-Boghiu, Melia-Evelina Fleischmann, Theresa Willem, Tobias Lasser	(参考訳) 近年、多くのディープラーニング(dl)研究が、他の要因に関わらず定量的指標の改善に重点を置いている。皮膚科における皮膚病変分類のようなヒト中心のアプリケーションでは、dl駆動の臨床決定支援システムは、意思決定プロセスの透明性が限られているため、まだ初期段階にある。さらに、訓練されたDLアルゴリズムの動作を説明するための手順の欠如は、臨床医の信頼をほとんど得られない。皮膚病変の診断には、皮膚科医は疾患の視覚的評価と患者の麻酔から収集されたデータの両方に依存している。マルチモーダルデータを扱うデータ駆動アルゴリズムは、畳み込みアーキテクチャに必要な特徴レベルと決定レベルの融合手順の分離によって制限される。この問題に対処するため,トランスフォーマーアーキテクチャのアテンション機構を介し,単一段階のマルチモーダルデータ融合を実現し,皮膚疾患の診断に役立てる。本手法は,画像リッチおよび患者データリッチ環境において,最先端のシングルモーダルかつマルチモーダルなDLアーキテクチャを上回る。さらに、アーキテクチャの選択により、イメージドメインとメタデータドメインの両方で、追加の修正を必要とせずに、分類タスクのネイティブ解釈サポートが可能になる。 A lot of deep learning (DL) research these days is mainly focused on improving on quantitative metrics regardless of other factors. In human centered applications, like skin lesion classification in dermatology, DL-driven clinical decision support systems are still in their infancy due to the limited transparency of their decision-making process. Moreover, the lack of procedures that can explain the behavior of trained DL algorithms leads to almost no trust from the clinical physicians. To diagnose skin lesions, dermatologists rely on both visual assessment of the disease and the data gathered from the anamnesis of the patient. Data-driven algorithms dealing with multi-modal data are limited by the separation of feature-level and decision-level fusion procedures required by convolutional architectures. To address this issue, we enable single-stage multi-modal data fusion via the attention mechanism of transformer-based architectures to aid in the diagnosis of skin diseases. Our method beats other state-of-the-art single- and multi-modal DL architectures in both image rich and patient-data rich environments. Additionally, the choice of the architecture enables native interpretability support for the classification task both in image and metadata domain with no additional modifications necessary.	翻訳日:2023-05-07 16:13:11 公開日:2023-04-03
# コンパクト支持型OEP型バランス型デュアルマルチフレームレットの構造評価 A structural characterization of Compactly Supported OEP-based balanced dual multiframelets ( http://arxiv.org/abs/2305.01641v1 ) ライセンス: Link先を確認	Ran Lu	(参考訳) スカラーフレームレットと比較して、ジェネレータに対する比較的小さなサポート、高消滅モーメントなど、マルチフレームレットには一定の利点がある。マルチフレームのバランス特性は非常に望ましいものであり、それに対応する離散的マルチフレーム変換の下でベクトル値データがどのように効率的に処理できるかを反映している。バランスの取れたマルチフレームを研究対象とする文献の多くは、関数設定の観点からいるが、マルチフレームフィルタバンクの観点からのアプローチはほとんどない。本稿では,斜め拡張原理(OEP)の観点から,バランスの取れたデュアル・マルチフレームの構造的特徴について考察する。 OEPはフレームレットとフィルタバンクを自然に接続するので、フレームレットの特性を分析するのに非常に便利なツールです。 OEPにより、我々は、コンパクトに支持されたバランスの取れたデュアル・マルチフレームを、バランスの取れたモーメント補正フィルタの概念によって特徴付ける。本稿は、バランスの取れたデュアルフレームレットが持つ重要な構造について、最も一般的な設定で示し、バランスのとれたマルチフレームレットとその基盤となる離散マルチフレーム変換を理解するための、より完全な図形を提供する。 Compared to scalar framelets, multiframelets have certain advantages, such as relatively smaller supports on generators, high vanishing moments, etc. The balancing property of multiframelets is very desired, as it reflects how efficient vector-valued data can be processed under the corresponding discrete multiframelet transform. Most of the literature studying balanced multiframelets is from the point of view of the function setting, but very few approaches are from the aspect of multiframelet filter banks. In this paper, we study structural characterizations of balanced dual multiframelets from the point of view of the Oblique Extension Principle (OEP). The OEP naturally connects framelets with filter banks, which makes it a very handy tool for analyzing the properties of framelets. With the OEP, we shall characterize compactly supported balanced dual multiframemets through the concept of balanced moment correction filters, which is the key notion that will be introduced in our investigation. The results of this paper demonstrate what essential structures a balanced dual multiframelet has in the most general setting, and bring us a more complete picture to understand balanced multiframelets and their underlying discrete multiframelet transforms.	翻訳日:2023-05-07 15:53:52 公開日:2023-04-03
# RLサイバー操作エージェントのためのマルチエージェントサイバーバトルシム A Multiagent CyberBattleSim for RL Cyber Operation Agents ( http://arxiv.org/abs/2304.11052v1 ) ライセンス: Link先を確認	Thomas Kunz, Christian Fisher, James La Novara-Gsell, Christopher Nguyen, Li Li	(参考訳) サイバー物理的資産の強化は重要かつ労働集約的である。近年、機械学習(ml)と強化学習(rl)は、重要な人間の洞察/知性を必要とするタスクを自動化できることを特に示しています。自律的なrlエージェントの開発には、さまざまな選択肢、特に攻撃者や防御者を陥れるようなトレーニングシナリオの配置方法を迅速に評価できる、適切なトレーニング環境が必要です。 CyberBattleSimは、レッドエージェント、すなわち攻撃者のトレーニングをサポートするトレーニング環境である。ブルーエージェント、すなわちディフェンダーを訓練する能力を追加しました。本論文は,ブルーエージェントを単独またはレッドエージェントと共同で訓練した際に得られた結果について,我々の変化と報告について述べる。その結果,ブルーエージェントの訓練は攻撃に対する防御力を高めることが判明した。特に、青色剤と赤色剤を併用する訓練は、洗練された赤色剤を阻害するブルー剤の能力を高める。 Hardening cyber physical assets is both crucial and labor-intensive. Recently, Machine Learning (ML) in general and Reinforcement Learning RL) more specifically has shown great promise to automate tasks that otherwise would require significant human insight/intelligence. The development of autonomous RL agents requires a suitable training environment that allows us to quickly evaluate various alternatives, in particular how to arrange training scenarios that pit attackers and defenders against each other. CyberBattleSim is a training environment that supports the training of red agents, i.e., attackers. We added the capability to train blue agents, i.e., defenders. The paper describes our changes and reports on the results we obtained when training blue agents, either in isolation or jointly with red agents. Our results show that training a blue agent does lead to stronger defenses against attacks. In particular, training a blue agent jointly with a red agent increases the blue agent's capability to thwart sophisticated red agents.	翻訳日:2023-04-30 08:04:57 公開日:2023-04-03
# 制御行動模倣のための生成的adversarial neuroevolution Generative Adversarial Neuroevolution for Control Behaviour Imitation ( http://arxiv.org/abs/2304.12432v1 ) ライセンス: Link先を確認	Maximilien Le Clei, Pierre Bellec	(参考訳) 最近の模倣学習への関心は高まり、複雑なタスクでエージェントを訓練するために、巨大な人間のビデオゲームとロボット操作データセットが使われている。近年、深層神経進化は様々な強化学習問題における勾配に基づく技術の性能と一致することが示されているが、深層神経進化技術の模倣学習への応用はいまだに未解明である。本研究では,一般的なシミュレーション環境における行動模倣に深部神経進化が有効かどうかを検討する。我々は,OpenAI Gymの8つの状態ベース制御タスク上で,最先端のエージェントを模倣するために,標準的な深部リカレントネットワークを進化させ,その能力を評価する。あらゆる課題において、訓練済みのエージェントが獲得したスコアよりも高いスコアを達成できる最後のエリートアクターが、スコアの軌跡に忠実に追従しているのが分かる。以上の結果から,神経進化は行動エージェントの正確なエミュレーションを実現するための深層学習技術に重要な付加物となる可能性が示唆された。私たちのアプローチの汎用性とシンプルさは、ますます複雑な設定で複雑な振る舞いを模倣する道を開くと信じています。我々はgithub.com/MaximilienLC/ganeでソースコードとモデルチェックポイントと結果を提供しています。 There is a recent surge in interest for imitation learning, with large human video-game and robotic manipulation datasets being used to train agents on very complex tasks. While deep neuroevolution has recently been shown to match the performance of gradient-based techniques on various reinforcement learning problems, the application of deep neuroevolution techniques to imitation learning remains relatively unexplored. In this work, we propose to explore whether deep neuroevolution can be used for behaviour imitation on popular simulation environments. We introduce a simple co-evolutionary adversarial generation framework, and evaluate its capabilities by evolving standard deep recurrent networks to imitate state-of-the-art pre-trained agents on 8 OpenAI Gym state-based control tasks. Across all tasks, we find the final elite actor agents capable of achieving scores as high as those obtained by the pre-trained agents, all the while closely following their score trajectories. Our results suggest that neuroevolution could be a valuable addition to deep learning techniques to produce accurate emulation of behavioural agents. We believe that the generality and simplicity of our approach opens avenues for imitating increasingly complex behaviours in increasingly complex settings, e.g. human behaviour in real-world settings. We provide our source code, model checkpoints and results at github.com/MaximilienLC/gane.	翻訳日:2023-04-30 07:29:03 公開日:2023-04-03
# 制御課題における繰り返しアーキテクチャの神経進化 Neuroevolution of Recurrent Architectures on Control Tasks ( http://arxiv.org/abs/2304.12431v1 ) ライセンス: Link先を確認	Maximilien Le Clei, Pierre Bellec	(参考訳) 現代の人工知能の研究は通常、勾配に基づく最適化技術を用いて固定サイズのディープニューラルネットワークのパラメータを訓練する。単純な進化アルゴリズムは、強化学習の設定など、勾配に基づく技術のパフォーマンスにマッチする時に、ディープニューラルネットワークパラメータを最適化する能力も示されている。ネットワークパラメータの最適化に加えて、多くの進化的計算技術もネットワークアーキテクチャを段階的に構築することができる。しかし、基本的な進化規則からネットワークアーキテクチャを構築することは、現代の強化学習ベンチマークにスケールすることがまだ示されていない。そこで本研究では, 再帰型ニューラルネットワークのアーキテクチャを, 少数の突然変異規則に従って動的に進化させる手法を提案する。我々は並列な進化的アルゴリズムを実装し、19のOpenAI Gym状態に基づく強化学習制御タスクで実験を行う。ほとんどの場合、動的エージェントは、パラメータの桁数を桁違いに減らしながら、勾配に基づくエージェントのパフォーマンスを一致または超過する。我々は、ネットワークのコンパクトさと自律設計が重要である実生活のアプリケーションへの道を開く努力を信じている。私たちはgithub.com/MaximilienLC/nraでソースコードと最終モデルチェックポイントと完全な結果を提供しています。 Modern artificial intelligence works typically train the parameters of fixed-sized deep neural networks using gradient-based optimization techniques. Simple evolutionary algorithms have recently been shown to also be capable of optimizing deep neural network parameters, at times matching the performance of gradient-based techniques, e.g. in reinforcement learning settings. In addition to optimizing network parameters, many evolutionary computation techniques are also capable of progressively constructing network architectures. However, constructing network architectures from elementary evolution rules has not yet been shown to scale to modern reinforcement learning benchmarks. In this paper we therefore propose a new approach in which the architectures of recurrent neural networks dynamically evolve according to a small set of mutation rules. We implement a massively parallel evolutionary algorithm and run experiments on all 19 OpenAI Gym state-based reinforcement learning control tasks. We find that in most cases, dynamic agents match or exceed the performance of gradient-based agents while utilizing orders of magnitude fewer parameters. We believe our work to open avenues for real-life applications where network compactness and autonomous design are of critical importance. We provide our source code, final model checkpoints and full results at github.com/MaximilienLC/nra.	翻訳日:2023-04-30 07:28:40 公開日:2023-04-03
# 忠実性ベンチマーク:視覚言語タスクにおける正確な自然言語説明に向けて Benchmarking Faithfulness: Towards Accurate Natural Language Explanations in Vision-Language Tasks ( http://arxiv.org/abs/2304.08174v1 ) ライセンス: Link先を確認	Jakob Ambsdorf	(参考訳) ディープニューラルモデルが日々の生活に浸透するにつれ、彼らの意思決定について透明で理解可能な説明が必要になる。しかし,これまで開発されたほとんどの説明手法は,日常ユーザにとって直感的に理解できない。対照的に、自然言語の説明(NLE)は、モデルの意思決定を容易に理解可能な方法でコミュニケーション可能にすることを約束する。現在のモデルは説得力のある説明を生み出すことに成功したが、NLEが実際にモデルの推論過程(忠実性と呼ばれる性質)をいかにうまく表現しているかは、明らかな疑問である。忠実度を測定するためのメトリクスの開発は、より忠実なモデルを設計するために重要であるが、現在のメトリクスはNLEに適用できないか、複数のモダリティで異なるモデルアーキテクチャを比較するように設計されていない。忠実度尺度の先行研究と詳細な理論的根拠に基づいて、帰属相似性、NLE相似性、NLE-包括性という3つの忠実度指標を提案する。本手法の有効性は,評価された説明忠実度の変化を期待する実演e-UGモデルに体系的に修正を加えることで,視覚言語NLE生成のためのe-ViLベンチマークのVQA-Xおよびe-SNLI-VEデータセットを用いて評価する。 e-snli-veデータセットでは,e-ugの説明生成モジュールへの冗長入力の削除が,帰属相似性によって測定された言語的モダリティに対するモデルの忠実性を高めることを示した。さらに,NLE-Sufficiency と -Comprehensiveness は必ずしも属性-相似性と相関しないことを示した。 With deep neural models increasingly permeating our daily lives comes a need for transparent and comprehensible explanations of their decision-making. However, most explanation methods that have been developed so far are not intuitively understandable for lay users. In contrast, natural language explanations (NLEs) promise to enable the communication of a model's decision-making in an easily intelligible way. While current models successfully generate convincing explanations, it is an open question how well the NLEs actually represent the reasoning process of the models - a property called faithfulness. Although the development of metrics to measure faithfulness is crucial to designing more faithful models, current metrics are either not applicable to NLEs or are not designed to compare different model architectures across multiple modalities. Building on prior research on faithfulness measures and based on a detailed rationale, we address this issue by proposing three faithfulness metrics: Attribution-Similarity, NLE-Sufficiency, and NLE-Comprehensiveness. The efficacy of the metrics is evaluated on the VQA-X and e-SNLI-VE datasets of the e-ViL benchmark for vision-language NLE generation by systematically applying modifications to the performant e-UG model for which we expect changes in the measured explanation faithfulness. We show on the e-SNLI-VE dataset that the removal of redundant inputs to the explanation-generation module of e-UG successively increases the model's faithfulness on the linguistic modality as measured by Attribution-Similarity. Further, our analysis demonstrates that NLE-Sufficiency and -Comprehensiveness are not necessarily correlated to Attribution-Similarity, and we discuss how the two metrics can be utilized to gain further insights into the explanation generation process.	翻訳日:2023-04-23 04:25:22 公開日:2023-04-03
# OutCenTR:高次元データセットにおける脆弱性の悪用を予測するための新しい半教師付きフレームワーク OutCenTR: A novel semi-supervised framework for predicting exploits of vulnerabilities in high-dimensional datasets ( http://arxiv.org/abs/2304.10511v1 ) ライセンス: Link先を確認	Hadi Eskandari, Michael Bewong, Sabih ur Rehman	(参考訳) 毎日、ますます増加する脆弱性が報告されている。しかし、これらの脆弱性はすべて同じではない。悪用される脆弱性の可能性を正しく見積もることは、システム管理者にとって重要なタスクです。これは、システム管理者が適切な脆弱性の優先順位付けとパッチを行うのに役立つ。我々の研究は、National Vulnerability Databaseのような高度に不均衡な高次元データセットで利用される可能性のある脆弱性を予測するために、外れ値検出技術を利用している。本稿では,ベースライン外乱検出モデルを強化する次元削減手法であるOutCenTRを提案する。さらに,4つのベンチマークと12の合成データセットを用いて,OutCenTRの有効性と効率を実証的に示す。実験の結果,PCA や GRP といった最先端の次元減少技術と比較して,F1 スコアの平均は5倍向上した。 An ever-growing number of vulnerabilities are reported every day. Yet these vulnerabilities are not all the same; Some are more targeted than others. Correctly estimating the likelihood of a vulnerability being exploited is a critical task for system administrators. This aids the system administrators in prioritizing and patching the right vulnerabilities. Our work makes use of outlier detection techniques to predict vulnerabilities that are likely to be exploited in highly imbalanced and high-dimensional datasets such as the National Vulnerability Database. We propose a dimensionality reduction technique, OutCenTR, that enhances the baseline outlier detection models. We further demonstrate the effectiveness and efficiency of OutCenTR empirically with 4 benchmark and 12 synthetic datasets. The results of our experiments show on average a 5-fold improvement of F1 score in comparison with state-of-the-art dimensionality reduction techniques such as PCA and GRP.	翻訳日:2023-04-23 03:59:12 公開日:2023-04-03
# グローバル量子クロックに結合した非相互作用系の重力ポテンシャルと時間拡張の発生 Emergence of Gravitational Potential and Time Dilation from Non-interacting Systems Coupled to a Global Quantum Clock ( http://arxiv.org/abs/2304.01263v1 ) ライセンス: Link先を確認	Ashmeet Singh and Oliver Friedrich	(参考訳) 時間座標をグローバルな量子自由度としてモデル化した時間座標と、局所的な量子「時計」として機能する内部自由度によってモデル化された物理系の適切な時間という2つのバージョンを考慮し、量子力学のリレーショナル時間定式化における重力バックリアクションを研究する。我々は,地球規模のホイーラー・デウィット型制約における座標時間と質量エネルギーの相互作用が重力時間拡張につながることを示した。巨大な対象が存在する場合、これはシュワルツシルト計量における時間拡張に一致する。さらに、2つの粒子が時間座標に独立に結合すると、これらの粒子間のニュートンの重力相互作用が低エネルギー限界で現れることを示す。また、高エネルギーダイバージェンスの再正規化の特徴も観察する。 We study gravitational back-reaction within relational time formulations of quantum mechanics by considering two versions of time: a time coordinate, modelled as a global quantum degree of freedom, and the proper time of a given physical system, modelled via an internal degree of freedom serving as a local quantum "clock". We show that interactions between coordinate time and mass-energy in a global Wheeler-DeWitt-like constraint lead to gravitational time dilation. In the presence of a massive object this agrees with time dilation in a Schwarzchild metric at leading order in $G$. Furthermore, if two particles couple independently to the time coordinate we show that Newtonian gravitational interaction between those particles emerges in the low energy limit. We also observe features of renormalization of high energy divergences.	翻訳日:2023-04-16 22:41:17 公開日:2023-04-03
# 人間行動分析のための実産業作業と伝統工芸品のモーションキャプチャベンチマーク Motion Capture Benchmark of Real Industrial Tasks and Traditional Crafts for Human Movement Analysis ( http://arxiv.org/abs/2304.03771v1 ) ライセンス: Link先を確認	Brenda Elizabeth Olivas-Padilla, Alina Glushkova and Sotiris Manitsaris	(参考訳) 人間の動き分析は、ロボット工学、バイオメカニクス、データサイエンスにおける重要な研究分野である。トラッキング、姿勢推定、運動合成などを含む。長年にわたって多くの方法論が発展してきたが、これらの手法の体系的かつ定量的な評価は、人間の3次元運動の検証可能な真理データを用いても必要である。本稿では,慣性に基づくモーションキャプチャを用いて記録した7つのデータセットについて述べる。データセットには、現場で実環境で行われる工業従事者や熟練職人によるプロフェッショナルなジェスチャーが含まれている。データセットは人間の動作モデリング、分析、生成の研究に使用されることを意図して作成された。データ収集のためのプロトコルを詳細に記述し、収集したデータの予備分析をベンチマークとして提供する。 Gesture Operational Modelは、運動記述子に基づくハイブリッド確率的バイオメカニカルアプローチであり、専門家の動きのダイナミクスをモデル化し、運動軌跡の数学的表現を作成して、身体のデキスタリティを分析し定量化する。このモデルでは、人間のプロのポーズを正確に生成することができ、作業のパフォーマンスを通じて、身体の関節がどのように協力し変化するかを直感的に記述することができた。 Human movement analysis is a key area of research in robotics, biomechanics, and data science. It encompasses tracking, posture estimation, and movement synthesis. While numerous methodologies have evolved over time, a systematic and quantitative evaluation of these approaches using verifiable ground truth data of three-dimensional human movement is still required to define the current state of the art. This paper presents seven datasets recorded using inertial-based motion capture. The datasets contain professional gestures carried out by industrial operators and skilled craftsmen performed in real conditions in-situ. The datasets were created with the intention of being used for research in human motion modeling, analysis, and generation. The protocols for data collection are described in detail, and a preliminary analysis of the collected data is provided as a benchmark. The Gesture Operational Model, a hybrid stochastic-biomechanical approach based on kinematic descriptors, is utilized to model the dynamics of the experts' movements and create mathematical representations of their motion trajectories for analysis and quantifying their body dexterity. The models allowed accurate the generation of human professional poses and an intuitive description of how body joints cooperate and change over time through the performance of the task.	翻訳日:2023-04-16 22:35:51 公開日:2023-04-03
# 自己回帰拡散モデルによる制御可能な運動合成と再構成 Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models ( http://arxiv.org/abs/2304.04681v1 ) ライセンス: Link先を確認	Wenjie Yin, Ruibo Tu, Hang Yin, Danica Kragic, Hedvig Kjellstr\"om, M{\aa}rten Bj\"orkman	(参考訳) データ駆動および制御可能な人間のモーション合成と予測は、インタラクティブメディアとソーシャルロボティクスにおける様々な応用を含む活発な研究分野である。これらの分野には、過去の観察や不完全なポーズを扱う様々な動きを生み出すための課題が残っている。本稿では、他のモードの制御コンテキストに条件付された動き列上の自己回帰的確率拡散モデルであるMoDiffを紹介する。本モデルでは、モーダルトランスフォーマーエンコーダとトランスフォーマーベースのデコーダを統合し、運動と制御の時間的相関を捉えるのに有効である。また,よりリッチなデータ表現とロバストな生成を実現するために,拡散転送プロセスに基づく新しいデータドロップアウト手法を導入する。記録データに近い高忠実度動きの頑健な合成と再構成のための拡散データドロップアウトの利点を示すため, 2つのベースラインに対する移動の制御可能な動作合成におけるMoDiffの優れた性能を示す。 Data-driven and controllable human motion synthesis and prediction are active research areas with various applications in interactive media and social robotics. Challenges remain in these fields for generating diverse motions given past observations and dealing with imperfect poses. This paper introduces MoDiff, an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities. Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities. We also introduce a new data dropout method based on the diffusion forward process to provide richer data representations and robust generation. We demonstrate the superior performance of MoDiff in controllable motion synthesis for locomotion with respect to two baselines and show the benefits of diffusion data dropout for robust synthesis and reconstruction of high-fidelity motion close to recorded data.	翻訳日:2023-04-16 22:24:22 公開日:2023-04-03
# Astroformer: 分類に必要なのはデータだけではない Astroformer: More Data Might Not be All You Need for Classification ( http://arxiv.org/abs/2304.05350v1 ) ライセンス: Link先を確認	Rishit Dagli	(参考訳) 自然言語処理やコンピュータビジョンなどの分野の最近の進歩は、膨大な量の未ラベルまたは部分的にラベル付けされたデータを用いて訓練された複雑で大規模なモデルに依存しており、これらの最先端の手法をリソース制約環境にデプロイすることは困難である。銀河形態学は銀河の形成と進化の過程を理解するために重要である。銀河の形態を分類する効率的な方法は、現代の天文学調査から物理情報を抽出するために必要である。本稿では,少ない量のデータから学習する方法を提案する。我々はCoAtNetとMaxViTの成功から多くのインスピレーションを得たハイブリッドトランスフォーマー・畳み込みアーキテクチャを提案する。具体的には、トランスフォーマー-畳み込みハイブリッドと、ネットワークのための新しいスタック設計、相対的な自己アテンション層を作成する異なる方法、およびデータ拡張と正規化の慎重な選択と組み合わせる。我々のアプローチでは、ギャラクシー10デカルスデータセット上の画像から銀河の形態を予測する新しい最先端の手法が設定されている。科学の目的であり、17736枚のラベル付き画像からなり、その精度は914.86ドルである。さらに、このアプローチはCIFAR-100とTiny ImageNetの新たな最先端も設定する。また、大きなデータセットに使用するモデルやトレーニング手法は、低データ環境ではうまく動作しないことが多いことが分かりました。私たちのコードとモデルはカンファレンスの前の後でリリースされます。 Recent advancements in areas such as natural language processing and computer vision rely on intricate and massive models that have been trained using vast amounts of unlabelled or partly labeled data and training or deploying these state-of-the-art methods to resource constraint environments has been a challenge. Galaxy morphologies are crucial to understanding the processes by which galaxies form and evolve. Efficient methods to classify galaxy morphologies are required to extract physical information from modern-day astronomy surveys. In this paper, we introduce methods to learn from less amounts of data. We propose using a hybrid transformer-convolutional architecture drawing much inspiration from the success of CoAtNet and MaxViT. Concretely, we use the transformer-convolutional hybrid with a new stack design for the network, a different way of creating a relative self-attention layer, and pair it with a careful selection of data augmentation and regularization techniques. Our approach sets a new state-of-the-art on predicting galaxy morphologies from images on the Galaxy10 DECals dataset, a science objective, which consists of 17736 labeled images achieving $94.86\%$ top-$1$ accuracy, beating the current state-of-the-art for this task by $4.62\%$. Furthermore, this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet. We also find that models and training methods used for larger datasets would often not work very well in the low-data regime. Our code and models will be released at a later date before the conference.	翻訳日:2023-04-16 22:16:06 公開日:2023-04-03
# ナレッジ蒸留グラフニューラルネットワークによるてんかん発作のパーソナライズ Knowledge-Distilled Graph Neural Networks for Personalized Epileptic Seizure Detection ( http://arxiv.org/abs/2304.06038v1 ) ライセンス: Link先を確認	Qinyue Zheng, Arun Venkitaraman, Simona Petravic, and Pascal Frossard	(参考訳) 発作モニタリングのためのウェアラブルデバイスはてんかん患者の生活の質を著しく向上させる可能性がある。しかし、電脳波(EEG)の完全な電極セットに依存する既存のソリューションは、毎日の使用には不都合である可能性がある。そこで,本研究では,全電極から学習した高精細な検知器(教師と呼ぶ)から知識を伝達し,新しい検出器(学生と呼ぶ)を学習するための新しい知識蒸留手法を提案する。どちらも軽量な実装を提供し、脳波記録に必要な電極数を著しく削減している。本稿では,教師と生徒の発作検知器がグラフニューラルネットワーク(gnn)である場合について考察する。私たちは2つのケースを考えます (a)事前選択されたチャンネルを用いて全患者について一人の学生が学ぶ場合 b) Gumbelsoftmaxアプローチを用いて個別のチャンネル選択を行い,各患者に対して個別の学習を行う場合。テンプル大学病院脳波データコーパス(TUSZ)を用いた実験では,脳波の少ない患者において,知識蒸留とパーソナライゼーションの両方が発作検出の性能向上に重要な役割を果たすことが示された。我々は,2つのチャネルを数えることで,競争性のある発作検出性能が得られることを確認した。これは、記録が少なくても、発作を個別に監視するウェアラブルデバイスの、より現実的なシナリオにおける私たちのアプローチの可能性を示している。 Wearable devices for seizure monitoring detection could significantly improve the quality of life of epileptic patients. However, existing solutions that mostly rely on full electrode set of electroencephalogram (EEG) measurements could be inconvenient for every day use. In this paper, we propose a novel knowledge distillation approach to transfer the knowledge from a sophisticated seizure detector (called the teacher) trained on data from the full set of electrodes to learn new detectors (called the student). They are both providing lightweight implementations and significantly reducing the number of electrodes needed for recording the EEG. We consider the case where the teacher and the student seizure detectors are graph neural networks (GNN), since these architectures actively use the connectivity information. We consider two cases (a) when a single student is learnt for all the patients using preselected channels; and (b) when personalized students are learnt for every individual patient, with personalized channel selection using a Gumbelsoftmax approach. Our experiments on the publicly available Temple University Hospital EEG Seizure Data Corpus (TUSZ) show that both knowledge-distillation and personalization play significant roles in improving performance of seizure detection, particularly for patients with scarce EEG data. We observe that using as few as two channels, we are able to obtain competitive seizure detection performance. This, in turn, shows the potential of our approach in more realistic scenario of wearable devices for personalized monitoring of seizures, even with few recordings.	翻訳日:2023-04-16 22:07:02 公開日:2023-04-03
# 深層q学習による量的取引 Quantitative Trading using Deep Q Learning ( http://arxiv.org/abs/2304.06037v1 ) ライセンス: Link先を確認	Soumyadip Sarkar	(参考訳) 強化学習(Reinforcement Learning、RL)は、ロボット工学、ゲームプレイ、自律システムなど、さまざまな用途で使用されている機械学習の分野である。近年、RLを量的トレーディングに適用することへの関心が高まっており、金融市場で利益のあるトレーディングを行うことが目標となっている。本稿では,量的取引におけるRLの利用について検討し,RLに基づく取引アルゴリズムのケーススタディを示す。その結果,RLは量的トレーディングの強力なツールであり,従来のトレーディングアルゴリズムより優れている可能性が示唆された。量的取引における強化学習の利用は、より高度で効果的な取引システムの開発につながる可能性がある研究の有望な領域である。将来の研究は、代替強化学習アルゴリズムの使用を探求し、追加のデータソースを取り入れ、異なるアセットクラスでシステムをテストする。本研究は, 定量的取引における強化学習の可能性を示し, この分野における継続的な研究・開発の重要性を強調した。より洗練された効果的な取引システムを開発することで、金融市場の効率を改善し、投資家のリターンを高めることができる。 Reinforcement learning (RL) is a branch of machine learning that has been used in a variety of applications such as robotics, game playing, and autonomous systems. In recent years, there has been growing interest in applying RL to quantitative trading, where the goal is to make profitable trades in financial markets. This paper explores the use of RL in quantitative trading and presents a case study of a RL-based trading algorithm. The results show that RL can be a powerful tool for quantitative trading, and that it has the potential to outperform traditional trading algorithms. The use of reinforcement learning in quantitative trading represents a promising area of research that can potentially lead to the development of more sophisticated and effective trading systems. Future work could explore the use of alternative reinforcement learning algorithms, incorporate additional data sources, and test the system on different asset classes. Overall, our research demonstrates the potential of using reinforcement learning in quantitative trading and highlights the importance of continued research and development in this area. By developing more sophisticated and effective trading systems, we can potentially improve the efficiency of financial markets and generate greater returns for investors.	翻訳日:2023-04-16 22:06:35 公開日:2023-04-03
# 大規模画像テキスト(LIT)モデルを用いたCTマルチタスク学習 CT Multi-Task Learning with a Large Image-Text (LIT) Model ( http://arxiv.org/abs/2304.02649v1 ) ライセンス: Link先を確認	Chuang Niu and Ge Wang	(参考訳) 大規模言語モデル(LLM)は、複数の言語タスクをパワーアップするだけでなく、異なる空間にまたがる汎用インターフェースとしても機能する。これまでのところ、コンピュータビジョン分野におけるllmの成功を、高次元およびマルチモーダルな医療画像を含む医療画像分野に効果的に翻訳する方法は、まだ実証されていない。本稿では,LLMとLIMを併用した肺がん診断のためのマルチタスクCT大画像テキスト(LIT)モデルの構築の可能性について報告する。具体的には、LLMとLIMをエンコーダとして、マルチソース情報とタスク固有の患者固有の先行情報を相乗化して、最適な診断性能を実現するタスク固有のテキストプロンプトに基づいてマルチモーダル情報を知覚する。 LITモデルとそれに関連する技術の重要な要素を3次元肺CT解析に重点を置いて評価した。肺の分節, 肺結節の検出, 肺がんの分類など, LIT モデルが複数の医療業務をうまく遂行していることを示す。多様な応用における優れた医用画像と最適な患者結果のための大規模画像言語モデルの開発が進行中である。 Large language models (LLM) not only empower multiple language tasks but also serve as a general interface across different spaces. Up to now, it has not been demonstrated yet how to effectively translate the successes of LLMs in the computer vision field to the medical imaging field which involves high-dimensional and multi-modal medical images. In this paper, we report a feasibility study of building a multi-task CT large image-text (LIT) model for lung cancer diagnosis by combining an LLM and a large image model (LIM). Specifically, the LLM and LIM are used as encoders to perceive multi-modal information under task-specific text prompts, which synergizes multi-source information and task-specific and patient-specific priors for optimized diagnostic performance. The key components of our LIT model and associated techniques are evaluated with an emphasis on 3D lung CT analysis. Our initial results show that the LIT model performs multiple medical tasks well, including lung segmentation, lung nodule detection, and lung cancer classification. Active efforts are in progress to develop large image-language models for superior medical imaging in diverse applications and optimal patient outcomes.	翻訳日:2023-04-07 16:40:16 公開日:2023-04-03
# 2017年から2023年までの大規模言語モデル研究の文献的レビュー A Bibliometric Review of Large Language Models Research from 2017 to 2023 ( http://arxiv.org/abs/2304.02020v1 ) ライセンス: Link先を確認	Lizhou Fan, Lingyao Li, Zihui Ma, Sanggyu Lee, Huizi Yu, Libby Hemphill	(参考訳) LLM(Large Language Model)は、自然言語処理(NLP)タスクにまたがる卓越した性能を示す言語モデルの一種であり、人間に似た言語を生成する能力と、科学技術に革命をもたらす可能性から、非常に追求された研究領域となっている。本研究では,学術文献の書誌的・談話的分析を LLM 上で実施する。 5000以上の出版物を合成し、研究者、実践者、政策立案者が現在のLLM研究の展望をナビゲートするためのロードマップとして機能する。研究パラダイムとコラボレーションのパターンを特定し,2017年から2023年にかけての研究動向を示す。まず,LLM 研究の基本となるコアアルゴリズム開発と NLP タスクの解析から始める。次に,医学,工学,社会科学,人文科学などの分野におけるllmの応用について検討する。また,llms研究のダイナミックで高速な進化についても概説する。概して、本論文はllms研究とその応用の現状、影響、可能性について貴重な知見を提供する。 Large language models (LLMs) are a class of language models that have demonstrated outstanding performance across a range of natural language processing (NLP) tasks and have become a highly sought-after research area, because of their ability to generate human-like language and their potential to revolutionize science and technology. In this study, we conduct bibliometric and discourse analyses of scholarly literature on LLMs. Synthesizing over 5,000 publications, this paper serves as a roadmap for researchers, practitioners, and policymakers to navigate the current landscape of LLMs research. We present the research trends from 2017 to early 2023, identifying patterns in research paradigms and collaborations. We start with analyzing the core algorithm developments and NLP tasks that are fundamental in LLMs research. We then investigate the applications of LLMs in various fields and domains including medicine, engineering, social science, and humanities. Our review also reveals the dynamic, fast-paced evolution of LLMs research. Overall, this paper offers valuable insights into the current state, impact, and potential of LLMs research and its applications.	翻訳日:2023-04-06 14:33:21 公開日:2023-04-03
# 双方向LSTMによる偽職投稿の検出 Detecting Fake Job Postings Using Bidirectional LSTM ( http://arxiv.org/abs/2304.02019v1 ) ライセンス: Link先を確認	Aravind Sasidharan Pillai	(参考訳) 偽の求人広告がオンライン求人市場に広まり、求職者と雇用者にとって大きな課題となっている。この問題に対処する必要性が高まっているにもかかわらず、不正な求人広告の検出にディープラーニング技術を活用する研究は限られている。本研究では,双方向長短期記憶(Bidirectional Long Short-Term Memory, Bi-LSTM)モデルを用いて,偽の求人広告を識別することによってギャップを埋めることを目的とする。提案手法は数値的特徴とテキスト的特徴の両方を考慮し,データ内のパターンや関係を効果的に把握する。提案モデルはより優れた性能を示し,オンライン求人市場における実用的応用の可能性を示し,0.91LOC AUCスコアと98.71%の精度で達成した。この研究の成果は、偽の求人の拡散に対処し、ジョブ検索プロセスの全体的な完全性を改善する、堅牢で自動化されたツールの開発に寄与する。さらに,本手法に関する課題,今後の研究方向,倫理的考察について論じ,オンラインジョブ詐欺と戦うための実践的ソリューションのさらなる探求と開発をめざす。 Fake job postings have become prevalent in the online job market, posing significant challenges to job seekers and employers. Despite the growing need to address this problem, there is limited research that leverages deep learning techniques for the detection of fraudulent job advertisements. This study aims to fill the gap by employing a Bidirectional Long Short-Term Memory (Bi-LSTM) model to identify fake job advertisements. Our approach considers both numeric and text features, effectively capturing the underlying patterns and relationships within the data. The proposed model demonstrates a superior performance, achieving a 0.91 ROC AUC score and a 98.71% accuracy rate, indicating its potential for practical applications in the online job market. The findings of this research contribute to the development of robust, automated tools that can help combat the proliferation of fake job postings and improve the overall integrity of the job search process. Moreover, we discuss challenges, future research directions, and ethical considerations related to our approach, aiming to inspire further exploration and development of practical solutions to combat online job fraud.	翻訳日:2023-04-06 14:33:03 公開日:2023-04-03
# Beyond Fixed Grid: 変形可能なグリッドによる幾何学的画像表現の学習 Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid ( http://arxiv.org/abs/2008.09269v2 ) ライセンス: Link先を確認	Jun Gao, Zian Wang, Jinchen Xuan, Sanja Fidler	(参考訳) 現代のコンピュータビジョンでは、画像は通常、一定の一様格子として表現され、いくつかのストライドを持ち、深層畳み込みニューラルネットワークによって処理される。我々は、グリッドを変形して、高周波画像コンテンツとよりよく一致させることは、より効果的な戦略であると主張する。学習可能なニューラルネットワークモジュールである \emph{Deformable Grid} DefGrid を導入し、2次元三角格子の頂点の位置オフセットを予測し、変形格子のエッジが画像境界と整合する。 defgridをさまざまな処理レベルでモジュールとして挿入することで、さまざまなユースケース、すなわちさまざまなユースケースで紹介しています。我々はDefGridをエンド・ツー・エンドのemph{learnable geometry downsampling} 層として利用し、画像の深部CNNへの送出時の解像度を下げるための標準的なプール法を置き換える。意味セグメンテーションタスクにおいて,一様グリッド上でcnnを使用する場合と比較して,同じグリッド解像度で有意に改善された結果を示す。また,オブジェクトマスクアノテーションのタスクにおいてDefGridを出力層に利用し,予測した多角形格子上のオブジェクト境界の推論により,既存のピクセルワイドおよび曲線ベースのアプローチよりも正確な結果が得られることを示す。最終的にdefgridを,教師なし画像分割のためのスタンドアロンモジュールとして紹介し,既存のアプローチよりも優れた性能を示す。プロジェクトウェブサイト: http://www.cs.toronto.edu/~jungao/def-grid In modern computer vision, images are typically represented as a fixed uniform grid with some stride and processed via a deep convolutional neural network. We argue that deforming the grid to better align with the high-frequency image content is a more effective strategy. We introduce \emph{Deformable Grid} DefGrid, a learnable neural network module that predicts location offsets of vertices of a 2-dimensional triangular grid, such that the edges of the deformed grid align with image boundaries. We showcase our DefGrid in a variety of use cases, i.e., by inserting it as a module at various levels of processing. We utilize DefGrid as an end-to-end \emph{learnable geometric downsampling} layer that replaces standard pooling methods for reducing feature resolution when feeding images into a deep CNN. We show significantly improved results at the same grid resolution compared to using CNNs on uniform grids for the task of semantic segmentation. We also utilize DefGrid at the output layers for the task of object mask annotation, and show that reasoning about object boundaries on our predicted polygonal grid leads to more accurate results over existing pixel-wise and curve-based approaches. We finally showcase DefGrid as a standalone module for unsupervised image partitioning, showing superior performance over existing approaches. Project website: http://www.cs.toronto.edu/~jungao/def-grid	翻訳日:2023-04-05 20:06:13 公開日:2023-04-03
# ハドロン衝突体における一次頂点再構成のための量子アニールを用いたトラッククラスタリング Track clustering with a quantum annealer for primary vertex reconstruction at hadron colliders ( http://arxiv.org/abs/1903.08879v4 ) ライセンス: Link先を確認	Souvik Das, Andrew J. Wildridge, Andreas Jung	(参考訳) ビーム軸に沿った荷電粒子軌道のクラスタリングは、ハドロン衝突型加速器実験におけるハドロン相互作用の位置を再構築する最初のステップである。我々は2036年の物理量子ビットd波量子アニーラーを用いて、大ハドロン衝突型加速器の小型ミューオンソレノイド実験で測定された一次頂点と軌道の位置が類似する人工事象において、限られた容量でトラッククラスタリングを行う。このアルゴリズムは古典量子ハイブリッドではなく、完全に量子アニールに依存しており、様々な事象トポロジーでテストされている。 d-wave chimeraアーキテクチャ上の問題を決定論的グラフに埋め込み,論理キュービット内の結合強度を最適化する方法,アニーリング時間を最適化する方法を示す。さらに,物理焼鈍機と同じ処理時間に制約された商用CPU上での模擬焼鈍との比較を行った。平均665物理量子ビットを含む56の論理量子ビット問題に対するシミュレーションアニーリングに対する量子アドバンテージに注意する。我々の埋め込みと最適化手法とベンチマークパラダイムは、一般に量子アニール上の他のクラスタリング問題に適用できる。このアルゴリズムは、lhcの一次頂点数に到達するためのより洗練されたアルゴリズムのビルディングブロックとして使うことができる。 Clustering of charged particle tracks along the beam axis is the first step in reconstructing the positions of hadronic interactions, also known as primary vertices, at hadron collider experiments. We use a 2036 physical qubit D-Wave quantum annealer to perform track clustering in a limited capacity on artificial events where the positions of primary vertices and tracks resemble those measured by the Compact Muon Solenoid experiment at the Large Hadron Collider. The algorithm, which is not a classical-quantum hybrid but relies entirely on quantum annealing, is tested on a variety of event topologies. We demonstrate a deterministic graph-embedding of the problem on the D-Wave Chimera architecture, a method for optimizing the coupling strengths within logical qubits, and a method for optimizing annealing time. Further, we benchmark it against simulated annealing on a commercial CPU constrained to the same processor time per anneal as the physical annealer. We note a quantum advantage against simulated annealing up to a 56 logical qubit problem that involves 665 physical qubits on average. Our embedding and optimization methods, and the benchmarking paradigm, can be applied generally to other clustering problems on quantum annealers. This algorithm may be used as a building-block for more sophisticated algorithms to reach the number of primary vertices at the LHC.	翻訳日:2023-04-05 20:05:35 公開日:2023-04-03
# 通信効率の良い連帯線形および深い一般化正準相関解析 Communication-Efficient Federated Linear and Deep Generalized Canonical Correlation Analysis ( http://arxiv.org/abs/2109.12400v2 ) ライセンス: Link先を確認	Sagar Shrestha and Xiao Fu	(参考訳) 古典的および深い一般化された標準相関解析(GCCA)アルゴリズムは、線形変換とニューラルネットワークを用いて複数の ``views'' (例:音声と画像)からデータエンティティの低次元共通表現を求める。ビューが異なるコンピュータエージェント(例えば組織やエッジデバイス)に取得され、プライバシや通信コストの考慮からデータ共有が望まれない場合、フェデレートされた学習ベースのGCCAが好まれる。連合学習では、ビューをエージェントにローカルに保持し、中央サーバとの限られた情報交換のみを許可する。しかし,既存のGCCAアルゴリズムをこのような統合学習環境に適用すると,通信オーバーヘッドが著しく大きくなる可能性がある。本研究は, 最大分散(MAX-VAR)定式化の下で, 線形および深部GCCAの通信効率向上のためのフェデレート学習フレームワークを提案する。オーバーヘッド問題は、計算エージェントと中央コントローラ間の情報交換を積極的に(量子化によって)圧縮することで解決される。数値化されていないバージョンと比較して,提案手法は,精度や収束速度をほとんど損なうことなく,通信オーバーヘッドの大幅な削減を享受できることを示した。厳密な収束解析も提示され、これは非自明な試みである。汎用フェデレーション最適化の結果はGCCAの特別な問題構造をカバーしていない。本結果は,重量子化や確率近似の下でも,線形および深部GCCAのアルゴリズムが線形速度で臨界点に収束することを示す。さらに、線形MAX-VARの場合、量子化アルゴリズムは、合理的条件下での幾何速度で大域的最適にアプローチする。提案手法の有効性を示すために合成および実データ実験を用いる。 Classic and deep generalized canonical correlation analysis (GCCA) algorithms seek low-dimensional common representations of data entities from multiple ``views'' (e.g., audio and image) using linear transformations and neural networks, respectively. When the views are acquired and stored at different computing agents (e.g., organizations and edge devices) and data sharing is undesired due to privacy or communication cost considerations, federated learning-based GCCA is well-motivated. In federated learning, the views are kept locally at the agents and only derived, limited information exchange with a central server is allowed. However, applying existing GCCA algorithms onto such federated learning settings may incur prohibitively high communication overhead. This work puts forth a communication-efficient federated learning framework for both linear and deep GCCA under the maximum variance (MAX-VAR) formulation. The overhead issue is addressed by aggressively compressing (via quantization) the exchanging information between the computing agents and a central controller. Compared to the unquantized version, our empirical study shows that the proposed algorithm enjoys a substantial reduction of communication overheads with virtually no loss in accuracy and convergence speed. Rigorous convergence analyses are also presented, which is a nontrivial effort. Generic federated optimization results do not cover the special problem structure of GCCA. Our result shows that the proposed algorithms for both linear and deep GCCA converge to critical points at a sublinear rate, even under heavy quantization and stochastic approximations. In addition, in the linear MAX-VAR case, the quantized algorithm approaches a global optimum in a geometric rate under reasonable conditions. Synthetic and real-data experiments are used to showcase the effectiveness of the proposed approach.	翻訳日:2023-04-05 19:33:03 公開日:2023-04-03
# 少数ショットオープンセット認識のための再構築指導型メタラーニング Reconstruction guided Meta-learning for Few Shot Open Set Recognition ( http://arxiv.org/abs/2108.00340v3 ) ライセンス: Link先を確認	Sayak Nag, Dripta S. Raychaudhuri, Sujoy Paul, Amit K. Roy-Chowdhury	(参考訳) 多くのアプリケーションでは、非常に限られたデータ(フェーショット分類)から分類器を学習することに制約があります。未知のカテゴリ(オープンセットの分類)からサンプルを識別する必要がある場合、タスクはさらに困難になる。少数のサンプルを持つクラスのよい抽象化を学ぶことは、特にオープンセットの設定では、非常に難しい。結果として、オープンセット認識は、数ショット設定で最小限の注目を集めている。しかし、各クラスのラベル付きサンプル数が限られている環境モニタリングのような多くのアプリケーションでは、これは重要なタスクである。既存のオープンセット認識(fsosr)法はしきい値スキームに依存しており、オープンクラスサンプルの均一な確率を考慮する人もいる。しかし、このアプローチはしばしば不正確であり、特に細粒度の分類では、しきい値の選択に非常に敏感である。これらの問題に対処するため、我々はReconstructing Exemplar-based Few-shot Open-set ClaSsifier (ReFOCS)を提案する。新規のexemplar reconstruction-based meta-learning strategy refocsを用いて、サンプルの開度を自己認識して学習することにより、注意深く調整された閾値の必要性をなくすfsosrを合理化する。例題はクラス代表として行動し、トレーニングデータセットで提供されるか、機能ドメインで見積もることができる。さまざまなデータセットをテストすることで、ReFOCSは複数の最先端手法より優れていることを示す。 In many applications, we are constrained to learn classifiers from very limited data (few-shot classification). The task becomes even more challenging if it is also required to identify samples from unknown categories (open-set classification). Learning a good abstraction for a class with very few samples is extremely difficult, especially under open-set settings. As a result, open-set recognition has received minimal attention in the few-shot setting. However, it is a critical task in many applications like environmental monitoring, where the number of labeled examples for each class is limited. Existing few-shot open-set recognition (FSOSR) methods rely on thresholding schemes, with some considering uniform probability for open-class samples. However, this approach is often inaccurate, especially for fine-grained categorization, and makes them highly sensitive to the choice of a threshold. To address these concerns, we propose Reconstructing Exemplar-based Few-shot Open-set ClaSsifier (ReFOCS). By using a novel exemplar reconstruction-based meta-learning strategy ReFOCS streamlines FSOSR eliminating the need for a carefully tuned threshold by learning to be self-aware of the openness of a sample. The exemplars, act as class representatives and can be either provided in the training dataset or estimated in the feature domain. By testing on a wide variety of datasets, we show ReFOCS to outperform multiple state-of-the-art methods.	翻訳日:2023-04-05 19:32:36 公開日:2023-04-03
# Bayesian Controller Fusion:ロボットの深部強化学習における制御の活用 Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics ( http://arxiv.org/abs/2107.09822v3 ) ライセンス: Link先を確認	Krishan Rana, Vibhavari Dasagi, Jesse Haviland, Ben Talbot, Michael Milford and Niko S\"underhauf	(参考訳) 本稿では,従来の手作りコントローラの強みとモデルフリー深部強化学習(RL)を組み合わせたハイブリッド制御戦略であるBayesian Controller Fusion(BCF)を紹介する。 BCFはロボティクス領域で成長し、多くのタスクに対して信頼性はあるが最適でない制御が優先されるが、スクラッチからのRLは安全でデータ非効率である。各システムからの不確実性を認識した分布出力を融合することにより、BCFはそれらの間の制御を調停し、それぞれの強みを利用する。我々は,広大かつ長期にわたる環境下でのナビゲーションと,マニピュラビリティの最大化を伴う複雑な到達タスクの2つの実世界のロボティクスタスクについてBCFを研究する。これら2つの領域に対して、単純な手作りのコントローラが存在し、リスク・逆の方法でタスクを解決できるが、解析的モデリング、コントローラの誤校正、タスクの変動に制限を課した最適解を必ずしも示さない。訓練の初期段階における事前の指導が自然に行われるため、BCFは学習を加速し、政策がより経験を積むにつれて、事前の制御性能よりも大幅に改善する。さらに重要なことは、コントロールの事前のリスクの多様性を考えると、BCFは安全な探索と展開を保証する。さらに、bcfのゼロショットsim-to-real設定の適用可能性と、実世界の分散状態を扱う能力を示す。 BCFは、深いRLと従来のロボット制御の相補的な強みを組み合わせるための、有望なアプローチである。コードと追加ビデオはhttps://krishanrana.github.io/bcfで公開されている。 We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF's applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at https://krishanrana.github.io/bcf.	翻訳日:2023-04-05 19:32:13 公開日:2023-04-03
# 因果埋め込みによる物理系予測のための観測可能性の普遍集合 Universal set of Observables for Forecasting Physical Systems through Causal Embedding ( http://arxiv.org/abs/2105.10759v3 ) ライセンス: Link先を確認	G Manjunath, A de Clercq and MJ Steynberg	(参考訳) 我々は、基礎となる力学系の左無限軌道全体やそのような左無限軌道からの観測が、いつ、どのようにして異なる空間内の一対の要素によって一意に表現できるかを示す。そのようなペアのコレクションは、駆動力学系から派生したもので、駆動系と一緒に関数を学ぶのに使用される。 (i)。基礎となるシステムに位相的に共役するシステムを決定する (ii) 共役が計算可能で普遍的であるため、基盤となるシステムのダイナミクスを予測すること、すなわち、基盤となるシステムに依存しない (iii) たとえ関数の学習に誤りがあったとしても、因果的に埋め込まれたオブジェクトのイメージを含むアトラクタを保証する。これらを達成することによって、学習可能な関数の存在の保証がないため、しばしば長期的一貫性の低い既存の貯水池コンピューティングスキームを破り、Takensの遅延埋め込みにおける安定性の課題を克服する新たな予測スキームを開拓する。既知技術が失敗した基盤システムの正確なモデリングについて説明する。 We demonstrate when and how an entire left-infinite orbit of an underlying dynamical system or observations from such left-infinite orbits can be uniquely represented by a pair of elements in a different space, a phenomenon which we call \textit{causal embedding}. The collection of such pairs is derived from a driven dynamical system and is used to learn a function which together with the driven system would: (i). determine a system that is topologically conjugate to the underlying system (ii). enable forecasting the underlying system's dynamics since the conjugacy is computable and universal, i.e., it does not depend on the underlying system (iii). guarantee an attractor containing the image of the causally embedded object even if there is an error made in learning the function. By accomplishing these we herald a new forecasting scheme that beats the existing reservoir computing schemes that often lead to poor long-term consistency as there is no guarantee of the existence of a learnable function, and overcomes the challenges of stability in Takens delay embedding. We illustrate accurate modeling of underlying systems where previously known techniques have failed.	翻訳日:2023-04-05 19:31:08 公開日:2023-04-03
# ネガティビティはより早く広まる - 政治的コミュニケーションにおける感情の役割に関する大規模多言語twitter分析 Negativity Spreads Faster: A Large-Scale Multilingual Twitter Analysis on the Role of Sentiment in Political Communication ( http://arxiv.org/abs/2202.00396v3 ) ライセンス: Link先を確認	Dimosthenis Antypas, Alun Preece, Jose Camacho-Collados	(参考訳) ソーシャルメディアは、現代社会、特に西洋社会では、Twitterのようなプラットフォームが政治家をフォローできるため、市民が政治的議論により関与するようになると、非常に影響力を増している。同様に、政治家はTwitterを使って意見を表明し、現在の話題について議論し、有権者の行動に影響を与えるための政治議題を推進している。本稿では、欧州3カ国の政治家のツイートを分析し、そのツイートのバイラル性について検討する。これまでの研究では、ネガティブな感情を伝えるツイートがより頻繁にリツイートされることが示されている。最先端の事前学習された言語モデルを利用することで、ギリシャ、スペイン、イギリスの国会議員が収集した数十万のツイートについて感情分析を行った。私たちは、影響力のあるツイートとあまり人気のないツイートの違いを体系的に探索し分析することでこれを達成しました。我々の分析は、特に近年において、政治家の否定的なツイートが広く広まり、政党と政治家と一般大衆の間で興味深い違いが浮かび上がっていることを示している。 Social media has become extremely influential when it comes to policy making in modern societies, especially in the western world, where platforms such as Twitter allow users to follow politicians, thus making citizens more involved in political discussion. In the same vein, politicians use Twitter to express their opinions, debate among others on current topics and promote their political agendas aiming to influence voter behaviour. In this paper, we attempt to analyse tweets of politicians from three European countries and explore the virality of their tweets. Previous studies have shown that tweets conveying negative sentiment are likely to be retweeted more frequently. By utilising state-of-the-art pre-trained language models, we performed sentiment analysis on hundreds of thousands of tweets collected from members of parliament in Greece, Spain and the United Kingdom, including devolved administrations. We achieved this by systematically exploring and analysing the differences between influential and less popular tweets. Our analysis indicates that politicians' negatively charged tweets spread more widely, especially in more recent times, and highlights interesting differences between political parties as well as between politicians and the general population.	翻訳日:2023-04-05 19:23:46 公開日:2023-04-03
# 3量子ビットを用いた量子フーリエ変換の実装 Implementing quantum Fourier transform using three qubits ( http://arxiv.org/abs/2110.15067v2 ) ライセンス: Link先を確認	Mouhcine Yachi, Radouan Hab-arrih, Ahmed Jellal	(参考訳) 3つの量子ビットを記述するハミルトニアンの循環対称性を用いて、量子フーリエ変換を実現する。この対称性により、ハミルトニアンに関係する物理パラメータの大きさに独立して固有ベクトルの集合を構築することができ、その結果、絡み合いは維持される。実現はトラップされたイオンに頼り、ゲートの実装は各スピン積状態からフーリエモードへの断熱的遷移を必要とする。忠実度を数値計算し,その結果は重要な値を示した。最後に、対向運転場を用いてゲートの加速について議論する。 Using the circulant symmetry of a Hamiltonian describing three qubits, we realize the quantum Fourier transform. This symmetry allows us to construct a set of eigenvectors independently on the magnitude of physical parameters involved in the Hamiltonian and as a result, the entanglement will be maintained. The realization will be leaned on trapped ions and the gate implementation requires an adiabatic transition from each spin product state to Fourier modes. The fidelity was numerically calculated and the results show important values. Finally, we discuss the acceleration of the gate by using the counter-driving field.	翻訳日:2023-04-05 19:21:24 公開日:2023-04-03
# リハーサルフリー連続学習について A Closer Look at Rehearsal-Free Continual Learning ( http://arxiv.org/abs/2203.17269v2 ) ライセンス: Link先を確認	James Seale Smith, Junjiao Tian, Shaunak Halbe, Yen-Chang Hsu, Zsolt Kira	(参考訳) 連続学習(continual learning)とは、機械学習モデルがトレーニングデータの連続的なシフトから新たな概念を学習すると同時に、トレーニングデータから消失する可能性のある既見のクラスの知識の低下を長期にわたって回避する(破滅的な忘れ方問題として知られる現象)。 1つの拡張タスク(いわゆるクラス増分連続学習)の継続的な学習への現在のアプローチは、この知識の劣化を避けるために、これまで見られたデータを広範囲にリハーサルする必要がある。残念ながら、リハーサルはメモリにコストがかかり、データプライバシーにも違反する可能性がある。代わりに,知識蒸留とパラメータ正規化を組み合わせることにより,リハーサルを伴わずに継続学習性能の向上を図る。具体的には、予測蒸留、特徴蒸留、L2パラメータ正則化、EWCパラメータ正則化など、一般的な連続学習手法について深く研究する。まず、パラメータ正規化手法が1つの拡張タスクのリハーサルなし連続学習に失敗するという一般的な仮定を論じる。次に、リハーサルなし連続学習における事前学習モデルからの知識を活用する方法について検討し、バニラL2パラメータ正則化がEWCパラメータ正則化および特徴蒸留より優れていることを示す。最後に、最近普及したimagenet-rベンチマークを調べ、vitトランスフォーマのセルフアテンションブロックに実装されたl2パラメータの正規化が、最近普及した継続的学習手法のプロンプトよりも優れていることを示す。 Continual learning is a setting where machine learning models learn novel concepts from continuously shifting training data, while simultaneously avoiding degradation of knowledge on previously seen classes which may disappear from the training data for extended periods of time (a phenomenon known as the catastrophic forgetting problem). Current approaches for continual learning of a single expanding task (aka class-incremental continual learning) require extensive rehearsal of previously seen data to avoid this degradation of knowledge. Unfortunately, rehearsal comes at a cost to memory, and it may also violate data-privacy. Instead, we explore combining knowledge distillation and parameter regularization in new ways to achieve strong continual learning performance without rehearsal. Specifically, we take a deep dive into common continual learning techniques: prediction distillation, feature distillation, L2 parameter regularization, and EWC parameter regularization. We first disprove the common assumption that parameter regularization techniques fail for rehearsal-free continual learning of a single, expanding task. Next, we explore how to leverage knowledge from a pre-trained model in rehearsal-free continual learning and find that vanilla L2 parameter regularization outperforms EWC parameter regularization and feature distillation. Finally, we explore the recently popular ImageNet-R benchmark, and show that L2 parameter regularization implemented in self-attention blocks of a ViT transformer outperforms recent popular prompting for continual learning methods.	翻訳日:2023-04-05 19:12:32 公開日:2023-04-03
# ネットワーク偏波, フィルタ気泡, エコーチャンバー : 対策と低減方法についての注釈付きレビュー Network polarization, filter bubbles, and echo chambers: An annotated review of measures and reduction methods ( http://arxiv.org/abs/2207.13799v4 ) ライセンス: Link先を確認	Ruben Interian, Ruslan G. Marzo, Isela Mendoza, Celso C. Ribeiro	(参考訳) 分極は、コミュニティや社会のメンバーをつなぐ基盤となるネットワークが、グループ間の接続が弱い高度に連結したグループによって特徴づけられるときに生じる。分極化の増大、エコーチェンバーの強化、ソーシャルネットワークにおける情報フィルタによる孤立化は、コンピュータ科学、経済学、社会科学、政治科学など様々な分野の研究者の注目を集めている。本稿では,ネットワークの偏光対策と偏光処理モデルについて注釈付きレビューを行う。グラフやネットワークにおける偏極を測定するためのいくつかのアプローチが同定され、ホモフィリー、モジュラリティ、ランダムウォーク、バランス理論に基づくものが含まれる。分極化を減らすために使われる戦略には、エッジエディションやノードエディション(挿入や削除、エッジウェイトの変更を含む)を提案する方法、ソーシャルネットワーク設計の変更、あるいはこれらのネットワークに埋め込まれたレコメンデーションシステムの変更が含まれる。 Polarization arises when the underlying network connecting the members of a community or society becomes characterized by highly connected groups with weak inter-group connectivity. The increasing polarization, the strengthening of echo chambers, and the isolation caused by information filters in social networks are increasingly attracting the attention of researchers from different areas of knowledge such as computer science, economics, social and political sciences. This work presents an annotated review of network polarization measures and models used to handle the polarization. Several approaches for measuring polarization in graphs and networks were identified, including those based on homophily, modularity, random walks, and balance theory. The strategies used for reducing polarization include methods that propose edge or node editions (including insertions or deletions, as well as edge weight modifications), changes in social network design, or changes in the recommendation systems embedded in these networks.	翻訳日:2023-04-05 19:05:04 公開日:2023-04-03
# yankee swap:matroidランクバリュエーションのための高速で簡単なフェアアロケーションメカニズム Yankee Swap: a Fast and Simple Fair Allocation Mechanism for Matroid Rank Valuations ( http://arxiv.org/abs/2206.08495v5 ) ライセンス: Link先を確認	Vignesh Viswanathan and Yair Zick	(参考訳) エージェントがマトロイドランクの評価値を持つ場合、不特定商品の公平な割り当てについて検討する。我々の主な貢献は、明快で効率的なロレンツ支配割り当てを計算する、口語的ヤンキースワップ手順に基づく単純なアルゴリズムである。このような割り当てを計算する多項式時間アルゴリズムはあるが、提案手法は2つの方法で改善する。 (a)我々のアプローチは容易に理解でき、複雑なマトロイド最適化アルゴリズムをサブルーチンとして使用しません。 (b)我々のアプローチはスケーラブルであり、ロレンツ支配割当を計算するのに既知のアルゴリズムよりも高速である。これらの2つの特性は、実際の公平な割り当て設定におけるアルゴリズムの採用の鍵となります。 We study fair allocation of indivisible goods when agents have matroid rank valuations. Our main contribution is a simple algorithm based on the colloquial Yankee Swap procedure that computes provably fair and efficient Lorenz dominating allocations. While there exist polynomial time algorithms to compute such allocations, our proposed method improves on them in two ways. (a) Our approach is easy to understand and does not use complex matroid optimization algorithms as subroutines. (b) Our approach is scalable; it is provably faster than all known algorithms to compute Lorenz dominating allocations. These two properties are key to the adoption of algorithms in any real fair allocation setting; our contribution brings us one step closer to this goal.	翻訳日:2023-04-05 19:02:45 公開日:2023-04-03
# 多言語ファインタニングとバックトランスレーションによる多言語双方向教師なし翻訳 Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation ( http://arxiv.org/abs/2209.02821v4 ) ライセンス: Link先を確認	Bryan Li, Mohammad Sadegh Rasooli, Ajay Patel, Chris Callison-Burch	(参考訳) 本研究では,NMTモデルをトレーニングし,未知の言語を英語と英語の両方に翻訳する2段階のアプローチを提案する。最初の段階では、事前訓練されたXLM-RおよびRoBERTa重みにエンコーダデコーダモデルを初期化し、40言語で並列データに対して多言語微調整を行う。このモデルは、未熟な言語のゼロショット翻訳に一般化できる。第2段階では、この一般化能力を利用して、単言語データセットから合成並列データを生成し、その後、双方向にバックトランスレーションのラウンドを訓練する。我々のアプローチは、EcXTra(英語中心のクロスリンガル(X)転送)であり、概念的には単純であり、標準のクロスエントロピー目的のみを使用する。データ駆動型であり、補助並列データと単言語データを活用する。我々は7つの低リソース言語に対する教師なしnmt結果を評価し,各ラウンドのバックトランスレーション訓練により双方向性能がさらに向上することを確認した。我々の最後のシングルEcXTra訓練モデルは、すべての翻訳方向の競合翻訳性能を達成し、特に英語からカザフ語への新たな最先端(22.9 > 10.4 BLEU)を確立した。私たちのコードはhttps://github.com/manestay/EcXTraで利用可能です。 We propose a two-stage approach for training a single NMT model to translate unseen languages both to and from English. For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 40 languages to English. We find this model can generalize to zero-shot translations on unseen languages. For the second stage, we leverage this generalization ability to generate synthetic parallel data from monolingual datasets, then bidirectionally train with successive rounds of back-translation. Our approach, which we EcXTra (English-centric Crosslingual (X) Transfer), is conceptually simple, only using a standard cross-entropy objective throughout. It is also data-driven, sequentially leveraging auxiliary parallel data and monolingual data. We evaluate unsupervised NMT results for 7 low-resource languages, and find that each round of back-translation training further refines bidirectional performance. Our final single EcXTra-trained model achieves competitive translation performance in all translation directions, notably establishing a new state-of-the-art for English-to-Kazakh (22.9 > 10.4 BLEU). Our code is available at https://github.com/manestay/EcXTra .	翻訳日:2023-04-05 18:55:34 公開日:2023-04-03
# サーバ学習によるフェデレーションラーニング - 非IIDデータのパフォーマンス向上 Federated Learning with Server Learning: Enhancing Performance for Non-IID Data ( http://arxiv.org/abs/2210.02614v3 ) ライセンス: Link先を確認	Van Sy Mai, Richard J. La, Tao Zhang	(参考訳) フェデレートラーニング(FL)は、クライアントに格納されたローカルデータを協調サーバで分散学習する手段として登場した。最近の研究では、クライアントでデータをトレーニングする場合、flはパフォーマンスの低下と収束の遅さに苦しむことが示されている。ここでは、サーバが小さなデータセットから補助学習を行うことにより、この性能劣化を軽減するための新たな補完的アプローチを検討する。解析と実験により,サーバのデータセットが小さく,すべてのクライアントから収集したデータと分布が異なる場合でも,モデル精度と収束時間の両方において,新たなアプローチが大幅に向上することが示された。 Federated Learning (FL) has emerged as a means of distributed learning using local data stored at clients with a coordinating server. Recent studies showed that FL can suffer from poor performance and slower convergence when training data at clients are not independent and identically distributed. Here we consider a new complementary approach to mitigating this performance degradation by allowing the server to perform auxiliary learning from a small dataset. Our analysis and experiments show that this new approach can achieve significant improvements in both model accuracy and convergence time even when the server dataset is small and its distribution differs from that of the aggregated data from all clients.	翻訳日:2023-04-05 18:45:02 公開日:2023-04-03
# PU GNN:不均衡PUラベル付きグラフ注意ネットワークによるP2E MMORPGのチャージバックフラッド検出 PU GNN: Chargeback Fraud Detection in P2E MMORPGs via Graph Attention Networks with Imbalanced PU Labels ( http://arxiv.org/abs/2211.08604v5 ) ライセンス: Link先を確認	Jiho Choi, Junghoon Park, Woocheol Kim, Jin-Hyeok Park, Yumin Suh, Minchang Sung	(参考訳) 近年のマルチプレイヤーオンラインロールプレイングゲーム(MMORPG)におけるプレイツーアーンシステム(P2E)の出現により、ゲーム内商品は、これまで以上に現実世界の価値と交換可能になった。 p2e mmorpgsの商品は、ブロックチェーンネットワークを介してbitcoin、ethereum、klaytnなどの暗号通貨と直接交換することができる。従来のゲーム内商品とは異なり、一旦ブロックチェーンに書き込むと、P2E商品は支払い詐欺、キャンセル、返金などのチャージバック詐欺であってもゲーム運用チームによって復元できない。そこで本研究では,p2eトークンのトランザクションパターンを用いて,pu損失を伴うグラフアテンションネットワークを活用した,新たなチャージバック詐欺予測手法pu gnnを提案する。修正GraphSMOTEの導入により、提案モデルはチャージバック詐欺データセットにおけるラベルの不均衡分布を処理する。実世界の3つのP2E MMORPGデータセットを用いた実験により,PU GNNは従来提案されていた手法よりも優れた性能を示した。 The recent advent of play-to-earn (P2E) systems in massively multiplayer online role-playing games (MMORPGs) has made in-game goods interchangeable with real-world values more than ever before. The goods in the P2E MMORPGs can be directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn via blockchain networks. Unlike traditional in-game goods, once they had been written to the blockchains, P2E goods cannot be restored by the game operation teams even with chargeback fraud such as payment fraud, cancellation, or refund. To tackle the problem, we propose a novel chargeback fraud prediction method, PU GNN, which leverages graph attention networks with PU loss to capture both the players' in-game behavior with P2E token transaction patterns. With the adoption of modified GraphSMOTE, the proposed model handles the imbalanced distribution of labels in chargeback fraud datasets. The conducted experiments on three real-world P2E MMORPG datasets demonstrate that PU GNN achieves superior performances over previously suggested methods.	翻訳日:2023-04-05 18:36:21 公開日:2023-04-03
# 進化アルゴリズム(movea)によるヒト脳の高分解能経頭蓋電気刺激の多目的最適化 Multi-objective optimization via evolutionary algorithm (MOVEA) for high-definition transcranial electrical stimulation of the human brain ( http://arxiv.org/abs/2211.05658v2 ) ライセンス: Link先を確認	Mo Wang, Kexin Lou, Zeming Liu, Pengfei Wei, Quanying Liu	(参考訳) 経頭蓋電気刺激(tes)戦略の設計には、目標領域の強度、焦点距離、刺激深度、回避ゾーンなど、しばしば互いに排他的である複数の目的を考慮する必要がある。異なる戦略を最適化し、これらの目標間のトレードオフを比較するための計算フレームワークは現在不足している。本稿では,TES戦略の設計における非凸最適化問題に対して,事前定義された方向のないMOVEA(Multi-Objective Optimization)を提案する。 MOVEAはパレート最適化を通じて複数の目標の同時最適化を可能にし、手動の重量調整なしでパレートフロントを生成し、より多くの目標に容易に拡張できる。このパレート前線は、強度や焦点性といった相反する目標間のトレードオフ関係を尊重しながら、様々な要求を満たす最適な解からなる。 moveaは多用途で、high definition (hd) と two-pair システムに基づく経頭蓋交互電流刺激 (tacs) と経頭蓋側時間刺激 (ttis) の両方に適している。我々は,tacsとttiの包括的比較を行った。moveaは脳領域と認知機能との因果関係の理解や疾患の治療において,特定の目的と制約に基づく tes の最適化,tti と tacs ベースのニューロモジュレーションを促進する。 MOVEAのコードはhttps://github.com/ncclabsustech/MOVEAで公開されている。 Designing a transcranial electrical stimulation (TES) strategy requires considering multiple objectives, such as intensity in the target area, focality, stimulation depth, and avoidance zone, which are often mutually exclusive. A computational framework for optimizing different strategies and comparing trade-offs between these objectives is currently lacking. In this paper, we propose a general framework called multi-objective optimization via evolutionary algorithms (MOVEA) to address the non-convex optimization problem in designing TES strategies without predefined direction. MOVEA enables simultaneous optimization of multiple targets through Pareto optimization, generating a Pareto front after a single run without manual weight adjustment and allowing easy expansion to more targets. This Pareto front consists of optimal solutions that meet various requirements while respecting trade-off relationships between conflicting objectives such as intensity and focality. MOVEA is versatile and suitable for both transcranial alternating current stimulation (tACS) and transcranial temporal interference stimulation (tTIS) based on high definition (HD) and two-pair systems. We performed a comprehensive comparison between tACS and tTIS in terms of intensity, focality, and steerability for targets at different depths.MOVEA facilitates the optimization of TES based on specific objectives and constraints, advancing tTIS and tACS-based neuromodulation in understanding the causal relationship between brain regions and cognitive functions and in treating diseases. The code for MOVEA is available at https://github.com/ncclabsustech/MOVEA.	翻訳日:2023-04-05 18:35:54 公開日:2023-04-03
# 不完全情報に基づく知識グラフの品質評価 Knowledge Graph Quality Evaluation under Incomplete Information ( http://arxiv.org/abs/2212.00994v2 ) ライセンス: Link先を確認	Xiaodong Li, Chenxin Zou, Yi Cai, Yuelong Zhu	(参考訳) 知識グラフ(KG)は多くのタスクにおける基本的な役割のため、ますます注目を集めている。したがって、KGsの品質評価は重要で不可欠である。この分野での既存の手法では、異なる次元からの新しい品質指標を提案するか、kg建設段階での性能を測定するかによってkgを評価する。しかし、これらの方法には2つの大きな問題がある。まず、KGsの内部情報を品質評価中に露出させるKGsの生データに強く依存する。第二に、ダウンストリームアプリケーションにとって後者がより重要となる能力レベルではなく、データレベルの品質についてより深く検討する。そこで本研究では,不完全情報に基づく知識グラフ品質評価フレームワーク(qeii)を提案する。品質評価タスクは、2つのKG間の逆Q&Aゲームに変換される。したがって、ゲームの勝者はより良い品質を持つと考えられる。評価プロセス中は、情報保護を保証する生データを露出しない。 4組のKGの実験結果から,QEIIはベースラインと比較して,不完全情報下での能力レベルにおいて合理的な品質評価を行うことを示した。 Knowledge graphs (KGs) have attracted more and more attentions because of their fundamental roles in many tasks. Quality evaluation for KGs is thus crucial and indispensable. Existing methods in this field evaluate KGs by either proposing new quality metrics from different dimensions or measuring performances at KG construction stages. However, there are two major issues with those methods. First, they highly rely on raw data in KGs, which makes KGs' internal information exposed during quality evaluation. Second, they consider more about the quality at data level instead of ability level, where the latter one is more important for downstream applications. To address these issues, we propose a knowledge graph quality evaluation framework under incomplete information (QEII). The quality evaluation task is transformed into an adversarial Q&A game between two KGs. Winner of the game is thus considered to have better qualities. During the evaluation process, no raw data is exposed, which ensures information protection. Experimental results on four pairs of KGs demonstrate that, compared with baselines, the QEII implements a reasonable quality evaluation at ability level under incomplete information.	翻訳日:2023-04-05 18:27:53 公開日:2023-04-03
# 一般化された少数ショットセマンティクスセグメンテーションのための強固なベースライン A Strong Baseline for Generalized Few-Shot Semantic Segmentation ( http://arxiv.org/abs/2211.14126v2 ) ライセンス: Link先を確認	Sina Hajimiri, Malik Boudiaf, Ismail Ben Ayed, Jose Dolz	(参考訳) 本稿では,簡単なトレーニングプロセスと最適化の容易な推論フェーズを備えた,一般化されたマイナショットセグメンテーションフレームワークを提案する。特に、よく知られたInfoMaxの原理に基づいて、学習した特徴表現とそれに対応する予測との相互情報(MI)を最大化する単純なモデルを提案する。また,MIに基づく定式化から派生した用語は,知識蒸留用語と結合し,基礎クラスにおける知識を保持する。簡単なトレーニングプロセスでは、ベースクラスでトレーニングされたセグメンテーションネットワークの上に推論モデルを適用することができる。提案手法は,人気のマイナショットセグメンテーションベンチマークであるpascal-$5^i$とcoco-$20^i$に対して大幅に改善する。特に新規の授業では、改善率は7%から26%(PASCAL-$5^i$)と3%から12%(COCO-$20^i$)である。さらに,パフォーマンスギャップがさらに悪化する,より困難な設定を提案する。私たちのコードはhttps://github.com/sinahmr/DIaM.comで公開されています。 This paper introduces a generalized few-shot segmentation framework with a straightforward training process and an easy-to-optimize inference phase. In particular, we propose a simple yet effective model based on the well-known InfoMax principle, where the Mutual Information (MI) between the learned feature representations and their corresponding predictions is maximized. In addition, the terms derived from our MI-based formulation are coupled with a knowledge distillation term to retain the knowledge on base classes. With a simple training process, our inference model can be applied on top of any segmentation network trained on base classes. The proposed inference yields substantial improvements on the popular few-shot segmentation benchmarks, PASCAL-$5^i$ and COCO-$20^i$. Particularly, for novel classes, the improvement gains range from 7% to 26% (PASCAL-$5^i$) and from 3% to 12% (COCO-$20^i$) in the 1-shot and 5-shot scenarios, respectively. Furthermore, we propose a more challenging setting, where performance gaps are further exacerbated. Our code is publicly available at https://github.com/sinahmr/DIaM.	翻訳日:2023-04-05 18:26:51 公開日:2023-04-03
# ハイゼンベルク対共変文字列 Heisenberg versus the Covariant String ( http://arxiv.org/abs/2212.07256v3 ) ライセンス: Link先を確認	Norbert Dragon and Florian Oppermann	(参考訳) p^2 - m^2\bigr)\psi = 0$ は、自由ベクトル位置作用素 $x=(x_0,\dots x_{d-1})$: ハイゼンベルク代数 $[p^m, x_n] = i \delta^m{}_n$ は、任意の質量のポアンカル多重が消えるという単純な議論から導かれる。同じ結論はストーン=ヴォン・ノイマンの定理から導かれる。量子論において、絶対連続スペクトルの低次元部分多様体への制約は、ディラックの対応する古典的制約に対する処理が一貫した対応する量子モデルを持つシンプレクティック部分多様体を定義するとしてもゼロとなる。そのヒルベルト空間は、制約のない理論の部分空間ではない。したがって、制約のないモデルの演算子関係は制約付きモデルに引き継がれる必要はない。この議論は相対論的粒子の量子化されたワールドラインモデルと共変量子弦の物理的状態を除いている。粒子に作用するローレンツ変換の生成元に関する誤解を補正する。 A Poincar\'e multiplet of mass eigenstates $\bigl(P^2 - m^2\bigr)\Psi = 0$ cannot be a subspace of a space with a $D$-vector position operator $X=(X_0,\dots X_{D-1})$: the Heisenberg algebra $[P^m, X_n] = i \delta^m{}_n$ implies by a simple argument that each Poincar\'e multiplet of definite mass vanishes. The same conclusion follows from the Stone-von Neumann theorem. In a quantum theory the constraint of an absolutely continuous spectrum to a lower dimensional submanifold yields zero even if Dirac's treatment of the corresponding classical constraint defines a symplectic submanifold with a consistent corresponding quantum model. Its Hilbert space is not a subspace of the unconstrained theory. Hence the operator relations of the unconstrained model need not carry over to the constrained model. Our argument excludes quantized worldline models of relativistic particles and the physical states of the covariant quantum string. We correct misconceptions about the generators of Lorentz transformations acting on particles.	翻訳日:2023-04-05 18:17:41 公開日:2023-04-03
# D適応による学習時間自由学習 Learning-Rate-Free Learning by D-Adaptation ( http://arxiv.org/abs/2301.07733v3 ) ライセンス: Link先を確認	Aaron Defazio and Konstantin Mishchenko	(参考訳) d-適応(d-adaptation)は、バックトラッキングやラインサーチなしに凸リプシッツ関数を最小化するための収束率を漸近的に達成し、ステップごとに追加の関数値や勾配評価を行わない学習率を自動的に設定する手法である。本手法は,収束率に乗算的ログ係数を付加することなく,このクラスで最初のハイパーパラメータフリーメソッドである。本手法のSGDおよびAdam変種に対する広範な実験を行い,大規模ビジョンや言語問題を含む12以上の機械学習問題に対して手作業による学習率を自動的にマッチングする手法を提案する。オープンソース実装は \url{https://github.com/facebookresearch/dadaptation} で利用可能である。 D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step. Our approach is the first hyper-parameter free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. An open-source implementation is available at \url{https://github.com/facebookresearch/dadaptation}.	翻訳日:2023-04-05 18:10:20 公開日:2023-04-03
# 映像動作認識のための階層的説明 Hierarchical Explanations for Video Action Recognition ( http://arxiv.org/abs/2301.00436v3 ) ライセンス: Link先を確認	Sadaf Gulshad, Teng Long, Nanne van Noord	(参考訳) ディープニューラルネットワークを解釈するには、視覚入力を解剖し、分類の原型的な部分を見つけることが主なアプローチである。しかし、既存の手法はこれらのプロトタイプ間の階層的関係を無視することが多く、したがってより高いレベル(ウォータースポーツなど)と低いレベル(水泳など)のセマンティック概念を説明できない。本研究では,人間認知システムに着想を得て,不確実性に対処するために階層的情報を活用する。水と人間の活動を観察しても,決定的な行動は認められない。水泳を観察した後だけ、私たちはそれを水泳行動に精練することができる。この目的のために,プロトタイプとクラス間の階層関係を構築するための階層型プロトタイプ記述器 (HIPE) を提案する。 HIPEは、入力されたビデオフレームをクラス階層の複数のレベルに分割することで、ビデオアクション分類の推論プロセスを可能にし、この手法は他のビデオタスクにも適用できる。本手法の信頼性は,ActivityNet と UCF-101 の精度・説明可能性トレードオフを減らし,マルチレベルな説明を提供することによって検証する。 To interpret deep neural networks, one main approach is to dissect the visual input and find the prototypical parts responsible for the classification. However, existing methods often ignore the hierarchical relationship between these prototypes, and thus can not explain semantic concepts at both higher level (e.g., water sports) and lower level (e.g., swimming). In this paper inspired by human cognition system, we leverage hierarchal information to deal with uncertainty: When we observe water and human activity, but no definitive action it can be recognized as the water sports parent class. Only after observing a person swimming can we definitively refine it to the swimming action. To this end, we propose HIerarchical Prototype Explainer (HIPE) to build hierarchical relations between prototypes and classes. HIPE enables a reasoning process for video action classification by dissecting the input video frames on multiple levels of the class hierarchy, our method is also applicable to other video tasks. The faithfulness of our method is verified by reducing accuracy-explainability trade off on ActivityNet and UCF-101 while providing multi-level explanations.	翻訳日:2023-04-05 18:08:29 公開日:2023-04-03
# 光円錐弦の平滑化について The Rough with the Smooth of the Light Cone String ( http://arxiv.org/abs/2212.14822v2 ) ライセンス: Link先を確認	Norbert Dragon and Florian Oppermann	(参考訳) ポアンカーイ群のユニタリ表現の生成元は滑らかな波動関数を滑らかな波動関数に写像する代数を生成する。この数学的結果は、以前は非有界作用素の代数的処理が正当化されると仮定した物理学者にとって非常に歓迎されている。しかし、滑らかさは、滑らかな波動関数を滑らかでない函数に写像する粗い作用素がポアンカルの対称性と矛盾する副作用を持つ:それらの生成元との積は定義できない。粗かつ滑らかな作用素は共通代数のメンバーではない。 transverse heisenberg pairs $x^i$ and $p^j$, $i,j\in \{1,\dots d-2\}$, $p_z = p^{d-1}$, $p^+=(p^0 + p_z)/\sqrt{2}$, 光円錐弦で起こるように、大まかに質量のない多重集合に作用する。それらの代数の領域は回転によって自身に写像されず、ローレンツ変換だけに留まる。これは全ての次元において真であり、ボソニック弦の臨界次元 $d=26$ の代数的計算を無意味にする: no dimension $d > 2$ では、光円錐弦はローレンツ群のユニタリ表現を許容する。無質量多重は空間的位置演算子 $\vec x$ と矛盾し、空間的モーメントの変換を生成する。 The generators of unitary representations of the Poincar\'e group generate an algebra which maps smooth wavefunctions to smooth wavefunctions. This mathematical result is highly welcome to physicists, who previously just assumed their algebraic treatment of unbounded operators be justified. The smoothness, however, has the side effect that rough operators, which map smooth wavefunctions to functions which are not smooth, are inconsistent with Poincar\'e symmetry: their product with the generators cannot be defined. Rough and smooth operators are not members of a common algebra. Transverse Heisenberg pairs $X^i$ and $P^j$, $i,j\in \{1,\dots D-2\}$, $P_z = P^{D-1}$, which commute with $P^+=(P^0 + P_z)/\sqrt{2}$, as they occur in the light cone string, act roughly on massless multiplets. The domain of their algebra is not mapped to itself by rotations, leave alone Lorentz transformations. This is true in all dimensions and makes the algebraic calculation of the critical dimension, $D=26$, of the bosonic string meaningless: in no dimension $D > 2$ does the light cone string admit a unitary representation of the Lorentz group. Massless multiplets are inconsistent with a spatial position operator $\vec X$, which generates translations of the spatial momentum.	翻訳日:2023-04-05 18:08:08 公開日:2023-04-03
# 因果カミソリ Causal Razors ( http://arxiv.org/abs/2302.10331v2 ) ライセンス: Link先を確認	Wai-yin Lam	(参考訳) 因果発見を行う場合、真の因果メカニズムが基礎となる確率分布とどのように対応しているかを仮定する必要がある。これらの仮定は、この作品において因果的なカミソリとしてラベル付けされる。文献に登場する多数のカミソリについて検討し,それらを包括的に比較した。特に,多項因果モデルにおける不人気因果関係,すなわちパラメータ最小性,および他のよく研究された因果関係を精査する。我々の論理的結果は、スコアベースのカジュアル検索アルゴリズムの適切なスコアリング基準を選択する際のジレンマとなる。 When performing causal discovery, assumptions have to be made on how the true causal mechanism corresponds to the underlying joint probability distribution. These assumptions are labeled as causal razors in this work. We review numerous causal razors that appeared in the literature, and offer a comprehensive logical comparison of them. In particular, we scrutinize an unpopular causal razor, namely parameter minimality, in multinomial causal models and its logical relations with other well-studied causal razors. Our logical result poses a dilemma in selecting a reasonable scoring criterion for score-based casual search algorithms.	翻訳日:2023-04-05 18:00:40 公開日:2023-04-03
# 潜在拡散前処理によるテキスト駆動視覚合成 Text-driven Visual Synthesis with Latent Diffusion Prior ( http://arxiv.org/abs/2302.08510v2 ) ライセンス: Link先を確認	Ting-Hsuan Liao, Songwei Ge, Yiran Xu, Yao-Chih Lee, Badour AlBahar and Jia-Bin Huang	(参考訳) テキストからの3Dオブジェクト合成や画像編集,カスタマイズ生成といった,汎用的な下流アプリケーションを可能にする拡散モデルによって駆動される大規模テキスト・画像合成は,大きな進歩を遂げている。本稿では,様々な視覚合成タスクにおいて,遅延拡散モデルを用いた画像先行処理を提案する。このようなプリエントを利用する既存のメソッドは、これらのモデルの完全な機能を使用しない。これを改善するための中核となるアイデアは 1) デコーダの異なるレイヤからの機能の損失をマッチングして詳細なガイダンスを提供する機能 2) 予測潜伏特性を規則化し, 訓練を安定させるKL分散損失。提案手法の有効性を,テキストから3D,スタイルGAN適応,階層画像編集の3つの異なるアプリケーションに示す。その結果,本手法はベースラインと良好に比較できることがわかった。 There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. Existing methods that utilize such priors fail to use these models' full capabilities. To improve this, our core ideas are 1) a feature matching loss between features from different layers of the decoder to provide detailed guidance and 2) a KL divergence loss to regularize the predicted latent features and stabilize the training. We demonstrate the efficacy of our approach on three different applications, text-to-3D, StyleGAN adaptation, and layered image editing. Extensive results show our method compares favorably against baselines.	翻訳日:2023-04-05 18:00:28 公開日:2023-04-03
# ChatGPT障害の分類的アーカイブ A Categorical Archive of ChatGPT Failures ( http://arxiv.org/abs/2302.03494v8 ) ライセンス: Link先を確認	Ali Borji	(参考訳) 大規模言語モデルは様々な分野で有用であることが示されている。 OpenAIが開発したChatGPTは、大量のデータを使って訓練され、コンテキストを理解し、適切な応答を生成することで人間の会話をシミュレートしている。幅広い質問に効果的に答える能力が、セキュリティと有用性の両方において、従来の公開チャットボットを上回っているため、大きな注目を集めている。しかし、ChatGPTの失敗の包括的分析は欠落しており、この研究の焦点となっている。推論、事実的エラー、数学、コーディング、バイアスを含む11の障害カテゴリが提示され、議論されている。 ChatGPTのリスク、制限、社会的意味も強調されている。本研究の目的は,将来の言語モデルやチャットボットの強化を支援することにある。 Large language models have been demonstrated to be valuable in different fields. ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation by comprehending context and generating appropriate responses. It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries, with fluent and comprehensive answers surpassing prior public chatbots in both security and usefulness. However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study. Eleven categories of failures, including reasoning, factual errors, math, coding, and bias, are presented and discussed. The risks, limitations, and societal implications of ChatGPT are also highlighted. The goal of this study is to assist researchers and developers in enhancing future language models and chatbots.	翻訳日:2023-04-05 17:59:27 公開日:2023-04-03
# 対比自己スーパービジョンのための補正不変学習 Amortised Invariance Learning for Contrastive Self-Supervision ( http://arxiv.org/abs/2302.12712v2 ) ライセンス: Link先を確認	Ruchika Chavhan, Henry Gouk, Jan Stuehmer, Calum Heggan, Mehrdad Yaghoobi, Timothy Hospedales	(参考訳) 対照的な自己教師付き学習法は、異なるデータ拡張に対する不変性を学習することで、高品質な転送可能表現を作り出すことで有名である。事前学習中に確立された不変性は強い帰納バイアスと解釈できる。しかし、下流タスクの不変性要件に適合するかどうかによっては、これらは役に立たないかもしれない。このことは、事前訓練中にタスク固有の不変性を学習するいくつかの試みにつながっているが、これらの手法は高度に計算集約され、訓練に手間がかかる。対照的自己管理のための無形不分散学習の概念を導入する。事前学習の段階では,特徴抽出器のパラメータ化を,表現によって符号化された不変量を制御する可変不変超パラメータで行う。そして、ダウンストリームタスクに対して、線形読み出しとタスク固有の不変条件の両方を、勾配差により効率よく、効果的に学習することができる。 ResNets や Vision Transformers などの一般的なアーキテクチャを用いた SimCLR と MoCo-v2 と,ResNet-18 を用いた SimCLR という2つの異なる手法を用いて, 視覚と音声の2つの異なる相違点を比較検討した。我々は、一つの機能を使用し、タスク固有の事前学習を避けながら、異なる不変条件で多様な下流タスクを学習する信頼性の高い方法を提供することを示す。これは、汎用表現学習の分野での新しい地平を開くエキサイティングな視点を提供する。 Contrastive self-supervised learning methods famously produce high quality transferable representations by learning invariances to different data augmentations. Invariances established during pre-training can be interpreted as strong inductive biases. However these may or may not be helpful, depending on if they match the invariance requirements of downstream tasks or not. This has led to several attempts to learn task-specific invariances during pre-training, however, these methods are highly compute intensive and tedious to train. We introduce the notion of amortised invariance learning for contrastive self supervision. In the pre-training stage, we parameterize the feature extractor by differentiable invariance hyper-parameters that control the invariances encoded by the representation. Then, for any downstream task, both linear readout and task-specific invariance requirements can be efficiently and effectively learned by gradient-descent. We evaluate the notion of amortised invariances for contrastive learning over two different modalities: vision and audio, on two widely-used contrastive learning methods in vision: SimCLR and MoCo-v2 with popular architectures like ResNets and Vision Transformers, and SimCLR with ResNet-18 for audio. We show that our amortised features provide a reliable way to learn diverse downstream tasks with different invariance requirements, while using a single feature and avoiding task-specific pre-training. This provides an exciting perspective that opens up new horizons in the field of general purpose representation learning.	翻訳日:2023-04-05 17:49:39 公開日:2023-04-03
# AHPにおける安全な判断集約に向けて Towards secure judgments aggregation in AHP ( http://arxiv.org/abs/2303.15099v2 ) ライセンス: Link先を確認	Konrad Ku{\l}akowski and Jacek Szybowski and Jiri Mazurek and Sebastian Ernst	(参考訳) 意思決定においては、専門家が正直で専門家であると仮定するのが一般的である。しかし、グループ分析階層プロセス(GAHP)のようなグループ決定フレームワークの1つ以上の専門家が、結果の操作を好意的に行おうとする場合は、そうではない。本研究の目的は,GAHPに2つのヒューリスティックスを導入し,加重を小さくすることでマニピュレータの検出とグループコンセンサスへの影響を最小化することである。最初のヒューリスティックは、マニピュレータがグループの他の専門家の判断に対して外れ値と見なすことができる判断を提供するという仮定に基づいている。第二のヒューリスティックは、不正直な判断はグループの平均的な一貫性よりも一貫性が低いと仮定する。どちらのアプローチも数値的な例とシミュレーションで示される。 In decision-making methods, it is common to assume that the experts are honest and professional. However, this is not the case when one or more experts in the group decision making framework, such as the group analytic hierarchy process (GAHP), try to manipulate results in their favor. The aim of this paper is to introduce two heuristics in the GAHP, setting allowing to detect the manipulators and minimize their effect on the group consensus by diminishing their weights. The first heuristic is based on the assumption that manipulators will provide judgments which can be considered outliers with respect to those of the rest of the experts in the group. The second heuristic assumes that dishonest judgments are less consistent than the average consistency of the group. Both approaches are illustrated with numerical examples and simulations.	翻訳日:2023-04-05 17:33:04 公開日:2023-04-03
# 質問に答えるにはどのような質問が必要か? AskReddit 質問のケーススタディ What Types of Questions Require Conversation to Answer? A Case Study of AskReddit Questions ( http://arxiv.org/abs/2303.17710v2 ) ライセンス: Link先を確認	Shih-Hong Huang, Chieh-Yang Huang, Ya-Fang Lin, Ting-Hao 'Kenneth' Huang	(参考訳) チャットボット、音声対話システム、スマートスピーカーなどの自動会話システムの普及は、現代のデジタル生活に大きな影響を与えている。しかし、これらのシステムは、ユーザが複雑な不明確な質問を探索するのを支援するのではなく、よく定義された質問に対する回答を提供するように設計されている。本稿では,会話を通じて最も答えられる不明瞭でオープンな質問のタイプを調べることにより,会話システムの境界を押し上げることを目的とする。最初にAskRedditに投稿された100万件のオープンエンドリクエストから500件の質問をサンプリングし、オンラインの群衆労働者を雇い、これらの質問について8つの質問に答えた。また、オープンコーディングを行い、質問を27の異なる領域に分類しました。人々が十分解決するために会話を必要とすると考える問題は、高度に社会的で個人的なものであることが分かりました。私たちの研究は、将来の研究がどのようにユーザのニーズに合わせることができるかについての洞察を提供する。 The proliferation of automated conversational systems such as chatbots, spoken-dialogue systems, and smart speakers, has significantly impacted modern digital life. However, these systems are primarily designed to provide answers to well-defined questions rather than to support users in exploring complex, ill-defined questions. In this paper, we aim to push the boundaries of conversational systems by examining the types of nebulous, open-ended questions that can best be answered through conversation. We first sampled 500 questions from one million open-ended requests posted on AskReddit, and then recruited online crowd workers to answer eight inquiries about these questions. We also performed open coding to categorize the questions into 27 different domains. We found that the issues people believe require conversation to resolve satisfactorily are highly social and personal. Our work provides insights into how future research could be geared to align with users' needs.	翻訳日:2023-04-05 17:23:15 公開日:2023-04-03
# CQMに基づく組合せ問題の解法と薬物設計への応用 A CQM-based approach to solving a combinatorial problem with applications in drug design ( http://arxiv.org/abs/2303.15419v2 ) ライセンス: Link先を確認	B. Maurice Benson, Victoria M. Ingman, Abhay Agarwal, Shahar Keinan	(参考訳) D-WaveのLeap Hybrid solverの使用は、Knapsack最適化問題の解決において、ダイナーの制約に合う固定メニューから食事の組み合わせを見つけることで実証されている。これは、最適化問題をCQM(Constrained Quadratic Model)として初めて定式化し、量子アニーラーに送信することで実現される。ここでは、必要なステップと実装されたコードを強調し、ChickenとWaffleのレストランメニューからのソリューションを提供します。さらに、このモデルがどのように一般化され、多くの複雑でしばしば矛盾する構造と性質の制約のある大きな探索空間内で最適な薬物分子を見つけるかについて議論する。 The use of D-Wave's Leap Hybrid solver is demonstrated here in solving a Knapsack optimization problem: finding meal combinations from a fixed menu that fit a diner's constraints. This is done by first formulating the optimization problem as a Constrained Quadratic Model (CQM) and then submitting it to a quantum annealer. We highlight here the steps needed, as well as the implemented code, and provide solutions from a Chicken and Waffle restaurant menu. Additionally, we discuss how this model may be generalized to find optimal drug molecules within a large search space with many complex, and often contradictory, structures and property constraints.	翻訳日:2023-04-05 17:20:38 公開日:2023-04-03
# メンタルヘルス記録テキストにおける痛みの症状の特定 : 自然言語処理アプローチ Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach ( http://arxiv.org/abs/2304.01240v1 ) ライセンス: Link先を確認	Jaya Chaturvedi, Sumithra Velupillai, Robert Stewart, Angus Roberts	(参考訳) 痛みは医療資源にアクセスする一般的な理由であり、特に精神的な健康と重なる研究領域が増加している。メンタルヘルスの電子健康記録は、この重複を研究する良いデータ源である。しかし、痛みに関する多くの情報はこれらの記録の自由なテキストに保持されており、痛みに関する言及はあいまいな性質のため、独特の自然言語処理の問題をもたらす。このプロジェクトは匿名のメンタルヘルス電子健康記録データベースからのデータを利用する。データは、機械学習に基づく分類アルゴリズムを訓練し、患者の痛みについて議論するか否かを分類する。これにより、大きなデータベースから関連する痛み情報を抽出し、痛みとメンタルヘルスのさらなる研究にそのようなアウトプットを使用することが容易になる。 1,985の文書は、3つの一般的な分類アルゴリズムを訓練するために使用されるゴールドスタンダードトレーニングデータを作成するために手動で3重注釈付けされた。最高のパフォーマンスモデルはF1スコアが0.98(95% CI 0.98-0.99)に達した。 Pain is a common reason for accessing healthcare resources and is a growing area of research, especially in its overlap with mental health. Mental health electronic health records are a good data source to study this overlap. However, much information on pain is held in the free text of these records, where mentions of pain present a unique natural language processing problem due to its ambiguous nature. This project uses data from an anonymised mental health electronic health records database. The data are used to train a machine learning based classification algorithm to classify sentences as discussing patient pain or not. This will facilitate the extraction of relevant pain information from large databases, and the use of such outputs for further studies on pain and mental health. 1,985 documents were manually triple-annotated for creation of gold standard training data, which was used to train three commonly used classification algorithms. The best performing model achieved an F1-score of 0.98 (95% CI 0.98-0.99).	翻訳日:2023-04-05 17:05:17 公開日:2023-04-03
# 循環型ドメインシフトの連続学習によるオンライン蒸留 Online Distillation with Continual Learning for Cyclic Domain Shifts ( http://arxiv.org/abs/2304.01239v1 ) ライセンス: Link先を確認	Joachim Houyon, Anthony Cioppa, Yasir Ghunaim, Motasem Alfarra, Ana\"is Halin, Maxim Henry, Bernard Ghanem, Marc Van Droogenbroeck	(参考訳) 近年、オンライン蒸留は、遅いが正確な教師モデルを用いてリアルタイムでディープニューラルネットワークを適用するための強力な技術として出現している。しかし、オンライン蒸留における大きな課題は、学生モデルが新しいドメインのデータで更新され、それまでの知識を忘れたときに生じるドメインシフトが破滅的なことを忘れることである。本稿では,ドメインシフトの影響を低減するために連続学習手法のパワーを活用することで,この問題に対する解決策を提案する。具体的には, オンライン蒸留の文脈において, 最先端の連続学習手法をいくつか統合し, 破滅的放棄の低減効果を実証する。さらに, 環状領域シフトの場合には, 提案する解の詳細な解析を行う。実験により, オンライン蒸留の堅牢性と精度向上に対する我々のアプローチの有効性を実証し, ビデオ監視や自律運転といった分野への応用の可能性を示した。全体として、われわれの研究はオンライン蒸留と継続的学習の分野における重要な一歩であり、現実世界のアプリケーションに大きな影響を与える可能性がある。 In recent years, online distillation has emerged as a powerful technique for adapting real-time deep neural networks on the fly using a slow, but accurate teacher model. However, a major challenge in online distillation is catastrophic forgetting when the domain shifts, which occurs when the student model is updated with data from the new domain and forgets previously learned knowledge. In this paper, we propose a solution to this issue by leveraging the power of continual learning methods to reduce the impact of domain shifts. Specifically, we integrate several state-of-the-art continual learning methods in the context of online distillation and demonstrate their effectiveness in reducing catastrophic forgetting. Furthermore, we provide a detailed analysis of our proposed solution in the case of cyclic domain shifts. Our experimental results demonstrate the efficacy of our approach in improving the robustness and accuracy of online distillation, with potential applications in domains such as video surveillance or autonomous driving. Overall, our work represents an important step forward in the field of online distillation and continual learning, with the potential to significantly impact real-world applications.	翻訳日:2023-04-05 17:05:00 公開日:2023-04-03
# Spam-T5: メールスパム検出のための大規模言語モデルのベンチマーク Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection ( http://arxiv.org/abs/2304.01238v1 ) ライセンス: Link先を確認	Maxime Labonne and Sean Moran	(参考訳) 本稿では,メールスパム検出における大規模言語モデル (LLM) の有効性について,BERT-like, Sentence Transformers, Seq2Seq の3家系の著名なモデルを比較検討した。さらに,Na\"ive Bayes や LightGBM などのスパム検出のための機械学習手法をベースライン手法として検討した。 4つの公開データセットにまたがってこれらのモデルの性能を評価し、異なる数のトレーニングサンプル(フルトレーニングセットと数ショット設定)を利用する。その結果,ほとんどのケースでllmが一般的なベースライン技術,特に少数のシナリオのパフォーマンスを上回っていることが明らかとなった。この適応性は、ラベル付きサンプルの数に制限があり、モデルは頻繁な更新を必要とするスパム検出タスクに特有のLLMをレンダリングする。さらに,eメールのスパム検出に特化・微調整されたflan-t5モデルについても紹介する。以上の結果から,Spam-T5 がベースラインモデルや他の LLM をはるかに上回っていることが明らかとなった。私たちのコードはhttps://github.com/jpmorganchase/emailspamdetectionで公開されています。 This paper investigates the effectiveness of large language models (LLMs) in email spam detection by comparing prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. Additionally, we examine well-established machine learning techniques for spam detection, such as Na\"ive Bayes and LightGBM, as baseline methods. We assess the performance of these models across four public datasets, utilizing different numbers of training samples (full training set and few-shot settings). Our findings reveal that, in the majority of cases, LLMs surpass the performance of the popular baseline techniques, particularly in few-shot scenarios. This adaptability renders LLMs uniquely suited to spam detection tasks, where labeled samples are limited in number and models require frequent updates. Additionally, we introduce Spam-T5, a Flan-T5 model that has been specifically adapted and fine-tuned for the purpose of detecting email spam. Our results demonstrate that Spam-T5 surpasses baseline models and other LLMs in the majority of scenarios, particularly when there are a limited number of training samples available. Our code is publicly available at https://github.com/jpmorganchase/emailspamdetection.	翻訳日:2023-04-05 17:04:43 公開日:2023-04-03
# ADMG Causal Data Augmentation の実用化のためのガイドライン A Guide for Practical Use of ADMG Causal Data Augmentation ( http://arxiv.org/abs/2304.01237v1 ) ライセンス: Link先を確認	Poinsot Audrey, Leite Alessandro	(参考訳) 小規模データレジームに機械学習を適用する場合、データ拡張は不可欠である。観測されたデータ分布に従って新しいサンプルを生成し、その多様性と多様性を高め、研究者や実践者がモデルの堅牢性を改善し、現実世界にデプロイするのに役立つ。それでも、基盤となるデータメカニズムに関する事前の知識がほとんど考慮されず、生成されたデータの忠実さと多様性が制限されるため、表形式のデータでの使用は改善される必要がある。因果グラフにエンコードされた条件付き独立性に依存することにより、これらの課題に対処するための解決策として因果的データ拡張戦略が指摘されている。本稿では,ADMGの因果拡大手法を実験的に分析し,事前知識が新たなデータポイントの生成に役立っているかを理解する上で,研究者や実践者を支援するために異なる設定を考慮に入れた。その結果,研究手法が注目された。 (a) 基礎となるモデル機構とは独立である。 (b) MLモデルの精度を向上させるために、小さなデータ構造において困難となる最小限の観測値を必要とする。 (c)モデルの性能を低下させる拡張集合に異常値を伝達し、 (d)はハイパーパラメータの値に敏感である。 Data augmentation is essential when applying Machine Learning in small-data regimes. It generates new samples following the observed data distribution while increasing their diversity and variability to help researchers and practitioners improve their models' robustness and, thus, deploy them in the real world. Nevertheless, its usage in tabular data still needs to be improved, as prior knowledge about the underlying data mechanism is seldom considered, limiting the fidelity and diversity of the generated data. Causal data augmentation strategies have been pointed out as a solution to handle these challenges by relying on conditional independence encoded in a causal graph. In this context, this paper experimentally analyzed the ADMG causal augmentation method considering different settings to support researchers and practitioners in understanding under which conditions prior knowledge helps generate new data points and, consequently, enhances the robustness of their models. The results highlighted that the studied method (a) is independent of the underlying model mechanism, (b) requires a minimal number of observations that may be challenging in a small-data regime to improve an ML model's accuracy, (c) propagates outliers to the augmented set degrading the performance of the model, and (d) is sensitive to its hyperparameter's value.	翻訳日:2023-04-05 17:04:21 公開日:2023-04-03
# CONVolutional AttENTION(ConvEntion)を用いた天文画像時系列分類 Astronomical image time series classification using CONVolutional attENTION (ConvEntion) ( http://arxiv.org/abs/2304.01236v1 ) ライセンス: Link先を確認	Anass Bairouk, Marc Chaumont, Dominique Fouchez, Jerome Paquet, Fr\'ed\'eric Comby, Julian Bautista	(参考訳) 狙いだ近年,天文画像の時系列処理が注目されている。実際、過渡的な天体に関する多くの調査が進行中か建設中であり、例えばヴェラ・ルービン天文台の宇宙と時間に関するレガシーサーベイ (LSST) は、これらの時系列を大量に生成することを目指している。関連する科学的トピックは、我々の銀河内の天体の研究から、宇宙の膨張を測定するための最も遠い超新星の観測まで幅広い。膨大な量のデータが得られるため、天体を検知し分類する堅牢な自動ツールの必要性は着実に高まっている。メソッド。この研究は、天体画像が光度曲線よりも多くの情報を含んでいるという仮定に基づいている。本稿では,画像を用いて異なる種類の空間オブジェクトを分類するための深層学習に基づく新しい手法を提案する。われわれはConvEntionと命名し,ConVolutional attENTIONの略とした。これはコンボリューションとトランスフォーマーに基づいており、天文学的な画像時系列を扱うための新しいアプローチである。我々のソリューションは時空間的特徴を統合し、様々な種類の画像データセットに適用できる。結果だ本研究では,データセットが苦しむ様々な問題を解き,天文学的画像時系列を用いた分類において,画像時系列を用いた最新手法と比較し,光曲線を用いたアプローチと比較して13%の精度で分類する新しい結果を示す。 Aims. The treatment of astronomical image time series has won increasing attention in recent years. Indeed, numerous surveys following up on transient objects are in progress or under construction, such as the Vera Rubin Observatory Legacy Survey for Space and Time (LSST), which is poised to produce huge amounts of these time series. The associated scientific topics are extensive, ranging from the study of objects in our galaxy to the observation of the most distant supernovae for measuring the expansion of the universe. With such a large amount of data available, the need for robust automatic tools to detect and classify celestial objects is growing steadily. Methods. This study is based on the assumption that astronomical images contain more information than light curves. In this paper, we propose a novel approach based on deep learning for classifying different types of space objects directly using images. We named our approach ConvEntion, which stands for CONVolutional attENTION. It is based on convolutions and transformers, which are new approaches for the treatment of astronomical image time series. Our solution integrates spatio-temporal features and can be applied to various types of image datasets with any number of bands. Results. In this work, we solved various problems the datasets tend to suffer from and we present new results for classifications using astronomical image time series with an increase in accuracy of 13%, compared to state-of-the-art approaches that use image time series, and a 12% increase, compared to approaches that use light curves.	翻訳日:2023-04-05 17:04:01 公開日:2023-04-03
# グラフマルコフニューラルネットワークの公正評価 Fair Evaluation of Graph Markov Neural Networks ( http://arxiv.org/abs/2304.01235v1 ) ライセンス: Link先を確認	Pirmin Lemberger and Antoine Saillenfest	(参考訳) グラフマルコフニューラルネットワーク(GMNN)は、半教師付きノード分類タスクにラベル依存を含めることで、正規グラフニューラルネットワーク(GNN)を改善するために最近提案されている。 GMNNは理論的に原理的にこれを行い、3種類の情報を使ってラベルを予測する。通常のgnnと同じように、ノードの特徴とグラフ構造を使用するが、さらに隣のノードのラベルの情報を活用して、予測の精度を向上させる。本稿では,wikipediaの記事を32のカテゴリに分類し,2.3mのエッジで接続した48kの相互参照のグラフを含む,wikivitalsという新しいデータセットを提案する。本研究の目的は, GMNNの予測精度, 記事の内容, 相互関係, ラベル間の相関の3つの情報ソースの寄与度を厳格に評価することである。そこで本研究では,分割に対する適切なランダム化とモデル選択とモデル評価の明確な分離を用いて,GNN性能の公正比較を行う手法を提案する。 Graph Markov Neural Networks (GMNN) have recently been proposed to improve regular graph neural networks (GNN) by including label dependencies into the semi-supervised node classification task. GMNNs do this in a theoretically principled way and use three kinds of information to predict labels. Just like ordinary GNNs, they use the node features and the graph structure but they moreover leverage information from the labels of neighboring nodes to improve the accuracy of their predictions. In this paper, we introduce a new dataset named WikiVitals which contains a graph of 48k mutually referred Wikipedia articles classified into 32 categories and connected by 2.3M edges. Our aim is to rigorously evaluate the contributions of three distinct sources of information to the prediction accuracy of GMNN for this dataset: the content of the articles, their connections with each other and the correlations among their labels. For this purpose we adapt a method which was recently proposed for performing fair comparisons of GNN performance using an appropriate randomization over partitions and a clear separation of model selection and model assessment.	翻訳日:2023-04-05 17:03:39 公開日:2023-04-03
# ポテンシャル場源面(PFSS)磁気グラムへの畳み込みニューラルネットワークの適用による太陽風速の予測 Prediction of solar wind speed by applying convolutional neural network to potential field source surface (PFSS) magnetograms ( http://arxiv.org/abs/2304.01234v1 ) ライセンス: Link先を確認	Rong Lin, Zhekai Luo, Jiansen He, Lun Xie, Chuanpeng Hou, Shuwei Chen	(参考訳) 正確な太陽風速モデルは、宇宙天気予報、破滅的なイベント警告、太陽風に関するその他の問題、磁気圏相互作用に重要である。本研究では,太陽-地球系のラグランジュ1(L1)点における太陽風速の予測を目的とした,畳み込み型ニューラルネットワーク (CNN) と電位場源面 (PFSS) に基づくモデルを構築し,太陽風源面の$R_{\rm SS}=2.5R_\odot$を考慮した。このモデルの入力は4つのポテンシャル磁場源表面(pfss)磁図からなり、r_{\rm ss}$ はターゲットエポックの4日前の7, 6, 5, 4日である。還元磁図はモデルの効率を高めるために使われる。我々は、GONG(Global Oscillation Network Group)光球磁気グラムと電位場外挿モデルを用いて、ソース表面でPFSS磁気グラムを生成する。このモデルは、データの時間分解能を1時間に抑えた8倍の検証訓練スキームにおいて、平均相関係数0.52と根平均二乗誤差80.8km/sの連続テストデータセットの予測を提供する。モデルはまた、太陽風の高速流れを予測する可能性があり、一般的な脅威スコア 0.39 で定量化することができる。 An accurate solar wind speed model is important for space weather predictions, catastrophic event warnings, and other issues concerning solar wind - magnetosphere interaction. In this work, we construct a model based on convolutional neural network (CNN) and Potential Field Source Surface (PFSS) magnetograms, considering a solar wind source surface of $R_{\rm SS}=2.5R_\odot$, aiming to predict the solar wind speed at the Lagrange 1 (L1) point of the Sun-Earth system. The input of our model consists of four Potential Field Source Surface (PFSS) magnetograms at $R_{\rm SS}$, which are 7, 6, 5, and 4 days before the target epoch. Reduced magnetograms are used to promote the model's efficiency. We use the Global Oscillation Network Group (GONG) photospheric magnetograms and the potential field extrapolation model to generate PFSS magnetograms at the source surface. The model provides predictions of the continuous test dataset with an averaged correlation coefficient (CC) of 0.52 and a root mean square error (RMSE) of 80.8 km/s in an eight-fold validation training scheme with the time resolution of the data as small as one hour. The model also has the potential to forecast high speed streams of the solar wind, which can be quantified with a general threat score of 0.39.	翻訳日:2023-04-05 17:03:22 公開日:2023-04-03
# 救急部門におけるアウトカム予測のためのマルチモーダル知覚言語モデル Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department ( http://arxiv.org/abs/2304.01233v1 ) ライセンス: Link先を確認	Sabri Boughorbel, Fethi Jarray, Abdulaziz Al Homaid, Rashid Niaz, Khalid Alyafei	(参考訳) 言語モデリングは、高い精度と高いセマンティックコヒーレンスで魅力的なテキストを生成するという驚くべき進歩を示している。興味深い研究の方向性は、コンテキスト情報を用いた特定のアプリケーションのためのこれらの強力なモデルを強化することである。本稿では,医療アプリケーションのためのマルチモーダル言語モデリングについて検討する。主訴のテキスト情報とトリアージで記録されたバイタルサインに基づいて, 病院救急部門における結果予測と患者トリアージに関心がある。我々は、いくつかのアプリケーションで有望な結果を示すモダリティに依存しないトランスフォーマーベースのモデルであるPerceiverを適応する。バイタル符号のモダリティは表形式で表されるため,置換不変性を保証するために知覚器位置符号化を改良した。 120Kの訪問でMIMIC-IV EDデータセットを用いた診断コード予測のためのマルチモーダル言語モデルの評価を行った。実験分析では,テキストやバイタルサインのみに基づいて学習したモデルと比較して,ミュータリモダリティが予測性能を向上させることを示した。マルチモダリティがパフォーマンス向上に繋がる疾患カテゴリーを特定し,これらのカテゴリにおいて,重要な兆候が予測力を増したことを示す。クロスアテンション層を解析することにより、マルチモーダリティがモデル予測にどのように貢献するかを示す。この研究は、医療アプリケーションのためのマルチモーダル言語モデルの開発に関する興味深い洞察を与える。 Language modeling have shown impressive progress in generating compelling text with good accuracy and high semantic coherence. An interesting research direction is to augment these powerful models for specific applications using contextual information. In this work, we explore multi-modal language modeling for healthcare applications. We are interested in outcome prediction and patient triage in hospital emergency department based on text information in chief complaints and vital signs recorded at triage. We adapt Perceiver - a modality-agnostic transformer-based model that has shown promising results in several applications. Since vital-sign modality is represented in tabular format, we modified Perceiver position encoding to ensure permutation invariance. We evaluated the multi-modal language model for the task of diagnosis code prediction using MIMIC-IV ED dataset on 120K visits. In the experimental analysis, we show that mutli-modality improves the prediction performance compared with models trained solely on text or vital signs. We identified disease categories for which multi-modality leads to performance improvement and show that for these categories, vital signs have added predictive power. By analyzing the cross-attention layer, we show how multi-modality contributes to model predictions. This work gives interesting insights on the development of multi-modal language models for healthcare applications.	翻訳日:2023-04-05 17:02:53 公開日:2023-04-03
# フロッケ符号における量子セルオートマトンと異常の測定 Measurement Quantum Cellular Automata and Anomalies in Floquet Codes ( http://arxiv.org/abs/2304.01277v1 ) ライセンス: Link先を確認	David Aasen, Jeongwan Haah, Zhi Li, Roger S. K. Mong	(参考訳) パウリ測定回路における量子情報の進化について検討する。本稿では,最近導入されたFloquetトポロジカルコードに関連する1次元および2次元システムについて述べる。測定回路の文脈で局所可逆性を定義し, 同様の足場上の有限深度計測回路を有限深度ユニタリ回路に扱えるようにした。ユニタリの場合とは対照的に、有限深さ局所可逆測定列は1次元の変換を実装できる。 2次元の局所可逆測定列は、境界に沿って論理情報の流れを誘導することもある。本稿では,これらの概念を統一し,論理演算子のフローを特徴づける指標を1次元で定義する「測定量子セルオートマトン」を提案する。 Floquet位相符号の$\mathbb{Z}_2$バルク不変量は、自明な境界を持つことの障害を示す。我々は、Hastings-Haah ハニカム符号がそのような障害のあるクラスに属することを証明し、任意の境界は非局所力学、周期倍、あるいは量子情報の境界フローを持つ必要があることを意味する。 We investigate the evolution of quantum information under Pauli measurement circuits. We focus on the case of one- and two-dimensional systems, which are relevant to the recently introduced Floquet topological codes. We define local reversibility in context of measurement circuits, which allows us to treat finite depth measurement circuits on a similar footing to finite depth unitary circuits. In contrast to the unitary case, a finite depth locally reversible measurement sequence can implement a translation in one dimension. A locally reversible measurement sequence in two dimensions may also induce a flow of logical information along the boundary. We introduce "measurement quantum cellular automata" which unifies these ideas and define an index in one dimension to characterize the flow of logical operators. We find a $\mathbb{Z}_2$ bulk invariant for Floquet topological codes which indicates an obstruction to having a trivial boundary. We prove that the Hastings-Haah honeycomb code belong to a class with such obstruction, which means that any boundary must have either non-local dynamics, period doubled, or admits boundary flow of quantum information.	翻訳日:2023-04-05 16:56:34 公開日:2023-04-03
# ノイズ量子電池からの作業抽出過程--非局所的資源の役割 Work extraction processes from noisy quantum batteries: the role of non local resources ( http://arxiv.org/abs/2304.01270v1 ) ライセンス: Link先を確認	Salvatore Tirone, Raffaele Salvia, Stefano Chessa and Vittorio Giovannetti	(参考訳) 量子バッテリモデルからの作業抽出における環境騒音の悪影響を緩和するために,非局所操作で得られる有益効果と非局所状態との非対称性を示す。具体的には、ノイズ動作後の非局所回復操作を用いることで、一般に、分離可能な(非絡み合った)入力状態であっても、バッテリから回復できる作業量を増やすことができることを示す。逆に、局所回復操作で絡み合った入力状態を採用すると、一般的にバッテリー性能は向上しない。 We demonstrate an asymmetry between the beneficial effects one can obtain using non-local operations and non-local states to mitigate the detrimental effects of environmental noise in the work extraction from quantum battery models. Specifically, we show that using non-local recovery operations after the noise action can in general increase the amount of work one can recover from the battery even with separable (i.e. non entangled) input states. On the contrary, employing entangled input states with local recovery operations will not generally improve the battery performances.	翻訳日:2023-04-05 16:56:17 公開日:2023-04-03
# ランダム量子回路における情報流の動的相転移 Dynamical phase transitions of information flow in random quantum circuits ( http://arxiv.org/abs/2304.01256v1 ) ライセンス: Link先を確認	J.-Z. Zhuang, Y.-K. Wu, L.-M. Duan	(参考訳) 本研究では,ランダムクリフォード量子回路に支配される多体力学における情報の流れを解明し,この情報の流れにおけるリッチな相転移を探索する。情報フローダイナミクスの位相遷移点と臨界指数は、有限サイズスケーリングによってよく確立される。位相遷移が、情報の位置や最終プローブ領域とどのように異なるかを調べ、これらの遷移の中でユビキタスな振る舞いを見つけ、この量子多体モデルにおける情報伝播と揺らぎに関する興味深い特性を明らかにする。古典情報と量子情報のフローは、それぞれホレボとコヒーレント情報によって測定され、同様の動的相転移挙動を示す。我々の研究は、情報フローが大規模システムの様々な相転移を伴う豊富な挙動を持つことを示し、その研究は量子多体ダイナミクスの理解に新たな光を当てている。 We study how the information flows in the many-body dynamics governed by random Clifford quantum circuits and discover a rich set of dynamical phase transitions in this information flow. The phase transition points and the critical exponents for the information flow dynamics are well established through the finite-size scaling. We investigate how the phase transitions vary with the initial position where the information is located and the final probe region, and find ubiquitous behaviors in these transitions, revealing interesting properties about the information propagation and scrambling in this quantum many-body model. The flow of both classical and quantum information, measured respectively by Holevo and coherent information, show similar dynamical phase transition behaviors. Our work shows that the information flow has rich behaviors with various phase transitions for large systems and its study sheds new light on understanding of quantum many-body dynamics.	翻訳日:2023-04-05 16:56:08 公開日:2023-04-03
# 準周期駆動量子ビットによる巨大エネルギー振動 Giant energy oscillations mediated by a quasiperiodically driven qubit ( http://arxiv.org/abs/2304.01254v1 ) ライセンス: Link先を確認	Dominik Vuina, David M. Long, Phillip J.D. Crowley, Anushya Chandran	(参考訳) 2つの非共振周波数で駆動される量子ビットは、断熱限界における量子化された平均エネルギー電流を媒介することができる。非断熱過程は、駆動間で伝達されるネットエネルギーのエネルギー電流と対応する振動の反転をもたらすことを示す。振動は有界だが巨大で、量子ビットのエネルギー分割よりもはるかに大きい。ランダウ・ツェナー解析は、振動の時間スケールがドライブの周期で指数関数的に大きいと予測する。しかし, 数値解析により, この時間スケールは周期の単調関数ではなく, 断熱限界が近づくにつれてサブ構造が増加することが明らかとなった。この非単調性は、その後のランダウ・ツェナー遷移の干渉効果から生じる。巨大エネルギー振動は、窒素空洞中心での短期実験で観測可能である。 A qubit driven by two incommensurate frequencies can mediate a quantised average energy current in the adiabatic limit. We show that non-adiabatic processes result in reversals of the energy current and corresponding oscillations in the net energy transferred between the drives. The oscillations are bounded but giant -- much larger than the qubit energy splitting. A Landau-Zener analysis predicts that the timescale of the oscillations is exponentially large in the period of the drives. However, numerical analysis reveals that this timescale is not a monotonic function of the period, and has increasing sub-structure as the adiabatic limit is approached. We show that this non-monotonicity arises from interference effects between subsequent Landau-Zener transitions. Giant energy oscillations should be observable in near-term experiments with nitrogen-vacancy centers.	翻訳日:2023-04-05 16:55:53 公開日:2023-04-03
# 一元化画像の再生と拡張に先立つ生成拡散 Generative Diffusion Prior for Unified Image Restoration and Enhancement ( http://arxiv.org/abs/2304.01247v1 ) ライセンス: Link先を確認	Ben Fei, Zhaoyang Lyu, Liang Pan, Junzhe Zhang, Weidong Yang, Tianyue Luo, Bo Zhang, Bo Dai	(参考訳) 既存の画像復元法は主に自然画像の後方分布を利用する。しかし、彼らはしばしば既知の劣化を仮定し、複雑な実アプリケーションへの適応を制限するために教師付きトレーニングも要求する。本研究では,非教師付きサンプリング方式で後部分布を効果的にモデル化する生成拡散事前(GDP)を提案する。 GDPは、線形逆問題、非線形問題、盲目の問題を解決するために、プレトレインデノナイジング拡散生成モデル(DDPM)を利用する。具体的には、GDPは、一般的なガイダンス方法よりも実用的な条件付きガイダンスのプロトコルを体系的に探求する。さらに、GDPは、デノナイジング過程における劣化モデルのパラメータを最適化し、ブラインド画像復元を達成するのに長けている。さらに、階層的なガイダンスとパッチベースの手法を考案し、GDPが任意の解像度の画像を生成することを可能にする。実験では,低照度化やHDR画像回復などの非線形・盲点問題だけでなく,高分解能,色調,彩色などの線形問題に対する複数の画像データセットに対するGDPの汎用性を実証した。 GDPは、再建品質と知覚品質の様々なベンチマークにおいて、現在指導されていない主要な手法よりも優れています。さらに、GDPは、ImageNetトレーニングセットの分布から、様々なタスクから任意のサイズで、自然画像や合成画像に対してよく一般化する。 Existing image restoration methods mostly leverage the posterior distribution of natural images. However, they often assume known degradation and also require supervised training, which restricts their adaptation to complex real applications. In this work, we propose the Generative Diffusion Prior (GDP) to effectively model the posterior distributions in an unsupervised sampling manner. GDP utilizes a pre-train denoising diffusion generative model (DDPM) for solving linear inverse, non-linear, or blind problems. Specifically, GDP systematically explores a protocol of conditional guidance, which is verified more practical than the commonly used guidance way. Furthermore, GDP is strength at optimizing the parameters of degradation model during the denoising process, achieving blind image restoration. Besides, we devise hierarchical guidance and patch-based methods, enabling the GDP to generate images of arbitrary resolutions. Experimentally, we demonstrate GDP's versatility on several image datasets for linear problems, such as super-resolution, deblurring, inpainting, and colorization, as well as non-linear and blind issues, such as low-light enhancement and HDR image recovery. GDP outperforms the current leading unsupervised methods on the diverse benchmarks in reconstruction quality and perceptual quality. Moreover, GDP also generalizes well for natural images or synthesized images with arbitrary sizes from various tasks out of the distribution of the ImageNet training set.	翻訳日:2023-04-05 16:55:42 公開日:2023-04-03
# 大規模言語モデルにおける安全性分析:ChatGPTを用いたSTPAの事例 Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT ( http://arxiv.org/abs/2304.01246v1 ) ライセンス: Link先を確認	Yi Qi, Xingyu Zhao, Xiaowei Huang	(参考訳) ChatGPTやBERTといった大規模言語モデル(LLM)は、多くの知識領域にわたる詳細な回答を備えた人間のような会話によって、新たなAI熱波を導いている。 LLMは多くのAIアプリケーションドメインに迅速に適用されていますが、私たちは次のような質問に興味を持っています。本稿では,ChatGPTを用いた自動緊急ブレーキ(AEB)システムにおけるシステム理論プロセス解析(STPA)の事例研究を行う。リスク分析において最も普及している技術の一つであるSTPAは,高い複雑性や主観性といった限界があることが知られており,本論文はChatGPTを用いて対処することを目的としている。具体的には、ChatGPTをSTPAに組み込む3つの方法について、ヒトの専門家との相互作用を考慮し検討した。比較の結果は一人間の専門家の介入なしにChatGPTを使用することは、LCMの信頼性及び精度の問題により不十分である。 (ii)ChatGPTと人間の専門家との相互作用がより良くなり、そして 3)STPAにおけるChatGPTの使用は,既存の比較手法をベースラインに再利用することにより,ヒトの安全専門家を単独で上回りうる。安全分析にLSMを適用しようとする試みに加えて,今後の研究に向けた重要な課題(LCMの信頼性に関する懸念,標準化の必要性など)も挙げる。 Large Language Models (LLMs), such as ChatGPT and BERT, are leading a new AI heatwave due to its human-like conversations with detailed and articulate answers across many domains of knowledge. While LLMs are being quickly applied to many AI application domains, we are interested in the following question: Can safety analysis for safety-critical systems make use of LLMs? To answer, we conduct a case study of Systems Theoretic Process Analysis (STPA) on Automatic Emergency Brake (AEB) systems using ChatGPT. STPA, one of the most prevalent techniques for hazard analysis, is known to have limitations such as high complexity and subjectivity, which this paper aims to explore the use of ChatGPT to address. Specifically, three ways of incorporating ChatGPT into STPA are investigated by considering its interaction with human experts: one-off simplex interaction, recurring simplex interaction, and recurring duplex interaction. Comparative results reveal that: (i) using ChatGPT without human experts' intervention can be inadequate due to reliability and accuracy issues of LLMs; (ii) more interactions between ChatGPT and human experts may yield better results; and (iii) using ChatGPT in STPA with extra care can outperform human safety experts alone, as demonstrated by reusing an existing comparison method with baselines. In addition to making the first attempt to apply LLMs in safety analysis, this paper also identifies key challenges (e.g., trustworthiness concern of LLMs, the need of standardisation) for future research in this direction.	翻訳日:2023-04-05 16:55:03 公開日:2023-04-03
# 自律型サイバーエージェントのための統一エミュレーションシミュレーション訓練環境 Unified Emulation-Simulation Training Environment for Autonomous Cyber Agents ( http://arxiv.org/abs/2304.01244v1 ) ライセンス: Link先を確認	Li Li, Jean-Pierre S. El Rami, Adrian Taylor, James Hailing Rao, and Thomas Kunz	(参考訳) 自律型サイバーエージェントは、エージェントが代表的環境で訓練される強化および深層強化学習(RL/DRL)を適用して開発することができる。トレーニング環境は、エージェントが探索しようとするネットワークサイバーオペレーション(CyOp)の高忠実度をシミュレートする必要がある。ネットワークのcyopsの複雑さを考えると、良いシミュレータは達成が難しい。本研究は,Cyber Gym for Intelligent Learning (CyGIL)において,高忠実度シミュレータを自動生成する手法を提案する。表現学習と連続学習を通じて、CyGIL-EがシミュレートされたCyGIL-Sを自動的に生成する統一されたCyOpトレーニング環境を提供する。シミュレータ生成はエージェントトレーニングプロセスと統合され、必要なエージェントトレーニング時間を更に短縮する。 CyGIL-Sで訓練されたエージェントは、エミュレートされた「リアル」ネットワークへの完全な転送性を示すCyGIL-Eに直接転送可能である。サイギルトレーニング性能を実証するために実験を行った。オフラインでRLを実行するCyGILソリューションは、現実のサイバーネットワークでRLエージェントを活用するためのsim-to-realに向けた有望な方向を示す。 Autonomous cyber agents may be developed by applying reinforcement and deep reinforcement learning (RL/DRL), where agents are trained in a representative environment. The training environment must simulate with high-fidelity the network Cyber Operations (CyOp) that the agent aims to explore. Given the complexity of net-work CyOps, a good simulator is difficult to achieve. This work presents a systematic solution to automatically generate a high-fidelity simulator in the Cyber Gym for Intelligent Learning (CyGIL). Through representation learning and continuous learning, CyGIL provides a unified CyOp training environment where an emulated CyGIL-E automatically generates a simulated CyGIL-S. The simulator generation is integrated with the agent training process to further reduce the required agent training time. The agent trained in CyGIL-S is transferrable directly to CyGIL-E showing full transferability to the emulated "real" network. Experimental results are presented to demonstrate the CyGIL training performance. Enabling offline RL, the CyGIL solution presents a promising direction towards sim-to-real for leveraging RL agents in real-world cyber networks.	翻訳日:2023-04-05 16:54:29 公開日:2023-04-03
# CoReFusion: 誘導熱超解法のための対照的な正則核融合 CoReFusion: Contrastive Regularized Fusion for Guided Thermal Super-Resolution ( http://arxiv.org/abs/2304.01243v1 ) ライセンス: Link先を確認	Aditya Kasliwal, Pratinav Seth, Sriya Rallabandi and Sanchit Singhal	(参考訳) 低照度環境ではよく機能するため、通常の可視域撮像に比べて多くの利点がある。超解像アプローチは、低コスト・低解像熱センサによる測定を用いて正確な高解像熱画像の再現により、その有用性を広げることができる。画像間のスペクトル範囲ミスマッチのため、可視範囲画像を用いた熱画像の誘導超解像は困難である。しかし、可視範囲画像のキャプチャに失敗した場合、重要な領域でのアプリケーションの動作を防止できる。熱画像のガイド超解像のための新しいデータ融合フレームワークと正規化手法を提案する。提案するアーキテクチャは,高分解能rgb画像や低分解能熱画像の1つが欠落しているにも関わらず,性能を維持できるとともに,計算能力に優れ,軽量であり,欠落データの存在下では堅牢に設計されている。提案手法は,実世界のシナリオにおいてしばしば発生する欠落モダリティ問題に対する有望な解決法である。コードはhttps://github.com/Kasliwal17/CoReFusion.comで入手できる。 Thermal imaging has numerous advantages over regular visible-range imaging since it performs well in low-light circumstances. Super-Resolution approaches can broaden their usefulness by replicating accurate high-resolution thermal pictures using measurements from low-cost, low-resolution thermal sensors. Because of the spectral range mismatch between the images, Guided Super-Resolution of thermal images utilizing visible range images is difficult. However, In case of failure to capture Visible Range Images can prevent the operations of applications in critical areas. We present a novel data fusion framework and regularization technique for Guided Super Resolution of Thermal images. The proposed architecture is computationally in-expensive and lightweight with the ability to maintain performance despite missing one of the modalities, i.e., high-resolution RGB image or the lower-resolution thermal image, and is designed to be robust in the presence of missing data. The proposed method presents a promising solution to the frequently occurring problem of missing modalities in a real-world scenario. Code is available at https://github.com/Kasliwal17/CoReFusion.	翻訳日:2023-04-05 16:54:08 公開日:2023-04-03
# マルチチャネル不均一学習によるエビデンスグラフを用いた臨床エビデンス勧告の強化 Enhancing Clinical Evidence Recommendation with Multi-Channel Heterogeneous Learning on Evidence Graphs ( http://arxiv.org/abs/2304.01242v1 ) ライセンス: Link先を確認	Maolin Luo, and Xiang Zhang	(参考訳) 臨床証拠には、患者の関連や影響、介入(薬物や理学療法など)、問題、結果が含まれる。臨床エビデンスを推奨する目標は、医療従事者に意思決定プロセスを支援するための関連する情報を提供し、新たな証拠を生み出すことである。具体的なタスクは、臨床問題に基づくエビデンスを推奨することに集中します。しかし、特定の臨床的問題と関連する証拠との直接的なつながりは、しばしば疎結合の課題となる。また、適切な証拠を推薦するには、証拠間のトポロジカルな関係とそれらを記述するテキスト情報の両方を共同利用することが不可欠である。これらの課題に対処するために,エビデンス共参照グラフとエビデンステキストグラフという2つの知識グラフを定義し,各要素間のトポロジ的および言語的関係を表現する。また,エビデンス・レコメンデーションにおける共参照テキストの不均質性を扱うために,多チャンネル不均質学習モデルと融合注意機構を提案する。実験により,オープンデータ上での最先端手法に勝ることを示す。 Clinical evidence encompasses the associations and impacts between patients, interventions (such as drugs or physiotherapy), problems, and outcomes. The goal of recommending clinical evidence is to provide medical practitioners with relevant information to support their decision-making processes and to generate new evidence. Our specific task focuses on recommending evidence based on clinical problems. However, the direct connections between certain clinical problems and related evidence are often sparse, creating a challenge of link sparsity. Additionally, to recommend appropriate evidence, it is essential to jointly exploit both topological relationships among evidence and textual information describing them. To address these challenges, we define two knowledge graphs: an Evidence Co-reference Graph and an Evidence Text Graph, to represent the topological and linguistic relations among evidential elements, respectively. We also introduce a multi-channel heterogeneous learning model and a fusional attention mechanism to handle the co-reference-text heterogeneity in evidence recommendation. Our experiments demonstrate that our model outperforms state-of-the-art methods on open data.	翻訳日:2023-04-05 16:53:52 公開日:2023-04-03
# ドラヴィダ語におけるホモフォビア・トランスフォビアの検出:深層学習手法の探索 Detection of Homophobia & Transphobia in Dravidian Languages: Exploring Deep Learning Methods ( http://arxiv.org/abs/2304.01241v1 ) ライセンス: Link先を確認	Deepawali Sharma, Vedika Gupta, Vivek Kumar Singh	(参考訳) オンラインソーシャルメディアプラットフォームにおける乱用コンテンツの増加は、オンラインユーザーの社会生活に影響を与えている。攻撃的・憎悪的な言葉の使用は、いわゆるメディアを有害なものにしている。ホモフォビアとトランスフォビアはLGBT+コミュニティに対する攻撃的なコメントを構成している。これらのコメントを検知し、処理し、タイムリーにフラグを立てたり、ユーザーに対して警告を発したりすることが必須になる。しかし,このようなコンテンツの自動検出は,低リソース言語として認識されるドラヴィダ語では難しい課題である。そこで本論文は,マラヤラムとタミル・ランゲージのソーシャルメディアコメントをホモフォビック,トランスフォビック,非LGBT+コンテンツとして分類するために,異なるディープラーニングモジュールの適用性を検討する。一般的なディープラーニングモデルである畳み込みニューラルネットワーク(CNN)、GloVe埋め込みを用いたLong Short Term Memory(LSTM)、トランスフォーマーベース学習モデル(Multilingual BERTおよびIndicBERT)を分類問題に適用する。その結果、IndicBERTはマラヤラムとタミルでそれぞれ0.86と0.77の重み付き平均F1スコアで他の不完全なモデルよりも優れていた。そこで本研究では,選択したドラヴィダ語言語における与えられたタスクに対するIndicBERTのより高い性能を確認した。 The increase in abusive content on online social media platforms is impacting the social life of online users. Use of offensive and hate speech has been making so-cial media toxic. Homophobia and transphobia constitute offensive comments against LGBT+ community. It becomes imperative to detect and handle these comments, to timely flag or issue a warning to users indulging in such behaviour. However, automated detection of such content is a challenging task, more so in Dravidian languages which are identified as low resource languages. Motivated by this, the paper attempts to explore applicability of different deep learning mod-els for classification of the social media comments in Malayalam and Tamil lan-guages as homophobic, transphobic and non-anti-LGBT+content. The popularly used deep learning models- Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) using GloVe embedding and transformer-based learning models (Multilingual BERT and IndicBERT) are applied to the classification problem. Results obtained show that IndicBERT outperforms the other imple-mented models, with obtained weighted average F1-score of 0.86 and 0.77 for Malayalam and Tamil, respectively. Therefore, the present work confirms higher performance of IndicBERT on the given task in selected Dravidian languages.	翻訳日:2023-04-05 16:53:34 公開日:2023-04-03
# 非生成エネルギーモデル Non-Generative Energy Based Models ( http://arxiv.org/abs/2304.01297v1 ) ライセンス: Link先を確認	Jacob Piland and Christopher Sweet and Priscila Saboia and Charles Vardeman II and Adam Czajka	(参考訳) エネルギーベースモデル(EBM)はコンピュータビジョンの中でますます人気が高まっている。 ebmsはディープニューラルネットワーク(dnn)のトレーニングに確率的アプローチをもたらし、キャリブレーション、分散検出、敵対的抵抗といった分野でのパフォーマンスを向上させることが示されている。しかし、これらの利点は入力データ確率を推定するコストを伴い、通常、Stochastic Gradient Langevin Dynamics (SGLD) のようなランゲヴィンベースの手法を用いて、計算コストを増大させ、パラメータ化を必要とし、効率のキャッシング方法を必要とし、安定性とスケーリングの問題に陥る。 EBMは動的手法を用いて、ネットワークの現在の状態によって定義された確率密度関数(PDF)からサンプルを抽出し、それらを最大ログ確率アプローチを用いてトレーニングデータと比較し、正しいPDFを学習する。本稿では,Grathwohlらによって同定された「Non-Generative EBM(NG-EBM)」を,トレーニングを指示する損失項として用いた非生成的EBM(Non-Generative EBM)を提案する。我々のNG-EBMトレーニング戦略は、従来の手法の計算複雑性やオーバーヘッドを伴わずに、校正、分布外検出、対向抵抗におけるEMMの利点の多くを維持していることを示す。特に、NG-EBMアプローチは、従来の訓練されたモデルと比較して、CIFAR10の2.5倍、CIFAR100の7.5倍の予測校正誤差を改善する。 Energy-based models (EBM) have become increasingly popular within computer vision. EBMs bring a probabilistic approach to training deep neural networks (DNN) and have been shown to enhance performance in areas such as calibration, out-of-distribution detection, and adversarial resistance. However, these advantages come at the cost of estimating input data probabilities, usually using a Langevin based method such as Stochastic Gradient Langevin Dynamics (SGLD), which bring additional computational costs, require parameterization, caching methods for efficiency, and can run into stability and scaling issues. EBMs use dynamical methods to draw samples from the probability density function (PDF) defined by the current state of the network and compare them to the training data using a maximum log likelihood approach to learn the correct PDF. We propose a non-generative training approach, Non-Generative EBM (NG-EBM), that utilizes the {\it{Approximate Mass}}, identified by Grathwohl et al., as a loss term to direct the training. We show that our NG-EBM training strategy retains many of the benefits of EBM in calibration, out-of-distribution detection, and adversarial resistance, but without the computational complexity and overhead of the traditional approaches. In particular, the NG-EBM approach improves the Expected Calibration Error by a factor of 2.5 for CIFAR10 and 7.5 times for CIFAR100, when compared to traditionally trained models.	翻訳日:2023-04-05 16:47:23 公開日:2023-04-03
# purkinje画像とmlアルゴリズムを用いた動的調節計測 Dynamic Accommodation Measurement using Purkinje Images and ML Algorithms ( http://arxiv.org/abs/2304.01296v1 ) ライセンス: Link先を確認	Faik Ozan Ozhan, Arda Gulersoy, Ugur Aygun, Afsun Sahin, Hakan Urey	(参考訳) 本研究では,ARおよび眼科応用に適した4つのPurkinjeリフレクション(PR)に基づく動的視線および調節測定装置の試作を行った。 PR1&2とPR3&4は、それぞれ正確な視線測定と調節測定に使用される。眼模型はZEMAXで開発され,実験結果とよく一致した。モデルは、0.25d以上の精度で4つのディプターから1つのディプターへの調節を予測している。再現性テストを行い,被験者から正確な視線と調節推定値を得た。我々は物理的に正確なモデルと機械学習を用いて大規模な合成データセットを作成している。 We developed a prototype device for dynamic gaze and accommodation measurements based on 4 Purkinje reflections (PR) suitable for use in AR and ophthalmology applications. PR1&2 and PR3&4 are used for accurate gaze and accommodation measurements, respectively. Our eye model was developed in ZEMAX and matches the experiments well. Our model predicts the accommodation from 4 diopters to 1 diopter with better than 0.25D accuracy. We performed repeatability tests and obtained accurate gaze and accommodation estimations from subjects. We are generating a large synthetic data set using physically accurate models and machine learning.	翻訳日:2023-04-05 16:46:55 公開日:2023-04-03
# Prompt-Tuning を用いた会話課題の言語間移動学習の効率化 Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning ( http://arxiv.org/abs/2304.01295v1 ) ライセンス: Link先を確認	Lifu Tu, Jin Qu, Semih Yavuz, Shafiq Joty, Wenhao Liu, Caiming Xiong, Yingbo Zhou	(参考訳) 英語のような高リソース言語で訓練された言語モデルの言語間移動は、多くのNLPタスクで広く研究されているが、会話タスクに焦点が当てられているのは比較的限られている。これは、非英語の会話データを取得するコストが高いためであり、カバー範囲は限られている。本稿では、英語のみのスキーマガイド対話(SGD)データセット(Rastogi et al., 2020)を105言語に翻訳することで、並列かつ大規模多言語会話データセットであるXSGDを紹介する。 xsgdは言語毎に約330k発話を含む。そこで我々は,アライメントプロンプトを学習する効率的なプロンプトチューニング手法を開発した。また、NLIベースとバニラ分類器の2つの異なる分類器と、アライメントされたプロンプトによって可能となる言語間のテスト機能についても検討する。我々は,2つの会話タスク(スロットフィルングとインテント分類)における言語横断的一般化能力を評価する。提案手法は,NLIに基づく分類器のモデリング能力の強化と,アライメントプロンプトによる言語間移動の大幅な改善,特に数ショット設定において実現された。 Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks, but focus on conversational tasks has been rather limited. This is partly due to the high cost of obtaining non-English conversational data, which results in limited coverage. In this work, we introduce XSGD, a parallel and large-scale multilingual conversation dataset that we created by translating the English-only Schema-Guided Dialogue (SGD) dataset (Rastogi et al., 2020) into 105 other languages. XSGD contains approximately 330k utterances per language. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts. We also investigate two different classifiers: NLI-based and vanilla classifiers, and test cross-lingual capability enabled by the aligned prompts. We evaluate our model's cross-lingual generalization capabilities on two conversation tasks: slot-filling and intent classification. Our results demonstrate the strong and efficient modeling ability of NLI-based classifiers and the large cross-lingual transfer improvements achieved by our aligned prompts, particularly in few-shot settings.	翻訳日:2023-04-05 16:46:48 公開日:2023-04-03
# ガウス過程による非線形PDEのスパースコレスキー分解 Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes ( http://arxiv.org/abs/2304.01294v1 ) ライセンス: Link先を確認	Yifan Chen, Houman Owhadi, Florian Sch\"afer	(参考訳) 一般非線形偏微分方程式(PDE)を解くためのガウス過程(GP)フレームワークの計算スケーラビリティについて検討する。この枠組みはPDEを非線形制約で2次最適化問題に変換する。その複雑性のボトルネックは、GPの共分散核とその偏微分のコロケーション点での点での評価から得られる高密度なカーネル行列による計算にある。ディラックスと微分測定の新しい順序付けの下で、コレスキー因子のほぼ疎度に基づいて、そのようなカーネル行列に対するスパースチョレスキー分解アルゴリズムを提案する。我々は,スパルシリティパターンを厳密に同定し,kullback-leiblerダイバージェンスにおいて最適であるgpの対応するvecchia近似の指数収束精度を定量化する。これにより、空間上の複雑性 $o(n\log^d(n/\epsilon))$ と時間内に $o(n\log^{2d}(n/\epsilon))$ を持つカーネル行列の逆コレスキー係数を計算できる。スパース因子により、勾配に基づく最適化手法はスケーラブルになる。さらに、しばしばより効率的なガウス・ニュートン法を用いることで、線形系を解くために、縮小されたカーネル行列のスパース係数と共役勾配アルゴリズムを適用することができる。非線形楕円型, バーガー型, モンジュアンプ型といった幅広い非線形pdesに対して, アルゴリズムの近似空間/時間複雑性を数値的に示す。要約すると、GPで一般的なPDEを解くための高速でスケーラブルで正確な方法を提供する。 We study the computational scalability of a Gaussian process (GP) framework for solving general nonlinear partial differential equations (PDEs). This framework transforms solving PDEs to solving quadratic optimization problem with nonlinear constraints. Its complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel of the GP and its partial derivatives at collocation points. We present a sparse Cholesky factorization algorithm for such kernel matrices based on the near-sparsity of the Cholesky factor under a new ordering of Diracs and derivative measurements. We rigorously identify the sparsity pattern and quantify the exponentially convergent accuracy of the corresponding Vecchia approximation of the GP, which is optimal in the Kullback-Leibler divergence. This enables us to compute $\epsilon$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N\log^d(N/\epsilon))$ in space and $O(N\log^{2d}(N/\epsilon))$ in time. With the sparse factors, gradient-based optimization methods become scalable. Furthermore, we can use the oftentimes more efficient Gauss-Newton method, for which we apply the conjugate gradient algorithm with the sparse factor of a reduced kernel matrix as a preconditioner to solve the linear system. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Amp\`ere equations. In summary, we provide a fast, scalable, and accurate method for solving general PDEs with GPs.	翻訳日:2023-04-05 16:46:28 公開日:2023-04-03
# ウェアラブルセンサを用いた社会不安者の社会的文脈におけるマルチモーダル生理的反応 Wearable Sensor-based Multimodal Physiological Responses of Socially Anxious Individuals across Social Contexts ( http://arxiv.org/abs/2304.01293v1 ) ライセンス: Link先を確認	Emma R. Toner, Mark Rucker, Zhiyuan Wang, Maria A. Larrazabal, Lihua Cai, Debajyoti Datta, Elizabeth Thompson, Haroon Lone, Mehdi Boukhechba, Bethany A. Teachman, and Laura E. Barnes	(参考訳) 受動的に装着されたセンサーから個人の社会的コンテキストを正しく識別することは、社会的不安障害の治療にジャスト・イン・タイム適応的介入(JITAI)を提供することを約束する。本研究では,異なる社会的文脈における生理的反応(他者との比較),社会的相(前・後・相互作用対相互作用),社会的相互作用のサイズ(ダイアディック対グループインタラクション),社会的脅威(暗黙対社会的評価)のレベルについて,受動的に収集したデータを用いて評価した。この研究の参加者(46ドル)は、社会的相互作用不安尺度(80ドル中34ドル)で評価された中程度から重度の社会不安症状を報告した。多変量ランダムフォレストモデルとフォローアップ・クラスター分析を用いて,社会・非社会の異なる状況における生理的反応パターンを検討した。以上の結果から,社会的文脈は,社会的フェーズ,グループサイズ,社会的脅威のレベルよりも確実に区別できるが,これらの区別可能な文脈の中にも,生理的反応パターンにかなりの変動があることが示唆された。実世界のコンテキスト検出とJITAIの展開について論じる。 Correctly identifying an individual's social context from passively worn sensors holds promise for delivering just-in-time adaptive interventions (JITAIs) to treat social anxiety disorder. In this study, we present results using passively collected data from a within-subject experiment that assessed physiological response across different social contexts (i.e, alone vs. with others), social phases (i.e., pre- and post-interaction vs. during an interaction), social interaction sizes (i.e., dyadic vs. group interactions), and levels of social threat (i.e., implicit vs. explicit social evaluation). Participants in the study ($N=46$) reported moderate to severe social anxiety symptoms as assessed by the Social Interaction Anxiety Scale ($\geq$34 out of 80). Univariate paired difference tests, multivariate random forest models, and follow-up cluster analyses were used to explore physiological response patterns across different social and non-social contexts. Our results suggest that social context is more reliably distinguishable than social phase, group size, or level of social threat, but that there is considerable variability in physiological response patterns even among these distinguishable contexts. Implications for real-world context detection and deployment of JITAIs are discussed.	翻訳日:2023-04-05 16:46:00 公開日:2023-04-03
# 知覚者による3次元における境界ボックスによる単眼3次元物体検出 Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver ( http://arxiv.org/abs/2304.01289v1 ) ライセンス: Link先を確認	Xianpeng Liu, Ce Zheng, Kelvin Cheng, Nan Xue, Guo-Jun Qi, Tianfu Wu	(参考訳) 単眼3次元物体検出の主な課題は3次元中心の正確な位置決めである。理想的な場合において,この課題を3次元空間の局所グリッド探索方式で修復できるという新たな強みに感化されて,2次元から3次元までの情報フローと3次元から2次元のコンテキストを重畳した3次元から2次元までの情報フローを組み合わせたステージワイズ手法を提案する。具体的には、まず、市販のバックボーン単分子3D検出器から最初の提案を得る。次に,初期提案から局所グリッドサンプリングにより3次元アンカー空間を生成する。最後に、3D-to-2D提案の検証段階で3Dバウンディングボックスをデノナイズする。本稿では,重なり合う提案を識別する識別的特徴を効果的に学習するために,Perceiver I/Oモデルを用いて3次元から2次元の幾何学的情報と2次元の外観情報を融合させる手法を提案する。提案のエンコードされた潜在表現により、検証ヘッドは自己アテンションモジュールで実装される。提案手法はMonoXiverと命名され, 背骨単分子3D検出器に容易に適用可能である。確立されたKITTIデータセットと挑戦的な大規模Waymoデータセットの実験結果から、MonoXiverは計算オーバーヘッドを制限して一貫して改善を達成している。 The main challenge of monocular 3D object detection is the accurate localization of 3D center. Motivated by a new and strong observation that this challenge can be remedied by a 3D-space local-grid search scheme in an ideal case, we propose a stage-wise approach, which combines the information flow from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and 3D-to-2D (proposal verification by denoising with 3D-to-2D contexts) in a top-down manner. Specifically, we first obtain initial proposals from off-the-shelf backbone monocular 3D detectors. Then, we generate a 3D anchor space by local-grid sampling from the initial proposals. Finally, we perform 3D bounding box denoising at the 3D-to-2D proposal verification stage. To effectively learn discriminative features for denoising highly overlapped proposals, this paper presents a method of using the Perceiver I/O model to fuse the 3D-to-2D geometric information and the 2D appearance information. With the encoded latent representation of a proposal, the verification head is implemented with a self-attention module. Our method, named as MonoXiver, is generic and can be easily adapted to any backbone monocular 3D detectors. Experimental results on the well-established KITTI dataset and the challenging large-scale Waymo dataset show that MonoXiver consistently achieves improvement with limited computation overhead.	翻訳日:2023-04-05 16:45:38 公開日:2023-04-03
# x-time:camsによる表データ機械学習を高速化するインメモリエンジン X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs ( http://arxiv.org/abs/2304.01285v1 ) ライセンス: Link先を確認	Giacomo Pedretti, John Moon, Pedro Bruel, Sergey Serebryakov, Ron M. Roth, Luca Buonanno, Tobias Ziegler, Cong Xu, Martin Foltin, Jim Ignowski, Catherine E. Graves	(参考訳) データ構造は、データ科学において最も一般的な形式である。ディープラーニングモデルは、画像や音声などの非構造化データから学習することが証明されているが、表データから学習する場合の単純なアプローチよりも正確ではない。対照的に、現代的なツリーベース機械学習(ML)モデルでは、構造化データから関連する情報を抽出する。データサイエンスにおける必須要件は、例えば、科学的な発見を加速するためにシミュレーションを伴うクローズドループでモデルが使用される場合のモデル推論レイテンシを低減することである。しかしながら、ハードウェアアクセラレーションコミュニティは、主にディープニューラルネットワークに焦点を当てており、他の機械学習形式を無視している。これまでの研究では、ランダムフォレストを効率的にマッピングするためにアナログコンテンツアドレスメモリ(CAM)コンポーネントが用いられてきた。本研究では,XGBoostやCatBoostといった最先端のツリーベースMLモデルの推論を可能にする,新たな精度向上型アナログCAMと,チップ上のプログラマブルネットワークを実装した,アナログデジタルアーキテクチャ全般に焦点をあてる。 16nm技術で1チップで評価した結果、最先端のGPUと比較して119倍のレイテンシが9740倍、ピーク電力は19Wであった。 Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption.	翻訳日:2023-04-05 16:45:15 公開日:2023-04-03
# 信仰・知識・証拠 Belief, knowledge and evidence ( http://arxiv.org/abs/2304.01283v1 ) ライセンス: Link先を確認	Steffen Lewitzka and Vin\'icius Pinto	(参考訳) 本稿では,信念と知識というよく知られた古典的認識論的概念と,直感的原理 \textit{`evidences belief and knowledge'} が満たされる証拠の概念を組み合わせた論理体系を提案する。我々のアプローチは、最初の著者である『lecite{lewjlc2, lewigpl, lewapal}』の以前の著作に依拠しており、直観主義的真理(すなわち『textit{proof}』)の推論のためのS5$スタイルの原理と、その体系を『textit{intuitionistic}』の信念と知識の概念と組み合わせたものである。我々は、この組み合わせシステムを考慮し、構築的概念である \textit{proof} を、古典的な概念である \textit{evidence} に置き換える。この結果、モダルシステム$S5$と古典的なエピステミック原理を組み合わせた論理となり、$\square\varphi$はエピステミックな意味で '$\varphi$ is obvious' と読む。文献に見られる通常の可能な世界意味論とは対照的に、我々は、信念と知識がアクセシビリティの関係によってモデル化されるのではなく、直接命題の集合(世界の集合)としてモデル化される関係性に基づく意味論を提案する。 We present a logical system that combines the well-known classical epistemic concepts of belief and knowledge with a concept of evidence such that the intuitive principle \textit{`evidence yields belief and knowledge'} is satisfied. Our approach relies on previous works of the first author \cite{lewjlc2, lewigpl, lewapal} who introduced a modal system containing $S5$-style principles for the reasoning about intutionistic truth (i.e. \textit{proof}) and, inspired by \cite{artpro}, combined that system with concepts of \textit{intuitionistic} belief and knowledge. We consider that combined system and replace the constructive concept of \textit{proof} with a classical notion of \textit{evidence}. This results in a logic that combines modal system $S5$ with classical epistemic principles where $\square\varphi$ reads as `$\varphi$ is evident' in an epistemic sense. Inspired by \cite{lewapal}, and in contrast to the usual possible worlds semantics found in the literature, we propose here a relational, frame-based semantics where belief and knowledge are not modeled via accessibility relations but directly as sets of propositions (sets of sets of worlds).	翻訳日:2023-04-05 16:44:54 公開日:2023-04-03
# PEACH:半教師付き擬似パラレル文書生成による翻訳のための事前学習シーケンスとシーケンスの多言語モデル PEACH: Pre-Training Sequence-to-Sequence Multilingual Models for Translation with Semi-Supervised Pseudo-Parallel Document Generation ( http://arxiv.org/abs/2304.01282v1 ) ライセンス: Link先を確認	Alireza Salemi, Amirhossein Abaskohi, Sara Tavakoli, Yadollah Yaghoobzadeh, Azadeh Shakery	(参考訳) 多言語プレトレーニングは、機械翻訳を含む多言語nlpタスクを著しく改善する。既存の手法の多くは、モノリンガルデータに基づくマスク付き言語モデリングとテキストデノベーションの目的に基づくものである。モノリンガルデータに対する多言語事前学習は、多くの言語ペアにおける並列データの可用性を無視する。また、利用可能な人間の生成した並列翻訳データを事前学習に組み込む研究もある。この種の並列データは間違いなく役に立つが、高リソースの言語ペアであっても制限されている。本稿では,多言語事前学習のための高品質な擬似並列データを生成する,新しい半教師付きSPDGを提案する。まず、単語の順序付け、追加、削除、置換のために単言語データに対して述語モデルを事前訓練し、予め学習した文書の品質を高める。そして、単語間翻訳のための辞書を用いて事前学習文書ごとに異なる擬似翻訳を生成し、事前学習された復調モデルを適用する。次に、擬似並列データを用いて、多言語列列列モデルのPEACHを事前学習する。 PEACHは, 教師付き, ゼロショット, 少数ショットのシナリオを含む様々な翻訳タスクにおいて, mT5 と mBART のトレーニングに使用されている既存手法よりも優れていることを示す。さらに、PEACHが類似言語間で知識を伝達する能力は、低リソース言語に特に有用である。 PEACHは,精度の高い擬似並列を生成するための高品質な辞書を用いて,低リソース言語に有用であることを示す。 Multilingual pre-training significantly improves many multilingual NLP tasks, including machine translation. Most existing methods are based on some variants of masked language modeling and text-denoising objectives on monolingual data. Multilingual pre-training on monolingual data ignores the availability of parallel data in many language pairs. Also, some other works integrate the available human-generated parallel translation data in their pre-training. This kind of parallel data is definitely helpful, but it is limited even in high-resource language pairs. This paper introduces a novel semi-supervised method, SPDG, that generates high-quality pseudo-parallel data for multilingual pre-training. First, a denoising model is pre-trained on monolingual data to reorder, add, remove, and substitute words, enhancing the pre-training documents' quality. Then, we generate different pseudo-translations for each pre-training document using dictionaries for word-by-word translation and applying the pre-trained denoising model. The resulting pseudo-parallel data is then used to pre-train our multilingual sequence-to-sequence model, PEACH. Our experiments show that PEACH outperforms existing approaches used in training mT5 and mBART on various translation tasks, including supervised, zero- and few-shot scenarios. Moreover, PEACH's ability to transfer knowledge between similar languages makes it particularly useful for low-resource languages. Our results demonstrate that with high-quality dictionaries for generating accurate pseudo-parallel, PEACH can be valuable for low-resource languages.	翻訳日:2023-04-05 16:44:24 公開日:2023-04-03
# 知識抽出による自己異種統合による長期視覚認識 Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation ( http://arxiv.org/abs/2304.01279v1 ) ライセンス: Link先を確認	Yan Jin, Mengke LI, Yang Lu, Yiu-ming Cheung, Hanzi Wang	(参考訳) 深層ニューラルネットワークは、ここ数十年で大きな進歩を遂げている。しかしながら、現実世界のデータはしばしば長い尾の分布を示すため、バニラディープモデルは多数派に大きく偏っている傾向にある。この問題に対処するため、最先端の手法は通常、ロングテール分布の異なる部分に焦点を当てるために専門家(moe)の混合を採用する。これらの手法のエキスパートはモデル深度が同じであり、異なるクラスが異なる深さのモデルに適合するように異なる好みを持つという事実を無視する。そこで本研究では,知識抽出を用いた自己異種統合法(SHIKE)を提案する。まず,異なる浅い部分と1つのネットワークの深い部分の間で特徴を融合するために,dkf(deep-wise knowledge fusion)を提案する。 dkfに基づき、我々はさらに、moeフレームワークのテールクラスに無視できない影響を持つ最も難しい負のクラスの影響を減らすために、動的知識伝達(dkt)を提案します。その結果、特に尾のクラスにおいて、長い尾のデータの分類精度を著しく向上させることができる。 SHIKEはCIFAR100-LT (IF100), ImageNet-LT, iNaturalist 2018, Places-LTで56.3%, 60.3%, 75.4%, 41.9%の最先端性能を達成した。 Deep neural networks have made huge progress in the last few decades. However, as the real-world data often exhibits a long-tailed distribution, vanilla deep models tend to be heavily biased toward the majority classes. To address this problem, state-of-the-art methods usually adopt a mixture of experts (MoE) to focus on different parts of the long-tailed distribution. Experts in these methods are with the same model depth, which neglects the fact that different classes may have different preferences to be fit by models with different depths. To this end, we propose a novel MoE-based method called Self-Heterogeneous Integration with Knowledge Excavation (SHIKE). We first propose Depth-wise Knowledge Fusion (DKF) to fuse features between different shallow parts and the deep part in one network for each expert, which makes experts more diverse in terms of representation. Based on DKF, we further propose Dynamic Knowledge Transfer (DKT) to reduce the influence of the hardest negative class that has a non-negligible impact on the tail classes in our MoE framework. As a result, the classification accuracy of long-tailed data can be significantly improved, especially for the tail classes. SHIKE achieves the state-of-the-art performance of 56.3%, 60.3%, 75.4%, and 41.9% on CIFAR100-LT (IF100), ImageNet-LT, iNaturalist 2018, and Places-LT, respectively.	翻訳日:2023-04-05 16:44:00 公開日:2023-04-03
# $\delta$相互作用によるSchr\"{o}dinger演算子について On Schr\"{o}dinger Operators Modified by $\delta$ Interactions ( http://arxiv.org/abs/2304.01326v1 ) ライセンス: Link先を確認	Kaya G\"uven Akba\c{s}, Fatih Erman, O. Teoman Turgut	(参考訳) デルタ相互作用によって修正されたシュル「{o}dinger 作用素 $H_0$ のスペクトル特性を研究し、新しいグリーン関数の極が元のグリーン関数の極に対してどのように再配置されるかを明確に示す。我々は、新しい境界状態エネルギーが古い状態と接し、デルタ相互作用が魅力的であれば基底状態エネルギーは常に低下することを証明する。また,若干のヒューリスティックな方法で小さな結合定数の仮定の下で境界状態エネルギーと波動関数を求める別の摂動的な方法も導出する。さらに,この結果が再正規化処理が必要な場合に拡張可能であることを示す。最後に, 粒子がコンパクトな2次元多様体内をデルタ相互作用の影響下で移動している場合について, 多中心の場合, 曲線上で支持されるデルタ相互作用, およびその場合について検討する。 We study the spectral properties of a Schr\"{o}dinger operator $H_0$ modified by delta interactions and show explicitly how the poles of the new Green's function are rearranged relative to the poles of original Green's function of $H_0$. We prove that the new bound state energies are interlaced between the old ones and the ground state energy is always lowered if the delta interaction is attractive. We also derive an alternative perturbative way of finding the bound state energies and wave functions under the assumption of small coupling constant in a somewhat heuristic way. We further show that these results can be extended to the case where a renormalization process is required. Finally, we consider the possible extensions of our results to the multi center case, to delta interaction supported on curves, and to the case, where the particle is moving in a compact two dimensional manifold under the influence of a delta interaction.	翻訳日:2023-04-05 16:37:51 公開日:2023-04-03
# 任意の2次元閉じ込めを伴うシュリンガー問題にアプローチするディープラーニングニューラルネットワーク Deep learning neural network for approaching Schr\"odinger problems with arbitrary two-dimensional confinement ( http://arxiv.org/abs/2304.01325v1 ) ライセンス: Link先を確認	Adrian Radu, Carlos A. Duque	(参考訳) 本稿では,ニューラルネットワークを用いた自動学習法に基づく2次元シュリンガー方程式へのアプローチを提案する。これは、解の知識から多くの任意のサンプル問題まで、任意の二次元ポテンシャルに閉じ込められた粒子の基底状態を決定することを目的としている。基底状態の波動関数とエネルギーを予測するために,二つの隠れ層を持つネットワークアーキテクチャを提案する。ニューラルネットワークが提供する推定値を検証するために、いくつかの精度インジケータが提案されている。トレーニングされたネットワークは、学習プロセスで使用されるものと異なる大量の閉じ込めポテンシャルに適用することでテストされた。対称ポテンシャルを持つ特定の事例を具体例として解決し,良好なネットワーク予測精度が得られた。 This article presents an approach to the two-dimensional Schr\"odinger equation based on automatic learning methods with neural networks. It is intended to determine the ground state of a particle confined in any two-dimensional potential, starting from the knowledge of the solutions to a large number of arbitrary sample problems. A network architecture with two hidden layers is proposed to predict the wave function and energy of the ground state. Several accuracy indicators have been proposed to validate the estimates provided by the neural network. The trained network was tested by applying it to a large set of confinement potentials different from those used in the learning process. Some particular cases with symmetrical potentials were solved as concrete examples, and a good network prediction accuracy was found.	翻訳日:2023-04-05 16:37:31 公開日:2023-04-03
# PALI:ペルソ・アラビア文字の言語識別ベンチマーク PALI: A Language Identification Benchmark for Perso-Arabic Scripts ( http://arxiv.org/abs/2304.01322v1 ) ライセンス: Link先を確認	Sina Ahmadi and Milind Agarwal and Antonios Anastasopoulos	(参考訳) ペルソ・アラビア文字(Perso-Arabic script)は、世界中の様々な言語コミュニティで広く採用され使用されている文字群である。このようなスクリプトを使って様々な言語を識別することは、言語技術にとって重要であり、低リソースのセットアップでは困難である。そこで本稿では,ペルソ・アラビア文字を用いた言語検出の課題について,特に「非従来的」な文章を実践するバイリンガル・コミュニティで取り上げる。これを解決するために,教師付き手法を用いて文を言語に分類する。また,これらに基づいて,分類器によって混同されることが多い言語群を対象とする階層モデルを提案する。私たちの実験結果は、ソリューションの有効性を示しています。 The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various linguistic communities around the globe. Identifying various languages using such scripts is crucial to language technologies and challenging in low-resource setups. As such, this paper sheds light on the challenges of detecting languages using Perso-Arabic scripts, especially in bilingual communities where ``unconventional'' writing is practiced. To address this, we use a set of supervised techniques to classify sentences into their languages. Building on these, we also propose a hierarchical model that targets clusters of languages that are more often confused by the classifiers. Our experiment results indicate the effectiveness of our solutions.	翻訳日:2023-04-05 16:37:20 公開日:2023-04-03
# 低リソース言語技術のためのコーパス作成へのアプローチ--南クルド語とラキ語を事例として Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki ( http://arxiv.org/abs/2304.01319v1 ) ライセンス: Link先を確認	Sina Ahmadi and Zahra Azin and Sara Belelli and Antonios Anastasopoulos	(参考訳) 言語技術において過度に表現され、絶滅危惧されている言語コミュニティが直面する大きな課題の1つは、言語データの欠如または欠如である。これはまた、クルド語とラキ語の南の品種で、道具の実質的な進歩とともに非常に限られた資源が利用可能である場合でもある。そこで本稿では、クルド南部でコンテンツを放送するローカルラジオ局であるローカルニュースサイトと、ラキのフィールドワークに依存するいくつかのアプローチを提案する。本稿では,このような未表現言語の課題,特に文書作成と標準化,そして南クルド語とラキ語のためのコーパスを作成するために,データソースの検索や手書きコンテンツの遡及といった課題について述べる。さらに,クルド語およびザザ・ゴラーニ語以外の変種に照らして,言語識別の課題について検討した。 One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data. This is also the case of the Southern varieties of the Kurdish and Laki languages for which very limited resources are available with insubstantial progress in tools. To tackle this, we provide a few approaches that rely on the content of local news websites, a local radio station that broadcasts content in Southern Kurdish and fieldwork for Laki. In this paper, we describe some of the challenges of such under-represented languages, particularly in writing and standardization, and also, in retrieving sources of data and retro-digitizing handwritten content to create a corpus for Southern Kurdish and Laki. In addition, we study the task of language identification in light of the other variants of Kurdish and Zaza-Gorani languages.	翻訳日:2023-04-05 16:37:07 公開日:2023-04-03
# matched machine learning: 学習メトリクスを用いた治療効果推論の一般化フレームワーク Matched Machine Learning: A Generalized Framework for Treatment Effect Inference With Learned Metrics ( http://arxiv.org/abs/2304.01316v1 ) ライセンス: Link先を確認	Marco Morucci, Cynthia Rudin, Alexander Volfovsky	(参考訳) 本稿では,機械学習ブラックボックスの柔軟性と,観測因果推論における長年のツールであるマッチングの解釈可能性を組み合わせたフレームワークであるmatched machine learningを紹介する。因果推論の多くの高リスク応用において、解釈可能性が最も重要である。平均的および個別化された治療効果の非パラメトリック推定のための現在のツールは、人間の見積もりの監査を許さないブラックボックスである。我々のフレームワークは、機械学習を使用して、ユニットのマッチングと結果の推定に最適なメトリクスを学習し、解釈可能でありながら機械学習ブラックボックスのパフォーマンスを達成する。我々の一般的なフレームワークは、いくつかの出版作品を特別なケースとして包含している。提案するフレームワークの漸近的推論理論により,個々の治療効果と平均治療効果の両面から近似した信頼区間を構築できる。一致機械学習のインスタンスはブラックボックスの機械学習手法と同等に動作し、類似した問題に対する既存のマッチング手法よりも優れていることを示す。最後に、我々のアプリケーションでは、共変量データが非常に複雑である場合でも、どのようにマッチング機械学習を用いて因果推論を行うかを示す。 We introduce Matched Machine Learning, a framework that combines the flexibility of machine learning black boxes with the interpretability of matching, a longstanding tool in observational causal inference. Interpretability is paramount in many high-stakes application of causal inference. Current tools for nonparametric estimation of both average and individualized treatment effects are black-boxes that do not allow for human auditing of estimates. Our framework uses machine learning to learn an optimal metric for matching units and estimating outcomes, thus achieving the performance of machine learning black-boxes, while being interpretable. Our general framework encompasses several published works as special cases. We provide asymptotic inference theory for our proposed framework, enabling users to construct approximate confidence intervals around estimates of both individualized and average treatment effects. We show empirically that instances of Matched Machine Learning perform on par with black-box machine learning methods and better than existing matching methods for similar problems. Finally, in our application we show how Matched Machine Learning can be used to perform causal inference even when covariate data are highly complex: we study an image dataset, and produce high quality matches and estimates of treatment effects.	翻訳日:2023-04-05 16:36:51 公開日:2023-04-03
# 強化学習における実証設計 Empirical Design in Reinforcement Learning ( http://arxiv.org/abs/2304.01315v1 ) ライセンス: Link先を確認	Andrew Patterson, Samuel Neumann, Martha White, Adam White	(参考訳) 強化学習における実証設計は簡単な作業ではない。優れた実験を行うには、詳細や時には重要な計算資源に注意する必要がある。ドル当たりの計算資源は急速に増え続けているが、強化学習における典型的な実験の規模も大きい。今や、数百万のパラメータを持つエージェントが数十のタスクに対して、それぞれ30日間の経験と同等のパラメータでベンチマークするのも一般的だ。これらの実験の規模は、特にアルゴリズムを比較する際に、適切な統計証拠の必要性と相反することが多い。最近の研究は、一般的なアルゴリズムがハイパーパラメータの設定や実装の詳細にどのように敏感であるかを強調しており、一般的な経験的実践は弱い統計的証拠をもたらす(Machado et al., 2018; Henderson et al., 2018)。ここでは、これを一歩進める。この原稿は、行動を呼びかけることと、強化学習で良い実験を行うための包括的なリソースの両方を表しています。特に、共通性能測定の基礎となる統計的仮定、性能変動と安定性を適切に評価する方法、仮説テスト、複数のエージェントの比較のための特別な考察、ベースラインとイラストラティブな例構築、ハイパーパラメータと実験者バイアスの扱いについて述べる。全体を通して、文献に見られる一般的な誤りと、事例実験による統計的結果を強調します。この文書の目的は、我々の前例のない計算を使って強化学習に優れた科学を学べるか、また、経験的設計における潜在的な落とし穴への警告を与えることである。 Empirical design in reinforcement learning is no small task. Running good experiments requires attention to detail and at times significant computational resources. While compute resources available per dollar have continued to grow rapidly, so have the scale of typical experiments in reinforcement learning. It is now common to benchmark agents with millions of parameters against dozens of tasks, each using the equivalent of 30 days of experience. The scale of these experiments often conflict with the need for proper statistical evidence, especially when comparing algorithms. Recent studies have highlighted how popular algorithms are sensitive to hyper-parameter settings and implementation details, and that common empirical practice leads to weak statistical evidence (Machado et al., 2018; Henderson et al., 2018). Here we take this one step further. This manuscript represents both a call to action, and a comprehensive resource for how to do good experiments in reinforcement learning. In particular, we cover: the statistical assumptions underlying common performance measures, how to properly characterize performance variation and stability, hypothesis testing, special considerations for comparing multiple agents, baseline and illustrative example construction, and how to deal with hyper-parameters and experimenter bias. Throughout we highlight common mistakes found in the literature and the statistical consequences of those in example experiments. The objective of this document is to provide answers on how we can use our unprecedented compute to do good science in reinforcement learning, as well as stay alert to potential pitfalls in our empirical design.	翻訳日:2023-04-05 16:36:35 公開日:2023-04-03
# 実践における知識グラフのユーザ,課題,可視化の必要性 Characterizing the Users, Challenges, and Visualization Needs of Knowledge Graphs in Practice ( http://arxiv.org/abs/2304.01311v1 ) ライセンス: Link先を確認	Harry Li, Gabriel Appleby, Camelia Daniela Brumar, Remco Chang, Ashley Suh	(参考訳) 本研究は、企業と学術の両方で幅広いユースケースで働いている19人の知識グラフ実践者へのインタビューから得られた知見を提示する。本研究では,視覚的デザインによって緩和できるKGの作成,探索,分析において,KG実践者が経験した重要な課題を明らかにする。以上の結果から,kg実践者のうち,kg製作者,アナリスト,消費者の3人がそれぞれ独自の専門知識とニーズを持っていることが明らかとなった。我々は、KGビルダーがスキーマインクルーダーの恩恵を受けることを発見した。一方、KGアナリストは、中間クエリ結果を提供するカスタマイズ可能なクエリビルダーが必要である。 kg ユーザに対しては,ノードリンク図の有効性の欠如,および kg の採用と理解を促進するためのドメイン固有可視化の必要性が指摘されている。最後に、KGを効果的に実践するには、現在のツールや技術、コラボレーションワークフローに対処しない、技術的および社会的ソリューションの両方が必要です。インタビューの分析から,消化可能性と発見可能性のバランスをとる知識カード,時間的変化を追跡するタイムラインビュー,有機的発見をサポートするインターフェース,AIと機械学習予測のセマンティック説明など,KGのユーザビリティ向上のための可視化研究の方向性を抽出した。 This study presents insights from interviews with nineteen Knowledge Graph (KG) practitioners who work in both enterprise and academic settings on a wide variety of use cases. Through this study, we identify critical challenges experienced by KG practitioners when creating, exploring, and analyzing KGs that could be alleviated through visualization design. Our findings reveal three major personas among KG practitioners - KG Builders, Analysts, and Consumers - each of whom have their own distinct expertise and needs. We discover that KG Builders would benefit from schema enforcers, while KG Analysts need customizable query builders that provide interim query results. For KG Consumers, we identify a lack of efficacy for node-link diagrams, and the need for tailored domain-specific visualizations to promote KG adoption and comprehension. Lastly, we find that implementing KGs effectively in practice requires both technical and social solutions that are not addressed with current tools, technologies, and collaborative workflows. From the analysis of our interviews, we distill several visualization research directions to improve KG usability, including knowledge cards that balance digestibility and discoverability, timeline views to track temporal changes, interfaces that support organic discovery, and semantic explanations for AI and machine learning predictions.	翻訳日:2023-04-05 16:36:09 公開日:2023-04-03
# 2軸非直線イメージングにおける過渡現象の役割 Role of Transients in Two-Bounce Non-Line-of-Sight Imaging ( http://arxiv.org/abs/2304.01308v1 ) ライセンス: Link先を確認	Siddharth Somasundaram, Akshat Dave, Connor Henley, Ashok Veeraraghavan, Ramesh Raskar	(参考訳) 非視線イメージング(NLOS)の目的は、多重散乱光を用いてカメラの視野から隠された物体を撮像することである。最近の研究は、レーザーを走査し、2つのリレー面を持つシーンにおける閉塞物体の鋳造影を測定することにより、2バウンス(2B)のNLOSイメージングの実現可能性を示している。本研究では,2B-NLOSにおける飛行時間(ToF)測定の役割を多重照明下で検討した。具体的には,tof情報による形状復元に必要な計測数と空間分解能の低減について検討する。本稿では,(1)時間分解能,(2)空間分解能,(3)SNRによる画像キャプチャ数,およびシステムパラメータの関数としての回復可能性に関するトレードオフについて述べる。これにより、2Bライダーの数学的制約の形式的定義が導かれる。我々の研究は将来のNLOSイメージングシステム、特にToFセンサーがますます普及するにつれて、分析的基盤を築き上げていると信じている。 The goal of non-line-of-sight (NLOS) imaging is to image objects occluded from the camera's field of view using multiply scattered light. Recent works have demonstrated the feasibility of two-bounce (2B) NLOS imaging by scanning a laser and measuring cast shadows of occluded objects in scenes with two relay surfaces. In this work, we study the role of time-of-flight (ToF) measurements, \ie transients, in 2B-NLOS under multiplexed illumination. Specifically, we study how ToF information can reduce the number of measurements and spatial resolution needed for shape reconstruction. We present our findings with respect to tradeoffs in (1) temporal resolution, (2) spatial resolution, and (3) number of image captures by studying SNR and recoverability as functions of system parameters. This leads to a formal definition of the mathematical constraints for 2B lidar. We believe that our work lays an analytical groundwork for design of future NLOS imaging systems, especially as ToF sensors become increasingly ubiquitous.	翻訳日:2023-04-05 16:35:45 公開日:2023-04-03
# 並列テンパリングにおける混合時間境界の改善 Improved Bound for Mixing Time of Parallel Tempering ( http://arxiv.org/abs/2304.01303v1 ) ライセンス: Link先を確認	Holden Lee, Zeyu Shen	(参考訳) サンプリングアルゴリズムの分野では、直接サンプリングが不可能な場合にはMCMC(Markov Chain Monte Carlo)法が広く用いられている。しかし、ターゲット分布の多様性はしばしば収束と混合を遅くする。一般的な解決策は並列テンパリングである。実際には極めて効果的であるが、その性能に関する理論的保証は限られている。本稿では,各パラメータに多項式依存を持つスペクトルギャップ上での並列テンパリングのための新しい下限を,$(L + 1)$がレベル数であるような$\log L$を除いて提示する。これにより、モード数に指数関数的に依存する最善のバウンダリが改善される。さらに、スペクトルギャップ上の仮定上界は$\log L$に指数関数依存しており、ある意味では我々の境界は密であることを示す。 In the field of sampling algorithms, MCMC (Markov Chain Monte Carlo) methods are widely used when direct sampling is not possible. However, multimodality of target distributions often leads to slow convergence and mixing. One common solution is parallel tempering. Though highly effective in practice, theoretical guarantees on its performance are limited. In this paper, we present a new lower bound for parallel tempering on the spectral gap that has a polynomial dependence on all parameters except $\log L$, where $(L + 1)$ is the number of levels. This improves the best existing bound which depends exponentially on the number of modes. Moreover, we complement our result with a hypothetical upper bound on spectral gap that has an exponential dependence on $\log L$, which shows that, in some sense, our bound is tight.	翻訳日:2023-04-05 16:35:26 公開日:2023-04-03
# 微分プライベート学習のためのカーネルアフィンハルマシン Kernel Affine Hull Machines for Differentially Private Learning ( http://arxiv.org/abs/2304.01300v1 ) ライセンス: Link先を確認	Mohit Kumar, Bernhard A. Moser, Lukas Fischer	(参考訳) 本稿では,データ空間を,個々のデータ点に関するプライバシーに敏感な情報を隠蔽する幾何学体に分割し,元の学習課題の構造を保ちながら,学習を通してデータを表現する手段として,アフィンの点の殻を用いる方法について検討する。この目的のために,カーネルアフィンハルマシン (kernel affine hull machine, kahm) を導入する。 KAHMは、広範囲かつ深いオートエンコーダにおいて重要なビルディングブロックであり、分類アプリケーションのためのデータ表現学習を可能にする。プライバシ保存学習を確実にするために,変換プロセスを通じて差分プライベートなデータサンプルを平滑化することを含む,新しいデータ生成法を提案する。その結果生成されたデータにより、差分プライバシーが保証されるだけでなく、KAHMモデリングエラーが元のトレーニングデータサンプルよりも大きくないことも保証される。また, 試作データを用いて, 微分プライベート分類器で発生する精度損失問題にも対処する。このアプローチにより、メンバーシップ推論攻撃のリスクは大幅に低減されるが、精度の限界損失しか生じない。応用として,大域的分類器の評価が局所的に計算された距離測定のみを必要とすることを特徴とする,KAHMに基づく微分プライベートフェデレーション学習方式を導入する。本研究は,プライバシー保護学習と分類のための効果的なツールとして,KAHMの可能性を示すものである。 This paper explores the use of affine hulls of points as a means of representing data via learning in Reproducing Kernel Hilbert Spaces (RKHS), with the goal of partitioning the data space into geometric bodies that conceal privacy-sensitive information about individual data points, while preserving the structure of the original learning problem. To this end, we introduce the Kernel Affine Hull Machine (KAHM), which provides an effective way of computing a distance measure from the resulting bounded geometric body. KAHM is a critical building block in wide and deep autoencoders, which enable data representation learning for classification applications. To ensure privacy-preserving learning, we propose a novel method for generating fabricated data, which involves smoothing differentially private data samples through a transformation process. The resulting fabricated data guarantees not only differential privacy but also ensures that the KAHM modeling error is not larger than that of the original training data samples. We also address the accuracy-loss issue that arises with differentially private classifiers by using fabricated data. This approach results in a significant reduction in the risk of membership inference attacks while incurring only a marginal loss of accuracy. As an application, a KAHM based differentially private federated learning scheme is introduced featuring that the evaluation of global classifier requires only locally computed distance measures. Overall, our findings demonstrate the potential of KAHM as effective tool for privacy-preserving learning and classification.	翻訳日:2023-04-05 16:35:14 公開日:2023-04-03
# メモリ効率とロバストな単調演算子学習(MOL)を用いた並列MRIの高速化 Accelerated parallel MRI using memory efficient and robust monotone operator learning (MOL) ( http://arxiv.org/abs/2304.01351v1 ) ライセンス: Link先を確認	Aniket Pramanik, Mathews Jacob	(参考訳) 画像物理と学習正則化を併用したモデルベースディープラーニング手法が,並列MRI加速のための強力なツールとして登場してきた。本稿では, 並列MRIにおける単調演算子学習(MOL)フレームワークの有用性について検討する。 MOLアルゴリズムは、モノトン畳み込みニューラルネットワーク(CNN)と共役勾配アルゴリズムを用いて勾配降下ステップを交互に行い、データの一貫性を促進する。このアプローチの利点は、一意性、収束性、安定性を含む圧縮センシングアルゴリズムと同様の保証を含んでいる。提案手法を,静的および動的設定のための高速化並列mriの文脈で,異なる未ロールアルゴリズムと比較することにより検証する。 Model-based deep learning methods that combine imaging physics with learned regularization priors have been emerging as powerful tools for parallel MRI acceleration. The main focus of this paper is to determine the utility of the monotone operator learning (MOL) framework in the parallel MRI setting. The MOL algorithm alternates between a gradient descent step using a monotone convolutional neural network (CNN) and a conjugate gradient algorithm to encourage data consistency. The benefits of this approach include similar guarantees as compressive sensing algorithms including uniqueness, convergence, and stability, while being significantly more memory efficient than unrolled methods. We validate the proposed scheme by comparing it with different unrolled algorithms in the context of accelerated parallel MRI for static and dynamic settings.	翻訳日:2023-04-05 16:28:53 公開日:2023-04-03
# 化学タンパク質相互作用抽出のためのエンド・ツー・エンドモデル:トークン化とスパンベースのパイプライン戦略の改善 End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies ( http://arxiv.org/abs/2304.01344v1 ) ライセンス: Link先を確認	Xuguang Ai and Ramakanth Kavuluru	(参考訳) エンド・ツー・エンド関係抽出(E2ERE)は情報抽出において重要な課題である。 e2ereは通常、エンティティ(または名前付きエンティティ認識(ner))と関連する関係を識別するが、ほとんどのreタスクは単にエンティティが前もって提供され、最終的に関係分類を行うと仮定する。 E2EREは、NERの雪玉効果がREにより多くの誤差をもたらす可能性を考えると、RE単独よりも本質的に困難である。バイオメディカルE2EREの複雑なデータセットはChemProtデータセット(BioCreative VI, 2017)であり、科学文献における化学物質と遺伝子/タンパク質の関係を識別する。 ChemProtはBLUE、BLURB、BigBioを含む最近のバイオメディカル自然言語処理ベンチマークに含まれている。しかしながら、これらのベンチマークや他の別々の取り組みでは、通常はエンドツーエンドではなく、例外が少ない。この取り組みでは、ChemProtデータセット上で新しい最先端のE2EREパフォーマンスを生成するために、スパンベースのパイプラインアプローチを採用しています。以上の結果から,e2ereでは,特に複雑な名前付きエンティティの扱いに関して,スパンベースのアプローチが優れていることを示す。私たちのエラー解析では、ChemProt用のE2EREのいくつかの重要な障害モードも特定しています。 End-to-end relation extraction (E2ERE) is an important task in information extraction, more so for biomedicine as scientific literature continues to grow exponentially. E2ERE typically involves identifying entities (or named entity recognition (NER)) and associated relations, while most RE tasks simply assume that the entities are provided upfront and end up performing relation classification. E2ERE is inherently more difficult than RE alone given the potential snowball effect of errors from NER leading to more errors in RE. A complex dataset in biomedical E2ERE is the ChemProt dataset (BioCreative VI, 2017) that identifies relations between chemical compounds and genes/proteins in scientific literature. ChemProt is included in all recent biomedical natural language processing benchmarks including BLUE, BLURB, and BigBio. However, its treatment in these benchmarks and in other separate efforts is typically not end-to-end, with few exceptions. In this effort, we employ a span-based pipeline approach to produce a new state-of-the-art E2ERE performance on the ChemProt dataset, resulting in $> 4\%$ improvement in F1-score over the prior best effort. Our results indicate that a straightforward fine-grained tokenization scheme helps span-based approaches excel in E2ERE, especially with regards to handling complex named entities. Our error analysis also identifies a few key failure modes in E2ERE for ChemProt.	翻訳日:2023-04-05 16:28:07 公開日:2023-04-03
# ビデオ中の効率的なデータ収集のためのスケール不変トラジェクトリ簡易化法 A Scale-Invariant Trajectory Simplification Method for Efficient Data Collection in Videos ( http://arxiv.org/abs/2304.01340v1 ) ライセンス: Link先を確認	Yang Liu, Luiz Gustavo Hafemann	(参考訳) トレーニングデータは機械学習タスクにとって重要な要件であり、ラベル付きトレーニングデータは取得に高価であり、手動または半自動のデータ収集パイプラインを必要とすることが多い。アプリケーションを追跡するために、データ収集は各フレームの関心クラスの周りにバウンディングボックスを描画し、同じ"インスタンス"の検出をフレーム上で関連付ける。半自動データ収集パイプラインでは、ベースライン検出とトラッキングアルゴリズムを実行し、各フレームにバウンディングボックスを追加/削除/変更するための手作業による修正と、フレーム(トラックスイッチ)上のアソシエーションエラーの解決によって、これを実現することができる。本稿では,この半自動化シナリオにおいて,より効率的に地中データを生成するためのデータ補正パイプラインを提案する。提案手法は追跡システムからのトラジェクタを単純化し,アノテータがサンプルしたキーフレーム内のオブジェクトの検証と修正を行う。キーフレーム内のオブジェクトが修正されると、他のフレームのバウンディングボックスは補間によって取得される。本手法は,手動補正を必要とするフレーム数を大幅に削減する。 MOTデータセットでは、HOTAスコア89.61%を維持しながらフレーム数を30倍に削減する。さらに、サッカーネットデータセットでは79.24%、ダンストラックデータセットでは85.79%のホタスコアを達成しつつ、フレーム数を10倍に削減する。プロジェクトコードとデータはhttps://github.com/foreverYoungGitHub/trajectory-simplify-benchmarkで公開されている。 Training data is a critical requirement for machine learning tasks, and labeled training data can be expensive to acquire, often requiring manual or semi-automated data collection pipelines. For tracking applications, the data collection involves drawing bounding boxes around the classes of interest on each frame, and associate detections of the same "instance" over frames. In a semi-automated data collection pipeline, this can be achieved by running a baseline detection and tracking algorithm, and relying on manual correction to add/remove/change bounding boxes on each frame, as well as resolving errors in the associations over frames (track switches). In this paper, we propose a data correction pipeline to generate ground-truth data more efficiently in this semi-automated scenario. Our method simplifies the trajectories from the tracking systems and let the annotator verify and correct the objects in the sampled keyframes. Once the objects in the keyframes are corrected, the bounding boxes in the other frames are obtained by interpolation. Our method achieves substantial reduction in the number of frames requiring manual correction. In the MOT dataset, it reduces the number of frames by 30x while maintaining a HOTA score of 89.61% . Moreover, it reduces the number of frames by a factor of 10x while achieving a HOTA score of 79.24% in the SoccerNet dataset, and 85.79% in the DanceTrack dataset. The project code and data are publicly released at https://github.com/foreverYoungGitHub/trajectory-simplify-benchmark.	翻訳日:2023-04-05 16:27:40 公開日:2023-04-03
# 熱的騒音によるニューラルネットワーク景観の地形図の作成 Charting the Topography of the Neural Network Landscape with Thermal-Like Noise ( http://arxiv.org/abs/2304.01335v1 ) ライセンス: Link先を確認	Theo Jules, Gal Brener, Tal Kachman, Noam Levi, Yohai Bar-Sinai	(参考訳) ニューラルネットワークのトレーニングは複雑で高次元で非凸でノイズの多い最適化問題であり、理論的理解は応用的視点と基本的な理由の両方から興味深い。主な課題は、最適化を導く景観の幾何学と地形を理解することである。本研究では,Langevin dynamics を用いた位相空間探索という標準的な統計力学手法を用いて,ランダムデータに基づく分類タスクを実行する過度パラメータ付き完全連結ネットワークについて,この景観を考察する。一定温度における熱力学に類似したゆらぎの統計を解析し、低損失領域の明確な幾何学的記述を推定する。揺らぎから容易に次元が得られるような低次元多様体であることが分かる。さらに、この次元は、分類決定境界付近に存在するデータポイントの数によって制御される。重要なことは、決定境界の指数的性質と低損失領域の平坦性により、最小付近での損失の2次近似が根本的に不適切であることである。これにより、より高温で曲率の高い領域にダイナミクスを生じさせ、任意の温度で二次的な統計を発生させる。解析的に解析可能で観測されたゆらぎ統計を再現した簡易損失モデルを用いて,この挙動を説明する。 The training of neural networks is a complex, high-dimensional, non-convex and noisy optimization problem whose theoretical understanding is interesting both from an applicative perspective and for fundamental reasons. A core challenge is to understand the geometry and topography of the landscape that guides the optimization. In this work, we employ standard Statistical Mechanics methods, namely, phase-space exploration using Langevin dynamics, to study this landscape for an over-parameterized fully connected network performing a classification task on random data. Analyzing the fluctuation statistics, in analogy to thermal dynamics at a constant temperature, we infer a clear geometric description of the low-loss region. We find that it is a low-dimensional manifold whose dimension can be readily obtained from the fluctuations. Furthermore, this dimension is controlled by the number of data points that reside near the classification decision boundary. Importantly, we find that a quadratic approximation of the loss near the minimum is fundamentally inadequate due to the exponential nature of the decision boundary and the flatness of the low-loss region. This causes the dynamics to sample regions with higher curvature at higher temperatures, while producing quadratic-like statistics at any given temperature. We explain this behavior by a simplified loss model which is analytically tractable and reproduces the observed fluctuation statistics.	翻訳日:2023-04-05 16:27:15 公開日:2023-04-03
# 深層学習による素数分割可能性について On the Prime Number Divisibility by Deep Learning ( http://arxiv.org/abs/2304.01333v1 ) ライセンス: Link先を確認	Da Wu, Jingye Yang, Mian Umair Ahsan, Kai Wang	(参考訳) 与えられた整数を2、3または他の素数で割り切れるかどうかを決定するようなタスクは、人間にとって自明であるが、事前に特定されたアルゴリズムがなければ、コンピュータにとって簡単ではない。本稿では,複数のディープラーニングアーキテクチャと特徴工学的アプローチを検証し,小素数による大きな有限整数の可除性(最大2^{32}$)を決定するシナリオを評価した。その結果、ネットワークフレームワークやネットワーク構造(CNN、RNN、Transformerなど)の複雑さに関わらず、素数の可視性を予測する能力は、ディープラーニングモデルに供給される機能空間に依存することがわかった。また、Amazon、Google、Microsoftから入手可能なAutomated Machine Learning (AutoML)パイプラインを評価し、適切にエンジニアリングされた機能を提供しない限り、この問題に対処できないことを示した。さらに、フーリエ級数基底ベクトル上の通常の線形回帰を用いて、問題の閉形式解を提案し、その成功を示した。最後に,chatgptを用いたプロンプトベースの学習を評価し,小素数での成功と,大素数で明らかな失敗を実証した。機能工学は、AutoMLや大規模言語モデル(LLM)の時代においても、パフォーマンスの向上、解釈可能性の向上、マシンラーニング/深層学習モデルの複雑さの低減に引き続き重要な課題である、と結論付けている。 Certain tasks such as determining whether a given integer can be divided by 2, 3, or other prime numbers may be trivial for human beings, but can be less straightforward for computers in the absence of pre-specified algorithms. In this paper, we tested multiple deep learning architectures and feature engineering approaches, and evaluated the scenario of determining divisibility of large finite integers (up to $2^{32}$) by small prime numbers. It turns out that, regardless of the network frameworks or the complexity of the network structures (CNN, RNN, Transformer, etc.), the ability to predict the prime number divisibility critically depends on the feature space fed into the deep learning models. We also evaluated commercially available Automated Machine Learning (AutoML) pipelines from Amazon, Google and Microsoft, and demonstrated that they failed to address this issue unless appropriately engineered features were provided. We further proposed a closed form solution to the problem using the ordinary linear regression on Fourier series basis vectors, and showed its success. Finally, we evaluated prompt-based learning using ChatGPT and demonstrated its success on small primes and apparent failures on larger primes. We conclude that feature engineering remains an important task to improve the performance, increase the interpretability, and reduce the complexity of machine learning/deep learning models, even in the era of AutoML and large-language models (LLMs).	翻訳日:2023-04-05 16:26:55 公開日:2023-04-03
# 辞書なしでカスタムイベントデータを作成する:bag-of-tricks Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks ( http://arxiv.org/abs/2304.01331v1 ) ライセンス: Link先を確認	Andrew Halterman, Philip A. Schrodt, Andreas Beger, Benjamin E. Bagozzi, Grace I. Scarborough	(参考訳) テキストから自動的に抽出される「who did what to who」の構造化された記録は、国際政治学者にとって重要な資料である。新しいイベントデータセットを開発するコスト、特に手作り辞書に依存する自動システムを使用する場合、ほとんどの研究者は、特定の研究課題に最適化されたカスタマイズされたイベントデータセットを開発するのではなく、ICEWSのような大規模で既存のデータセットに頼っている。本稿では,自然言語処理(nlp)の最近の進歩を活かし,イベントデータセットを迅速に作成可能な,効率的なカスタムイベントデータ生成のための ‘bag of tricks' について述べる。そこで本稿では,大規模言語モデルと標準機械学習分類器を用いて,能動的学習によるイベントカテゴリ分類器を訓練し,アクターとアクターを識別し,NLPから<question-Awering'モデルを事前訓練し,アクターの言及をWikipediaの記事に分類する手法を提案する。これらのテクニックがICEWSに代わる,新たなPOLECATグローバルイベントデータセットを生成する方法と,より小型でカスタムなイベントデータセットを学者が迅速に生成する方法の例について説明する。新しいテクニックを実装するためのサンプルコードとモデルを公開する。 Event data, or structured records of ``who did what to whom'' that are automatically extracted from text, is an important source of data for scholars of international politics. The high cost of developing new event datasets, especially using automated systems that rely on hand-built dictionaries, means that most researchers draw on large, pre-existing datasets such as ICEWS rather than developing tailor-made event datasets optimized for their specific research question. This paper describes a ``bag of tricks'' for efficient, custom event data production, drawing on recent advances in natural language processing (NLP) that allow researchers to rapidly produce customized event datasets. The paper introduces techniques for training an event category classifier with active learning, identifying actors and the recipients of actions in text using large language models and standard machine learning classifiers and pretrained ``question-answering'' models from NLP, and resolving mentions of actors to their Wikipedia article to categorize them. We describe how these techniques produced the new POLECAT global event dataset that is intended to replace ICEWS, along with examples of how scholars can quickly produce smaller, custom event datasets. We publish example code and models to implement our new techniques.	翻訳日:2023-04-05 16:26:31 公開日:2023-04-03
# 文書類似性アルゴリズムの比較 A Comparison of Document Similarity Algorithms ( http://arxiv.org/abs/2304.01330v1 ) ライセンス: Link先を確認	Nicholas Gahman and Vinayak Elangovan	(参考訳) 文書類似性は自然言語処理の重要な部分であり、最も一般的には盗作検出やテキスト要約に使われる。したがって、最も効果的な文書類似性アルゴリズムを見つけることは、自然言語処理の分野に大きな影響を与える可能性がある。本報告では,多数の文書類似性アルゴリズムについて検討し,どれが最も有用かを決定する。統計アルゴリズム、ニューラルネットワーク、コーパス/知識ベースのアルゴリズムの3つのタイプの文書類似性アルゴリズムに分類することで、最も効果的な文書類似性アルゴリズムに対処する。各カテゴリでもっとも効果的なアルゴリズムは、各アルゴリズムが利用できるあらゆる可能な領域をテストする一連のベンチマークデータセットと評価を使用して、我々の研究で比較されます。 Document similarity is an important part of Natural Language Processing and is most commonly used for plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity algorithm could have a major positive impact on the field of Natural Language Processing. This report sets out to examine the numerous document similarity algorithms, and determine which ones are the most useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based algorithms. The most effective algorithms in each category are also compared in our work using a series of benchmark datasets and evaluations that test every possible area that each algorithm could be used in.	翻訳日:2023-04-05 16:26:06 公開日:2023-04-03
# ニューラル遅延微分方程式を用いた遅延学習 Learning the Delay Using Neural Delay Differential Equations ( http://arxiv.org/abs/2304.01329v1 ) ライセンス: Link先を確認	Maria Oprea and Mark Walth and Robert Stephany and Gabriella Torres Nothaft and Arnaldo Rodriguez-Gonzalez and William Clark	(参考訳) 機械学習と動的システムの交点は最近かなりの関心を集めている。ニューラルネットワークの常微分方程式(ノード)は、これらのフィールド間の重なりが豊富である。本稿では,遅延微分方程式(ddes)に基づく連続時間ニューラルネットワーク手法を提案する。本モデルでは,データからモデルパラメータと遅延を直接学習するために随伴感度法を用いる。我々のアプローチはNODEにインスパイアされ、遅延の値が先行値であることが仮定された初期のニューラルDDEモデルを拡張します。我々は,提案手法の感度解析を行い,ベンチマークシステムからddeパラメータを学習する能力を示す。我々は今後の方向性と応用の可能性で議論を終える。 The intersection of machine learning and dynamical systems has generated considerable interest recently. Neural Ordinary Differential Equations (NODEs) represent a rich overlap between these fields. In this paper, we develop a continuous time neural network approach based on Delay Differential Equations (DDEs). Our model uses the adjoint sensitivity method to learn the model parameters and delay directly from data. Our approach is inspired by that of NODEs and extends earlier neural DDE models, which have assumed that the value of the delay is known a priori. We perform a sensitivity analysis on our proposed approach and demonstrate its ability to learn DDE parameters from benchmark systems. We conclude our discussion with potential future directions and applications.	翻訳日:2023-04-05 16:25:55 公開日:2023-04-03
# チープフェイク検出のグランドチャレンジ Grand Challenge On Detecting Cheapfakes ( http://arxiv.org/abs/2304.01328v1 ) ライセンス: Link先を確認	Duc-Tien Dang-Nguyen and Sohail Ahmed Khan and Cise Midoglu and Michael Riegler and P{\aa}l Halvorsen and Minh-Son Dao	(参考訳) Cheapfake(チープフェイク)は、マルチメディアコンテンツの非AI(チープ)操作を含む最近作られた用語である。チープフェイクはディープフェイクよりも一般的であることが知られている。画像/ビデオ操作のための編集ソフトウェアを使って、あるいはソフトウェアを使わずに、単にメディアを誤解を招くクレームと共有することで、画像/ビデオのコンテキストを変更することで、安価なフェイクメディアを作成できる。このコンテキストの変更は、メディアのout-of-context(ooc)誤用と呼ばれる。 OOCメディアは、画像やビデオが改ざんされないため、偽メディアよりもずっと検出が難しい。本稿では,OOC画像の検出に焦点をあてるとともに,ニュース記事中の画像キャプションと矛盾する実画像の誤用に着目した。この課題の目的は、最近コンパイルされたCOSMOSデータセットに基づいて、与えられたサンプル(新しい画像と関連するキャプション)がOOCであるかどうかを検出できるモデルを開発し、ベンチマークすることである。 Cheapfake is a recently coined term that encompasses non-AI ("cheap") manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration of context is referred to as out-of-context (OOC) misuse of media. OOC media is much harder to detect than fake media, since the images and videos are not tampered. In this challenge, we focus on detecting OOC images, and more specifically the misuse of real photographs with conflicting image captions in news items. The aim of this challenge is to develop and benchmark models that can be used to detect whether given samples (news image and associated captions) are OOC, based on the recently compiled COSMOS dataset.	翻訳日:2023-04-05 16:25:46 公開日:2023-04-03
# lidarによる動的物体の3次元追跡と状態推定 Lidar based 3D Tracking and State Estimation of Dynamic Objects ( http://arxiv.org/abs/2304.01396v1 ) ライセンス: Link先を確認	Patil Shubham Suresh, Gautham Narayan Narasimhan	(参考訳) 対向車の状態推定: 初期の研究は、自走車の位置、速度、方向、角速度などの状態の決定に基づいている。提案手法は,運動計画や意思決定に不可欠な非自走車の状態推定に重点を置いている。ダイナミックシーンベースのローカライゼーション: 私たちのプロジェクトは、移動エゴ(自己)や非エゴ車両のような動的シーンで作業します。以前の手法は静的環境に重点を置いていた。 State estimation of oncoming vehicles: Earlier research has been based on determining states like position, velocity, orientation , angular velocity, etc of ego-vehicle. Our approach focuses on estimating the states of non-ego vehicles which is crucial for Motion planning and decision-making. Dynamic Scene Based Localization: Our project will work on dynamic scenes like moving ego (self) and non-ego vehicles. Previous methods were focused on static environments.	翻訳日:2023-04-05 16:19:31 公開日:2023-04-03
# クラスタ化システム同定によるパーソナライズモデル学習 Learning Personalized Models with Clustered System Identification ( http://arxiv.org/abs/2304.01395v1 ) ライセンス: Link先を確認	Leonardo F. Toso, Han Wang, James Anderson	(参考訳) 線形系モデルを異なる系力学から複数の軌道を観測することから学習する問題に対処する。このフレームワークは、システムの類似性に応じて、複数のシステムが彼らのダイナミクスをクラスタに分割する、協調的なシナリオを含んでいる。したがって、同じクラスタ内のシステムは、他のクラスタによる観測の恩恵を受けることができる。この枠組みを考慮して,各システムがクラスタのアイデンティティを交互に推定し,そのダイナミクスを推定するアルゴリズムを提案する。そして、これを集約して各クラスタのモデルを更新する。軽度の仮定では,クラスタのアイデンティティを正確に推定し,クラスタ内のシステム数と逆スケールする近似的なサンプル複雑性を実現し,より効率的かつパーソナライズされたシステム識別プロセスを実現する。 We address the problem of learning linear system models from observing multiple trajectories from different system dynamics. This framework encompasses a collaborative scenario where several systems seeking to estimate their dynamics are partitioned into clusters according to their system similarity. Thus, the systems within the same cluster can benefit from the observations made by the others. Considering this framework, we present an algorithm where each system alternately estimates its cluster identity and performs an estimation of its dynamics. This is then aggregated to update the model of each cluster. We show that under mild assumptions, our algorithm correctly estimates the cluster identities and achieves an approximate sample complexity that scales inversely with the number of systems in the cluster, thus facilitating a more efficient and personalized system identification process.	翻訳日:2023-04-05 16:19:21 公開日:2023-04-03
# グラフ上の反事実学習:調査 Counterfactual Learning on Graphs: A Survey ( http://arxiv.org/abs/2304.01391v1 ) ライセンス: Link先を確認	Zhimeng Guo, Teng Xiao, Charu Aggarwal, Hui Liu, Suhang Wang	(参考訳) グラフ構造化データは、ソーシャルネットワーク、分子グラフ、トランザクションネットワークなどの現実世界で広く利用されている。グラフニューラルネットワーク(gnns)は、グラフでの表現学習において大きな成功を収め、さまざまな下流タスクを効率化した。しかし、GNNには、解釈可能性の欠如や、トレーニングデータのバイアスを容易に受け継ぎ、カジュアルな関係をモデル化できないといった欠点がいくつかある。近年,グラフ上の反実的学習は,これらの欠点を緩和する有望な結果を示している。グラフ上の反実的公正性、説明可能性、リンク予測などに対する様々なグラフ反実的学習手法が提案されている。この有望な方向性の展開を促進するため,本調査では,グラフ反事実学習に関する論文を分類し,総合的にレビューする。既存の手法を研究課題に基づいて4つのカテゴリに分けた。それぞれのカテゴリについて、バックグラウンドとモチベーションの例、既存の作品を要約する一般的なフレームワーク、そしてこれらの作品の詳細なレビューを提供する。我々は,グラフ構造化データ,反事実学習,実世界のアプリケーションとの交点における将来研究の方向性を指摘する。今後の研究のためのリソースの総合的なビューを提供するため、オープンソース実装、パブリックデータセット、そして一般的に使用される評価指標のコレクションをコンパイルする。この調査は、グラフの反事実学習カテゴリと現在のリソースの統一的な理解を構築するための 'one-stop-shop' として機能することを目的としている。また、文書やリソースのリポジトリも維持しており、リポジトリ https://github.com/TimeLovercc/Awesome-Graph-Causal-Learning.orgの更新を続けます。 Graph-structured data are pervasive in the real-world such as social networks, molecular graphs and transaction networks. Graph neural networks (GNNs) have achieved great success in representation learning on graphs, facilitating various downstream tasks. However, GNNs have several drawbacks such as lacking interpretability, can easily inherit the bias of the training data and cannot model the casual relations. Recently, counterfactual learning on graphs has shown promising results in alleviating these drawbacks. Various graph counterfactual learning approaches have been proposed for counterfactual fairness, explainability, link prediction and other applications on graphs. To facilitate the development of this promising direction, in this survey, we categorize and comprehensively review papers on graph counterfactual learning. We divide existing methods into four categories based on research problems studied. For each category, we provide background and motivating examples, a general framework summarizing existing works and a detailed review of these works. We point out promising future research directions at the intersection of graph-structured data, counterfactual learning, and real-world applications. To offer a comprehensive view of resources for future studies, we compile a collection of open-source implementations, public datasets, and commonly-used evaluation metrics. This survey aims to serve as a ``one-stop-shop'' for building a unified understanding of graph counterfactual learning categories and current resources. We also maintain a repository for papers and resources and will keep updating the repository https://github.com/TimeLovercc/Awesome-Graph-Causal-Learning.	翻訳日:2023-04-05 16:19:08 公開日:2023-04-03
# カイラル有効場論演算子を持つ$A = 3$核の磁気モーメント Magnetic moments of $A = 3$ nuclei with chiral effective field theory operators ( http://arxiv.org/abs/2304.01389v1 ) ライセンス: Link先を確認	Soham Pal (1), Shiplu Sarker (1), Patrick J. Fasano (2), Pieter Maris (1), James P. Vary (1), Mark A. Caprio (2) ((1) Iowa State University, (2) University of Notre-Dame)	(参考訳) カイラル有効場理論(英語版)(\chi$EFT)は、第一原理から体系的に即興的な方法で核間相互作用を得るための枠組みを提供し、一貫した電気弱電流作用素の導出を提供する。本研究では,TritonとHelium-3の磁気双極子モーメントの計算に一貫した相互作用と電流を適用した。半局所座標空間(SCS)正則化を用いて得られるLENPIC相互作用に着目した。 LENPIC $\chi$EFTベクトル電流の運動量空間表現から、N2LOを通したSCS正規化磁気双極子作用素を導出する。次に,n2loにおけるscsレンピック相互作用を$\chi$eftで利用し,トリトンおよびヘリウム3系の非核殻モデル計算を行い,一核子及び二核子電磁電流を用いた磁気双極子モーメントの評価を行った。以前の$\chi$EFTの電流で予測されたように、N2LOによる電流補正はトリトンとヘリウム3の磁気双極子モーメントの実験と一致している。 Chiral effective field theory ($\chi$EFT) provides a framework for obtaining internucleon interactions in a systematically improvable fashion from first principles, while also providing for the derivation of consistent electroweak current operators. In this work, we apply consistently derived interactions and currents towards calculating the magnetic dipole moments of the $A=3$ systems Triton and Helium-3. We focus here on LENPIC interactions obtained using semilocal coordinate-space (SCS) regularization. Starting from the momentum-space representation of the LENPIC $\chi$EFT vector current, we derive the SCS-regularized magnetic dipole operator up through N2LO. We then carry out no-core shell model calculations for Triton and Helium-3 systems, using the SCS LENPIC interaction at N2LO in $\chi$EFT, and evaluate the magnetic dipole moments obtained using the consistently derived one-nucleon and two-nucleon electromagnetic currents. As anticipated by prior results with $\chi$EFT currents, the current corrections through N2LO provide improved, but not yet complete, agreement with experiment for the Triton and Helium-3 magnetic dipole moments.	翻訳日:2023-04-05 16:18:44 公開日:2023-04-03
# PoseMatcher: 深部特徴マッチングによる1ショット6Dオブジェクトポス推定 PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching ( http://arxiv.org/abs/2304.01382v1 ) ライセンス: Link先を確認	Pedro Castro, Tae-Kyun Kim	(参考訳) 見えないオブジェクトのポーズを推定することは、挑戦的なワンショットポーズ推定タスクの目標である。これまでの手法は機能マッチングと大きな成功に大きく依存していた。しかし、これらの手法は、特にポーズ推定のために設計されていない事前訓練されたモデルに依存しているため、しばしば非効率で制限される。本稿では,これらの制約を克服したモデルフリーワンショットオブジェクトポーズ推定器PoseMatcherを提案する。 3つのビューシステムに基づいて、オブジェクトとイメージのマッチングのための新しいトレーニングパイプラインを作成しました。このシンプルで効果的なアプローチは、トレーニング中の完全なオブジェクトポイントクラウドの近似を安価に構築することで、テスト時間のシナリオをエミュレートする。本稿では,PoseMatcherが入力モード,イメージ,ポイントクラウドの異なる部分への参加を可能にするために,入力間の自己と相互の注意を効率的に収容する新しい注意層であるIO-Layerを導入する。さらに,対象オブジェクトの冗長領域を反復的に除去し,精度を維持しつつ,ネットワークの複雑さやノイズをさらに低減するプルーニング戦略を提案する。最後に、ポーズリファインメント戦略、ズームと2Dオフセットリファインメントを再設計し、それらをワンショットパラダイムに適応させました。 linemod と ycb-v のデータセット上で,事前のワンショットポーズ推定手法を上回り,最近のインスタンスレベル手法に匹敵する結果を得る。ソースコードとモデルはhttps://github.com/pedrocastro/posematcherで入手できる。 Estimating the pose of an unseen object is the goal of the challenging one-shot pose estimation task. Previous methods have heavily relied on feature matching with great success. However, these methods are often inefficient and limited by their reliance on pre-trained models that have not be designed specifically for pose estimation. In this paper we propose PoseMatcher, an accurate model free one-shot object pose estimator that overcomes these limitations. We create a new training pipeline for object to image matching based on a three-view system: a query with a positive and negative templates. This simple yet effective approach emulates test time scenarios by cheaply constructing an approximation of the full object point cloud during training. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer, a new attention layer that efficiently accommodates self and cross attention between the inputs. Moreover, we propose a pruning strategy where we iteratively remove redundant regions of the target object to further reduce the complexity and noise of the network while maintaining accuracy. Finally we redesign commonly used pose refinement strategies, zoom and 2D offset refinements, and adapt them to the one-shot paradigm. We outperform all prior one-shot pose estimation methods on the Linemod and YCB-V datasets as well achieve results rivaling recent instance-level methods. The source code and models are available at https://github.com/PedroCastro/PoseMatcher.	翻訳日:2023-04-05 16:18:24 公開日:2023-04-03
# 機械学習を用いたパッシブ光ネットワークにおける故障分岐同定 Faulty Branch Identification in Passive Optical Networks using Machine Learning ( http://arxiv.org/abs/2304.01376v1 ) ライセンス: Link先を確認	Khouloud Abdelli, Carsten Tropschug, Helmut Griesser, and Stephan Pachnicke	(参考訳) パッシブ光ネットワーク(PON)は有望なブロードバンドアクセスネットワークソリューションとなっている。信頼できる送信を確実にし、サービスレベルの合意を満たすためには、ネットワーク障害を迅速に識別しローカライズするために、ponシステムを常に監視する必要がある。通常、PONシステムにおけるサービス中断は、主にファイバカットと光ネットワークユニット(ONU)の送信機/受信機故障に起因する。 ONUが光線端末(OLT)と異なる距離にある場合、記録された光時間領域反射率(OTDR)トレースを分析して故障したONUまたは分岐を特定することができる。しかし、同じ長さの2つ以上の枝に由来する反射が重なり合うと、故障枝の分離が非常に困難になるため、大域的な後方散乱信号による故障枝の判別が困難になる。近年、機械学習(ML)に基づくアプローチは、PONシステムにおける光学的欠陥を管理する大きな可能性を示している。このようなテクニックは、同じPONシステムから派生したデータでトレーニングやテストを行うときによく機能する。しかし、ponシステム(トレーニングデータの生成に採用)が変化した場合、例えば、より多くのブランチを追加したり、隣り合う2つのブランチの長さ差を変更したりすることで、パフォーマンスが著しく低下する可能性がある。などネットワーク変更毎にmlモデルを再トレーニングする必要があるため、時間を要する可能性がある。本稿では,ネットワークアーキテクチャとは独立に学習した汎用MLアプローチを提案し,近接長の分岐に対してOTDR信号が与えられたPONシステムの障害分岐を特定する。このようなアプローチは、ネットワークの変更毎に再トレーニングされることなく、任意のPONシステムに適用することができる。提案手法はPONシステムから得られた実験データを用いて検証する。 Passive optical networks (PONs) have become a promising broadband access network solution. To ensure a reliable transmission, and to meet service level agreements, PON systems have to be monitored constantly in order to quickly identify and localize networks faults. Typically, a service disruption in a PON system is mainly due to fiber cuts and optical network unit (ONU) transmitter/receiver failures. When the ONUs are located at different distances from the optical line terminal (OLT), the faulty ONU or branch can be identified by analyzing the recorded optical time domain reflectometry (OTDR) traces. However, faulty branch isolation becomes very challenging when the reflections originating from two or more branches with similar length overlap, which makes it very hard to discriminate the faulty branches given the global backscattered signal. Recently, machine learning (ML) based approaches have shown great potential for managing optical faults in PON systems. Such techniques perform well when trained and tested with data derived from the same PON system. But their performance may severely degrade, if the PON system (adopted for the generation of the training data) has changed, e.g. by adding more branches or varying the length difference between two neighboring branches. etc. A re-training of the ML models has to be conducted for each network change, which can be time consuming. In this paper, to overcome the aforementioned issues, we propose a generic ML approach trained independently of the network architecture for identifying the faulty branch in PON systems given OTDR signals for the cases of branches with close lengths. Such an approach can be applied to an arbitrary PON system without requiring to be re-trained for each change of the network. The proposed approach is validated using experimental data derived from PON system.	翻訳日:2023-04-05 16:18:00 公開日:2023-04-03
# Pythia: トレーニングとスケーリングを対象とする大規模言語モデル分析スイート Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling ( http://arxiv.org/abs/2304.01373v1 ) ライセンス: Link先を確認	Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal	(参考訳) 大規模言語モデル(llm)は、トレーニングの過程でどのように発展し進化するのか? モデルがスケールするにつれて、これらのパターンはどのように変化するのか? これらの疑問に答えるために、我々は、同じ順序で、70Mから12Bのパラメータで見られる公開データに基づいてトレーニングされた16のLLMからなるスイートである、textit{Pythia}を紹介した。 16モデルごとに154のチェックポイントをパブリックアクセスし、トレーニングデータローダをダウンロードして再構築し、さらなる研究を行うためのツールを提供します。我々は,様々な分野の研究を容易にするために,<textit{pythia> を意図しており,記憶の新規な結果,短期の頻度効果,性別バイアスの低減など,いくつかの事例研究を行っている。この高度に制御されたセットアップは、llmとそのトレーニングダイナミクスに対する新たな洞察を得られることを実証する。トレーニングされたモデル、分析コード、トレーニングコード、トレーニングデータはhttps://github.com/EleutherAI/pythia.comにある。 How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at https://github.com/EleutherAI/pythia.	翻訳日:2023-04-05 16:17:32 公開日:2023-04-03
# 閉曲線に対するガウスモデル Gaussian model for closed curves ( http://arxiv.org/abs/2304.01367v1 ) ライセンス: Link先を確認	Krzysztof Byrski, Przemys{\l}aw Spurek, Jacek Tabor	(参考訳) ガウス混合モデル(GMM)は、曲線データや強い非線形データにうまく適応しない。しかし、この問題を解くために、曲線座標系においてガウス的を用いることができる。さらに、そのような解は、関数の族によって定義される複雑な形状へのクラスタの適応を可能にする。しかしそれでも、クラスタを閉じた曲線(円、楕円など)としてモデル化することは困難である。本研究では,データ中の複雑なテンプレートを検出するために使用できる閉曲線の密度表現を提案する。この目的のために、閉曲線をモデル化するための新しい確率分布を定義する。そして、そのような分布の混合を構築し、一次元閉曲線の場合、効果的に訓練できることを示す。 Gaussian Mixture Models (GMM) do not adapt well to curved and strongly nonlinear data. However, we can use Gaussians in the curvilinear coordinate systems to solve this problem. Moreover, such a solution allows for the adaptation of clusters to the complicated shapes defined by the family of functions. But still, it is challenging to model clusters as closed curves (e.g., circles, ellipses, etc.). In this work, we propose a density representation of the closed curve, which can be used to detect the complicated templates in the data. For this purpose, we define a new probability distribution to model closed curves. Then we construct a mixture of such distributions and show that it can be effectively trained in the case of the one-dimensional closed curves.	翻訳日:2023-04-05 16:17:16 公開日:2023-04-03
# 自律型サイバーエージェントのためのネットワークAIジャムの実現 Enabling A Network AI Gym for Autonomous Cyber Agents ( http://arxiv.org/abs/2304.01366v1 ) ライセンス: Link先を確認	Li Li, Jean-Pierre S. El Rami, Adrian Taylor, James Hailing Rao, Thomas Kunz	(参考訳) 本研究の目的は、強化・深層強化学習(RL/DRL)を適用し、ネットワークサイバーオペレーション(CyOps)のための自律エージェントの実現である。要求されるRLトレーニング環境は、実際のネットワークエミュレーションによって達成される高忠実さの必要性と、シミュレーションを使用して最も達成される多数のトレーニングエピソードを実行する必要性のバランスをとる必要があるため、特に困難である。シミュレーションされたCyGIL-EがシミュレートされたCyGIL-Sを自動生成する統合学習環境であるCyGIL(CyGIL)を開発する。予備実験の結果から、CyGIL-SはCyGIL-Eに必要な日数と比較して数分でエージェントを訓練することができる。 CyGIL-Sで訓練されたエージェントは、エミュレートされた「リアル」ネットワークで完全な意思決定能力を示すCyGIL-Eに直接転送可能である。オフラインでRLを実行するCyGILソリューションは、現実のサイバーネットワークでRLエージェントを活用するためのsim-to-realに向けた有望な方向を示す。 This work aims to enable autonomous agents for network cyber operations (CyOps) by applying reinforcement and deep reinforcement learning (RL/DRL). The required RL training environment is particularly challenging, as it must balance the need for high-fidelity, best achieved through real network emulation, with the need for running large numbers of training episodes, best achieved using simulation. A unified training environment, namely the Cyber Gym for Intelligent Learning (CyGIL) is developed where an emulated CyGIL-E automatically generates a simulated CyGIL-S. From preliminary experimental results, CyGIL-S is capable to train agents in minutes compared with the days required in CyGIL-E. The agents trained in CyGIL-S are transferrable directly to CyGIL-E showing full decision proficiency in the emulated "real" network. Enabling offline RL, the CyGIL solution presents a promising direction towards sim-to-real for leveraging RL agents in real-world cyber networks.	翻訳日:2023-04-05 16:17:07 公開日:2023-04-03
# 言語横断プラジャリズム検出の簡便かつ効果的な方法 A Simple and Effective Method of Cross-Lingual Plagiarism Detection ( http://arxiv.org/abs/2304.01352v1 ) ライセンス: Link先を確認	Karen Avetisyan, Arthur Malajyan, Tsolak Ghukasyan	(参考訳) 本稿では,多数の言語に適用可能な単純な言語間プラジャリズム検出手法を提案する。提案手法は,候補検索タスクにオープンな多言語セサリと,詳細な解析に事前訓練された多言語BERT言語モデルを利用する。この方法は、使用時に機械翻訳や単語認識の曖昧さに依存しないため、非ソース言語を含む多数の言語に適している。提案手法の有効性は、いくつかの既存および新しいベンチマークで実証され、フランス語、ロシア語、アルメニア語の最先端の結果が得られた。 We present a simple cross-lingual plagiarism detection method applicable to a large number of languages. The presented approach leverages open multilingual thesauri for candidate retrieval task and pre-trained multilingual BERT-based language models for detailed analysis. The method does not rely on machine translation and word sense disambiguation when in use, and therefore is suitable for a large number of languages, including under-resourced languages. The effectiveness of the proposed approach is demonstrated for several existing and new benchmarks, achieving state-of-the-art results for French, Russian, and Armenian languages.	翻訳日:2023-04-05 16:16:49 公開日:2023-04-03
# Maxwell's Demon for Emergent Page Curve and Split Property Maxwell's Demon for Emergent Page Curve and Split Property ( http://arxiv.org/abs/2304.01414v1 ) ライセンス: Link先を確認	Yang An	(参考訳) 緊急重力の適切な状況を求めて,最近我々は,島の発達の非重力結合構造に類似した極端表面が変化する場合にのみエントロピー機構が発生することを明らかにした。本稿では,これらのユークリッド状態の進化を見出すためには,外力が$F_{ex}\propto T_{H}\delta A(\mu_a)$である必要がある。我々は、降下傾向を第2法則傾向に類似させ、ホーキング放射の潜在的傾向を探索する。この類似性はマクスウェルの悪魔の役割を想起させ、近接宇宙においてページ曲線が現れるための浴槽のメカニズムを解釈する。これは、ラジュによって提起された量子重力に関するスプリット問題を解決できる。 Seeking for the proper situation of Emergent Gravity, we recent reveal that the entropic mechanism only happens when extremal surfaces are varied, which is similar to the non-gravitational-bath-coupled setup of the island development. In this paper, we consider perturbing thin shell state outside horizon during equilibrating, to find the evolution of these Euclidean states requires an external force to be $ F_{ex}\propto T_{H}\delta A(\mu_a)$, proportional to area variation of the apparent horizon which could transform into the actual event horizon as extremal surface. We analogize the falling tendency to the 2nd law tendency, and then explore the potential tendency violation of Hawking radiation. This analogy recalls the role of Maxwell's Demon to interpret the mechanism of the bath for the Page curve to emerge in a close universe. It could reconcile the Split Problem concerning quantum gravity raised by Raju.	翻訳日:2023-04-05 16:10:13 公開日:2023-04-03
# 量子等化に対するコヒーレントLQGアプローチ A Coherent LQG approach to Quantum Equalization ( http://arxiv.org/abs/2304.01413v1 ) ライセンス: Link先を確認	Rebbecca TY Thien, Shanon L. Vuglar and Ian R. Petersen	(参考訳) 量子等化問題を解くために,サブオプティカルでコヒーレントな量子lqgコントローラを設計する手法を提案する。本手法では,制御問題として問題を再構成し,古典的なLQGコントローラを設計し,量子システムとして実装する。例としては、アクティブシステムとパッシブシステムの両方のアルゴリズム、すなわち、運動量演算子と運動量演算子の両方で力学が記述されるシステムと、消滅演算子のみの力学を持つシステムを示す。 We propose a method to design a suboptimal, coherent quantum LQG controller to solve a quantum equalization problem. Our method involves reformulating the problem as a control problem and then designing a classical LQG controller and implementing it as a quantum system. Illustrative examples are included which demonstrate the algorithm for both active and passive systems, i.e., systems where the dynamics are described in terms of both position and momentum operators and systems with dynamics in terms of annihilation operators only.	翻訳日:2023-04-05 16:09:54 公開日:2023-04-03
# statcan dialogue dataset: 真の意図による会話によるデータテーブルの検索 The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents ( http://arxiv.org/abs/2304.01412v1 ) ライセンス: Link先を確認	Xing Han Lu, Siva Reddy, Harm de Vries	(参考訳) 我々は、StatCan Dialogue Datasetを導入し、カナダ統計局で働いているエージェントと、公開データテーブルを探しているオンラインユーザとの間で19,379の会話を交わした。会話は本質的な意図に起因し、英語やフランス語で行われ、5000以上の複雑なデータテーブルの1つを取得するエージェントに繋がる。このデータセットに基づいて,(1)現在進行中の会話に基づく関連表の自動検索,(2)各ターンにおける適切なエージェント応答の自動生成の2つのタスクを提案する。我々は,強いベースラインを確立することで各タスクの難しさを調査する。時間的データ分割の実験では、検証からテストセットに移行するとき、両方のタスク間でパフォーマンスが大幅に低下するのを観察するため、すべてのモデルが将来の会話に一般化するのに苦労していることが明らかになりました。さらに、応答生成モデルは、いつテーブルを返すかを決定するのに苦労している。タスクが既存のモデルに重大な課題をもたらすことを考慮し、私たちはコミュニティにタスクのためのモデル開発を奨励します。 We introduce the StatCan Dialogue Dataset consisting of 19,379 conversation turns between agents working at Statistics Canada and online users looking for published data tables. The conversations stem from genuine intents, are held in English or French, and lead to agents retrieving one of over 5000 complex data tables. Based on this dataset, we propose two tasks: (1) automatic retrieval of relevant tables based on a on-going conversation, and (2) automatic generation of appropriate agent responses at each turn. We investigate the difficulty of each task by establishing strong baselines. Our experiments on a temporal data split reveal that all models struggle to generalize to future conversations, as we observe a significant drop in performance across both tasks when we move from the validation to the test set. In addition, we find that response generation models struggle to decide when to return a table. Considering that the tasks pose significant challenges to existing models, we encourage the community to develop models for our task, which can be directly used to help knowledge workers find relevant tables for live chat users.	翻訳日:2023-04-05 16:09:43 公開日:2023-04-03
# キャビティ媒介型集合モーメント交換相互作用 Cavity-Mediated Collective Momentum-Exchange Interactions ( http://arxiv.org/abs/2304.01411v1 ) ライセンス: Link先を確認	Chengyi Luo, Haoqing Zhang, Vanessa P. W. Koh, John D. Wilson, Anjun Chu, Murray J. Holland, Ana Maria Rey, and James K. Thompson	(参考訳) 量子シミュレーションとセンシングは、複雑な相互作用系の理解から未発見の物理学の探索まで、自然に新たな洞察を提供するという大きな約束を持っている。無限距離光子を媒介する相互作用によって相互作用するレーザー冷却原子の大規模なアンサンブルは、両方の試みの強力な基盤である。ここでは、原子が共通のキャビティモードから光子の集団放出と吸収を通じて運動量状態と交換する最初の運動量交換相互作用を実現する。運動量-交換相互作用は、物質波干渉計におけるオール・ツー・オール・アイシングのような相互作用をもたらす。多体エネルギーギャップも出現し、干渉計の物質波パケットを効果的に結合してドップラー劣化を抑制する。調整可能な運動量-交換相互作用は、量子相互作用による物質-波干渉法と超伝導体や力学ゲージ場のシミュレーションを含むエキゾチックな挙動を実現するための新しい能力を提供する。 Quantum simulation and sensing hold great promise for providing new insights into nature, from understanding complex interacting systems to searching for undiscovered physics. Large ensembles of laser-cooled atoms interacting via infinite-range photon mediated interactions are a powerful platform for both endeavours. Here, we realize for the first time momentum-exchange interactions in which atoms exchange their momentum states via collective emission and absorption of photons from a common cavity mode. The momentum-exchange interaction leads to an observed all-to-all Ising-like interaction in a matter-wave interferometer, which is useful for entanglement generation. A many-body energy gap also emerges, effectively binding interferometer matter-wave packets together to suppress Doppler dephasing, akin to M\"ossbauer spectroscopy. The tunable momentum-exchange interaction provides a new capability for quantum interaction-enhanced matter-wave interferometry and for realizing exotic behaviors including simulations of superconductors and dynamical gauge fields.	翻訳日:2023-04-05 16:09:23 公開日:2023-04-03
# 実現可能性保証付き2段直流最適潮流の効率的な学習型解法 An Efficient Learning-Based Solver for Two-Stage DC Optimal Power Flow with Feasibility Guarantees ( http://arxiv.org/abs/2304.01409v1 ) ライセンス: Link先を確認	Ling Zhang, Daniel Tabas and Baosen Zhang	(参考訳) 本稿では,負荷が不確実性に直面している場合の最適かつ信頼性の高いディスパッチのためのシナリオベース2段階直流最適電力流(OPF)問題を考察する。この問題は線形プログラムであるが、不確実性を正確に表わすのに必要な多数のシナリオのため、計算的に解決が難しいままである。計算問題を軽減するため、第2段階の決定をより効率的に処理できるように、多くの手法が提案されている。第二段階の決定を近似する適切なポリシーを見つける上での課題は、これらのソリューションが実現可能である必要があることである。そこで本稿では,この2段階問題をより効率的かつ最適な方法で解くための学習法を提案する。ゲージマップと呼ばれる手法が学習アーキテクチャ設計に組み込まれ、学習したソリューションがネットワーク制約に対して実現可能であることを保証する。すなわち、実現可能なソリューションのみを出力するフォワード関数をフィードするポリシーを設計できる。標準IEEEシステムにおけるシミュレーション結果から, 反復解法や広く用いられているアフィンポリシと比較して, 提案手法は良質な解を学習するだけでなく, 桁違いの計算を高速化することを示した。 In this paper, we consider the scenario-based two-stage stochastic DC optimal power flow (OPF) problem for optimal and reliable dispatch when the load is facing uncertainty. Although this problem is a linear program, it remains computationally challenging to solve due to the large number of scenarios needed to accurately represent the uncertainties. To mitigate the computational issues, many techniques have been proposed to approximate the second-stage decisions so they can dealt more efficiently. The challenge of finding good policies to approximate the second-stage decisions is that these solutions need to be feasible, which has been difficult to achieve with existing policies. To address these challenges, this paper proposes a learning method to solve the two-stage problem in a more efficient and optimal way. A technique called the gauge map is incorporated into the learning architecture design to guarantee the learned solutions' feasibility to the network constraints. Namely, we can design policies that are feed forward functions that only output feasible solutions. Simulation results on standard IEEE systems show that, compared to iterative solvers and the widely used affine policy, our proposed method not only learns solutions of good quality but also accelerates the computation by orders of magnitude.	翻訳日:2023-04-05 16:09:07 公開日:2023-04-03
# 目標情報の拡張による学習:フィードバックアライメントの代替理論 Learning with augmented target information: An alternative theory of Feedback Alignment ( http://arxiv.org/abs/2304.01406v1 ) ライセンス: Link先を確認	Huzi Cheng, Joshua W. Brown	(参考訳) エラーバックプロパゲーション(bp)は、ほぼ全ての現代のニューラルネットワークのトレーニングを長い間支配してきたが、対称ウェイト要件や同期更新など、いくつかの生物学的な可能性の問題に悩まされている。フィードバックアライメント(FA)はBPの代替として提案され、様々なタスクやネットワークアーキテクチャに有効であることが示されている。その単純さと有効性にもかかわらず、さまざまなアーキテクチャでFAがどのように機能するかという満足のいく説明はまだ欠けている。本稿では、FAが情報理論のレンズを通してどのように機能するかという新しいアーキテクチャに依存しない理論を提案する: BPが計算した勾配を同じパラメータで近似する代わりに、FAはトレーニング対象情報をニューラルネットワークに埋め込むことで効果的な表現を学習する。理想的な設定におけるFAダイナミクスの分析と、一連の実験を通してこれを示す。この理論の意義に基づき、我々は3種類のFAを設計し、複数のタスクで同等の性能を示す。これらの変種は、予測符号化や表現の漂流のような神経科学のいくつかの現象や理論も説明できる。 While error backpropagation (BP) has dominated the training of nearly all modern neural networks for a long time, it suffers from several biological plausibility issues such as the symmetric weight requirement and synchronous updates. Feedback Alignment (FA) was proposed as an alternative to BP to address those dilemmas and has been demonstrated to be effective on various tasks and network architectures. Despite its simplicity and effectiveness, a satisfying explanation of how FA works across different architectures is still lacking. Here we propose a novel, architecture-agnostic theory of how FA works through the lens of information theory: Instead of approximating gradients calculated by BP with the same parameter, FA learns effective representations by embedding target information into neural networks to be trained. We show this through the analysis of FA dynamics in idealized settings and then via a series of experiments. Based on the implications of this theory, we designed three variants of FA and show their comparable performance on several tasks. These variants also account for some phenomena and theories in neuroscience such as predictive coding and representational drift.	翻訳日:2023-04-05 16:08:47 公開日:2023-04-03
# ワークミーティングにおけるアバター--フォトリアリズムとアピールの関係 Avatars in Work Meetings: Correlation Between Photorealism and Appeal ( http://arxiv.org/abs/2304.01405v1 ) ライセンス: Link先を確認	Vrushank Phadnis, Kristin Moore and Mar Gonzalez Franco	(参考訳) 職場会議におけるアバターの受容性に及ぼすリアリズムの影響を検討した。 2509人の知識労働者を対象に、アニメーションGIFを用いて5レベルのフォトリアリズムを検証した。アバターのスタイルは、マネージャ、既知の同僚、未知の同僚によって使用された。すべてのシナリオにおいて、より高いリアリズムが好まれることがわかったが、完全に現実的なアバターは時々参加者に不利であると認識された。調査結果をセグメンテーションして,調査回答の年齢層と組織パターンを調査した。最後に,オープンエンド反応を評価し,アバターの選択に影響を及ぼす要因の質的評価を行う。その結果,光リアリズムは作業アバターの選択における重要な属性であることがわかった。しかし、アバターを使った仕事仲間との地域選好や関係も役割を担っている可能性がある。アバター選択に影響を与える他の要因の探索は、職場でのアバター使用の影響をさらに理解するために必要である。 We investigated the effects of realism on acceptability of avatars for work meetings. Our survey of 2509 knowledge workers tested five levels of photorealism using animated GIFs. Avatar styles were rated for usage by: a manager, known colleague and unknown colleague. In all scenarios, we found that higher realism was favored; however fully realistic avatars were sometimes perceived as uncanny by participants. We segmented our results to uncover demographic and firmographic patterns in the survey responses. Lastly, we caveat our findings by evaluating open end responses to provide a qualitative evaluation of factors influencing avatar choices for work meetings. In conclusion, our findings suggest that photorealism is a key attribute in selecting work avatars. However, regional preferences and relationship with work colleagues using the avatar may also play a role. Exploration of other factors influencing work avatar selection is needed to further understand the implications of avatar use in the workplace.	翻訳日:2023-04-05 16:08:29 公開日:2023-04-03
# アクティブトランスファー学習に基づくレベルセット推定による材料表面の適応的欠陥領域同定 Adaptive Defective Area Identification in Material Surface Using Active Transfer Learning-based Level Set Estimation ( http://arxiv.org/abs/2304.01404v1 ) ライセンス: Link先を確認	Shota Hozumi, Kentaro Kutsukake, Kota Matsui, Syunya Kusakawa, Toru Ujihara, Ichiro Takeuchi	(参考訳) 材料キャラクタリゼーションでは、材料表面上の欠陥領域の同定が基本である。従来のアプローチでは、表面上の所定のメッシュグリッドポイントにおける関連物理特性をポイント単位で測定し、その特性が所望のレベルに達しない領域を決定する。より効率的に欠陥領域を同定するために,測定資源を優先的に使用して欠陥領域の境界を検出する適応マッピング手法を提案する。我々はこの問題をレベルセット推定(LSE)問題のアクティブラーニング(AL)として解釈する。 AL-based LSEの目標は、表面で定義される物理特性関数のレベルセットをできるだけ少数の測定値で決定することである。さらに, 同様の仕様の材料が繰り返し生産される状況に対処するため, 以前に作成された材料の情報を効果的に活用できるように, 転写学習手法を導入する。概念実証として,提案手法をシリコンウェハの赤帯推定問題に適用し,従来の手法よりもかなり低い測定コストで欠陥領域を同定できることを実証した。 In material characterization, identifying defective areas on a material surface is fundamental. The conventional approach involves measuring the relevant physical properties point-by-point at the predetermined mesh grid points on the surface and determining the area at which the property does not reach the desired level. To identify defective areas more efficiently, we propose adaptive mapping methods in which measurement resources are used preferentially to detect the boundaries of defective areas. We interpret this problem as an active-learning (AL) of the level set estimation (LSE) problem. The goal of AL-based LSE is to determine the level set of the physical property function defined on the surface with as small number of measurements as possible. Furthermore, to handle the situations in which materials with similar specifications are repeatedly produced, we introduce a transfer learning approach so that the information of previously produced materials can be effectively utilized. As a proof-of-concept, we applied the proposed methods to the red-zone estimation problem of silicon wafers and demonstrated that we could identify the defective areas with significantly lower measurement costs than those of conventional methods.	翻訳日:2023-04-05 16:08:17 公開日:2023-04-03
# U-Netmerが医療用画像セグメンテーション用トランスフォーマーを発表 U-Netmer: U-Net meets Transformer for medical image segmentation ( http://arxiv.org/abs/2304.01401v1 ) ライセンス: Link先を確認	Sheng He, Rina Bao, P. Ellen Grant, Yangming Ou	(参考訳) U-NetベースのディープラーニングモデルとTransformerの組み合わせは、医療画像セグメンテーションの新しいトレンドである。 U-Netは詳細な局所意味情報やテクスチャ情報を抽出でき、Transformerは入力画像中の画素間の長距離依存関係を学習することができる。しかし、セグメンテーションのためにTransformerを直接適用するには、‘token-flatten’問題(ローカルパッチを局所パッチ内のピクセル間の相互作用を損なう1Dトークンにフラット化する)と‘‘scale-sensitivity’問題(入力イメージをローカルパッチに分割するために固定スケールを使用する)がある。そこで本研究では,u-netとtransformerの直接結合と比較して,u-netとtransformerのグローバル・ローカルな組み合わせを提案する。提案するu-netmerは入力画像をローカルパッチに分割する。局所パッチ間のグローバルコンテキスト情報は、トランスフォーマおよびu-netセグメントにおける自己アテンション機構によって学習され、トークンに平ら化せずに各局所パッチが“トケンフラット”問題を解決する。u-netmerは、入力画像を同じ構造とパラメータで異なるパッチサイズでセグメント化することができる。したがって、u-netmerは、‘scale-sensitivity’問題を解決するために、異なるパッチサイズで訓練することができる。 7つの臓器 (脳, 心臓, 乳房, 肺, ポリープ, 膵, 前立腺) と4つの画像モダリティ (MRI, CT, 超音波, 内視鏡) を用いて, 提案したU-Netmerが医用画像セグメンテーションの精度を向上させるために一般的に適用可能であることを示す。これらの実験結果から,U-Netmerはベースラインや他のモデルと比較して最先端の性能を提供することがわかった。また、異なるスケールのU-Netmerの出力間の差は、グラウンドトルースのない難易度でテスト画像のランク付けに信頼性スコアとみなすことのできるセグメンテーション精度と線形に相関する。 The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has ``token-flatten" problem (flattens the local patches into 1D tokens which losses the interaction among pixels within local patches) and ``scale-sensitivity" problem (uses a fixed scale to split the input image into local patches). Compared to directly combining U-Net and Transformer, we propose a new global-local fashion combination of U-Net and Transformer, named U-Netmer, to solve the two problems. The proposed U-Netmer splits an input image into local patches. The global-context information among local patches is learnt by the self-attention mechanism in Transformer and U-Net segments each local patch instead of flattening into tokens to solve the `token-flatten" problem. The U-Netmer can segment the input image with different patch sizes with the identical structure and the same parameter. Thus, the U-Netmer can be trained with different patch sizes to solve the ``scale-sensitivity" problem. We conduct extensive experiments in 7 public datasets on 7 organs (brain, heart, breast, lung, polyp, pancreas and prostate) and 4 imaging modalities (MRI, CT, ultrasound, and endoscopy) to show that the proposed U-Netmer can be generally applied to improve accuracy of medical image segmentation. These experimental results show that U-Netmer provides state-of-the-art performance compared to baselines and other models. In addition, the discrepancy among the outputs of U-Netmer with different scales is linearly correlated to the segmentation accuracy which can be considered as a confidence score to rank test images by difficulty without ground-truth.	翻訳日:2023-04-05 16:08:01 公開日:2023-04-03
# 皮膚科医の信頼向上へのフィードバックに基づく皮膚病変分類のための説明可能なCNNの微調整 Fine-tuning of explainable CNNs for skin lesion classification based on dermatologists' feedback towards increasing trust ( http://arxiv.org/abs/2304.01399v1 ) ライセンス: Link先を確認	Md Abdul Kadir, Fabrizio Nunnari, Daniel Sonntag	(参考訳) 本稿では,分類そのものと分類の視覚的説明という2つの出力を同時にフィードバックできるCNNファインチューニング手法を提案する。皮膚病変分類タスクにおけるこのフィードバック戦略の効果を示し、CNNが2種類のユーザフィードバックにどう反応するかを測定する。このアプローチを実現するために,学習ループにおけるモデル決定を説明するため,Grad-CAM技術を統合した新しいCNNアーキテクチャを提案する。シミュレーションされたユーザフィードバックを用いて,分類と説明の両方を微調整することで,分類精度を保ちながら視覚的説明が向上し,CNNベースの皮膚病変分類器の信頼性が向上する可能性が示唆された。 In this paper, we propose a CNN fine-tuning method which enables users to give simultaneous feedback on two outputs: the classification itself and the visual explanation for the classification. We present the effect of this feedback strategy in a skin lesion classification task and measure how CNNs react to the two types of user feedback. To implement this approach, we propose a novel CNN architecture that integrates the Grad-CAM technique for explaining the model's decision in the training loop. Using simulated user feedback, we found that fine-tuning our model on both classification and explanation improves visual explanation while preserving classification accuracy, thus potentially increasing the trust of users in using CNN-based skin lesion classifiers.	翻訳日:2023-04-05 16:07:16 公開日:2023-04-03
# 特殊Q-ロスによる非線形MPCからの模倣学習とそのガウスニュートン近似 Imitation Learning from Nonlinear MPC via the Exact Q-Loss and its Gauss-Newton Approximation ( http://arxiv.org/abs/2304.01782v1 ) ライセンス: Link先を確認	Andrea Ghezzi, Jasper Hoffman, Jonathan Frey, Joschka Boedecker, Moritz Diehl	(参考訳) 本稿では, 模倣学習による非線形モデル予測制御方針学習のための新しい損失関数を提案する。模倣学習の標準的なアプローチは、専門家に関する情報を無視し、専門家と学習したコントロールの間の距離に基づいた損失関数を採用する。そこで本研究では,提案する最適制御問題(ocp)の性能目標と制約満足度を直接埋め込んだq関数に基づく損失を提案する。しかし、ニューラルネットワークをQ-lossでトレーニングするには、新しいサンプルごとに関連するOCPを解決する必要がある。計算負荷を軽減するため,OCPのガウス・ニュートン近似に基づいて第2のQ損失を導出し,学習時間を短縮する。我々は,制約のある非線形システムの制御において,模倣学習の標準的アプローチである行動クローンに対する損失を検証する。最終結果は、Q関数に基づく損失は、同等あるいはより良い閉ループコストを達成する一方で、制約違反の量を大幅に減少させることを示した。 This work presents a novel loss function for learning nonlinear Model Predictive Control policies via Imitation Learning. Standard approaches to Imitation Learning neglect information about the expert and generally adopt a loss function based on the distance between expert and learned controls. In this work, we present a loss based on the Q-function directly embedding the performance objectives and constraint satisfaction of the associated Optimal Control Problem (OCP). However, training a Neural Network with the Q-loss requires solving the associated OCP for each new sample. To alleviate the computational burden, we derive a second Q-loss based on the Gauss-Newton approximation of the OCP resulting in a faster training time. We validate our losses against Behavioral Cloning, the standard approach to Imitation Learning, on the control of a nonlinear system with constraints. The final results show that the Q-function-based losses significantly reduce the amount of constraint violations while achieving comparable or better closed-loop costs.	翻訳日:2023-04-05 13:49:13 公開日:2023-04-03
# 音声認識におけるウェークワードスポッティングのためのデュアルアテンションニューラルトランスデューサ Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition ( http://arxiv.org/abs/2304.01905v1 ) ライセンス: Link先を確認	Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree M. Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann	(参考訳) 本稿では,wake words (ww) 認識を促進させ,音声認識タスクにおける推論時間遅延を改善するアーキテクチャであるdual-attention neural biasingを提案する。このアーキテクチャは、wwスポッティングを利用して、入力オーディオフレームに対してどのブランチを実行するかを選択することで、実行時の計算パスの動的スイッチを可能にする。提案手法では,浮動小数点演算(FLOP)によって定義されたランタイム計算コストを削減しつつ,WWスポッティング精度を効果的に向上する。そこで本研究では,本提案方式のデュアルアテンションネットワークを用いて,wwオーディオフレームの計算コストを90-%$で削減し,パラメータ数を1-%$で増やすことを実証する。このアーキテクチャは、ww f1スコアを16\%$相対的に改善し、一般的なレアワードエラーレートをベースラインと比較して3\%$改善する。 We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by $90\%$ for WW audio frames, with only $1\%$ increase in the number of parameters. This architecture improves WW F1 score by $16\%$ relative and improves generic rare word error rate by $3\%$ relative compared to the baselines.	翻訳日:2023-04-05 13:12:03 公開日:2023-04-03
# TransPimLib: メモリ内処理システムにおける効率的な超越関数ライブラリ TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems ( http://arxiv.org/abs/2304.01951v1 ) ライセンス: Link先を確認	Maurus Item, Juan G\'omez-Luna, Yuxin Guo, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu	(参考訳) プロセッシング・イン・メモリ(PIM)は、現代のコンピューティングシステムにおけるデータ移動のボトルネックを軽減することを約束する。しかし、現在の実世界のpimシステムは、メモリの近くで処理要素を構築するのが困難でコストがかかるため、ハードウェアが従来のプロセッサ(cpu、gpu)よりも制約が強いという固有の欠点がある。その結果、汎用PIMアーキテクチャは、かなり限られた命令セットをサポートし、超越関数などの複雑な操作(例えば平方根)を実行するのに苦労する。これらの操作は、機械学習アプリケーションにおけるアクティベーション機能など、現代のワークロードにおいて特に重要である。汎用PIMシステムにおける超越関数(およびその他のハード・トゥ・カルキュレート関数)のサポートを提供するため,CORDICに基づく三角関数,双曲関数,指数関数,対数,平方根などのためのライブラリである \emph{TransPimLib} を提案する。 UPMEM PIMアーキテクチャのためのTransPimLibの実装を開発し、マイクロベンチマークと3つのフルワークロード(Blackscholes, Sigmoid, Softmax)を用いて、TransPimLibの手法を性能と精度で徹底的に評価する。私たちは、すべてのコードとデータセットを、~\url{https://github.com/CMU-SAFARI/transpimlib}でオープンソースにしています。 Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at~\url{https://github.com/CMU-SAFARI/transpimlib}.	翻訳日:2023-04-05 13:02:54 公開日:2023-04-03
# 適応測定フィルタ:量子マルコフ連鎖の最適推定のための効率的な戦略 Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains ( http://arxiv.org/abs/2204.08964v5 ) ライセンス: Link先を確認	Alfred Godley and Madalin Guta	(参考訳) 連続時間計測は、量子工学と量子制御における多くのタスクに役立ち、環境を通じて監視される開量子システムの動的パラメータの推定を含む。しかし、そのような測定は出力状態で利用できる情報の最大量を抽出しないので、代替の最適測定戦略を見つけることが大きな課題である。本稿では、離散時間入力出力量子マルコフ連鎖の設定においてこの問題を解決する。本稿では,「計測フィルタ」演算子を更新し,出力単位の連続的な測定基準を決定する反復的な手順からなる一次元動的パラメータの最適推定アルゴリズムを提案する。このスキームの重要な要素は、システムとの相互作用後に出力を後処理する方法としてコヒーレント量子吸収器を使用することである。これは、結合系と吸収体定常状態が基準パラメータ値で純粋であるように適応的に設計される。このスキームは、最適連続時間適応測定のエキサイティングな展望を提供するが、現実的な実用的な実装を見つけるにはより多くの作業が必要である。 Continuous-time measurements are instrumental for a multitude of tasks in quantum engineering and quantum control, including the estimation of dynamical parameters of open quantum systems monitored through the environment. However, such measurements do not extract the maximum amount of information available in the output state, so finding alternative optimal measurement strategies is a major open problem. In this paper we solve this problem in the setting of discrete-time input-output quantum Markov chains. We present an efficient algorithm for optimal estimation of one-dimensional dynamical parameters which consists of an iterative procedure for updating a `measurement filter' operator and determining successive measurement bases for the output units. A key ingredient of the scheme is the use of a coherent quantum absorber as a way to post-process the output after the interaction with the system. This is designed adaptively such that the joint system and absorber stationary state is pure at a reference parameter value. The scheme offers an exciting prospect for optimal continuous-time adaptive measurements, but more work is needed to find realistic practical implementations.	翻訳日:2023-04-05 10:48:36 公開日:2023-04-03
# 最大公約数とプライベート集合交叉に対するセキュアな多要素量子計算 Secure multiparty quantum computations for greatest common divisor and private set intersection ( http://arxiv.org/abs/2303.17196v3 ) ライセンス: Link先を確認	Muhammad Imran	(参考訳) 本稿では,Liu,Yang,LiによるPSU(quantum multiparty private set union)に基づいて,最大共通因子(GCD)を計算するためのセキュアなマルチパーティ量子計算(MPQC)を提案する。最初のステップとして、Liu と Li による最小共通倍数 (LCM) 計算のための MPQC プロトコルのセキュリティを改善し、標準 (確率) Shor の量子周期フィニングアルゴリズム (QPA) の代わりに、効率的な正確な量子周期フィニングアルゴリズム (EQPA) をサブルーチンとして構築する。標準QPAの代わりにEQPAを使用することは、繰り返しなしでプロトコルの正確性を保証する。 LCMプロトコルの改良により、計算用LCMに基づくプライベート・セット・ユニオンプロトコルも改善される。最後に、PSUプロトコルの同じ考え方を用いて、PSI問題をGCD計算問題に変換することにより、量子多元的プライベートセット交差点(PSI)を構築する。性能解析により,半正直モデルにおける正当性と無条件のセキュリティは,サブルーチンプロトコル(LCMおよびPSUプロトコル)の正当性とセキュリティから直接保証されることが示された。さらに,提案プロトコルの複雑さは,秘密入力の大きさとパーティ数における多項式であることを示す。 We present a secure multiparty quantum computation (MPQC) for computing greatest common divisor (GCD) based on quantum multiparty private set union (PSU) by Liu, Yang, and Li. As the first step, we improve the security of the MPQC protocol for computing least common multiple (LCM) by Liu and Li by constructing an efficient exact quantum period-finding algorithm (EQPA) as a subroutine instead of the standard (probabilistic) Shor's quantum period-finding algorithm (QPA). The use of EQPA instead of the standard QPA guarantees the correctness of the protocol without repetitions. The improvement of LCM protocol also improves the private set union protocol which is based on computing LCM. Finally, using the same idea of the PSU protocol, we construct a quantum multiparty private set intersection (PSI) by transforming the PSI problem into the problem of computing GCD. Performance analysis shows that the correctness and the unconditional security in the semihonest model are guaranteed directly from the correctness and the security of the subroutine protocols (LCM and PSU protocols). Moreover, we show that the complexity of the proposed protocols is polynomial in the size of the secret inputs and the number of parties.	翻訳日:2023-04-05 10:38:04 公開日:2023-04-03
# MSC: StarCraft IIのマクロ管理のためのデータセット MSC: A Dataset for Macro-Management in StarCraft II ( http://arxiv.org/abs/1710.03131v3 ) ライセンス: Link先を確認	Huikai Wu, Yanqi Zong, Junge Zhang, Kaiqi Huang	(参考訳) マクロ管理はstarcraftの重要な問題であり、長い間研究されてきた。さまざまなデータセットとさまざまなメソッドがここ数年で提案されている。しかしこれらのデータセットには、学術研究や産業研究の促進にいくつかの欠陥がある。 1) 標準的な事前処理、解析、機能抽出の手順も、いくつかのデータセットで事前に定義されたトレーニング、検証、テストセットもない。 2)いくつかのデータセットはマクロ管理の特定のタスクに対してのみ指定される。 3) 一部のデータセットは小さすぎるか、ディープニューラルネットワークのような現代の機械学習アルゴリズムに十分なラベル付きデータを持っていない。したがって、以前のほとんどのメソッドはさまざまな機能でトレーニングされ、同じまたは異なるデータセットの異なるテストセットで評価されるため、直接比較することが困難になる。 StarCraftにおけるマクロ管理の研究を促進するため、SC2LEプラットフォームに基づく新しいデータセットMSCをリリースする。 mscはよく設計された特徴ベクトル、事前定義されたハイレベルアクション、各マッチの最終結果で構成される。また,MSCをトレーニング,検証,テストセットに分割し,評価と比較の便宜を図る。データセットの他に,グローバル状態評価とビルド順序予測のためのベースラインモデルと最初のベースライン結果を提案し,マクロ管理における2つの重要なタスクである。また、StarCraft IIにおけるマクロ管理の研究のために、さまざまな下流タスクやデータセットの分析も記述されている。ホームページ:https://github.com/wuhuikai/MSC Macro-management is an important problem in StarCraft, which has been studied for a long time. Various datasets together with assorted methods have been proposed in the last few years. But these datasets have some defects for boosting the academic and industrial research: 1) There're neither standard preprocessing, parsing and feature extraction procedures nor predefined training, validation and test set in some datasets. 2) Some datasets are only specified for certain tasks in macro-management. 3) Some datasets are either too small or don't have enough labeled data for modern machine learning algorithms such as deep neural networks. So most previous methods are trained with various features, evaluated on different test sets from the same or different datasets, making it difficult to be compared directly. To boost the research of macro-management in StarCraft, we release a new dataset MSC based on the platform SC2LE. MSC consists of well-designed feature vectors, pre-defined high-level actions and final result of each match. We also split MSC into training, validation and test set for the convenience of evaluation and comparison. Besides the dataset, we propose a baseline model and present initial baseline results for global state evaluation and build order prediction, which are two of the key tasks in macro-management. Various downstream tasks and analyses of the dataset are also described for the sake of research on macro-management in StarCraft II. Homepage: https://github.com/wuhuikai/MSC.	翻訳日:2023-04-05 02:41:44 公開日:2023-04-03
# 軌跡推論の数学的理論に向けて Towards a mathematical theory of trajectory inference ( http://arxiv.org/abs/2102.09204v2 ) ライセンス: Link先を確認	Hugo Lavenant, Stephen Zhang, Young-Heon Kim, Geoffrey Schiebinger	(参考訳) 確率過程の軌跡を時間的辺縁のサンプルから推測するための理論的枠組みと数値的手法を考案する。この問題は、細胞状態の高次元計測を提供するが、経時的に細胞の軌道を追跡できない単細胞rna配列データの解析において生じる。確率過程のクラスにおいて,各時点における時間的辺縁の限られたサンプルから基底真理軌道を復元することが可能であることが証明され,実際に行うための効率的なアルゴリズムが提供される。開発したGlobal Waddington-OT (gWOT) は, エントロピー規則化された最適輸送を含む全時間点において, 円滑な凸最適化問題である。そこで本研究では,本課題を効率的に解決できることを示すとともに,いくつかの合成データと実データを用いて,良好な再構成を実現する。 We devise a theoretical framework and a numerical method to infer trajectories of a stochastic process from samples of its temporal marginals. This problem arises in the analysis of single cell RNA-sequencing data, which provide high dimensional measurements of cell states but cannot track the trajectories of the cells over time. We prove that for a class of stochastic processes it is possible to recover the ground truth trajectories from limited samples of the temporal marginals at each time-point, and provide an efficient algorithm to do so in practice. The method we develop, Global Waddington-OT (gWOT), boils down to a smooth convex optimization problem posed globally over all time-points involving entropy-regularized optimal transport. We demonstrate that this problem can be solved efficiently in practice and yields good reconstructions, as we show on several synthetic and real datasets.	翻訳日:2023-04-05 02:38:25 公開日:2023-04-03
# 中心対称性を持つ形状不変ポテンシャルの統一スキーム A Unified Scheme of Shape Invariant Potentials with Central Symmetry ( http://arxiv.org/abs/2001.02068v3 ) ライセンス: Link先を確認	Taha Koohrokhi and Abdolmajid Izadpanah and Mitra Gerayloo	(参考訳) 古典的あるいは量子力学的にせよ、ほとんどの物理系は球対称である。保存量として、粒子が中心力の場を移動するとき、遠心ポテンシャルの数値に角運動量が現れる。本研究は、解ける中心ポテンシャルを1つの超ポテンシャルに統一する統一因子の役割を角運動量が果たすフォーマリズムを導入する。特定の$\ell$と$r$の依存関係に基づいて、超ポテンシャルは3次元高調波発振器(3-DHO)、クーロン(Culomb)、逆向きの3DHO電位などの形状不変ポテンシャルの集合を生成する。任意の$D$次元への P\"{o}schl-Teller ポテンシャルの一般化も導出され、これを "central P\"{o}schl-Teller" と呼び、その階層について論じる。さらに、超対称性が破られたり破られたりした条件を決定するための超ポテンシャルの性質についても論じる。驚くべきことに、統一されたスキームは、2つの荷電粒子(クーロン)と2つの核子(核)結合系を同じ枠組みで解明することができる。最終的に、この形式主義は重陽子に対する新しい効果的なポテンシャルを特定するために適用される。 Most physical systems, whether classical or quantum mechanical, are subjected to spherical symmetry. As a conserved quantity, angular momentum appears in the numerator of centrifugal potential when a particle moves in the field of a central force. The present work introduces a formalism in which angular momentum plays a unifying factor role that unifies solvable central potentials into one superpotential. Based on particular $\ell$ and $r$ dependencies, the superpotential generates a set of shape invariant potentials, such as the 3-dimensional harmonic oscillator (3-DHO), Coulomb, and upside-down 3-DHO potentials. A generalization of the P\"{o}schl-Teller potential to an arbitrary $D$ dimension is also derived, which we called "central P\"{o}schl-Teller", and its hierarchy is discussed. Furthermore, we discuss properties of the superpotential to determine conditions supersymmetry is broken or unbroken. Surprisingly, the unified scheme is also able to elucidate the two charged particles (Coulomb) as well as the two-nucleon (nuclear) bound systems in the same framework. Ultimately, this formalism is applied to specify a new effective potential for deuteron.	翻訳日:2023-04-05 02:37:06 公開日:2023-04-03
# バッチ非同期確率近似の収束と強化学習への応用 Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning ( http://arxiv.org/abs/2109.03445v4 ) ライセンス: Link先を確認	Rajeeva L. Karandikar and M. Vidyasagar	(参考訳) 確率近似(英: stochastic approximation、SA)アルゴリズムは、関数のノイズ測定のみが利用可能であるとき、ベクトル値の試行の零点または定点を見つけるために広く用いられる確率的手法である。これまでの文献では、‘synchronous’ の更新と、現在の推測のすべてのコンポーネントが毎回更新される `synchronous'' の更新とを区別し、1つのコンポーネントだけが更新される。本稿では,現在推定されている解のコンポーネントを瞬時に更新する,‘batch asynchronous stochastic approximation’(basa)と呼ばれる中間状態について検討する。 BASAにより、ユーザーはメモリ要件を時間的複雑さと引き換えることができる。このようなアルゴリズムが研究中の写像の不動点に収束することを示す一般的な方法を開発した。これらの収束証明は、既存の結果よりも弱い仮説を用いる。具体的には、既存の収束証明は、測定ノイズがゼロ平均i.i.d\列またはマルティンゲール差分列である必要がある。本稿では,非ゼロ条件平均の計測ノイズについて,偏りの測定を許可する。また、すべての収束結果は、確率的ステップサイズがよく知られたロビンズ・モンロ条件の確率的類似性を満たすと仮定している。この仮定を,マルコフ過程の既約性に関する純粋決定論的条件に置き換える。 Reinforcement Learning への具体的な応用として、時間差分アルゴリズム $TD(\lambda)$ for value iteration と、最適なアクション値関数を見つけるための $Q$-learning アルゴリズムを解析する。どちらの場合も、既存の文献よりも穏やかな条件下でこれらのアルゴリズムの収束を確立する。 The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a zero or a fixed point of a vector-valued funtion, when only noisy measurements of the function are available. In the literature to date, one makes a distinction between ``synchronous'' updating, whereby every component of the current guess is updated at each time, and ``asynchronous'' updating, whereby only one component is updated. In this paper, we study an intermediate situation that we call ``batch asynchronous stochastic approximation'' (BASA), in which, at each time instant, \textit{some but not all} components of the current estimated solution are updated. BASA allows the user to trade off memory requirements against time complexity. We develop a general methodology for proving that such algorithms converge to the fixed point of the map under study. These convergence proofs make use of weaker hypotheses than existing results. Specifically, existing convergence proofs require that the measurement noise is a zero-mean i.i.d\ sequence or a martingale difference sequence. In the present paper, we permit biased measurements, that is, measurement noises that have nonzero conditional mean. Also, all convergence results to date assume that the stochastic step sizes satisfy a probabilistic analog of the well-known Robbins-Monro conditions. We replace this assumption by a purely deterministic condition on the irreducibility of the underlying Markov processes. As specific applications to Reinforcement Learning, we analyze the temporal difference algorithm $TD(\lambda)$ for value iteration, and the $Q$-learning algorithm for finding the optimal action-value function. In both cases, we establish the convergence of these algorithms, under milder conditions than in the existing literature.	翻訳日:2023-04-05 02:02:32 公開日:2023-04-03
# 心臓血管疾患に対するAIを用いた大動脈血管木切開術 AI-based Aortic Vessel Tree Segmentation for Cardiovascular Diseases Treatment: Status Quo ( http://arxiv.org/abs/2108.02998v2 ) ライセンス: Link先を確認	Yuan Jin, Antonio Pepe, Jianning Li, Christina Gsaxner, Fen-hua Zhao, Kelsey L. Pomykala, Jens Kleesiek, Alejandro F. Frangi, Jan Egger	(参考訳) 大動脈管木は大動脈とその分岐動脈から構成され、全身に血液を供給する上で重要な役割を果たす。動脈瘤や解離などの大動脈疾患は大動脈破裂を引き起こすことがあるが、開腹手術による治療は非常に危険である。したがって、患者は、画像による血管の定期的な検査を必要とする定常的な監視の下で、一般的に薬物治療を受ける。診断とモニタリングのための標準的な画像モダリティはCT(CT)であり、CT血管造影(CT angiography)と呼ばれる造影剤で完成すれば、大動脈とその分岐血管の詳細な画像を提供することができる。最適に、連続ctasからの大動脈血管ツリー全体形状をオーバーレイ比較する。これにより、大動脈の変化を検出するだけでなく、一次病理学や新規に開発された枝も検出できる。この再建には、手作業で行う場合、スライスをスライスする作業が必要であり、1本の大動脈管木で一日を要し、臨床での使用は不可能である。しかし、自動または半自動の容器木分割アルゴリズムは、このタスクを手動実行時間のごく一部で完了し、臨床医の臨床ルーチンと並行して実行することができる。本稿では,大動脈管ツリーの自動的および半自動的なセグメンテーションのための計算手法を体系的に検討する。このレビューは、これらの最先端のアプローチが臨床実践への応用にどの程度近いか、そしてこの研究分野がどれほど活発であるかについて、出版物、データセット、課題の数を考慮して詳細に議論することで締めくくくっている。 The aortic vessel tree is composed of the aorta and its branching arteries, and plays a key role in supplying the whole body with blood. Aortic diseases, like aneurysms or dissections, can lead to an aortic rupture, whose treatment with open surgery is highly risky. Therefore, patients commonly undergo drug treatment under constant monitoring, which requires regular inspections of the vessels through imaging. The standard imaging modality for diagnosis and monitoring is computed tomography (CT), which can provide a detailed picture of the aorta and its branching vessels if completed with a contrast agent, called CT angiography (CTA). Optimally, the whole aortic vessel tree geometry from consecutive CTAs is overlaid and compared. This allows not only detection of changes in the aorta, but also of its branches, caused by the primary pathology or newly developed. When performed manually, this reconstruction requires slice by slice contouring, which could easily take a whole day for a single aortic vessel tree, and is therefore not feasible in clinical practice. Automatic or semi-automatic vessel tree segmentation algorithms, however, can complete this task in a fraction of the manual execution time and run in parallel to the clinical routine of the clinicians. In this paper, we systematically review computing techniques for the automatic and semi-automatic segmentation of the aortic vessel tree. The review concludes with an in-depth discussion on how close these state-of-the-art approaches are to an application in clinical practice and how active this research field is, taking into account the number of publications, datasets and challenges.	翻訳日:2023-04-05 02:02:03 公開日:2023-04-03
# fl-market: 連合学習におけるプライベートモデル取引 FL-Market: Trading Private Models in Federated Learning ( http://arxiv.org/abs/2106.04384v4 ) ライセンス: Link先を確認	Shuyuan Zheng, Yang Cao, Masatoshi Yoshikawa, Huizhong Li, Qiang Yan	(参考訳) 十分な量のトレーニングデータを取得することの難しさは、機械学習(ML)ベースのデータ分析の大きなボトルネックである。近年、ML指向データ取得の経済的かつ適度なソリューションとして、MLモデルのコモディティ化が提案されている。しかし、既存のモデルマーケットプレイスでは、ブローカーがデータ所有者のプライベートトレーニングデータにアクセスできると仮定している。本稿では,MLタスクに対する信頼性の高いデータ取得を促進するために,モデル購入者だけでなく,信頼できないブローカーに対してプライバシを保護するローカルプライベートモデルマーケットプレースであるFL-Marketを提案する。 fl-marketは、データオーナがローカル勾配をアップロードしてmlモデルを協調的にトレーニングする、新たなプライバシ保存型mlパラダイムであるfederated learningを使用して、ブローカ側でトレーニングデータを集中的に収集する必要性からmlを分離する(モデル更新のためにグローバル勾配に集約される)。そして、fl-marketは、データ所有者がローカルなディファレンシャルプライバシによって勾配を局所的に摂動させることを可能にし、プライバシーリスクをさらに防ぐ。 FL-Marketを駆動するために,局所勾配の摂動レベルをインテリジェントに決定する深層学習型オークション機構と,摂動勾配を集約する最適集約機構を提案する。当社のオークションとアグリゲーション機構は,モデル購入者の実用性を最適化するグローバルグラデーションの精度を共同で最大化することができる。提案手法の有効性を検証する実験を行った。 The difficulty in acquiring a sufficient amount of training data is a major bottleneck for machine learning (ML) based data analytics. Recently, commoditizing ML models has been proposed as an economical and moderate solution to ML-oriented data acquisition. However, existing model marketplaces assume that the broker can access data owners' private training data, which may not be realistic in practice. In this paper, to promote trustworthy data acquisition for ML tasks, we propose FL-Market, a locally private model marketplace that protects privacy not only against model buyers but also against the untrusted broker. FL-Market decouples ML from the need to centrally gather training data on the broker's side using federated learning, an emerging privacy-preserving ML paradigm in which data owners collaboratively train an ML model by uploading local gradients (to be aggregated into a global gradient for model updating). Then, FL-Market enables data owners to locally perturb their gradients by local differential privacy and thus further prevents privacy risks. To drive FL-Market, we propose a deep learning-empowered auction mechanism for intelligently deciding the local gradients' perturbation levels and an optimal aggregation mechanism for aggregating the perturbed gradients. Our auction and aggregation mechanisms can jointly maximize the global gradient's accuracy, which optimizes model buyers' utility. Our experiments verify the effectiveness of the proposed mechanisms.	翻訳日:2023-04-05 02:00:40 公開日:2023-04-03
# DNNにおけるスパース概念の創発的定義と定量化 Defining and Quantifying the Emergence of Sparse Concepts in DNNs ( http://arxiv.org/abs/2111.06206v6 ) ライセンス: Link先を確認	Jie Ren, Mingjie Li, Qirui Chen, Huiqi Deng, Quanshi Zhang	(参考訳) 本稿では,DNNの学習における概念創出現象を説明することを目的とする。具体的には、DNNの推論スコアを、いくつかのインタラクティブな概念の影響に結びつけることができる。これらの概念は、DNNの説明である疎いシンボリック因果グラフの因果パターンとして理解することができる。このような因果グラフを用いてdnnを説明する忠実性は理論的に保証される。なぜなら、因果グラフは指数関数的な数の異なるマスク標本上のdnnの出力をうまく模倣できるからである。さらに、そのような因果グラフは、多くの説明精度を失うことなく、さらに単純化され、And-Orグラフ(AOG)として書き直される。 This paper aims to illustrate the concept-emerging phenomenon in a trained DNN. Specifically, we find that the inference score of a DNN can be disentangled into the effects of a few interactive concepts. These concepts can be understood as causal patterns in a sparse, symbolic causal graph, which explains the DNN. The faithfulness of using such a causal graph to explain the DNN is theoretically guaranteed, because we prove that the causal graph can well mimic the DNN's outputs on an exponential number of different masked samples. Besides, such a causal graph can be further simplified and re-written as an And-Or graph (AOG), without losing much explanation accuracy.	翻訳日:2023-04-05 01:53:22 公開日:2023-04-03
# 公平性に配慮した連合学習に向けて Towards Fairness-Aware Federated Learning ( http://arxiv.org/abs/2111.01872v3 ) ライセンス: Link先を確認	Yuxin Shi, Han Yu, Cyril Leung	(参考訳) フェデレーション学習(fl)の最近の進歩は、パフォーマンスとデータのプライバシの保証を備えた大規模分散クライアントに、大規模な機械学習の機会をもたらした。しかし、現在のほとんどの作品は、flにおけるセントラルコントローラの関心に焦点をあて、flクライアントの利益を見落としている。これは、学習プロセスに積極的に参加することを妨げ、flエコシステムの持続性を損なうクライアントの不公平な扱いにつながる可能性がある。したがって、flにおける公平性を確保するという話題は、多くの研究の関心を集めている。近年、異なる視点からflの公平性を達成するために、多様な公正性認識fl(fafl)アプローチが提案されている。しかし、この学際分野に対する読者の洞察を得るための総合的な調査は行われていない。本稿ではそのような調査を行うことを目的とする。本研究は,本分野において既存文献で採用されている公正性の概念と基本的かつ単純化された仮定を考察し,クライアント選択,最適化,貢献評価,インセンティブ分布など,FLの主要なステップをカバーするFAFLアプローチの分類法を提案する。さらに,FAFLアプローチの性能を実験的に評価するための主要な指標について考察し,今後のFAFL研究の方向性を示唆する。 Recent advances in Federated Learning (FL) have brought large-scale collaborative machine learning opportunities for massively distributed clients with performance and data privacy guarantees. However, most current works focus on the interest of the central controller in FL,and overlook the interests of the FL clients. This may result in unfair treatment of clients that discourages them from actively participating in the learning process and damages the sustainability of the FL ecosystem. Therefore, the topic of ensuring fairness in FL is attracting a great deal of research interest. In recent years, diverse Fairness-Aware FL (FAFL) approaches have been proposed in an effort to achieve fairness in FL from different perspectives. However, there is no comprehensive survey that helps readers gain insight into this interdisciplinary field. This paper aims to provide such a survey. By examining the fundamental and simplifying assumptions, as well as the notions of fairness adopted by existing literature in this field, we propose a taxonomy of FAFL approaches covering major steps in FL, including client selection, optimization, contribution evaluation and incentive distribution. In addition, we discuss the main metrics for experimentally evaluating the performance of FAFL approaches, and suggest promising future research directions towards FAFL.	翻訳日:2023-04-05 01:52:24 公開日:2023-04-03
# 前向きSDE理論を用いたSchr\"odinger Bridgeの模擬訓練 Likelihood Training of Schr\"odinger Bridge using Forward-Backward SDEs Theory ( http://arxiv.org/abs/2110.11291v5 ) ライセンス: Link先を確認	Tianrong Chen, Guan-Horng Liu, Evangelos A. Theodorou	(参考訳) Schr\"odinger Bridge (SB) はエントロピー規則化された最適輸送問題であり、Scored-based Generative Model (SGM) と比較して、その数学的柔軟性のために深部生成モデルに注目が集まっている。しかし、SBの最適化原理が、ログライクな目的の構築にしばしば依存する深層生成モデルの近代的な訓練と関係しているかどうかは不明であり、このことは、生成的応用の原則的な代替としてSBモデルの適合性に関する疑問を提起する。本稿では,SBの最適条件を一組のSDEに変換する確率的最適制御に現れる数学的方法論として,前向き確率微分方程式理論に基づくSBモデルの確率的トレーニングのための新しい計算フレームワークを提案する。重要なことに、これらのSDEはSBの潜在的目的を構築するために使用することができ、驚くべきことに、SGMの目的を特別なケースとして一般化することができる。これにより、現代の生成訓練技術の応用を損なうことなく、sbの最適性を継承する新しい最適化原理が導かれるとともに、mnist、celeba、cifar10上の現実的な画像を生成するのに匹敵する結果が得られることを示した。私たちのコードはhttps://github.com/ghliu/SB-FBSDE.comで利用可能です。 Schr\"odinger Bridge (SB) is an entropy-regularized optimal transport problem that has received increasing attention in deep generative modeling for its mathematical flexibility compared to the Scored-based Generative Model (SGM). However, it remains unclear whether the optimization principle of SB relates to the modern training of deep generative models, which often rely on constructing log-likelihood objectives.This raises questions on the suitability of SB models as a principled alternative for generative applications. In this work, we present a novel computational framework for likelihood training of SB models grounded on Forward-Backward Stochastic Differential Equations Theory - a mathematical methodology appeared in stochastic optimal control that transforms the optimality condition of SB into a set of SDEs. Crucially, these SDEs can be used to construct the likelihood objectives for SB that, surprisingly, generalizes the ones for SGM as special cases. This leads to a new optimization principle that inherits the same SB optimality yet without losing applications of modern generative training techniques, and we show that the resulting training algorithm achieves comparable results on generating realistic images on MNIST, CelebA, and CIFAR10. Our code is available at https://github.com/ghliu/SB-FBSDE.	翻訳日:2023-04-05 01:52:03 公開日:2023-04-03
# 大規模並列ベイズ最適化へのポートフォリオアプローチ A portfolio approach to massively parallel Bayesian optimization ( http://arxiv.org/abs/2110.09334v2 ) ライセンス: Link先を確認	Mickael Binois (ACUMES, JAD), Nicholson Collier (ANL), Jonathan Ozik (ANL)	(参考訳) 最適化研究の実施時間を短縮する一つの方法は、一度に1回ではなく、並列に設計を評価することである。高価な評価ブラックボックスでは、ベイズ最適化のバッチバージョンが提案されている。彼らはブラックボックスのサロゲートモデルを構築し、インフィル基準によって複数のデザインを同時に選択する。それでも、大規模並列性を実現するコンピューティングリソースの可用性は高まっているが、数桁の並列設計を選択して評価を行う戦略は、より多くの設計を選択する複雑さのために制限される。ブラックボックスがうるさい場合にはさらに重要であり、より多くの評価と繰り返しの実験が必要である。ここでは,大規模なバッチ処理をネイティブに処理し,探索/探索のトレードオフとポートフォリオ割り当てに着目したスケーラブルな戦略を提案する。単目的および多目的最適化タスクにおいて,ノイズ関数に関する関連手法との比較を行った。これらの実験は、類似またはより良い性能を持つ既存手法よりも桁違いの速度向上を示す。 One way to reduce the time of conducting optimization studies is to evaluate designs in parallel rather than just one-at-a-time. For expensive-to-evaluate black-boxes, batch versions of Bayesian optimization have been proposed. They work by building a surrogate model of the black-box to simultaneously select multiple designs via an infill criterion. Still, despite the increased availability of computing resources that enable large-scale parallelism, the strategies that work for selecting a few tens of parallel designs for evaluations become limiting due to the complexity of selecting more designs. It is even more crucial when the black-box is noisy, necessitating more evaluations as well as repeating experiments. Here we propose a scalable strategy that can keep up with massive batching natively, focused on the exploration/exploitation trade-off and a portfolio allocation. We compare the approach with related methods on noisy functions, for mono and multi-objective optimization tasks. These experiments show orders of magnitude speed improvements over existing methods with similar or better performance.	翻訳日:2023-04-05 01:51:42 公開日:2023-04-03
# OpenFed: 包括的でVersatileなオープンソースフェデレーションラーニングフレームワーク OpenFed: A Comprehensive and Versatile Open-Source Federated Learning Framework ( http://arxiv.org/abs/2109.07852v3 ) ライセンス: Link先を確認	Dengsheng Chen, Vince Tan, Zhilin Lu and Jie Hu	(参考訳) 近年の人工知能技術の発展により、商業的・工業的な場面で応用が成功している。しかし、これらの技術は大量のデータを集中的に集約し、データの機密性やデータ転送コストが禁じられるシナリオに適用性を高める必要がある。フェデレーション学習は、モデルトレーニングの分散化によってこれらの問題を緩和し、データ転送と集約の必要性をなくす。連合学習の採用を進めるためには、いくつかの重要なオープン問題に対処するために、さらなる研究と開発が必要である。本研究では,エンドツーエンドのフェデレート学習のためのオープンソースソフトウェアフレームワークであるOpenFedを提案する。 OpenFedは、既存の痛点を標的に除去することで、フェデレートラーニングの研究者と下流ユーザーの両方の参入障壁を減らす。研究者にとって、OpenFedは、広範なベンチマークスイートに対して、新しいメソッドを簡単に実装し、かなり評価できるフレームワークを提供する。 openfedは,ダウンストリームユーザに対して,さまざまなサブジェクトマッターコンテキスト内でフェデレーション学習をプラグインしてプレイ可能にすることで,フェデレーション学習における深い専門知識の必要性をなくす。 Recent developments in Artificial Intelligence techniques have enabled their successful application across a spectrum of commercial and industrial settings. However, these techniques require large volumes of data to be aggregated in a centralized manner, forestalling their applicability to scenarios wherein the data is sensitive or the cost of data transmission is prohibitive. Federated Learning alleviates these problems by decentralizing model training, thereby removing the need for data transfer and aggregation. To advance the adoption of Federated Learning, more research and development needs to be conducted to address some important open questions. In this work, we propose OpenFed, an open-source software framework for end-to-end Federated Learning. OpenFed reduces the barrier to entry for both researchers and downstream users of Federated Learning by the targeted removal of existing pain points. For researchers, OpenFed provides a framework wherein new methods can be easily implemented and fairly evaluated against an extensive suite of benchmarks. For downstream users, OpenFed allows Federated Learning to be plugged and play within different subject-matter contexts, removing the need for deep expertise in Federated Learning.	翻訳日:2023-04-05 01:51:07 公開日:2023-04-03
# 最適化における非同期イテレーション:新しいシーケンス結果とシャーパアルゴリズム保証 Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees ( http://arxiv.org/abs/2109.04522v2 ) ライセンス: Link先を確認	Hamid Reza Feyzmahdavian and Mikael Johansson	(参考訳) 本稿では並列および分散最適化アルゴリズムの解析に現れる非同期反復に対する新しい収束結果を提案する。結果は簡単に適用でき、非同期度が反復の収束率にどのように影響するかを明確に見積もることができる。その結果,既存の非同期最適化手法の収束証明の短縮,合理化,強化が可能となり,これまで完全に理論的理解を欠いていた一般的なアルゴリズムに対する収束保証を確立することができた。 Specifically, we use our results to derive better iteration complexity bounds for proximal incremental aggregated gradient methods, to obtain tighter guarantees depending on the average rather than maximum delay for the asynchronous stochastic gradient descent method, to provide less conservative analyses of the speedup conditions for asynchronous block-coordinate implementations of Krasnoselskii-Mann iterations, and to quantify the convergence rates for totally asynchronous iterations under various assumptions on communication delays and update rates. We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates. Our results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization methods and allow us to establish convergence guarantees for popular algorithms that were thus far lacking a complete theoretical understanding. Specifically, we use our results to derive better iteration complexity bounds for proximal incremental aggregated gradient methods, to obtain tighter guarantees depending on the average rather than maximum delay for the asynchronous stochastic gradient descent method, to provide less conservative analyses of the speedup conditions for asynchronous block-coordinate implementations of Krasnoselskii-Mann iterations, and to quantify the convergence rates for totally asynchronous iterations under various assumptions on communication delays and update rates.	翻訳日:2023-04-05 01:50:29 公開日:2023-04-03
# 単語の「エゴネットワーク」における構造的不変性と意味的指紋 Structural invariants and semantic fingerprints in the "ego network" of words ( http://arxiv.org/abs/2203.00588v2 ) ライセンス: Link先を確認	Kilian Ollivier and Chiara Boldrini and Andrea Passarella and Marco Conti	(参考訳) 人類学的に確立された認知モデルは、社会的相互作用の「バンド幅」を制限する認知的制約のため、人間は通常の構造に従って社会的関係を組織することを示した。本研究では,言語生産など他の認知過程に類似した規則性が存在することを仮定する。この主張を調査するために、Twitterユーザ(正規ユーザとプロのライター)の不均一なグループのつぶやきを含むデータセットを分析した。確立された社会的認知の制約を明らかにするために用いられる方法論に類似した手法を利用することで、構造的および意味的両方のレベルで規則性を見出す。前者では、同心的な階層構造(言葉のエゴネットワーク、社会関係のエゴネットワークと類似)が、個人が使用する単語をどう整理するかをうまく捉えている。この構造内の層の大きさは、外向きに移動すると定期的に増加し(前回に比べて約2〜3倍)、2つの垂直な外部層は、ユーザの総層数に関係なく、使用語の約60%と30%を一貫して占める。意味分析のために、各egoネットワークの各リングは、そのリング内の単語に関連するトピックをキャプチャするセマンティックプロファイルによって記述される。環 #1 がモデルに特別な役割を果たすことが分かる。意味的に最も異なっており、環の中でも最も多様である。また、最内側のリングにおいて重要なトピックは、他のリングとエゴネットワーク全体において、それぞれに支配的な特徴を持つことも示している。この点において、環 #1 は単語の ego ネットワークの意味的指紋と見なすことができる。 Well-established cognitive models coming from anthropology have shown that, due to the cognitive constraints that limit our "bandwidth" for social interactions, humans organize their social relations according to a regular structure. In this work, we postulate that similar regularities can be found in other cognitive processes, such as those involving language production. In order to investigate this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). Leveraging a methodology similar to the one used to uncover the well-established social cognitive constraints, we find regularities at both the structural and semantic level. At the former, we find that a concentric layered structure (which we call ego network of words, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use. The size of the layers in this structure regularly grows (approximately 2-3 times with respect to the previous one) when moving outwards, and the two penultimate external layers consistently account for approximately 60% and 30% of the used words, irrespective of the number of the total number of layers of the user. For the semantic analysis, each ring of each ego network is described by a semantic profile, which captures the topics associated with the words in the ring. We find that ring #1 has a special role in the model. It is semantically the most dissimilar and the most diverse among the rings. We also show that the topics that are important in the innermost ring also have the characteristic of being predominant in each of the other rings, as well as in the entire ego network. In this respect, ring #1 can be seen as the semantic fingerprint of the ego network of words.	翻訳日:2023-04-05 01:44:12 公開日:2023-04-03
# ディープリニアネットワークの厳密解 Exact Solutions of a Deep Linear Network ( http://arxiv.org/abs/2202.04777v6 ) ライセンス: Link先を確認	Liu Ziyin, Botao Li, Xiangming Meng	(参考訳) この研究は、ニューラルネットワークの風景を理解するための基礎モデルである、重崩壊と確率ニューロンを持つディープ線形ネットワークの大域的ミニマの解析的表現を発見する。その結果、ゼロはディープニューラルネットワークアーキテクチャの特別なポイントであることがわかった。重みの減衰はモデルアーキテクチャと強く相互作用し、わずか1ドルの隠れ層しか持たないネットワークと質的に異なる1ドル以上の隠れ層を持つネットワークにおいて、ゼロで悪いミニマを生成できることを示します。その結果,一般的なディープラーニング初期化手法では,ニューラルネットワークの最適化が容易でないことがわかった。 This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that zero is a special point in deep neural network architecture. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer, qualitatively different from a network with only $1$ hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.	翻訳日:2023-04-05 01:43:46 公開日:2023-04-03
# コンパクト性スコア:教師なし特徴選択のための高速フィルタ法 Compactness Score: A Fast Filter Method for Unsupervised Feature Selection ( http://arxiv.org/abs/2201.13194v3 ) ライセンス: Link先を確認	Peican Zhu, Xin Hou, Keke Tang, Zhen Wang, Feiping Nie	(参考訳) 情報時代の繁栄とともに、大量のデータが日々生成される。これらのデータの大規模かつ高次元的な特性のため、実用的なアプリケーションにおいてより良い意思決定をすることがしばしば困難である。そのため,効率的なビッグデータ分析手法が必要である。特徴工学においては、特徴選択は、候補から優れた特徴を選択することが期待される重要な研究内容であると考えられる。次元の縮小、モデル効果の改善、モデル性能の向上など、機能選択によって異なる機能を実現することができる。多くの分類タスクにおいて、研究者は、同じクラスに属している場合、データが互いに近接しているように見えるので、局所的コンパクト性は特徴を評価する上で非常に重要であることを発見した。本稿では,CSUFS (Compactness Score) と呼ばれる高速な教師なし特徴選択手法を提案する。効率と精度を示すために、広範囲な実験を行い、いくつかのデータセットが選択される。その後,クラスタリングタスクに対処し,提案手法の有効性と優位性を明らかにする。ここで、パフォーマンスはいくつかのよく知られた評価指標で示され、効率は対応する実行時間によって反映される。シミュレーション結果から明らかになったように,提案アルゴリズムは既存のアルゴリズムよりも正確かつ効率的であると考えられる。 Along with the flourish of the information age, massive amounts of data are generated day by day. Due to the large-scale and high-dimensional characteristics of these data, it is often difficult to achieve better decision-making in practical applications. Therefore, an efficient big data analytics method is urgently needed. For feature engineering, feature selection seems to be an important research content in which is anticipated to select "excellent" features from candidate ones. Different functions can be realized through feature selection, such as dimensionality reduction, model effect improvement, and model performance improvement. In many classification tasks, researchers found that data seem to be usually close to each other if they are from the same class; thus, local compactness is of great importance for the evaluation of a feature. In this manuscript, we propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS), to select desired features. To demonstrate the efficiency and accuracy, several data sets are chosen with extensive experiments being performed. Later, the effectiveness and superiority of our method are revealed through addressing clustering tasks. Here, the performance is indicated by several well-known evaluation metrics, while the efficiency is reflected by the corresponding running time. As revealed by the simulation results, our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.	翻訳日:2023-04-05 01:43:23 公開日:2023-04-03
# 集約関数を用いた輸入則を満たすmiso階層型推論エンジン MISO hierarchical inference engine satisfying the law of importation with aggregation functions ( http://arxiv.org/abs/2112.12808v4 ) ライセンス: Link先を確認	Dechao Li and Qiannan Guo	(参考訳) ファジィ推論エンジンはファジィ系の最も重要な構成要素の一つであり、ファジィ論理推論法を用いて入力空間上のファジィ集合とファジィ規則基底から有意義な出力を得ることができる。本稿では,多入出力ファジィシステムにおけるファジィ推論エンジンの計算効率を高めるために,集約関数(LIA)による輸入法則を満たすファジィ含意に基づく3つのMISOファジィ階層推論エンジンを主に検討することを目的とする。まず、よく知られたファジィ含意の集合関数を、それらが満足する(LIA)ように見つけ出す。そして、所定の集約関数に対して、この集約関数に満足するファジィ含意(LIA)を特徴付ける。最後に,上記の理論的展開を応用したmisoファジィシステムにおいて,ファジィ階層推論エンジンを3つ構成する。 Fuzzy inference engine, as one of the most important components of fuzzy systems, can obtain some meaningful outputs from fuzzy sets on input space and fuzzy rule base using fuzzy logic inference methods. In order to enhance the computational efficiency of fuzzy inference engine in multi-input-single-output(MISO) fuzzy systems,this paper aims mainly to investigate three MISO fuzzy hierarchial inference engines based on fuzzy implications satisfying the law of importation with aggregation functions (LIA). We firstly find some aggregation functions for well-known fuzzy implications such that they satisfy (LIA). For a given aggregation function, the fuzzy implication which satisfies (LIA) with this aggregation function is then characterized. Finally, we construct three fuzzy hierarchical inference engines in MISO fuzzy systems applying aforementioned theoretical developments.	翻訳日:2023-04-05 01:42:40 公開日:2023-04-03
# banmo: カジュアルなビデオから3dニューラルモデルを作る BANMo: Building Animatable 3D Neural Models from Many Casual Videos ( http://arxiv.org/abs/2112.12761v3 ) ライセンス: Link先を確認	Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo	(参考訳) 関節型3d形状再構成の作業は、しばしば特殊なセンサー(例えば、同期マルチカメラシステム)や、事前構築された3d変形可能なモデル(例えば、smalやsmpl)に依存する。このようなメソッドは、野生のさまざまなオブジェクトセットにスケールできない。本稿では,特殊なセンサや事前定義されたテンプレート形状を必要としないBANMoを提案する。 BANMoは、多くのモノクロカジュアルビデオから高忠実な3Dモデル(形状とアニマタブルなスキンウェイトを含む)を、異なるレンダリングフレームワークで構築する。多くのビデオを使用することで、カメラのビューやオブジェクトの調音をより広範にカバーできる一方で、背景や照明条件の異なるシーン間での対応を確立する上での重要な課題がもたらされる。我々は,(1)関節骨とブレンドスキンを用いた古典的変形可能な形状モデル,(2)勾配に基づく最適化に寄与する体積神経放射場(NeRF),(3)ピクセルと関節モデルとの対応を生成する正準埋め込みの3つの学派を融合させることを考察した。ニューラルブレンドスキンモデルを導入し, 可微分変形と可逆変形を可能にした。標準埋め込みと組み合わせることで、サイクル整合性で自己教師できるビデオ間の密接な対応を確立することができる。リアルと合成のデータセットでは、BANMoは人間や動物の以前の作品よりも忠実な3D再構成を示しており、新しい視点やポーズからリアルな画像をレンダリングすることができる。プロジェクトWebページ: banmo-www.github.io Prior work for articulated 3D shape reconstruction often relies on specialized sensors (e.g., synchronized multi-camera systems), or pre-built 3D deformable models (e.g., SMAL or SMPL). Such methods are not able to scale to diverse sets of objects in the wild. We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape. BANMo builds high-fidelity, articulated 3D models (including shape and animatable skinning weights) from many monocular casual videos in a differentiable rendering framework. While the use of many videos provides more coverage of camera views and object articulations, they introduce significant challenges in establishing correspondence across scenes with different backgrounds, illumination conditions, etc. Our key insight is to merge three schools of thought; (1) classic deformable shape models that make use of articulated bones and blend skinning, (2) volumetric neural radiance fields (NeRFs) that are amenable to gradient-based optimization, and (3) canonical embeddings that generate correspondences between pixels and an articulated model. We introduce neural blend skinning models that allow for differentiable and invertible articulated deformations. When combined with canonical embeddings, such models allow us to establish dense correspondences across videos that can be self-supervised with cycle consistency. On real and synthetic datasets, BANMo shows higher-fidelity 3D reconstructions than prior works for humans and animals, with the ability to render realistic images from novel viewpoints and poses. Project webpage: banmo-www.github.io .	翻訳日:2023-04-05 01:42:24 公開日:2023-04-03
# 因果モデルを用いた機械学習同定バイオマーカーの一般化:免疫受容体診断法の検討 Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics ( http://arxiv.org/abs/2204.09291v2 ) ライセンス: Link先を確認	Milena Pavlovi\'c, Ghadi S. Al Hajj, Chakravarthi Kanduri, Johan Pensar, Mollie Wood, Ludvig M. Sollid, Victor Greiff, Geir Kjetil Sandve	(参考訳) 機械学習は、高次元の分子データから診断と予後のバイオマーカーを発見するためにますます使われている。しかしながら、実験設計に関連するさまざまな要因が、一般化可能な臨床応用診断の学習能力に影響を与える可能性がある。ここでは,因果的視点がこれらの課題の同定を改善し,機械学習に基づく診断の堅牢性と一般化との関係を定式化する。具体的には,最近確立された高次元バイオマーカーであるadaptive immune receptor repertoires (airrs) に注目した。シミュレーションにより、airrドメインの生物学的および実験的要因が学習バイオマーカーにどのように影響するかを示す。結論として, 因果モデリングは, 変数間の安定な関係を同定し, 個体群間で変化する関係と変数の調整を導くことにより, 機械学習に基づくバイオマーカーのロバスト性を向上させる。 Machine learning is increasingly used to discover diagnostic and prognostic biomarkers from high-dimensional molecular data. However, a variety of factors related to experimental design may affect the ability to learn generalizable and clinically applicable diagnostics. Here, we argue that a causal perspective improves the identification of these challenges and formalizes their relation to the robustness and generalization of machine learning-based diagnostics. To make for a concrete discussion, we focus on a specific, recently established high-dimensional biomarker - adaptive immune receptor repertoires (AIRRs). Through simulations, we illustrate how major biological and experimental factors of the AIRR domain may influence the learned biomarkers. In conclusion, we argue that causal modeling improves machine learning-based biomarker robustness by identifying stable relations between variables and by guiding the adjustment of the relations and variables that vary between populations.	翻訳日:2023-04-05 01:35:16 公開日:2023-04-03
# ベイジアンイメージングのための条件付きインジェクティブフロー Conditional Injective Flows for Bayesian Imaging ( http://arxiv.org/abs/2204.07664v3 ) ライセンス: Link先を確認	AmirEhsan Khorashadizadeh, Konik Kothari, Leonardo Salsi, Ali Aghababaei Harandi, Maarten de Hoop, Ivan Dokmani\'c	(参考訳) 計算画像のためのほとんどのディープラーニングモデルは、単一の再構成されたイメージを回帰する。しかし、実際には、不合理性、非線形性、モデルミスマッチ、ノイズはしばしばそのような推定を誤解させるか、あるいは不十分にする。ベイズアプローチは、画像と(ノイズ)計測を共同分散ランダムベクトルとしてモデル化し、未知の後方分布を近似することを目的としている。条件付き正規化フローに基づく最近の変分推論手法は従来のMCMC法に代わる有望な代替手段であるが, 過大なメモリと高解像度画像に対する計算要求, ハード非線形問題に対する性能低下といった欠点が生じる。本研究では,画像問題に特化して設計された条件付きインジェクティブフローであるC-Trumpetsを提案する。インジェクティビティは、固定体積変化層やスキップ接続revnet層といったアーキテクチャ革新とともに、低次元潜在空間におけるメモリフットプリントとトレーニング時間を削減し、C-Trumpetsは、コンピュータとメモリの予算を低く抑えながら、様々な画像および画像復元タスクにおいて、通常の条件フローモデルより優れている。 c-trumpetsは、mmseやmapのような点推定の高速近似と、物理的に測定可能な不確実性定量化を可能にする。 Most deep learning models for computational imaging regress a single reconstructed image. In practice, however, ill-posedness, nonlinearity, model mismatch, and noise often conspire to make such point estimates misleading or insufficient. The Bayesian approach models images and (noisy) measurements as jointly distributed random vectors and aims to approximate the posterior distribution of unknowns. Recent variational inference methods based on conditional normalizing flows are a promising alternative to traditional MCMC methods, but they come with drawbacks: excessive memory and compute demands for moderate to high resolution images and underwhelming performance on hard nonlinear problems. In this work, we propose C-Trumpets -- conditional injective flows specifically designed for imaging problems, which greatly diminish these challenges. Injectivity reduces memory footprint and training time while low-dimensional latent space together with architectural innovations like fixed-volume-change layers and skip-connection revnet layers, C-Trumpets outperform regular conditional flow models on a variety of imaging and image restoration tasks, including limited-view CT and nonlinear inverse scattering, with a lower compute and memory budget. C-Trumpets enable fast approximation of point estimates like MMSE or MAP as well as physically-meaningful uncertainty quantification.	翻訳日:2023-04-05 01:34:59 公開日:2023-04-03
# ゼロショット質問生成による経路検索の改善 Improving Passage Retrieval with Zero-Shot Question Generation ( http://arxiv.org/abs/2204.07496v4 ) ライセンス: Link先を確認	Devendra Singh Sachan and Mike Lewis and Mandar Joshi and Armen Aghajanyan and Wen-tau Yih and Joelle Pineau and Luke Zettlemoyer	(参考訳) オープンな質問応答における経路検索を改善するための,単純かつ効果的な手法を提案する。再ランカは、学習済み言語モデルを用いて、検索されたパスに条件付けられた入力質問の確率を算出するゼロショット質問生成モデルを用いて、検索されたパスを再スコアする。このアプローチは、任意の検索方法(例えば、ニューラルネットワークやキーワードベース)の上に適用でき、ドメイン固有のトレーニングやタスク固有のトレーニングを必要としない(従って、データ分散シフトをより一般化することが期待されている)。複数のオープンドメイン検索データセットで評価すると,上位20項目の検索精度では,6%-18%の絶対および強い教師付きモデルによって,強い教師なし検索モデルが最大12%向上する。さらに,既存のモデルに新たな再ランク付けを追加するだけで,完全なオープンドメイン質問応答に関する新たな最新結果を得ることができた。 We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or keyword-based), does not require any domain- or task-specific training (and therefore is expected to generalize better to data distribution shifts), and provides rich cross-attention between query and passage (i.e. it must explain every token in the question). When evaluated on a number of open-domain retrieval datasets, our re-ranker improves strong unsupervised retrieval models by 6%-18% absolute and strong supervised models by up to 12% in terms of top-20 passage retrieval accuracy. We also obtain new state-of-the-art results on full open-domain question answering by simply adding the new re-ranker to existing models with no further changes.	翻訳日:2023-04-05 01:34:33 公開日:2023-04-03
# 不織布の曇り評価 Assessing cloudiness in nonwovens ( http://arxiv.org/abs/2204.06275v2 ) ライセンス: Link先を確認	Michael Godehardt and Ali Moghiseh and Christine Oetjen and Joachim Ohser and Simon Ringger and Katja Schladitz and Ingo Windschiegel	(参考訳) フィルター媒体の均質性は, 特定の重量(固有グラム)と局所重量分布とともに, 材料選択と品質管理に重要である。曇り (cloudiness) または形成 ( formation) は、フィルタ媒体における均質性からの逸脱を記述するために用いられる概念である。我々は,選択した周波数範囲に結合した相対的局所的アレルウェイトのパワースペクトルから曇り指数を求める。パワースペクトルは広いスペクトル範囲のエネルギー密度を捕捉する。さらに、ある条件下では、非織布の構造は、アレンジ重量、局所アレンジ重量のばらつき、パワースペクトルによって完全に特徴づけられる。したがって、パワースペクトルは、曇りを排他的に反映するパラメータである。ここでは,実用的応用から生じる課題について述べる。最も顕著なのはスペクトルバンドの選択である。それは確かに特徴的な「雲の大きさ」に依存するが、画像のサイズと横分解能によって制限される。本研究は, 相対的局所軸重みのパワースペクトルに基づく曇り指数が理論的に良好に確立され, 画像データから頑健に測定できることを示す。スペクトル帯を選択することで、視覚的に知覚されたり、製品特性に決定的であったりする曇りを捉えることができる。そのため、技術標準を構築するのに適している。 The homogeneity of filter media is important for material selection and quality control, along with the specific weight (nominal grammage) and the distribution of the local weight. Cloudiness or formation is a concept used to describe deviations from homogeneity in filter media. We suggest to derive the cloudiness index from the power spectrum of the relative local areal weight, integrated over a selected frequency range. The power spectrum captures the energy density in a broad spectral range. Moreover, under certain conditions, the structure of a nonwoven is fully characterized by the areal weight, the variance of the local areal weight, and the power spectrum. Consequently, the power spectrum is the parameter that exclusively reflects the cloudiness. Here, we address questions arising from practical application. The most prominent is the choice of the spectral band. It certainly depends on the characteristic "size of the clouds", but is limited by the size and lateral resolution of the images. We show that the cloudiness index based on the power spectrum of the relative local areal weight is theoretically well founded and can be robustly measured from image data. Choosing the spectral band allows to capture the cloudiness either visually perceived or found to be decisive for product properties. It is thus well suited to build a technical standard on it.	翻訳日:2023-04-05 01:33:54 公開日:2023-04-03
# 野生における効率的な舗装距離検出・認識のためのパッチラベル推論ネットワーク Weakly Supervised Patch Label Inference Networks for Efficient Pavement Distress Detection and Recognition in the Wild ( http://arxiv.org/abs/2203.16782v2 ) ライセンス: Link先を確認	Sheng Huang and Wenhao Tang and Guixin Huang and Luwen Huangfu and Dan Yang	(参考訳) 自動的な画像ベース舗装災害検出と認識は、舗装維持と管理に不可欠である。しかし,既存のディープ・ラーニング・ベースの手法は,高精細度や低救難面積比などの舗装画像の特徴をほとんど省略しており,エンドツーエンドの訓練ができない。本稿では,Wakly Supervised Patch Label Inference Networks (WSPLIN) という,これらのタスクを様々なアプリケーション環境下で効率的に処理するための,シンプルで効果的なエンドツーエンドディープラーニング手法を提案する。 WSPLINは、完全に教師付き舗装画像分類問題を弱教師付き舗装画像分類問題に変換する。具体的には、WSPLINはまず異なるスケールの舗装画像を異なるコレクション戦略のパッチに分割し、次にパッチのラベルを推測するためにパッチラベル推論ネットワーク(PLIN)を使用し、解像度とスケール情報をフルに活用する。特に,難易度分布の事前知識に基づいてパッチラベルの空間性制約を設計し,包括的決定ネットワーク(CDN)を利用してPLINのトレーニングを弱教師付きで指導する。したがって、PLINが生成するパッチラベルは、粗い位置や苦痛の種類などの解釈可能な中間情報を提供する。 CQU-BPDDとCrack500-PDD(Crack500-PDD)データセットを新たに構築したCrack500-PDD(Crack500-PDD)データセットを用いて評価を行った。その結果,本手法は性能と効率の両方において,ベースラインよりも優れていることが示された。 WSPLINのソースコードはhttps://github.com/DearCaat/wsplin.comで公開されている。 Automatic image-based pavement distress detection and recognition are vital for pavement maintenance and management. However, existing deep learning-based methods largely omit the specific characteristics of pavement images, such as high image resolution and low distress area ratio, and are not end-to-end trainable. In this paper, we present a series of simple yet effective end-to-end deep learning approaches named Weakly Supervised Patch Label Inference Networks (WSPLIN) for efficiently addressing these tasks under various application settings. WSPLIN transforms the fully supervised pavement image classification problem into a weakly supervised pavement patch classification problem for solutions. Specifically, WSPLIN first divides the pavement image under different scales into patches with different collection strategies and then employs a Patch Label Inference Network (PLIN) to infer the labels of these patches to fully exploit the resolution and scale information. Notably, we design a patch label sparsity constraint based on the prior knowledge of distress distribution and leverage the Comprehensive Decision Network (CDN) to guide the training of PLIN in a weakly supervised way. Therefore, the patch labels produced by PLIN provide interpretable intermediate information, such as the rough location and the type of distress. We evaluate our method on a large-scale bituminous pavement distress dataset named CQU-BPDD and the augmented Crack500 (Crack500-PDD) dataset, which is a newly constructed pavement distress detection dataset augmented from the Crack500. Extensive results demonstrate the superiority of our method over baselines in both performance and efficiency. The source codes of WSPLIN are released on https://github.com/DearCaat/wsplin.	翻訳日:2023-04-05 01:33:33 公開日:2023-04-03
# 映画物語の合成:ストーリー理解のためのビデオ言語データセット Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding ( http://arxiv.org/abs/2203.05711v2 ) ライセンス: Link先を確認	Yidan Sun, Qin Chao, Yangfeng Ji and Boyang Li	(参考訳) 最近のaiの進歩にもかかわらず、ストーリー理解はオープンで未調査の問題だ。我々は、人気映画やテレビシリーズの5,193本のビデオ要約を含むビデオ言語ストーリーデータセットSYMON(Synopses of Movie Narratives)を収集、前処理、公開する。 SYMONは、人間のクリエイターが作った人間のオーディエンスのための自然主義的なストーリーテリングビデオを撮影する。原型的で自然主義的なストーリーデータセットとして、SYMONは多モーダルなストーリーイベント、豊富な精神状態の記述、視覚とテキストのモダリティの間に大きな意味的ギャップを特徴としている。我々は,映像要約ビデオにおけるビデオテキスト検索とゼロショットアライメントのベンチマークを構築し,ストーリー理解におけるドメイン内データの重要性を示す。 SYMONでは、マルチモーダルなストーリー理解の進展の基礎を築きたいと考えています。 Despite recent advances of AI, story understanding remains an open and under-investigated problem. We collect, preprocess, and publicly release a video-language story dataset, Synopses of Movie Narratives (SYMON), containing 5,193 video summaries of popular movies and TV series. SYMON captures naturalistic story-telling videos for human audience made by human creators. As a prototypical and naturalistic story dataset, SYMON features high coverage of multimodal story events, abundant mental-state descriptions, and large semantic gaps between the visual and the textual modalities. We establish benchmarks on video-text retrieval and zero-shot alignment on movie summary videos, which showcase the importance of in-domain data in story understanding. With SYMON, we hope to lay the groundwork for progress in multimodal story understanding.	翻訳日:2023-04-05 01:32:47 公開日:2023-04-03
# 分類における良性過剰--大きなモデルでラベルノイズに対抗できる Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models ( http://arxiv.org/abs/2206.00501v2 ) ライセンス: Link先を確認	Kaiyue Wen, Jiaye Teng, Jingzhao Zhang	(参考訳) 良性過剰フィッティングの研究は、過剰パラメータのディープラーニングモデルの成功のための洞察を提供する。本研究では,実世界の分類タスクにおいて過剰適合が真に有益であるかどうかを検討する。まず、ResNetモデルがCifar10に優越するが、ImageNetに優越しないという観察から始める。 ImageNet実験でベニグオーバーフィッティングが失敗する理由を理解するために,パラメータ数がデータポイント数より大きくないような制限的な設定でベニグオーバーフィッティングを理論的に解析する。この軽度な過パラメータ化設定の下で、我々の分析は相変化を識別する:以前の重度過パラメータ化設定とは異なり、良性過適合はラベルノイズの存在下で失敗する。我々の分析は経験的観察を説明し、ResNetsによる一連の制御実験によって検証される。我々の研究は、将来の方向性として不適合な体制における暗黙のバイアスを理解することの重要性を強調します。 Studies on benign overfitting provide insights for the success of overparameterized deep learning models. In this work, we examine whether overfitting is truly benign in real-world classification tasks. We start with the observation that a ResNet model overfits benignly on Cifar10 but not benignly on ImageNet. To understand why benign overfitting fails in the ImageNet experiment, we theoretically analyze benign overfitting under a more restrictive setup where the number of parameters is not significantly larger than the number of data points. Under this mild overparameterization setup, our analysis identifies a phase change: unlike in the previous heavy overparameterization settings, benign overfitting can now fail in the presence of label noise. Our analysis explains our empirical observations, and is validated by a set of control experiments with ResNets. Our work highlights the importance of understanding implicit bias in underfitting regimes as a future direction.	翻訳日:2023-04-05 01:25:14 公開日:2023-04-03
# 窒素空孔中心を持つ大領域における高分解能NMR分光 High-Resolution NMR Spectroscopy at Large Fields with Nitrogen Vacancy Centers ( http://arxiv.org/abs/2205.04150v2 ) ライセンス: Link先を確認	C. Munuera-Javaloy, A. Tobalina, and J. Casanova	(参考訳) 窒素空洞(NV)中心のアンサンブルは、室温でミクロンサイズの試料からNMR信号を検出するセンサーとして使用される。このシナリオでは、大きな磁場の体制が特に興味深いのは、化学シフトとJカップリングがよりアクセスしやすくなる一方で、核熱偏極が大きく、低濃度のサンプルでも強いセンサー応答をもたらすためである。しかしながら、この体制は、高周波核信号とNVベースのセンサーを混在させることが困難であるため、ほとんど未解明のままである。そこで本研究では, センサに伝達される誘導核スピン信号の振幅における関連するエネルギーシフトをマッピングする手法を用いて, この問題を回避する。この段階は、センサーが関与しないサンプル核スピンの自由沈降期間と交差する。したがって、この方法は、核スピン信号のコヒーレンスによって最終的に制限される高いスペクトル分解能をもたらす。 Ensembles of nitrogen-vacancy (NV) centers are used as sensors to detect NMR signals from micron-sized samples at room temperature. In this scenario, the regime of large magnetic fields is especially interesting as it leads to a large nuclear thermal polarisation -- thus, to a strong sensor response even in low concentration samples -- while chemical shifts and J-couplings become more accessible. Nevertheless, this regime remains largely unexplored owing to the difficulties to couple NV-based sensors with high-frequency nuclear signals. In this work, we circumvent this problem with a method that maps the relevant energy shifts in the amplitude of an induced nuclear spin signal that is subsequently transferred to the sensor. This stage is interspersed with free-precession periods of the sample nuclear spins where the sensor does not participate. Thus, our method leads to high spectral resolutions ultimately limited by the coherence of the nuclear spin signal.	翻訳日:2023-04-05 01:23:57 公開日:2023-04-03
# 暗号通貨ポンプ・ダンプのシーケンスベースターゲットコイン予測 Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump ( http://arxiv.org/abs/2204.12929v2 ) ライセンス: Link先を確認	Sihao Hu, Zhen Zhang, Shengliang Lu, Bingsheng He, Zhao Li	(参考訳) 暗号通貨市場におけるポンプ・アンド・ダンプ・スキーム(P&D)の普及に伴い、そのような不正行為を事前に検出し、潜在的な投資家に警告することが義務づけられる。本稿では,スケジュールされたポンプ時間の前に,対象交換所に記載された全てのコインのポンプ確率を予測することに焦点を当て,これを目標コイン予測タスクと呼ぶ。まず、2019年1月から2022年1月までテレグラムで組織された最新の709のp&dイベントを総合的に調査する。実験結果から,p&dにはチャネル内均一性とチャネル間不均質性を示すような興味深いパターンがみられた。ここでのチャンネルは、しばしばp&dイベントのコーディネートに使用されるテレグラムのグループの形式を指す。この観察により、チャネルのP&Dイベント履歴を位置注意機構を介してシーケンス表現にエンコードし、予測精度を高める、SNNと呼ばれる新しいシーケンスベースのニューラルネットワークを開発することができる。位置注意は有用な情報を抽出し、特にシーケンスの長さが長い場合にはノイズを軽減するのに役立つ。大規模実験により提案手法の有効性と一般化性を検証する。 https://github.com/Bayi-Hu/Pump-and-Dump-detection-on-Cryptocurrency.comでコードとP&Dデータセットをリリースし、定期的にデータセットを更新します。 With the proliferation of pump-and-dump schemes (P&Ds) in the cryptocurrency market, it becomes imperative to detect such fraudulent activities in advance to alert potentially susceptible investors. In this paper, we focus on predicting the pump probability of all coins listed in the target exchange before a scheduled pump time, which we refer to as the target coin prediction task. Firstly, we conduct a comprehensive study of the latest 709 P&D events organized in Telegram from Jan. 2019 to Jan. 2022. Our empirical analysis reveals some interesting patterns of P&Ds, such as that pumped coins exhibit intra-channel homogeneity and inter-channel heterogeneity. Here channel refers a form of group in Telegram that is frequently used to coordinate P&D events. This observation inspires us to develop a novel sequence-based neural network, dubbed SNN, which encodes a channel's P&D event history into a sequence representation via the positional attention mechanism to enhance the prediction accuracy. Positional attention helps to extract useful information and alleviates noise, especially when the sequence length is long. Extensive experiments verify the effectiveness and generalizability of proposed methods. Additionally, we release the code and P&D dataset on GitHub: https://github.com/Bayi-Hu/Pump-and-Dump-Detection-on-Cryptocurrency, and regularly update the dataset.	翻訳日:2023-04-05 01:23:09 公開日:2023-04-03
# ノイズのあるスキルラベルから職種を学習する Learning Job Titles Similarity from Noisy Skill Labels ( http://arxiv.org/abs/2207.00494v3 ) ライセンス: Link先を確認	Rabih Zbib, Lucas Alvarez Lacasa, Federico Retyk, Rus Poves, Juan Aizpuru, Hermenegildo Fabregat, Vaidotas Simkus, and Emilia Garc\'ia-Casademont	(参考訳) 職名間のセマンティックな類似度を測定することは、仕事の自動推薦に不可欠な機能である。このタスクは通常、同等の肩書きペアの形式でトレーニングデータを必要とする教師付き学習技術を使ってアプローチされる。そこで本稿では,ノイズのあるスキルラベルを用いた職名類似性モデルの学習のための教師なし表現学習手法を提案する。テキストのランク付けや仕事の正規化といったタスクに非常に効果的であることを示す。 Measuring semantic similarity between job titles is an essential functionality for automatic job recommendations. This task is usually approached using supervised learning techniques, which requires training data in the form of equivalent job title pairs. In this paper, we instead propose an unsupervised representation learning method for training a job title similarity model using noisy skill labels. We show that it is highly effective for tasks such as text ranking and job normalization.	翻訳日:2023-04-05 01:16:13 公開日:2023-04-03
# wnet: 訓練可能な再構成層を有するスパースビューctのためのデータ駆動型デュアルドメインデノイジングモデル WNet: A data-driven dual-domain denoising model for sparse-view computed tomography with a trainable reconstruction layer ( http://arxiv.org/abs/2207.00400v2 ) ライセンス: Link先を確認	Theodor Cheslerean-Boghiu, Felix C. Hofmann, Manuel Schulthei{\ss}, Franz Pfeiffer, Daniela Pfeiffer, Tobias Lasser	(参考訳) ディープラーニングベースのソリューションは、さまざまなアプリケーションでうまく実装されています。中でも注目すべきは、臨床ユースケースの関心が高まり、過去数年間に提案された最先端のデータ駆動アルゴリズムの主要な推進役となったことだ。 sparse-view tomographic reconstructionsのようなアプリケーションでは、取得時間を短く、放射線線量が少ない状態に保つために測定データの量が小さい場合、ストレッチアーティファクトの削減は、フルスキャンデータのサブセットのみを使用して診断可能な画像を取得することを主な目標としたデータ駆動デノイジングアルゴリズムの開発を促している。本稿では,sparse-viewアーティファクトをデノージングするためのトレーニング可能な再構築層を含むデータ駆動型デュアルドメインデノージングモデルであるwnetを提案する。 2つのエンコーダデコーダネットワークは、シングラムと再構成ドメインを同時にデノナイズする一方、フィルタバックプロジェクションアルゴリズムを実装する第3の層は、第1の2つの間に挟み込み、再構成操作を行う。胸部CTスキャンにおけるネットワークの性能について検討し,従来の固定層よりもトレーニング可能な再構成層を持つことのメリットを強調した。我々は2つの臨床的に関連のあるデータセットを用いてネットワークをトレーニングし、その結果を3種類のスパースビューCTと再構成アルゴリズムと比較した。 Deep learning based solutions are being succesfully implemented for a wide variety of applications. Most notably, clinical use-cases have gained an increased interest and have been the main driver behind some of the cutting-edge data-driven algorithms proposed in the last years. For applications like sparse-view tomographic reconstructions, where the amount of measurement data is small in order to keep acquisition time short and radiation dose low, reduction of the streaking artifacts has prompted the development of data-driven denoising algorithms with the main goal of obtaining diagnostically viable images with only a subset of a full-scan data. We propose WNet, a data-driven dual-domain denoising model which contains a trainable reconstruction layer for sparse-view artifact denoising. Two encoder-decoder networks perform denoising in both sinogram- and reconstruction-domain simultaneously, while a third layer implementing the Filtered Backprojection algorithm is sandwiched between the first two and takes care of the reconstruction operation. We investigate the performance of the network on sparse-view chest CT scans, and we highlight the added benefit of having a trainable reconstruction layer over the more conventional fixed ones. We train and test our network on two clinically relevant datasets and we compare the obtained results with three different types of sparse-view CT denoising and reconstruction algorithms.	翻訳日:2023-04-05 01:16:06 公開日:2023-04-03
# AnoShift: 教師なし異常検出のための分散シフトベンチマーク AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection ( http://arxiv.org/abs/2206.15476v4 ) ライセンス: Link先を確認	Marius Dragoi, Elena Burceanu, Emanuela Haller, Andrei Manolache and Florin Brad	(参考訳) データの分散シフトを分析することは、機械学習(ML)における研究の方向性の高まりであり、MLモデルの一般化特性を研究するのに適したシナリオを提供することに焦点を当てた、新たなベンチマークへとつながる。既存のベンチマークは教師なし学習にフォーカスしており、最善の知識は教師なし学習には何もありません。そこで本稿では,ネットワーク侵入検知のためのトラフィックデータセットである Kyoto-2006+ 上に構築されたデータを用いた教師なし異常検出ベンチマークを導入する。このタイプのデータは、入力の配布をシフトする前提を満たしている: 大量の時間(10ドル)をカバーし、時間とともに自然に変化する変化(ユーザの振る舞いパターンの変更やソフトウェアのアップデートなど)をカバーしている。まず、基本機能毎の分析、t-sne、および年数間の分布距離を測定するための最適な輸送手法を用いて、データの非定常的性質を強調する。次に、IID、NEAR、FARテスト分割でデータを分割するプロトコルであるAnoShiftを提案する。従来のアプローチからディープラーニングまで,さまざまなモデルで時間の経過とともにパフォーマンスの低下を検証する。最後に,分散シフト問題を認識し,適切な対応を行うことで,独立かつ同一の分散データを前提とした古典的トレーニングと比較して,パフォーマンスの向上が期待できることを示した。データセットとコードはhttps://github.com/bit-ml/anoshift/で入手できる。 Analyzing the distribution shift of data is a growing research direction in nowadays Machine Learning (ML), leading to emerging new benchmarks that focus on providing a suitable scenario for studying the generalization properties of ML models. The existing benchmarks are focused on supervised learning, and to the best of our knowledge, there is none for unsupervised learning. Therefore, we introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. This type of data meets the premise of shifting the input distribution: it covers a large time span ($10$ years), with naturally occurring changes over time (eg users modifying their behavior patterns, and software updates). We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. Next, we propose AnoShift, a protocol splitting the data in IID, NEAR, and FAR testing splits. We validate the performance degradation over time with diverse models, ranging from classical approaches to deep learning. Finally, we show that by acknowledging the distribution shift problem and properly addressing it, the performance can be improved compared to the classical training which assumes independent and identically distributed data (on average, by up to $3\%$ for our approach). Dataset and code are available at https://github.com/bit-ml/AnoShift/.	翻訳日:2023-04-05 01:15:38 公開日:2023-04-03
# 質問は、密集した通路のレトリバーを訓練するしかないか? Questions Are All You Need to Train a Dense Passage Retriever ( http://arxiv.org/abs/2206.10658v4 ) ライセンス: Link先を確認	Devendra Singh Sachan and Mike Lewis and Dani Yogatama and Luke Zettlemoyer and Joelle Pineau and Manzil Zaheer	(参考訳) ラベル付きトレーニングデータを必要としない高密度検索モデルをトレーニングするための,新しいコーパスレベルの自動エンコーディング手法であるartを紹介する。高度な検索は、open qaのようなオープンドメインタスクの中心的な課題であり、最先端の手法では、カスタムのハード負のマイニングとポジティブな例の否定を伴う大規模な教師ありデータセットを必要とする。対照的にARTは、未解決の入力や出力(質問や潜在的な回答文書など)へのアクセスのみを必要とする。新たな文書リトライバル自動エンコーディング方式を用いて,(1)証拠文書の集合を検索するために入力質問を使用し,(2)文書を用いて元の質問を再構築する確率を計算する。質問再構成に基づく検索の訓練は、文書と質問エンコーダの効果的な教師なし学習を可能にし、後から完全なオープンQAシステムに組み込むことができる。広範囲な実験により、ARTは事前訓練された言語モデルからのみ汎用的な初期化を行い、ラベル付きデータやタスク固有の損失を除去し、複数のQA検索ベンチマークで最先端の結果を得ることができた。 We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents). It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.	翻訳日:2023-04-05 01:15:13 公開日:2023-04-03
# スパイクニューラルネットワークのためのシナプス閾値シナジスティック学習手法 A Synapse-Threshold Synergistic Learning Approach for Spiking Neural Networks ( http://arxiv.org/abs/2206.06129v3 ) ライセンス: Link先を確認	Hongze Sun, Wuque Cai, Baoxin Yang, Yan Cui, Yang Xia, Dezhong Yao, Daqing Guo	(参考訳) スパイキングニューラルネットワーク(SNN)は、さまざまなインテリジェントなシナリオにおいて優れた機能を示している。既存のsnsの訓練方法はシナプス可塑性の概念に基づいているが、現実的脳での学習はニューロンの非シナプス機構も活用している。生体ニューロンのスパイク閾値は、ミリ秒の時間スケールでリッチなダイナミクスを示す重要な内在神経の特徴であり、神経情報処理の基盤となるメカニズムとして提案されている。本研究では,SNNにおけるシナプス重みとスパイク閾値を同時に学習する新しいシナジー学習手法を開発する。シナプス・スレッショルド・シナジスティック・ラーニング(STL-SNN)で訓練されたSNNは、2つの非生成シングルラーニングモデルで訓練されたSNNよりも、様々な静的およびニューロモルフィックなデータセットで大幅に優れた性能を発揮する。トレーニング中、シナジスティック学習アプローチは神経閾値を最適化し、適切な発射率で安定した信号伝達を提供する。さらに分析した結果、STL-SNNはノイズの多いデータに対して堅牢であり、深層ネットワーク構造に対する低エネルギー消費を示すことが示された。さらに、一般化された共同決定フレームワークを導入することにより、STL-SNNの性能をさらに向上することができる。以上の結果から, シナプスと内因性非シナプス機構の相乗効果は, SNN学習法の開発に有効である可能性が示唆された。 Spiking neural networks (SNNs) have demonstrated excellent capabilities in various intelligent scenarios. Most existing methods for training SNNs are based on the concept of synaptic plasticity; however, learning in the realistic brain also utilizes intrinsic non-synaptic mechanisms of neurons. The spike threshold of biological neurons is a critical intrinsic neuronal feature that exhibits rich dynamics on a millisecond timescale and has been proposed as an underlying mechanism that facilitates neural information processing. In this study, we develop a novel synergistic learning approach that involves simultaneously training synaptic weights and spike thresholds in SNNs. SNNs trained with synapse-threshold synergistic learning~(STL-SNNs) achieve significantly superior performance on various static and neuromorphic datasets than SNNs trained with two degenerated single-learning models. During training, the synergistic learning approach optimizes neural thresholds, providing the network with stable signal transmission via appropriate firing rates. Further analysis indicates that STL-SNNs are robust to noisy data and exhibit low energy consumption for deep network structures. Additionally, the performance of STL-SNN can be further improved by introducing a generalized joint decision framework. Overall, our findings indicate that biologically plausible synergies between synaptic and intrinsic non-synaptic mechanisms may provide a promising approach for developing highly efficient SNN learning methods.	翻訳日:2023-04-05 01:14:13 公開日:2023-04-03
# sagnacに基づく量子エンタングルメントのコヒーレンス解釈 Coherence interpretation of the Sagnac-based quantum entanglement ( http://arxiv.org/abs/2206.05358v4 ) ライセンス: Link先を確認	Byoung S. Ham	(参考訳) ベルの不等式違反は量子エンタングルメントの定量的測定ツールである。量子エンタングルメントは量子情報科学の中心であり、リモートで分離された局所検出器間の非局所相関は量子力学のユニークな性質を示す。過去数十年間、量子相関の基礎物理学と量子技術への潜在的な応用に関する集中的な研究が続けられてきた。そこで,コヒーレンス法による遅延型量子消去器の干渉計において,非局所相関に対する一致検出の役割について検討した。非局所量子特性を理解するために、干渉計の2つの出力光子間の一致測定から局所検出器間の結合パラメータ関係をコヒーレントに誘導する。このコヒーレンスアプローチに基づいて, 解の妥当性について, 量子エンタングルメント生成の逆直観的コヒーレンスバージョンを提案する。 Bell inequality violation is a quantitative measurement tool for quantum entanglement. Quantum entanglement is the heart of quantum information science, in which the resulting nonlocal correlation between remotely separated local detectors shows a unique property of quantum mechanics. Over the last few decades, intensive research has been conducted on the basic physics of quantum correlation as well as its potential applications to quantum technologies. Here, the role of coincidence detection is investigated for the nonlocal correlation in a simple interferometer of the delayed-choice quantum eraser using a coherence approach. To understand the nonlocal quantum feature, a joint-parameter relation between local detectors is coherently induced from coincidence measurements between two output photons of an interferometer. Based on this coherence approach, a counterintuitive coherence version of the quantum entanglement generation is proposed for the validity of the solution.	翻訳日:2023-04-05 01:13:47 公開日:2023-04-03
# menli: 自然言語推論によるロバストな評価指標 MENLI: Robust Evaluation Metrics from Natural Language Inference ( http://arxiv.org/abs/2208.07316v2 ) ライセンス: Link先を確認	Yanran Chen and Steffen Eger	(参考訳) 最近提案されたBERTベースのテキスト生成評価指標は、標準的なベンチマークでよく機能するが、情報正当性などの敵攻撃に弱い。これは、それらが意味的類似性のモデルであるという事実に由来する(一部)。対照的に、我々は自然言語推論(NLI)に基づく評価指標を開発し、より適切なモデリングを行う。我々は、嗜好ベースの敵攻撃フレームワークを設計し、我々のNLIベースのメトリクスが最近のBERTベースのメトリクスよりも攻撃に対してより堅牢であることを示す。標準ベンチマークでは、NLIベースのメトリクスは既存の要約の指標よりも優れていますが、SOTA MTの指標よりは劣ります。しかし、既存のメトリクスとNLIのメトリクスを組み合わせると、標準ベンチマーク(+5%から30%)で測定された高い逆の堅牢性(15%から30%)と高品質のメトリクスの両方が得られます。 Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e.g., relating to information correctness. We argue that this stems (in part) from the fact that they are models of semantic similarity. In contrast, we develop evaluation metrics based on Natural Language Inference (NLI), which we deem a more appropriate modeling. We design a preference-based adversarial attack framework and show that our NLI based metrics are much more robust to the attacks than the recent BERT-based metrics. On standard benchmarks, our NLI based metrics outperform existing summarization metrics, but perform below SOTA MT metrics. However, when combining existing metrics with our NLI metrics, we obtain both higher adversarial robustness (15%-30%) and higher quality metrics as measured on standard benchmarks (+5% to 30%).	翻訳日:2023-04-05 01:06:49 公開日:2023-04-03
# 可積分スピン鎖の量子シミュレーションにおける保存電荷 Conserved charges in the quantum simulation of integrable spin chains ( http://arxiv.org/abs/2208.00576v2 ) ライセンス: Link先を確認	Kazunobu Maruyoshi, Takuya Okuda, Juan William Pedersen, Ryo Suzuki, Masahito Yamazaki, Yutaka Yoshida	(参考訳) デジタル量子コンピュータ上での量子多体系の時間進化をシミュレーションすると、時間離散化による量子ノイズとトロッター誤差の課題に直面している。可積分スピンチェーンのトロッター誤差は、離散時間発展が可積分性を維持するならば制御できる。この研究において、実量子コンピュータと古典シミュレータ上で、スピン-1/2ハイゼンベルクXXXスピン鎖の可積分トロッター化を実装した。量子ノイズがいくつかの保存電荷の時間発展にどのように影響するかを研究し、期待値の減衰を観測する。さらに、将来量子デバイスやアルゴリズムのベンチマークに使用可能な、時間発展の初期の挙動についても研究しています。また,保存電荷を高い順序で効率的に生成する方法を提供する。 When simulating the time evolution of quantum many-body systems on a digital quantum computer, one faces the challenges of quantum noise and of the Trotter error due to time discretization. The Trotter error in integrable spin chains can be under control if the discrete time evolution preserves integrability. In this work we implement, on a real quantum computer and on classical simulators, the integrable Trotterization of the spin-1/2 Heisenberg XXX spin chain. We study how quantum noise affects the time evolution of several conserved charges, and observe the decay of the expectation values. We in addition study the early time behaviors of the time evolution, which can potentially be used to benchmark quantum devices and algorithms in the future. We also provide an efficient method to generate the conserved charges at higher orders.	翻訳日:2023-04-05 01:06:03 公開日:2023-04-03
# コヒーレンス、重ね合わせ、L\"{o}wdin対称直交化 Coherence, superposition, and L\"{o}wdin symmetric orthogonalization ( http://arxiv.org/abs/2209.03746v3 ) ライセンス: Link先を確認	G\"okhan Torun	(参考訳) コヒーレンスと重ね合わせの概念は概念的には同じであるが、資源理論の定式化には重要な違いがある。すなわち、基底状態はコヒーレンスの資源理論では直交するが、重ね合わせの資源理論では必ずしも直交するとは限らない。非直交性のため、重ね合わせ状態の操作と特徴づけにはかなりの努力が必要である。ここでは、L\"{o}wdin symmetric orthogonalization (LSO) 法が純粋重ね合わせ状態の特徴づけに有用な手段であることを示す。 LSOの主な性質は、元の非直交基底状態の構造と対称性が極端に保存されていることである。特に、極大にコヒーレントな状態は、lsoの助けを借りて極大重ね合わせで状態になる:言い換えれば、それらは対称直交化の作用の下で同値である。この結果から,LSOが主なツールであるコヒーレンスと重ね合わせの接続が促進される。 The notions of coherence and superposition are conceptually the same; however, an important distinction exists between their resource-theoretic formulations. Namely, while basis states are orthogonal in the resource theory of coherence, they are not necessarily orthogonal in the resource theory of superposition. Owing to the nonorthogonality, the manipulation and characterization of superposition states require significant efforts. Here, we demonstrate that the L\"{o}wdin symmetric orthogonalization (LSO) method offers a useful means for characterizing pure superposition states. The principal property of LSO is that the structure and symmetry of the original nonorthogonal basis states are preserved to the greatest extent possible, which prompts us to study the role of LSO in identifying the hierarchical relations of resource states. Notably, we reveal that the maximally coherent states turn into the states with maximal superposition with the help of LSO: in other words, they are equivalent under the action of symmetric orthogonalization. Our results facilitate further connections between coherence and superposition, where LSO is the main tool.	翻訳日:2023-04-05 00:58:35 公開日:2023-04-03
# 確率的プロトコルによる漸近状態変換のエントロピー制限の克服 Overcoming entropic limitations on asymptotic state transformations through probabilistic protocols ( http://arxiv.org/abs/2209.03362v3 ) ライセンス: Link先を確認	Bartosz Regula, Ludovico Lami, Mark M. Wilde	(参考訳) 量子相対エントロピー(quantum relative entropy)は、一般の資源理論的設定における量子状態の漸近的変換可能性を決定する上で重要な役割を果たすことが知られている。量子相対エントロピーは漸近状態変換の速度を特徴づけるには不十分であり、ヒルベルト射影計量の正則化に基づく新しいエントロピー量(英語版)が成立する。このようなシナリオは、一般に与えられた状態のコピー数として取られる量子状態の変換に関連するコストが、プロトコルを実現するのに必要な量子メモリのサイズと同一視されるような設定によって動機付けられる。提案手法では,相対エントロピーによって課されるよりも厳密に高いレートを実現する変換プロトコルを構築することができる。資源蒸留の課題に焦点をあて,確率的蒸留プロトコルの漸近速度の強い逆境界を広く適用し,非エンタングリング操作による絡み込み蒸留など,関連する状況において厳密であることを示す。これは決定論的プロトコルにのみ適用される既知の制限を一般化し拡張する。提案手法は, 確率的ワンショット変換の最近の結果と, 射影的相対エントロピーに対する新しい漸近的平衡特性に基づく。 The quantum relative entropy is known to play a key role in determining the asymptotic convertibility of quantum states in general resource-theoretic settings, often constituting the unique monotone that is relevant in the asymptotic regime. We show that this is no longer the case when one allows stochastic protocols that may only succeed with some probability, in which case the quantum relative entropy is insufficient to characterize the rates of asymptotic state transformations, and a new entropic quantity based on a regularization of the Hilbert projective metric comes into play. Such a scenario is motivated by a setting where the cost associated with transformations of quantum states, typically taken to be the number of copies of a given state, is instead identified with the size of the quantum memory needed to realize the protocol. Our approach allows for constructing transformation protocols that achieve strictly higher rates than those imposed by the relative entropy. Focusing on the task of resource distillation, we give broadly applicable strong converse bounds on the asymptotic rates of probabilistic distillation protocols, and show them to be tight in relevant settings such as entanglement distillation with non-entangling operations. This generalizes and extends previously known limitations that only applied to deterministic protocols. Our methods are based on recent results for probabilistic one-shot transformations as well as a new asymptotic equipartition property for the projective relative entropy.	翻訳日:2023-04-05 00:58:16 公開日:2023-04-03
# 潜在力学から有意義な表現へ From latent dynamics to meaningful representations ( http://arxiv.org/abs/2209.00905v2 ) ライセンス: Link先を確認	Dedi Wang, Yihang Wang, Luke Evans and Pratyush Tiwary	(参考訳) 表現学習は機械学習と人工知能の台頭の中心であるが、学習した表現を意味のあるものにすることが重要な問題である。このため、典型的なアプローチは、事前確率分布を通じて学習表現を正則化することである。しかし、そのような事前処理は通常使用できないかアドホックである。これに対応するために,動的制約付き表現学習フレームワークを提案する。事前定義された確率を用いる代わりに、動的システムにおける表現学習のより自然な制約である特定のダイナミクスに従うために潜在表現を制限します。我々の信念は、異なる系は異なる限界化された確率分布を持つことができるが、ニュートン方程式やシュロディンガー方程式のような同じ力学に従うという物理学の基本的な観察に由来する。我々は,現実の蛍光DNA映画データセットを含む様々なシステムに対する枠組みを検証する。本アルゴリズムは,非相関,等尺,有意な潜在表現を一意に識別できることを示す。 While representation learning has been central to the rise of machine learning and artificial intelligence, a key problem remains in making the learnt representations meaningful. For this the typical approach is to regularize the learned representation through prior probability distributions. However such priors are usually unavailable or ad hoc. To deal with this, we propose a dynamics-constrained representation learning framework. Instead of using predefined probabilities, we restrict the latent representation to follow specific dynamics, which is a more natural constraint for representation learning in dynamical systems. Our belief stems from a fundamental observation in physics that though different systems can have different marginalized probability distributions, they typically obey the same dynamics, such as Newton's and Schrodinger's equations. We validate our framework for different systems including a real-world fluorescent DNA movie dataset. We show that our algorithm can uniquely identify an uncorrelated, isometric and meaningful latent representation.	翻訳日:2023-04-05 00:57:52 公開日:2023-04-03
# フォトニック量子プロセッサのための量子体積 Quantum Volume for Photonic Quantum Processors ( http://arxiv.org/abs/2208.11724v2 ) ライセンス: Link先を確認	Yuxuan Zhang, Daoheng Niu, Alireza Shabani, Hassan Shapourian	(参考訳) 短期量子コンピューティングプロセッサのメトリクスを定義することは、量子ハードウェアの研究と開発に不可欠である。このような定量的特徴は、進捗の報告や異なる量子プラットフォームの比較に有用であるだけでなく、ボトルネックの特定や技術ロードマップの設計にも不可欠である。ランダム化ベンチマークや量子ボリュームのようなほとんどのメトリクスは、もともと回路ベースの量子コンピュータに導入され、フォトニックデバイスのような測定ベースの量子コンピューティング(MBQC)プロセッサにはすぐには適用されなかった。本稿では,MBQCプロセスの物理ノイズと不完全性を等価量子回路の論理誤差にマッピングする枠組みを提示することにより,このギャップを解消する。本稿では,光量子コンピューティングの短期的候補として符号化されたGottesman-Kitaev-Preskill(GKP)に基づく連続可変クラスタ状態について検討し,実効論理ゲート誤差チャネルを導出し,GKPのスクイーズと光子損失率の観点から量子量を算出する。 Defining metrics for near-term quantum computing processors has been an integral part of the quantum hardware research and development efforts. Such quantitative characteristics are not only useful for reporting the progress and comparing different quantum platforms, but also essential for identifying the bottlenecks and designing a technology roadmap. Most metrics such as randomized benchmarking and quantum volume were originally introduced for circuit-based quantum computers and were not immediately applicable to measurement-based quantum computing (MBQC) processors such as in photonic devices. In this paper, we close this gap by presenting a framework to map physical noises and imperfections in MBQC processes to logical errors in equivalent quantum circuits, whereby enabling the well-known metrics to characterize MBQC. To showcase our framework, we study a continuous-variable cluster state based on the Gottesman-Kitaev-Preskill (GKP) encoding as a near-term candidate for photonic quantum computing, and derive the effective logical gate error channels and calculate the quantum volume in terms of the GKP squeezing and photon loss rate.	翻訳日:2023-04-05 00:56:09 公開日:2023-04-03
# 機密コンピューティングによる機械学習: 知識の体系化 Machine Learning with Confidential Computing: A Systematization of Knowledge ( http://arxiv.org/abs/2208.10134v2 ) ライセンス: Link先を確認	Fan Mo, Zahra Tarkhani, Hamed Haddadi	(参考訳) 機械学習(ML)におけるプライバシとセキュリティの課題は、MLの広範な開発と、最近の大規模な攻撃面のデモとともに、ますます深刻になっている。成熟したシステム指向のアプローチとして、Confidential Computingは、さまざまなMLシナリオにおけるプライバシとセキュリティの問題を軽減するために、学術と産業の両方で使用されている。本稿では,ML と Confidential Computing の連携について検討する。機密情報処理支援ML技術に関する先行研究を体系化する。一秘密の保証及び保証二完全性保証及びその先進的な特徴及び欠点について議論すること。重要な課題はさらに特定され、既存の信頼実行環境(tee)システムのmlユースケースに対する制限を専門的に分析する。最後に、クローズドループ保護のための基盤となるプライバシー定義、効率的なMLのパーティショニングされた実行、ML専用のTEEアシストデザイン、TEE対応ML、ML完全なパイプライン保証などについて論じる。知識の体系化にこれらの潜在的なソリューションを提供することで、計算やシステムコストを導入することなく、より強力なTEE対応MLをプライバシ保証のために実現するために、ブリッジを構築することを目指している。 Privacy and security challenges in Machine Learning (ML) have become increasingly severe, along with ML's pervasive development and the recent demonstration of large attack surfaces. As a mature system-oriented approach, Confidential Computing has been utilized in both academia and industry to mitigate privacy and security issues in various ML scenarios. In this paper, the conjunction between ML and Confidential Computing is investigated. We systematize the prior work on Confidential Computing-assisted ML techniques that provide i) confidentiality guarantees and ii) integrity assurances, and discuss their advanced features and drawbacks. Key challenges are further identified, and we provide dedicated analyses of the limitations in existing Trusted Execution Environment (TEE) systems for ML use cases. Finally, prospective works are discussed, including grounded privacy definitions for closed-loop protection, partitioned executions of efficient ML, dedicated TEE-assisted designs for ML, TEE-aware ML, and ML full pipeline guarantees. By providing these potential solutions in our systematization of knowledge, we aim at building the bridge to help achieve a much strong TEE-enabled ML for privacy guarantees without introducing computation and system costs.	翻訳日:2023-04-05 00:55:48 公開日:2023-04-03
# クロスモーダルトランスフォーマーを用いたダンススタイルトランスファー Dance Style Transfer with Cross-modal Transformer ( http://arxiv.org/abs/2208.09406v3 ) ライセンス: Link先を確認	Wenjie Yin, Hang Yin, Kim Baraka, Danica Kragic, and M{\aa}rten Bj\"orkman	(参考訳) そこで本研究では,あるダンススタイルにおける既存のモーションクリップを,ダンスのモーションコンテキストを保ちつつ,別のダンススタイルのモーションクリップに変換する,ダンススタイル転送システムであるcycledanceを提案する。提案手法は,既存のCycleGANアーキテクチャを拡張して音声シーケンスをモデル化し,マルチモーダルトランスフォーマーエンコーダを統合する。シーケンス長に基づくカリキュラム学習を採用し,トレーニングを安定化する。本手法は,移動フレーム間のリッチかつ長期的関係を捉え,移動伝達と合成作業において共通の課題である。さらに,ダンス動作の文脈において,移動強度とコンテンツ保存の指標を新たに導入する。 5年以上のダンス経験を持つ30人を対象に,広範囲にわたるアブレーション研究と人間による研究を行った。その結果, サイクルダンスは, 自然性, 伝達強度, コンテンツ保存において, ベースラインのサイクルガンを著しく上回って, ターゲットスタイルで現実的な動きを生じさせることがわかった。 We present CycleDance, a dance style transfer system to transform an existing motion clip in one dance style to a motion clip in another dance style while attempting to preserve motion context of the dance. Our method extends an existing CycleGAN architecture for modeling audio sequences and integrates multimodal transformer encoders to account for music context. We adopt sequence length-based curriculum learning to stabilize training. Our approach captures rich and long-term intra-relations between motion frames, which is a common challenge in motion transfer and synthesis work. We further introduce new metrics for gauging transfer strength and content preservation in the context of dance movements. We perform an extensive ablation study as well as a human study including 30 participants with 5 or more years of dance experience. The results demonstrate that CycleDance generates realistic movements with the target style, significantly outperforming the baseline CycleGAN on naturalness, transfer strength, and content preservation.	翻訳日:2023-04-05 00:55:28 公開日:2023-04-03
# リーマン型PDE-G-CNNの解析 Analysis of (sub-)Riemannian PDE-G-CNNs ( http://arxiv.org/abs/2210.00935v4 ) ライセンス: Link先を確認	Gijs Bellaard, Daan L. J. Bon, Gautam Pai, Bart M. N. Smets, Remco Duits	(参考訳) グループ同変畳み込みニューラルネットワーク(G-CNN)は幾何学的深層学習に成功している。通常、G-CNNはCNNに対して、ネットワーク内でハードコードされたはずのトレーニング対称性にネットワーク容量を浪費しないという利点がある。最近導入されたPDEベースのG-CNN(PDE-G-CNN)フレームワークはG-CNNを一般化している。 PDE-G-CNNは、それらが同時に持つコアアドバンテージを持つ 1)ネットワークの複雑さを減らす。 2)分類性能の向上、及び 3)幾何学的解釈性を提供する。それらの実装は、主に核との線形および形態的畳み込みからなる。本稿では,前述した近似的形態素核が必ずしも正確な核を正確に近似するとは限らないことを示す。より具体的には、リーマン計量の空間異方性(英語版)に依存するので、準リーマン近似に頼らなければならない。異方性に関係なく動作する新しい近似カーネルを提供することでこの問題を解決する。近似核のより優れた誤差推定を持つ新しい定理を提供し、それらがすべて正確なものと同じ反射対称性を持っていることを証明する。 PDE-G-CNNフレームワークにおける複数の近似カーネルの有効性を2つのデータセットで検証し、新しい近似カーネルによる改善を観察する。我々は、PDE-G-CNNは、G-CNNとCNNの2つのデータセットに比較して、ネットワークの複雑さを著しく低減する。さらに、PDE-G-CNNはG-CNNよりも優れた幾何学的解釈可能性を持つ。 Group equivariant convolutional neural networks (G-CNNs) have been successfully applied in geometric deep learning. Typically, G-CNNs have the advantage over CNNs that they do not waste network capacity on training symmetries that should have been hard-coded in the network. The recently introduced framework of PDE-based G-CNNs (PDE-G-CNNs) generalises G-CNNs. PDE-G-CNNs have the core advantages that they simultaneously 1) reduce network complexity, 2) increase classification performance, and 3) provide geometric interpretability. Their implementations primarily consist of linear and morphological convolutions with kernels. In this paper we show that the previously suggested approximative morphological kernels do not always accurately approximate the exact kernels accurately. More specifically, depending on the spatial anisotropy of the Riemannian metric, we argue that one must resort to sub-Riemannian approximations. We solve this problem by providing a new approximative kernel that works regardless of the anisotropy. We provide new theorems with better error estimates of the approximative kernels, and prove that they all carry the same reflectional symmetries as the exact ones. We test the effectiveness of multiple approximative kernels within the PDE-G-CNN framework on two datasets, and observe an improvement with the new approximative kernels. We report that the PDE-G-CNNs again allow for a considerable reduction of network complexity while having comparable or better performance than G-CNNs and CNNs on the two datasets. Moreover, PDE-G-CNNs have the advantage of better geometric interpretability over G-CNNs, as the morphological kernels are related to association fields from neurogeometry.	翻訳日:2023-04-05 00:48:28 公開日:2023-04-03
# 室内環境におけるポイントゴールナビゲーションのための教師なしビジュアルオドメトリーとアクション統合 Unsupervised Visual Odometry and Action Integration for PointGoal Navigation in Indoor Environment ( http://arxiv.org/abs/2210.00413v2 ) ライセンス: Link先を確認	Yijun Cao, Xianshi Zhang, Fuya Luo, Chuan Lin, and Yongjie Li	(参考訳) 屋内環境におけるポイントゴールナビゲーションは、個人ロボットが特定の地点に向かうための基本的なタスクである。最近の研究は、ノイズのない動作とgpsとコンパスセンサによる完璧な位置決めの仮定の下で、フォトリアリスティックシミュレート環境でほぼ完璧に近い成功率でこのポイントゴーアナビゲーションタスクを解決した。しかし、実際の屋内環境で正確なGPS信号を得るのは難しい。 gps信号無しでポイントゴーアナビゲーション精度を向上させるために,視覚オドメトリ(vo)を用い,教師なしで訓練された新しいアクション統合モジュール(aim)を提案する。教師なしVOは、2つの隣接するフレームの再投射誤差からエージェントの相対的なポーズを計算し、正確なGPS信号を経路積分に置き換える。 VOによって推定される擬似位置は、エージェントが位置に対する内部認識を更新し、ナビゲーションの成功率を向上させるためのアクション統合の訓練に使用される。トレーニングと推論プロセスは、RGB、深さ、衝突、および自己行動情報のみを使用する。実験の結果,提案システムは良好な結果が得られ,Gibsonデータセット上で部分的に教師付き学習アルゴリズムよりも優れていた。 PointGoal navigation in indoor environment is a fundamental task for personal robots to navigate to a specified point. Recent studies solved this PointGoal navigation task with near-perfect success rate in photo-realistically simulated environments, under the assumptions with noiseless actuation and most importantly, perfect localization with GPS and compass sensors. However, accurate GPS signalis difficult to be obtained in real indoor environment. To improve the PointGoal navigation accuracy without GPS signal, we use visual odometry (VO) and propose a novel action integration module (AIM) trained in unsupervised manner. Sepecifically, unsupervised VO computes the relative pose of the agent from the re-projection error of two adjacent frames, and then replaces the accurate GPS signal with the path integration. The pseudo position estimated by VO is used to train action integration which assists agent to update their internal perception of location and helps improve the success rate of navigation. The training and inference process only use RGB, depth, collision as well as self-action information. The experiments show that the proposed system achieves satisfactory results and outperforms the partially supervised learning algorithms on the popular Gibson dataset.	翻訳日:2023-04-05 00:48:02 公開日:2023-04-03
# 10^{-14}$レベルの系統的不確実性を有するテラヘルツ振動分子時計 A terahertz vibrational molecular clock with systematic uncertainty at the $10^{-14}$ level ( http://arxiv.org/abs/2209.10864v4 ) ライセンス: Link先を確認	K. H. Leung, B. Iritani, E. Tiberi, I. Majewska, M. Borkowski, R. Moszynski, T. Zelevinsky	(参考訳) 光学格子中の中性量子吸収体は、精巧な分光分解能を持つ時計を実現するための主要なプラットフォームとして登場した。しかし、これらの時計の研究とその体系的な変化は、これまで原子に限られてきた。ここでは、この構造を二原子分子のアンサンブルに拡張し、純粋な分子振動に基づく正確な格子時計を実験的に実現する。非線形トラップ誘起光シフトのキャラクタリゼーションを含む主要な系統評価を行い,総系統的不確実性は4.6\times10^{-14}$である。振動分割の絶対周波数は31 825 183 207 592.8(5.1) Hzと測定され、分子の解離エネルギーは記録精度で決定される。この結果は分子分光とthz周波数標準の重要なマイルストーンであり、分子量子電気力学や新しい相互作用の探索を含む基礎物理学への応用により、他の中性分子種に一般化される可能性がある。 Neutral quantum absorbers in optical lattices have emerged as a leading platform for achieving clocks with exquisite spectroscopic resolution. However, the studies of these clocks and their systematic shifts have so far been limited to atoms. Here, we extend this architecture to an ensemble of diatomic molecules and experimentally realize an accurate lattice clock based on pure molecular vibration. We evaluate the leading systematics, including the characterization of nonlinear trap-induced light shifts, achieving a total systematic uncertainty of $4.6\times10^{-14}$. The absolute frequency of the vibrational splitting is measured to be 31 825 183 207 592.8(5.1) Hz, enabling the dissociation energy of our molecule to be determined with record accuracy. Our results represent an important milestone in molecular spectroscopy and THz-frequency standards, and may be generalized to other neutral molecular species with applications for fundamental physics, including tests of molecular quantum electrodynamics and the search for new interactions.	翻訳日:2023-04-05 00:47:16 公開日:2023-04-03
# r\'{e}nyiダイバージェンス深層相互学習 R\'{e}nyi Divergence Deep Mutual Learning ( http://arxiv.org/abs/2209.05732v4 ) ライセンス: Link先を確認	Weipeng Huang, Junjie Tao, Changbo Deng, Ming Fan, Wenqiang Wan, Qi Xiong, Guangyuan Piao	(参考訳) 本稿では、単純で効果的な計算パラダイムであるDeep Mutual Learning (DML)を再考する。我々は、より柔軟で調整可能なKL分散の代わりにR\'{e}nyi分散を用いて、バニラDMLを改善することを提案する。この修正により、バニラDMLよりもパフォーマンスを継続的に改善できる。提案したパラダイムの収束特性を理論的に解析し,非凸最適化タスクの最悪の場合において,定常学習率の確率勾配 Descent を $\mathcal{O}(1)$-bias に収束させることを示した。つまり、学習は近くの最適な場所に到達するが、境界の範囲内を探索し続けることで、過度な適合を軽減できる。最後に,dmlとr\'{e}nyiの発散を組み合わせることで,一般化したモデルをさらに改善できることを示す。 This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla DML with limited additional complexity. The convergence properties of the proposed paradigm are analyzed theoretically, and Stochastic Gradient Descent with a constant learning rate is shown to converge with $\mathcal{O}(1)$-bias in the worst case scenario for nonconvex optimization tasks. That is, learning will reach nearby local optima but continue searching within a bounded scope, which may help mitigate overfitting. Finally, our extensive empirical results demonstrate the advantage of combining DML and R\'{e}nyi divergence, which further improves generalized models.	翻訳日:2023-04-05 00:46:37 公開日:2023-04-03
# BioGPT: バイオメディカルテキスト生成とマイニングのための生成事前学習型トランス BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining ( http://arxiv.org/abs/2210.10341v3 ) ライセンス: Link先を確認	Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu	(参考訳) 事前学習された言語モデルは、一般的な自然言語領域での成功に触発されて、生物医学領域で注目を集めている。一般言語領域における事前訓練された言語モデルの2つの主要分野、すなわちBERT(とその変種)とGPT(およびその変種)のうち、最初のものはBioBERTやPubMedBERTといった生物医学領域で広く研究されている。彼らは様々な差別的な下流のバイオメディカルなタスクで大きな成功を収めてきたが、生成能力の欠如はアプリケーションの範囲を制限している。本稿では,大規模生物医学文献に基づくドメイン固有生成型トランスフォーマー言語モデルであるBioGPTを提案する。バイオGPTを6つのNLPタスクで評価し、我々のモデルが多くのタスクで過去のモデルより優れていることを示す。特に、BC5CDRで44.98%、38.42%、40.76%のF1スコア、KD-DTIとDDIのエンドツーエンド関係抽出タスクで78.2%、PubMedQAで78.2%の精度で新しい記録を作成した。テキスト生成のケーススタディは、バイオメディカル文献におけるバイオGPTの利点をさらに示し、バイオメディカル用語の流動的な記述を生成する。コードはhttps://github.com/microsoft/BioGPTで入手できる。 Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e., BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature. We evaluate BioGPT on six biomedical NLP tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA, creating a new record. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms. Code is available at https://github.com/microsoft/BioGPT.	翻訳日:2023-04-05 00:40:30 公開日:2023-04-03
# 野生におけるカテゴリーレベル6次元物体ポーズ推定のための自己教師あり幾何対応 Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild ( http://arxiv.org/abs/2210.07199v3 ) ライセンス: Link先を確認	Kaifeng Zhang, Yang Fu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang	(参考訳) 6dオブジェクトポーズ推定はコンピュータビジョンとロボティクスに幅広く応用されているが、アノテーションの欠如によって解決されるには程遠い。カテゴリレベルの6dポーズに移行することで、この問題はさらに難しくなります。現在のアプローチは、シミュレーションや人間からの収集からアノテーションを活用することで制限されている。本稿では,カテゴリーレベルの6次元ポーズ推定のために,大規模現実世界のオブジェクトビデオを直接学習する自己教師型学習手法を導入することで,この障壁を克服する。本フレームワークは,対象カテゴリの正準3次元形状を再構成し,入力画像と正準形状との密接な対応を表面埋め込みにより学習する。トレーニングのために,2次元3次元空間,異なるインスタンス,異なる時間ステップにまたがるサイクルを構成する新しい幾何学的サイクル整合性損失を提案する。学習した対応は、6次元ポーズ推定やキーポイント転送などの下流タスクに適用できる。驚いたことに、この手法は人間のアノテーションやシミュレータを使わずに、以前の監視または半監視された画像のメソッドよりも、ほぼあるいはそれ以上の性能を達成できます。私たちのプロジェクトページは以下のとおりです。 While 6D object pose estimation has wide applications across computer vision and robotics, it remains far from being solved due to the lack of annotations. The problem becomes even more challenging when moving to category-level 6D pose, which requires generalization to unseen instances. Current approaches are restricted by leveraging annotations from simulation or collected from humans. In this paper, we overcome this barrier by introducing a self-supervised learning approach trained directly on large-scale real-world object videos for category-level 6D pose estimation in the wild. Our framework reconstructs the canonical 3D shape of an object category and learns dense correspondences between input images and the canonical shape via surface embedding. For training, we propose novel geometrical cycle-consistency losses which construct cycles across 2D-3D spaces, across different instances and different time steps. The learned correspondence can be applied for 6D pose estimation and other downstream tasks such as keypoint transfer. Surprisingly, our method, without any human annotations or simulators, can achieve on-par or even better performance than previous supervised or semi-supervised methods on in-the-wild images. Our project page is: https://kywind.github.io/self-pose .	翻訳日:2023-04-05 00:39:25 公開日:2023-04-03
# 自己誘導拡散モデル Self-Guided Diffusion Models ( http://arxiv.org/abs/2210.06462v2 ) ライセンス: Link先を確認	Vincent Tao Hu, David W Zhang, Yuki M. Asano, Gertjan J. Burghouts, Cees G. M. Snoek	(参考訳) 拡散モデルは、特に生成過程を制御するためのガイダンスを使用する場合、画像生成品質の顕著な進歩を示した。しかし、指導にはトレーニングのために大量の画像注釈ペアが必要であり、その可用性、正確性、偏りに依存する。本稿では,自己誘導拡散モデルのためのフレームワークの設計に自己超越信号の柔軟性を活用することで,このようなアノテーションの必要性を解消する。特徴抽出関数と自己アノテーション関数を活用することで,全体像のレベルからオブジェクトボックス,さらにはセグメンテーションマスクまで,さまざまな画像粒度のガイダンス信号を提供する。シングルラベルおよびマルチラベル画像データセットを用いた実験により,自己ラベル誘導は,常にガイダンス無しの拡散モデルよりも優れており,特に不均衡データにおいて,接地ラベルに基づくガイダンスを超越する可能性も示された。自己教師付きボックスやマスクプロポーザルを備える場合、クラス、ボックス、セグメントラベルアノテーションを必要とせず、視覚的に多様で意味的に一貫性のある画像を生成する。自己誘導拡散はシンプルで柔軟性があり、大規模展開で利益を期待できる。 Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, guidance requires a large amount of image-annotation pairs for training and is thus dependent on their availability, correctness and unbiasedness. In this paper, we eliminate the need for such annotation by instead leveraging the flexibility of self-supervision signals to design a framework for self-guided diffusion models. By leveraging a feature extraction function and a self-annotation function, our method provides guidance signals at various image granularities: from the level of holistic images to object boxes and even segmentation masks. Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance and may even surpass guidance based on ground-truth labels, especially on unbalanced data. When equipped with self-supervised box or mask proposals, our method further generates visually diverse yet semantically consistent images, without the need for any class, box, or segment label annotation. Self-guided diffusion is simple, flexible and expected to profit from deployment at scale.	翻訳日:2023-04-05 00:39:06 公開日:2023-04-03
# 連続最適化によるプログラム合成 Synthesizing Programs with Continuous Optimization ( http://arxiv.org/abs/2211.00828v2 ) ライセンス: Link先を確認	Shantanu Mandal, Todd A. Anderson, Javier Turek, Justin Gottschlich, Abdullah Muzahid	(参考訳) いくつかの仕様に基づく自動ソフトウェア生成はプログラム合成として知られている。既存の手法の多くは、離散パラメータを持つ探索問題としてプログラム合成を定式化する。本稿では,プログラム合成を連続最適化問題として新たに定式化し,Covariance Matrix Adaptation Evolution Strategyとして知られる最先端の進化的アプローチを用いて解決する。次に,連続定式化を実際のプログラムに変換するマッピングスキームを提案する。我々は、GENESYSと呼ばれるシステムと、近年のプログラム合成技術(離散領域と連続領域の両方)を比較し、GENESYSが既存のスキームよりも固定時間内により多くのプログラムを合成していることを示す。例えば、長さ10のプログラムでは、GENESYSは既存の計画よりも28%多くのプログラムを同時に合成する。 Automatic software generation based on some specification is known as program synthesis. Most existing approaches formulate program synthesis as a search problem with discrete parameters. In this paper, we present a novel formulation of program synthesis as a continuous optimization problem and use a state-of-the-art evolutionary approach, known as Covariance Matrix Adaptation Evolution Strategy to solve the problem. We then propose a mapping scheme to convert the continuous formulation into actual programs. We compare our system, called GENESYS, with several recent program synthesis techniques (in both discrete and continuous domains) and show that GENESYS synthesizes more programs within a fixed time budget than those existing schemes. For example, for programs of length 10, GENESYS synthesizes 28% more programs than those existing schemes within the same time budget.	翻訳日:2023-04-05 00:30:17 公開日:2023-04-03
# 映像生成タスクのための連続表現空間INR-V INR-V: A Continuous Representation Space for Video-based Generative Tasks ( http://arxiv.org/abs/2210.16579v2 ) ライセンス: Link先を確認	Bipasha Sen, Aditya Agarwal, Vinay P Namboodiri, C. V. Jawahar	(参考訳) ビデオの生成は複雑な作業であり、フレームごとに時間的にコヒーレントな画像を生成する。これにより、ビデオの表現性は、ネットワーク設計を必要とする個々のビデオフレーム上でのみの画像ベースの操作に制限される。本稿では,映像生成タスクの連続的な空間を学習する映像表現ネットワークINR-Vを提案する。 inr-vは、ビデオの各入力画素のrgb値を予測する多層パーセプトロンである暗黙的ニューラルネットワーク(inrs)を使用して、ビデオをパラメータ化する。 INRは、複数のビデオインスタンスの神経表現に基づいてトレーニングされたハイパーネットワークであるメタネットワークを使用して予測される。その後、メタネットワークをサンプル化し、様々な新しいビデオを生成することで、下流のビデオベースの生成タスクを実現できる。興味深いことに、条件付き正規化とプログレッシブウェイト初期化は、INR-Vを得る上で重要な役割を果たす。 INR-Vによって学習された表現空間は、既存の作品では不可能な多くの興味深い性質を示す画像空間よりも表現性が高い。例えば、inr-vは、既知のビデオインスタンス間(中間id、表情、ポーズなど)の中間ビデオをスムーズに補間することができる。また、ビデオの欠落部分を塗りつぶして、一時的にコヒーレントなフルビデオを復元することもできる。本研究では,INR-Vが学習した映像補間,新規映像生成,映像インバージョン,既存のベースラインに対する映像インペインティングなど,多様な生成タスクの空間を評価する。 INR-Vはこれらのいくつかの実証されたタスクのベースラインを著しく上回り、明らかに提案された表現空間の可能性を示している。 Generating videos is a complex task that is accomplished by generating a set of temporally coherent images frame-by-frame. This limits the expressivity of videos to only image-based operations on the individual video frames needing network designs to obtain temporally coherent trajectories in the underlying image space. We propose INR-V, a video representation network that learns a continuous space for video-based generative tasks. INR-V parameterizes videos using implicit neural representations (INRs), a multi-layered perceptron that predicts an RGB value for each input pixel location of the video. The INR is predicted using a meta-network which is a hypernetwork trained on neural representations of multiple video instances. Later, the meta-network can be sampled to generate diverse novel videos enabling many downstream video-based generative tasks. Interestingly, we find that conditional regularization and progressive weight initialization play a crucial role in obtaining INR-V. The representation space learned by INR-V is more expressive than an image space showcasing many interesting properties not possible with the existing works. For instance, INR-V can smoothly interpolate intermediate videos between known video instances (such as intermediate identities, expressions, and poses in face videos). It can also in-paint missing portions in videos to recover temporally coherent full videos. In this work, we evaluate the space learned by INR-V on diverse generative tasks such as video interpolation, novel video generation, video inversion, and video inpainting against the existing baselines. INR-V significantly outperforms the baselines on several of these demonstrated tasks, clearly showcasing the potential of the proposed representation space.	翻訳日:2023-04-05 00:30:05 公開日:2023-04-03
# グラフト視覚変換器 Grafting Vision Transformers ( http://arxiv.org/abs/2210.15943v2 ) ライセンス: Link先を確認	Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo	(参考訳) ビジョントランスフォーマー(ViT)は近年、多くのコンピュータビジョンタスクにおける最先端技術となっている。畳み込みネットワーク(CNN)とは対照的に、ViTはネットワークの浅い層、すなわち高解像度の機能でもグローバルな情報共有を可能にする。しかし、後にスウィントランス(swin transformer)のようなピラミッドアーキテクチャが成功し、パフォーマンスと複雑さのトレードオフが向上した。本稿では,ネットワーク全体のグローバル依存性とマルチスケール情報を考慮した簡易かつ効率的なアドオンコンポーネント(グラフト)を提案する。任意の深さで分岐する柔軟性があり、バックボーンのパラメータと計算のほとんどを共有する。 GrafTは、ハイブリッドトランスフォーマー型と純粋なトランスフォーマー型の両方、均質構造とピラミッド構造の両方、そして様々な自己注意法を含む、よく知られたモデルに対して一貫した利得を示す。特に、ハイレベルなセマンティクスを提供することで、モバイルサイズのモデルに大きく貢献する。 ImageNet-1kデータセットでは、DeiT-T、Swin-T、MobileViT-XXSに+3.9%、+1.4%、+1.9%の精度改善が提供されている。私たちのコードとモデルは利用可能になります。 Vision Transformers (ViTs) have recently become the state-of-the-art across many computer vision tasks. In contrast to convolutional networks (CNNs), ViTs enable global information sharing even within shallow layers of a network, i.e., among high-resolution features. However, this perk was later overlooked with the success of pyramid architectures such as Swin Transformer, which show better performance-complexity trade-offs. In this paper, we present a simple and efficient add-on component (termed GrafT) that considers global dependencies and multi-scale information throughout the network, in both high- and low-resolution features alike. It has the flexibility of branching out at arbitrary depths and shares most of the parameters and computations of the backbone. GrafT shows consistent gains over various well-known models which includes both hybrid and pure Transformer types, both homogeneous and pyramid structures, and various self-attention methods. In particular, it largely benefits mobile-size models by providing high-level semantics. On the ImageNet-1k dataset, GrafT delivers +3.9%, +1.4%, and +1.9% top-1 accuracy improvement to DeiT-T, Swin-T, and MobileViT-XXS, respectively. Our code and models will be made available.	翻訳日:2023-04-05 00:29:36 公開日:2023-04-03
# 帰納的行動推論 Abductive Action Inference ( http://arxiv.org/abs/2210.13984v2 ) ライセンス: Link先を確認	Clement Tan, Chai Kiat Yeo, Cheston Tan, Basura Fernando	(参考訳) 帰納的推論(abductive reasoning)は、与えられた不完全な観測集合の最も可能性の高い推論を行うことを目的としている。本研究では,「現在の状態に着くためには,どのような行動が人間によって実行されたのか?」という疑問に答える,帰納的行動推論(abductive action inference)という新しいタスクを提案する。状態が与えられた場合,行動セット予測,行動シーケンス予測,帰納的行動検証という3つの帰納的推論問題を調査する。我々は、Transformer、Graph Neural Network、CLIP、BLIP、エンドツーエンドトレーニングされたSlow-Fast、Resnet50-3Dモデルなど、いくつかのSOTAモデルをベンチマークする。今回提案するobject-relational bigedモデルは,アクションゲノムデータセットにおけるこの困難なタスクにおいて,他のすべての手法を上回っている。コードは利用可能になる。 Abductive reasoning aims to make the most likely inference for a given set of incomplete observations. In this work, we propose a new task called abductive action inference, in which given a situation, the model answers the question `what actions were executed by the human in order to arrive in the current state?'. Given a state, we investigate three abductive inference problems: action set prediction, action sequence prediction, and abductive action verification. We benchmark several SOTA models such as Transformers, Graph neural networks, CLIP, BLIP, end-to-end trained Slow-Fast, and Resnet50-3D models. Our newly proposed object-relational BiGED model outperforms all other methods on this challenging task on the Action Genome dataset. Codes will be made available.	翻訳日:2023-04-05 00:28:53 公開日:2023-04-03
# I$^2$-GNNを用いたグラフニューラルネットワークのサイクルカウントパワー向上 Boosting the Cycle Counting Power of Graph Neural Networks with I$^2$-GNNs ( http://arxiv.org/abs/2210.13978v2 ) ライセンス: Link先を確認	Yinan Huang, Xingang Peng, Jianzhu Ma, Muhan Zhang	(参考訳) メッセージパッシングニューラルネットワーク(英: Message Passing Neural Networks、MPNN)は、グラフニューラルネットワーク(GNN)の一種。 MPNNの限られた表現力は、証明可能な強力なGNNアーキテクチャの研究を刺激する。しかし、あるモデルを知ることは、あるモデルが表現できる機能やできない機能についての洞察をほとんど与えない。これらのモデルが、生物学、化学、社会ネットワーク分析の応用に不可欠な、特定のグラフ部分構造を数えるといった特定の関数を近似できるかどうかはまだ不明である。そこで本研究では,各ノードのルート付きサブグラフを抽出し,ルートノードにユニークな識別子を割り当て,ルートノードの表現をそのルート付きサブグラフ内にエンコードする,GNNモデルの最近の人気クラスであるSubgraph MPNNのカウント能力について検討する。具体的には、サブグラフmpnnがノードレベルで4サイクル以上を数えることができないことを証明し、ノード表現が4原子以上の環系のような周囲の部分構造を正しくエンコードできないことを示唆する。この制限を克服するため、各サブグラフ内のルートノードとその隣人に異なる識別子を割り当てることで、サブグラフMPNNを拡張するためのI$^2$-GNNを提案する。 I$^2$-GNNsの識別力は、サブグラフMPNNよりも強く、3WLテストより部分的に強いことが示されている。さらに重要なことは、I$^2$-GNNは3, 4, 5, 6サイクル全てを数えることができ、有機化学におけるベンゼン環のような一般的なサブ構造をカバーし、線形複雑性を維持している。我々の知る限りでは、理論的な保証とともに6サイクルを数えられる最初の線形時間GNNモデルである。サイクルカウントタスクにおけるカウント能力を検証するとともに,分子予測ベンチマークにおける競合性能を示す。 Message Passing Neural Networks (MPNNs) are a widely used class of Graph Neural Networks (GNNs). The limited representational power of MPNNs inspires the study of provably powerful GNN architectures. However, knowing one model is more powerful than another gives little insight about what functions they can or cannot express. It is still unclear whether these models are able to approximate specific functions such as counting certain graph substructures, which is essential for applications in biology, chemistry and social network analysis. Motivated by this, we propose to study the counting power of Subgraph MPNNs, a recent and popular class of powerful GNN models that extract rooted subgraphs for each node, assign the root node a unique identifier and encode the root node's representation within its rooted subgraph. Specifically, we prove that Subgraph MPNNs fail to count more-than-4-cycles at node level, implying that node representations cannot correctly encode the surrounding substructures like ring systems with more than four atoms. To overcome this limitation, we propose I$^2$-GNNs to extend Subgraph MPNNs by assigning different identifiers for the root node and its neighbors in each subgraph. I$^2$-GNNs' discriminative power is shown to be strictly stronger than Subgraph MPNNs and partially stronger than the 3-WL test. More importantly, I$^2$-GNNs are proven capable of counting all 3, 4, 5 and 6-cycles, covering common substructures like benzene rings in organic chemistry, while still keeping linear complexity. To the best of our knowledge, it is the first linear-time GNN model that can count 6-cycles with theoretical guarantees. We validate its counting power in cycle counting tasks and demonstrate its competitive performance in molecular prediction benchmarks.	翻訳日:2023-04-05 00:28:40 公開日:2023-04-03
# 神経理論とは? 大規模LMにおける社会知能の限界について Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs ( http://arxiv.org/abs/2210.13312v2 ) ライセンス: Link先を確認	Maarten Sap, Ronan LeBras, Daniel Fried, Yejin Choi	(参考訳) 社会的インテリジェンスと心の理論(ToM)、すなわち、関係するすべての人々の異なる精神状態、意図、反応を推論する能力によって、人間は日々の社会的相互作用を効果的にナビゲートし理解することができる。 NLPシステムはますます複雑な社会状況において使用されるため、社会的ダイナミクスを理解する能力は重要である。本研究では,現代NLPシステムにおける社会的知能と心の理論のオープンな問題について,実証的・理論的観点から検討する。現在の最大の言語モデル(gpt-3, brown et al., 2020)の1つには,2つのタスク - socialiqa (sap et al., 2019) という,モデルが社会的インタラクションの参加者の意図や反応を理解する能力を測定するもの - と,モデルがメンタル状態や参加者の現実を推測できるかどうかを測定する tomi (le et al., 2019) がある。以上の結果から,socialiqa と tomi はそれぞれ 55% と 60% の well-below-human accuracies である。結論として,データやニューラルネットワーク,トレーニングパラダイムに起因する制限を調べることで,大規模言語モデルの欠点を文脈化するために,実用学からの理論を導出する。スケールしか必要としない一般的な物語に従えば、人中心のNLPアプローチがマインドの神経理論に対してより効果的である可能性が示唆される。更新版では、ニューラルToMのための新しい命令チューニングとRLFHモデルも分析した。その結果,ChatGPT や GPT-4 でさえ創発的心の理論を示さず,GPT-4 でさえ精神状態や現実に関する ToMi の質問に対して 60% の精度しか達成していないことがわかった。 Social intelligence and Theory of Mind (ToM), i.e., the ability to reason about the different mental states, intents, and reactions of all people involved, allow humans to effectively navigate and understand everyday social interactions. As NLP systems are used in increasingly complex social situations, their ability to grasp social dynamics becomes crucial. In this work, we examine the open question of social intelligence and Theory of Mind in modern NLP systems from an empirical and theory-based perspective. We show that one of today's largest language models (GPT-3; Brown et al., 2020) lacks this kind of social intelligence out-of-the box, using two tasks: SocialIQa (Sap et al., 2019), which measures models' ability to understand intents and reactions of participants of social interactions, and ToMi (Le et al., 2019), which measures whether models can infer mental states and realities of participants of situations. Our results show that models struggle substantially at these Theory of Mind tasks, with well-below-human accuracies of 55% and 60% on SocialIQa and ToMi, respectively. To conclude, we draw on theories from pragmatics to contextualize this shortcoming of large language models, by examining the limitations stemming from their data, neural architecture, and training paradigms. Challenging the prevalent narrative that only scale is needed, we posit that person-centric NLP approaches might be more effective towards neural Theory of Mind. In our updated version, we also analyze newer instruction tuned and RLFH models for neural ToM. We find that even ChatGPT and GPT-4 do not display emergent Theory of Mind; strikingly even GPT-4 performs only 60% accuracy on the ToMi questions related to mental states and realities.	翻訳日:2023-04-05 00:28:09 公開日:2023-04-03
# aiMotive Dataset:長距離知覚を用いたロバスト自動運転のためのマルチモーダルデータセット aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving with Long-Range Perception ( http://arxiv.org/abs/2211.09445v2 ) ライセンス: Link先を確認	Tam\'as Matuszka, Iv\'an Barton, \'Ad\'am Butykai, P\'eter Hajas, D\'avid Kiss, Domonkos Kov\'acs, S\'andor Kuns\'agi-M\'at\'e, P\'eter Lengyel, G\'abor N\'emeth, Levente Pet\H{o}, Dezs\H{o} Ribli, D\'avid Szeghy, Szabolcs Vajna, B\'alint Varga	(参考訳) 自動運転はコンピュータビジョン研究コミュニティで人気のある研究分野である。自動運転車は安全性が極めて重要であるため、現実の展開には堅牢性を保証することが不可欠である。いくつかの公共のマルチモーダルデータセットはアクセス可能であるが、主に悪天候に適さない2つのセンサーモード(カメラ、LiDAR)で構成されている。さらに、長距離アノテーションが欠如しているため、自動運転車の高速道路アシスタント機能の基盤となるニューラルネットワークのトレーニングが困難になる。そこで本稿では,長距離認識による頑健な自律運転のためのマルチモーダルデータセットを提案する。データセットは176のシーンで構成され、同期して校正されたLiDAR、カメラ、レーダーセンサーが360度視野をカバーする。収集したデータは、昼間、夜間、雨季に高速道路、都市、郊外で撮影され、フレーム間に一貫した識別子を持つ3D境界ボックスで注釈付けされている。さらに,3次元物体検出のためのユニモーダルベースラインモデルとマルチモーダルベースラインモデルを訓練した。データは \url{https://github.com/aimotive/aimotive_dataset} で入手できる。 Autonomous driving is a popular research area within the computer vision research community. Since autonomous vehicles are highly safety-critical, ensuring robustness is essential for real-world deployment. While several public multimodal datasets are accessible, they mainly comprise two sensor modalities (camera, LiDAR) which are not well suited for adverse weather. In addition, they lack far-range annotations, making it harder to train neural networks that are the base of a highway assistant function of an autonomous vehicle. Therefore, we introduce a multimodal dataset for robust autonomous driving with long-range perception. The dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view. The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain and is annotated with 3D bounding boxes with consistent identifiers across frames. Furthermore, we trained unimodal and multimodal baseline models for 3D object detection. Data are available at \url{https://github.com/aimotive/aimotive_dataset}.	翻訳日:2023-04-05 00:21:25 公開日:2023-04-03
# マルチエージェント強化学習のための説明可能な行動助言 Explainable Action Advising for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2211.07882v2 ) ライセンス: Link先を確認	Yue Guo, Joseph Campbell, Simon Stepputtis, Ruiyu Li, Dana Hughes, Fei Fang, Katia Sycara	(参考訳) 行動アドバイスは教師-学生パラダイムに基づく強化学習のための知識伝達技術である。専門教師は、学生のサンプル効率と政策性能を改善するために、訓練中に生徒にアドバイスを提供する。このようなアドバイスは一般に状態-作用対の形で与えられる。しかし、学生が新たな国家を論じて適用することは困難である。本稿では,教師が行動アドバイスを提示する説明可能な行動助言と,行動が選択された理由を示す説明を紹介する。これにより、生徒は学習したものを自己反映することができ、アドバイスの一般化が可能になり、教師が最適でない環境でもサンプルの効率と学習性能が向上する。我々は,単一エージェントシナリオと複数エージェントシナリオの両方において,我々のフレームワークが有効であることを実証的に示す。 Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm. An expert teacher provides advice to a student during training in order to improve the student's sample efficiency and policy performance. Such advice is commonly given in the form of state-action pairs. However, it makes it difficult for the student to reason with and apply to novel states. We introduce Explainable Action Advising, in which the teacher provides action advice as well as associated explanations indicating why the action was chosen. This allows the student to self-reflect on what it has learned, enabling advice generalization and leading to improved sample efficiency and learning performance - even in environments where the teacher is sub-optimal. We empirically show that our framework is effective in both single-agent and multi-agent scenarios, yielding improved policy returns and convergence rates when compared to state-of-the-art methods	翻訳日:2023-04-05 00:20:30 公開日:2023-04-03
# ラベルノイズがフェデレーション学習に及ぼす影響の定量化 Quantifying the Impact of Label Noise on Federated Learning ( http://arxiv.org/abs/2211.07816v7 ) ライセンス: Link先を確認	Shuqi Ke, Chao Huang, Xin Liu	(参考訳) Federated Learning(FL)は、クライアントがローカル(ヒューマン生成)データセットを使用してモデルを協調的にトレーニングする分散機械学習パラダイムである。既存の研究では、クライアント間のデータ不均一性に取り組むためのFLアルゴリズムの開発に焦点が当てられているが、FLにおけるデータ品質(ラベルノイズなど)の重要な問題は見過ごされている。本稿では,FLにおけるラベルノイズの影響を定量的に検討することにより,このギャップを埋めることを目的とする。クライアントのラベルノイズレベルにおいて線形な一般化誤差の上限を導出する。次に,様々なFLアルゴリズムを用いて,MNISTとCIFAR-10データセットの実験を行った。実験の結果,ノイズレベルが増加すると,大域モデル精度は線形に減少し,理論解析と一致することがわかった。さらに,ラベルノイズがflトレーニングの収束を遅くし,ノイズレベルが高い場合にはグローバルモデルが過剰に適合する傾向がみられた。 Federated Learning (FL) is a distributed machine learning paradigm where clients collaboratively train a model using their local (human-generated) datasets. While existing studies focus on FL algorithm development to tackle data heterogeneity across clients, the important issue of data quality (e.g., label noise) in FL is overlooked. This paper aims to fill this gap by providing a quantitative study on the impact of label noise on FL. We derive an upper bound for the generalization error that is linear in the clients' label noise level. Then we conduct experiments on MNIST and CIFAR-10 datasets using various FL algorithms. Our empirical results show that the global model accuracy linearly decreases as the noise level increases, which is consistent with our theoretical analysis. We further find that label noise slows down the convergence of FL training, and the global model tends to overfit when the noise level is high.	翻訳日:2023-04-05 00:20:15 公開日:2023-04-03
# 単一ステップrydbergブロックゲートによる全光量子情報処理 All optical quantum information processing via a single step Rydberg blockade gate ( http://arxiv.org/abs/2211.06998v3 ) ライセンス: Link先を確認	Mohammadsadegh Khazali	(参考訳) 量子インターネットの実現における重要な要素の1つは決定論的2光子ゲートである。この$CZ$フォトニックゲートは、全光学量子情報処理のためのユニバーサルゲートのセットも完成する。本稿では、非リドバーグ電磁誘導透過(eit)を用いた原子アンサンブルに制御光子とターゲット光子の両方を格納し、グローバルレーザーを用いた高速単段リドバーグ励起を行い、cz$フォトニックゲートを実現する手法について述べる。提案方式は、ライドバーグ励起に用いられる2つのレーザーの相対強度変調によって動作する。従来の$\pi$-gap-$\pi$スキームを回避して、提案手法では環境ノイズからライドバーグ原子を連続的にレーザーで保護する。閉塞半径内の貯蔵光子の完全な空間的重なりは、光学的深さを最適化し、実験を単純化する。ここでのコヒーレント操作は、以前のRydberg EITスキームで散逸した領域で行われる。主な不完全性源,すなわちRydbergの自然放出と中間レベル,集団回転誤差,遷移線のドップラー拡大,保存・検索効率,原子熱運動誘起デコヒーレンスを考慮し,現実的な実験パラメータ 99.7 % の忠実性は達成可能であると結論づける。 One of the critical elements in the realization of the quantum internet are deterministic two-photon gates. This $CZ$ photonic gate also completes a set of universal gates for all-optical quantum information processing. This article discusses an approach to realize high fidelity $CZ$ photonic gate by storing both control and target photons within an atomic ensemble using non-Rydberg electromagnetically induced transparency (EIT) followed by a fast, single-step Rydberg excitation with global lasers. The proposed scheme operates by relative intensity modulation of two lasers used in Rydberg excitation. Circumventing the conventional $\pi$-gap-$\pi$ schemes, the proposed operation features continuous laser protection of the Rydberg atoms from the environment noise. The complete spatial overlap of stored photons inside the blockade radius optimizes the optical depth and simplifies the experiment. The coherent operation here is performed in the region that was dissipative in the previous Rydberg EIT schemes. Encountering the main imperfection sources, i.e. the spontaneous emission of the Rydberg and intermediate levels, population rotation errors, Doppler broadening of the transition lines, storage/retrieval efficiency, and atomic thermal motion induced decoherence, this article concludes that with realistic experimental parameters 99.7\% fidelity is achievable.	翻訳日:2023-04-05 00:19:59 公開日:2023-04-03
# ニューラルネットワーク表現の人間のアライメント Human alignment of neural network representations ( http://arxiv.org/abs/2211.01201v4 ) ライセンス: Link先を確認	Lukas Muttenthaler, Jonas Dippel, Lorenz Linhardt, Robert A. Vandermeulen, Simon Kornblith	(参考訳) 今日のコンピュータビジョンモデルは、多種多様なビジョンタスクで人間またはほぼ人間レベルのパフォーマンスを達成する。しかし、彼らのアーキテクチャ、データ、学習アルゴリズムは、人間のビジョンを生み出すものとは様々な点で異なる。本稿では,ニューラルネットワークが学習した表現と行動応答から推定される人間の心的表現のアライメントに影響を与える要因について検討する。モデルスケールとアーキテクチャは基本的に人間の行動応答に影響を及ぼさないが、トレーニングデータセットと客観的関数はどちらもはるかに大きな影響を与える。これらの結果は、2つの異なるタスクを用いて収集された3つの人間類似性判定データセット間で一致している。 1つのデータセットからの行動応答から学習したニューラルネットワーク表現の線形変換は、他の2つのデータセット上の人間の類似性判定とのアライメントを大幅に改善する。さらに, 食物や動物などの人間の概念はニューラルネットワークによってよく表現されているのに対し, ロイヤルやスポーツ関連の物体はそうではない。全体として、より大きく多様なデータセットでトレーニングされたモデルは、ImageNetだけでトレーニングされたモデルよりも人間との整合性が向上するが、我々の結果は、スケーリング単独では、人間が使用するモデルと一致する概念的な表現でニューラルネットワークをトレーニングするのに十分ではないことを示唆している。 Today's computer vision models achieve human or near-human level performance across a wide variety of vision tasks. However, their architectures, data, and learning algorithms differ in numerous ways from those that give rise to human vision. In this paper, we investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations inferred from behavioral responses. We find that model scale and architecture have essentially no effect on the alignment with human behavioral responses, whereas the training dataset and objective function both have a much larger impact. These findings are consistent across three datasets of human similarity judgments collected using two different tasks. Linear transformations of neural network representations learned from behavioral responses from one dataset substantially improve alignment with human similarity judgments on the other two datasets. In addition, we find that some human concepts such as food and animals are well-represented by neural networks whereas others such as royal or sports-related objects are not. Overall, although models trained on larger, more diverse datasets achieve better alignment with humans than models trained on ImageNet alone, our results indicate that scaling alone is unlikely to be sufficient to train neural networks with conceptual representations that match those used by humans.	翻訳日:2023-04-05 00:18:26 公開日:2023-04-03
# 画像復調のための適応動的フィルタリングネットワーク Adaptive Dynamic Filtering Network for Image Denoising ( http://arxiv.org/abs/2211.12051v3 ) ライセンス: Link先を確認	Hao Shen, Zhong-Qiu Zhao, Wandi Zhang	(参考訳) 画像デノーミングネットワークでは、機能スケーリングは受動的フィールドサイズを拡大し、計算コストを削減するために広く利用されている。しかし、この慣行は高周波情報の損失を招き、大規模な特性を考慮できない。近年、動的畳み込みは高周波情報(エッジ、コーナー、テクスチャなど)の処理において強力な能力を発揮しているが、従来の作品はフィルタ生成における十分な空間的コンテクスト情報を欠いている。これらの問題を緩和するため,我々は動的畳み込みを用いて高周波・マルチスケール特徴の学習を改善することを提案する。具体的には,動的畳み込みを改善するために空間的に拡張されたカーネル生成(sekg)モジュールを設計し,計算量が非常に少ない空間的コンテキスト情報の学習を可能にした。 SEKG モジュールをベースとして,動的畳み込みブロック (DCB) とマルチスケール動的畳み込みブロック (MDCB) を提案する。前者は動的畳み込みにより高周波情報を強化し、スキップ接続を介して低周波情報を保存する。後者は、共有適応動的カーネルと拡張畳み込みの概念を利用して、効率的なマルチスケール特徴抽出を実現する。提案するマルチディメンジョン機能統合(MFI)機構は,マルチスケール機能をさらに融合させ,正確かつコンテキストに富んだ特徴表現を提供する。最後に,adfnet と呼ばれる dcb と mdcb を用いた効率的な分別ネットワークを構築する。実世界および合成ガウスノイズデータセットにおける計算複雑性の低い性能を実現する。ソースコードはhttps://github.com/it-hao/ADFNetで入手できる。 In image denoising networks, feature scaling is widely used to enlarge the receptive field size and reduce computational costs. This practice, however, also leads to the loss of high-frequency information and fails to consider within-scale characteristics. Recently, dynamic convolution has exhibited powerful capabilities in processing high-frequency information (e.g., edges, corners, textures), but previous works lack sufficient spatial contextual information in filter generation. To alleviate these issues, we propose to employ dynamic convolution to improve the learning of high-frequency and multi-scale features. Specifically, we design a spatially enhanced kernel generation (SEKG) module to improve dynamic convolution, enabling the learning of spatial context information with a very low computational complexity. Based on the SEKG module, we propose a dynamic convolution block (DCB) and a multi-scale dynamic convolution block (MDCB). The former enhances the high-frequency information via dynamic convolution and preserves low-frequency information via skip connections. The latter utilizes shared adaptive dynamic kernels and the idea of dilated convolution to achieve efficient multi-scale feature extraction. The proposed multi-dimension feature integration (MFI) mechanism further fuses the multi-scale features, providing precise and contextually enriched feature representations. Finally, we build an efficient denoising network with the proposed DCB and MDCB, named ADFNet. It achieves better performance with low computational complexity on real-world and synthetic Gaussian noisy datasets. The source code is available at https://github.com/it-hao/ADFNet.	翻訳日:2023-04-05 00:12:30 公開日:2023-04-03
# ESLAM:符号付き距離場のハイブリッド表現に基づく高効率高密度SLAMシステム ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields ( http://arxiv.org/abs/2211.11704v2 ) ライセンス: Link先を確認	Mohammad Mahdi Johari, Camilla Carta, Fran\c{c}ois Fleuret	(参考訳) 同時局所化マッピング(SLAM)のための効率的な暗黙的ニューラル表現法である ESLAM を提案する。 ESLAMは、未知のカメラポーズでRGB-Dフレームを読み出し、シーン内の現在のカメラ位置を推定しながらシーン表現を漸進的に再構築する。ニューラルラジアンス場(NeRF)の最新の進歩をSLAMシステムに組み込んだ結果,高効率かつ高精度なビジュアルSLAM法が実現した。シーン表現は、連続空間の各点に対して、補間された特徴をTrncated Signed Distance Field (TSDF) と RGB の値にデコードする多重スケールの軸整列垂直特徴平面と浅いデコーダから構成される。 Replica、ScanNet、TUM RGB-Dの3つの標準データセットに対する広範な実験により、ESLAMは最先端の高密度視覚SLAM法の精度を50%以上向上する一方で、最大10倍高速で、事前トレーニングを必要としないことが示された。 We present ESLAM, an efficient implicit neural representation method for Simultaneous Localization and Mapping (SLAM). ESLAM reads RGB-D frames with unknown camera poses in a sequential manner and incrementally reconstructs the scene representation while estimating the current camera position in the scene. We incorporate the latest advances in Neural Radiance Fields (NeRF) into a SLAM system, resulting in an efficient and accurate dense visual SLAM method. Our scene representation consists of multi-scale axis-aligned perpendicular feature planes and shallow decoders that, for each point in the continuous space, decode the interpolated features into Truncated Signed Distance Field (TSDF) and RGB values. Our extensive experiments on three standard datasets, Replica, ScanNet, and TUM RGB-D show that ESLAM improves the accuracy of 3D reconstruction and camera localization of state-of-the-art dense visual SLAM methods by more than 50%, while it runs up to 10 times faster and does not require any pre-training.	翻訳日:2023-04-05 00:12:01 公開日:2023-04-03
# グローバル最適2D-3次元形状マッチングのための共役製品グラフ Conjugate Product Graphs for Globally Optimal 2D-3D Shape Matching ( http://arxiv.org/abs/2211.11589v2 ) ライセンス: Link先を確認	Paul Roetzer and Zorah L\"ahner and Florian Bernard	(参考訳) 2次元輪郭と3次元メッシュの連続的および非厳密なマッチングを求める問題を考察する。このような問題は、両方の形状の間の積グラフの最も短い経路を見つけることによって大域的最適性に解決できるが、既存の解は縮退した解を避けるために非現実的な事前仮定に強く依存している(例えば、2次元輪郭の各点が一致する3次元形状の領域の知識)。そこで本稿では,2次元輪郭と3次元形状の共役積グラフに基づく新しい2d-3次元形状マッチング形式を提案する。そうすることで、シングルエッジで定義されたコストとは対照的に、初めて高次のコスト、すなわちエッジチェーンで定義されるコストを考えることができます。これによって柔軟性が大幅に向上し、先に局所的な剛性を取り込むことができます。これにより, 1次元特徴記述子のみを用いても, 効率よく退化解を回避し, より滑らかで現実的なマッチングが得られる。提案手法は, グローバルに最適かつ連続的な2D-3Dマッチングを行い, 従来の手法と同じ漸近的複雑性を持ち, 形状マッチングの最先端結果を生成し, 部分形状のマッチングも可能である。私たちのコードは公開されている(https://github.com/paul0noah/sm-2d3d)。 We consider the problem of finding a continuous and non-rigid matching between a 2D contour and a 3D mesh. While such problems can be solved to global optimality by finding a shortest path in the product graph between both shapes, existing solutions heavily rely on unrealistic prior assumptions to avoid degenerate solutions (e.g. knowledge to which region of the 3D shape each point of the 2D contour is matched). To address this, we propose a novel 2D-3D shape matching formalism based on the conjugate product graph between the 2D contour and the 3D shape. Doing so allows us for the first time to consider higher-order costs, i.e. defined for edge chains, as opposed to costs defined for single edges. This offers substantially more flexibility, which we utilise to incorporate a local rigidity prior. By doing so, we effectively circumvent degenerate solutions and thereby obtain smoother and more realistic matchings, even when using only a one-dimensional feature descriptor. Overall, our method finds globally optimal and continuous 2D-3D matchings, has the same asymptotic complexity as previous solutions, produces state-of-the-art results for shape matching and is even capable of matching partial shapes. Our code is publicly available (https://github.com/paul0noah/sm-2D3D).	翻訳日:2023-04-05 00:11:39 公開日:2023-04-03
# RobustLoc:運転環境におけるロバストカメラポッドの回帰 RobustLoc: Robust Camera Pose Regression in Challenging Driving Environments ( http://arxiv.org/abs/2211.11238v3 ) ライセンス: Link先を確認	Sijie Wang, Qiyu Kang, Rui She, Wee Peng Tay, Andreas Hartmannsgruber, Diego Navarro Navarro	(参考訳) カメラのリローカライゼーションは自動運転に様々な応用がある。従来のカメラポーズ回帰モデルは、環境摂動がほとんどない理想的なシナリオのみを考える。季節, 天気, 照明, 不安定な物体の存在に変化をもたらす可能性のある運転環境に対処するため, ニューラル微分方程式からの摂動に対する頑健さを導出するRobostLocを提案する。本モデルでは,多視点画像から特徴地図を抽出する畳み込みニューラルネットワーク,インタラクティブに情報を拡散するロバストなニューラルネットワーク方程式拡散ブロックモジュール,多層トレーニングによる分岐ポーズデコーダを用いて車両のポーズ推定を行う。実験により、ロバストロックは現在の最先端カメラの回帰モデルを超え、様々な環境で堅牢な性能を達成することが示された。私たちのコードは、https://github.com/sijieaaa/RobustLocでリリースされています。 Camera relocalization has various applications in autonomous driving. Previous camera pose regression models consider only ideal scenarios where there is little environmental perturbation. To deal with challenging driving environments that may have changing seasons, weather, illumination, and the presence of unstable objects, we propose RobustLoc, which derives its robustness against perturbations from neural differential equations. Our model uses a convolutional neural network to extract feature maps from multi-view images, a robust neural differential equation diffusion block module to diffuse information interactively, and a branched pose decoder with multi-layer training to estimate the vehicle poses. Experiments demonstrate that RobustLoc surpasses current state-of-the-art camera pose regression models and achieves robust performance in various environments. Our code is released at: https://github.com/sijieaaa/RobustLoc	翻訳日:2023-04-05 00:11:13 公開日:2023-04-03
# 複数のイグジットが必要:Unified Vision Language Modelの高速化のための動的早期イグジット You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model ( http://arxiv.org/abs/2211.11152v2 ) ライセンス: Link先を確認	Shengkun Tang, Yaqing Wang, Zhenglun Kong, Tianchi Zhang, Yao Li, Caiwen Ding, Yanzhi Wang, Yi Liang, Dongkuan Xu	(参考訳) 大規模なトランスフォーマーモデルは、統一アーキテクチャによるダウンストリームビジョン言語タスクに大幅な改善をもたらす。性能改善はモデルサイズが向上し、推論速度が遅くなり、厳格化のコストが増大する。ある種の予測は大規模モデルの完全な複雑さから恩恵を受けるが、全ての入力が実行するのに同じ量の計算を必要とするわけではない。この課題に対処するために、入力複雑性の観点から計算パワーを適応的に割り当て、推論効率を向上させる早期退避を提案する。既存のアーリーエグジット戦略は、通常、中間層に基づく出力信頼度を入力複雑性のプロキシとして採用し、次の層をスキップするという決定を導き出す。しかし、エンコーダの出力信頼度推定が困難であるため、エンコーダとデコーダの両方で広く使われている統一アーキテクチャでは、このような戦略は適用できない。エンコーダコンポーネントの早期終了を無視する計算能力を省くという点では最適ではない。この課題に対処するために,エンコーダとデコーダの層を動的にスキップし,複数回の早期退避時間,すなわちtextbf{MuE} の入力層ワイド類似性を同時に行う,統一視覚言語モデルのための新しい早期退避戦略を提案する。エンコーダのイメージとテキストのモダリティを分解することで、muleは柔軟性があり、モダリティの観点から異なるレイヤをスキップでき、性能低下を最小限に抑えながら推論効率を向上できる。 SNLI-VEとMS COCOデータセットを用いた実験では,提案手法により予測推論時間を最大50\%,40\%まで短縮でき,それぞれ99\%,96\%の性能を維持した。 Large-scale Transformer models bring significant improvements for various downstream vision language tasks with a unified architecture. The performance improvements come with increasing model size, resulting in slow inference speed and increased cost for severing. While some certain predictions benefit from the full complexity of the large-scale model, not all of inputs need the same amount of computation to conduct, potentially leading to computation resource waste. To handle this challenge, early exiting is proposed to adaptively allocate computational power in term of input complexity to improve inference efficiency. The existing early exiting strategies usually adopt output confidence based on intermediate layers as a proxy of input complexity to incur the decision of skipping following layers. However, such strategies cannot apply to encoder in the widely-used unified architecture with both encoder and decoder due to difficulty of output confidence estimation in the encoder. It is suboptimal in term of saving computation power to ignore the early exiting in encoder component. To handle this challenge, we propose a novel early exiting strategy for unified visual language models, which allows dynamically skip the layers in encoder and decoder simultaneously in term of input layer-wise similarities with multiple times of early exiting, namely \textbf{MuE}. By decomposing the image and text modalities in the encoder, MuE is flexible and can skip different layers in term of modalities, advancing the inference efficiency while minimizing performance drop. Experiments on the SNLI-VE and MS COCO datasets show that the proposed approach MuE can reduce expected inference time by up to 50\% and 40\% while maintaining 99\% and 96\% performance respectively.	翻訳日:2023-04-05 00:10:32 公開日:2023-04-03
# Castling-ViT: 視覚変換器推論における線形角アテンションへの切り替えによる自己注意の圧縮 Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference ( http://arxiv.org/abs/2211.10526v2 ) ライセンス: Link先を確認	Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Lin	(参考訳) 視覚変換器(ViT)は優れた性能を示しているが、畳み込みニューラルネットワーク(CNN)と比較して高い計算コストを必要とする。既存の効率的なViTは局所的な注意(Swinなど)や線形的な注意(Performerなど)を採用しており、これはViTがグローバルまたはローカルなコンテキストをキャプチャする能力を犠牲にする。この研究において、vitsは、推論中により効率的でありながら、グローバルコンテキストとローカルコンテキストの両方を学ぶことができるか? そこで本稿では,VT を線形角注意とマスク付きソフトマックス2次注意の両方を用いて訓練する Castling-ViT というフレームワークを提案する。当社のcastling-vitは角カーネルを利用して,クエリとキーの類似度をスペクトル角で測定します。 And we further simplify it with two techniques: (1) a novel linear-angular attention mechanism: we decompose the angular kernels into linear terms and high-order residuals, and only keep the linear terms; and (2) we adopt two parameterized modules to approximate high-order residuals: a depthwise convolution and an auxiliary masked softmax attention to help learn both global and local information, where the masks for softmax attention are regularized to gradually become zeros and thus incur no overhead during ViT inference. 3つのタスクに関する広範な実験とアブレーションの研究は、提案するキャスティング・ヴィットの有効性を一貫して検証している。例えば、画像ネットの分類において最大1.8%の精度と40%のmacs削減を達成し、同等のフロップでcoco検出時の1.2倍のマップを、バニラソフトマックスに基づくvitsと比較した。 Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens. Existing efficient ViTs adopt local attention (e.g., Swin) or linear attention (e.g., Performer), which sacrifice ViTs' capabilities of capturing either global or local context. In this work, we ask an important research question: Can ViTs learn both global and local context while being more efficient during inference? To this end, we propose a framework called Castling-ViT, which trains ViTs using both linear-angular attention and masked softmax-based quadratic attention, but then switches to having only linear angular attention during ViT inference. Our Castling-ViT leverages angular kernels to measure the similarities between queries and keys via spectral angles. And we further simplify it with two techniques: (1) a novel linear-angular attention mechanism: we decompose the angular kernels into linear terms and high-order residuals, and only keep the linear terms; and (2) we adopt two parameterized modules to approximate high-order residuals: a depthwise convolution and an auxiliary masked softmax attention to help learn both global and local information, where the masks for softmax attention are regularized to gradually become zeros and thus incur no overhead during ViT inference. Extensive experiments and ablation studies on three tasks consistently validate the effectiveness of the proposed Castling-ViT, e.g., achieving up to a 1.8% higher accuracy or 40% MACs reduction on ImageNet classification and 1.2 higher mAP on COCO detection under comparable FLOPs, as compared to ViTs with vanilla softmax-based attentions.	翻訳日:2023-04-05 00:08:40 公開日:2023-04-03
# neurallift-360: 360{\deg}ビューで3dオブジェクトに2d写真を持ち上げる NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360{\deg} Views ( http://arxiv.org/abs/2211.16431v2 ) ライセンス: Link先を確認	Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Yi Wang, Zhangyang Wang	(参考訳) 仮想現実と拡張現実(XR)は、3Dコンテンツの需要を増大させる。しかし、高品質な3Dコンテンツを作成するには、人間の専門家がしなければならない面倒な作業が必要です。本研究では,1枚の画像を1枚の3Dオブジェクトに持ち上げるという課題について検討し,360{\deg}ビューを持つ可視3Dオブジェクトを与えられた参照画像とよく一致する形で生成できることを初めて実証する。参照画像に条件を付けることで,画像から物体の新しい視点を合成する,永遠の好奇心を満たすことができる。私たちの技術は、3DアーティストやXRデザイナーのワークフローを緩和する有望な方向性に光を当てています。我々は,NeuralLift-360という,深度認識型ニューラル放射率表現(NeRF)を利用した新しいフレームワークを提案する。我々のNeuralLift-360は、ランキングの損失を発生させることで、荒々しい深さを推定できる。また,コヒーレントガイダンスを提供する前に,CLIP誘導サンプリング戦略を採用した。大規模な実験により、我々のNeuralLift-360は既存の最先端のベースラインを大幅に上回っていることが示された。プロジェクトページ: https://vita-group.github.io/neurallift-360/ Virtual reality and augmented reality (XR) bring increasing demand for 3D content. However, creating high-quality 3D content requires tedious work that a human expert must do. In this work, we study the challenging task of lifting a single image to a 3D object and, for the first time, demonstrate the ability to generate a plausible 3D object with 360{\deg} views that correspond well with the given reference image. By conditioning on the reference image, our model can fulfill the everlasting curiosity for synthesizing novel views of objects from images. Our technique sheds light on a promising direction of easing the workflows for 3D artists and XR designers. We propose a novel framework, dubbed NeuralLift-360, that utilizes a depth-aware neural radiance representation (NeRF) and learns to craft the scene guided by denoising diffusion models. By introducing a ranking loss, our NeuralLift-360 can be guided with rough depth estimation in the wild. We also adopt a CLIP-guided sampling strategy for the diffusion prior to provide coherent guidance. Extensive experiments demonstrate that our NeuralLift-360 significantly outperforms existing state-of-the-art baselines. Project page: https://vita-group.github.io/NeuralLift-360/	翻訳日:2023-04-05 00:02:45 公開日:2023-04-03
# 教師なし画像セマンティックセグメンテーションにおけるアライメントと均一性の再考 Rethinking Alignment and Uniformity in Unsupervised Image Semantic Segmentation ( http://arxiv.org/abs/2211.14513v2 ) ライセンス: Link先を確認	Daoan Zhang, Chenming Li, Haoquan Li, Wenjian Huang, Lingyun Huang, Jianguo Zhang	(参考訳) 教師なし画像セマンティクスセグメンテーション(uiss)は、外部の監督なしに低レベルの視覚特徴と意味レベルの表現をマッチングすることを目的としている。本稿では,UISSモデルにおける特徴アライメントと特徴均一性の観点から,重要な特性について述べる。また,UISSと画像表現学習の比較を行った。本分析に基づき, 既存のMI法は表現崩壊に悩まされていると論じる。そこで,本稿では,意味的注意(semantic attention network,san)と呼ばれるロバストなネットワークを提案し,新たなモジュールである意味的注意(semantic attention,seat)を提案し,ピクセル毎および意味的特徴を動的に生成する。複数のセマンティクスセグメンテーションベンチマークの実験結果は、教師なしセグメンテーションフレームワークがセマンティクス表現のキャッチを専門としていることを示している。 Unsupervised image semantic segmentation(UISS) aims to match low-level visual features with semantic-level representations without outer supervision. In this paper, we address the critical properties from the view of feature alignments and feature uniformity for UISS models. We also make a comparison between UISS and image-wise representation learning. Based on the analysis, we argue that the existing MI-based methods in UISS suffer from representation collapse. By this, we proposed a robust network called Semantic Attention Network(SAN), in which a new module Semantic Attention(SEAT) is proposed to generate pixel-wise and semantic features dynamically. Experimental results on multiple semantic segmentation benchmarks show that our unsupervised segmentation framework specializes in catching semantic representations, which outperforms all the unpretrained and even several pretrained methods.	翻訳日:2023-04-05 00:01:42 公開日:2023-04-03
# イベントカメラのためのデータ駆動型特徴追跡 Data-driven Feature Tracking for Event Cameras ( http://arxiv.org/abs/2211.12826v2 ) ライセンス: Link先を確認	Nico Messikommer, Carter Fang, Mathias Gehrig, Davide Scaramuzza	(参考訳) 高時間分解能、動きのぼかしに対するレジリエンスの増大、そして非常に少ない出力のため、イベントカメラは挑戦的なシナリオであっても低レイテンシで低帯域幅の特徴追跡に最適であることが示されている。既存のイベントカメラの特徴追跡手法は手作りか第一原理から派生しているが、広範なパラメータチューニングが必要であり、ノイズに敏感であり、非モデル化効果のために異なるシナリオに一般化しない。これらの欠陥に対処するために、グレースケールフレームで検出された特徴を追跡するために、低レイテンシイベントを活用するイベントカメラ用の最初のデータ駆動機能トラッカーを導入する。特徴トラック間で情報を共有する新しいフレームアテンションモジュールにより,ロバストな性能を実現する。合成データから実データへのゼロショットを直接転送することで、データ駆動トラッカーは、相対的特徴年齢における既存のアプローチを最大120%上回り、低レイテンシを実現する。この性能ギャップはさらに130%増加し、トラッカーを新たな自己超越戦略で実データに適用する。 Because of their high temporal resolution, increased resilience to motion blur, and very sparse output, event cameras have been shown to be ideal for low-latency and low-bandwidth feature tracking, even in challenging scenarios. Existing feature tracking methods for event cameras are either handcrafted or derived from first principles but require extensive parameter tuning, are sensitive to noise, and do not generalize to different scenarios due to unmodeled effects. To tackle these deficiencies, we introduce the first data-driven feature tracker for event cameras, which leverages low-latency events to track features detected in a grayscale frame. We achieve robust performance via a novel frame attention module, which shares information across feature tracks. By directly transferring zero-shot from synthetic to real data, our data-driven tracker outperforms existing approaches in relative feature age by up to 120% while also achieving the lowest latency. This performance gap is further increased to 130% by adapting our tracker to real data with a novel self-supervision strategy.	翻訳日:2023-04-04 23:59:39 公開日:2023-04-03
# ResFormer:マルチリゾリューショントレーニングによるViTのスケーリング ResFormer: Scaling ViTs with Multi-Resolution Training ( http://arxiv.org/abs/2212.00776v2 ) ライセンス: Link先を確認	Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang	(参考訳) 視覚トランスフォーマー(vits)は圧倒的な成功を収めているが、それらは脆弱な解像度のスケーラビリティ、すなわち、トレーニング中に目に見えない入力解像度が提示されると、パフォーマンスが大幅に低下する。 resformerはマルチレゾリューショントレーニングという独創的なアイデアに基づいて構築されたフレームワークで、幅広い範囲(ほとんど見えない)のテスト解像度のパフォーマンス向上を目的としています。特に、resformerは異なる解像度の複製された画像を操作し、異なるスケールでインタラクティブな情報を扱うためにスケール一貫性の損失を強制する。さらに,様々な解像度,特に新しい解像度を効果的に交互にテストするために,入力サイズに応じてスムーズに変化するグローバルローカルな位置埋め込み戦略を提案する。 ImageNet上で画像分類のための広範な実験を行う。この結果は、resformerが幅広い解像度に向けたスケーリング能力を持っているという強力な定量的証拠を提供する。例えば、ResFormer-B-MRは、比較的低解像度と高解像度(96と640)で評価すると、Top-1の精度が75.86%と81.72%に達する(DeiT-Bより48%と7.49%良い)。また,resformerは柔軟であり,意味セグメンテーション,オブジェクト検出,ビデオアクション認識にも容易に拡張できることを示す。コードはhttps://github.com/ruitian12/resformerで入手できる。 Vision Transformers (ViTs) have achieved overwhelming success, yet they suffer from vulnerable resolution scalability, i.e., the performance drops drastically when presented with input resolutions that are unseen during training. We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions. In particular, ResFormer operates on replicated images of different resolutions and enforces a scale consistency loss to engage interactive information across different scales. More importantly, to alternate among varying resolutions effectively, especially novel ones in testing, we propose a global-local positional embedding strategy that changes smoothly conditioned on input sizes. We conduct extensive experiments for image classification on ImageNet. The results provide strong quantitative evidence that ResFormer has promising scaling abilities towards a wide range of resolutions. For instance, ResFormer-B-MR achieves a Top-1 accuracy of 75.86% and 81.72% when evaluated on relatively low and high resolutions respectively (i.e., 96 and 640), which are 48% and 7.49% better than DeiT-B. We also demonstrate, moreover, ResFormer is flexible and can be easily extended to semantic segmentation, object detection and video action recognition. Code is available at https://github.com/ruitian12/resformer.	翻訳日:2023-04-04 23:51:43 公開日:2023-04-03
# 流れの正常化 Taming Normalizing Flows ( http://arxiv.org/abs/2211.16488v2 ) ライセンス: Link先を確認	Shimon Malnick, Shai Avidan, Ohad Fried	(参考訳) フローモデルの正規化 - モデルが特定の画像や画像カテゴリを生成する確率を変化させるアルゴリズムを提案する。与えられた画像の正確な生成確率を計算できるので、フローの正規化にフォーカスする。我々は、多くの興味深いプライバシーとバイアスを考慮したサブドメインである人間の顔を生成するモデルを用いて、改ざんを実証する。本手法は,プライバシの文脈,例えば,モデルの出力から特定の人物を取り除いたり,特定の対象分布に応じて特定の画像カテゴリを出力させたりすることで、デバイアスの文脈で利用することができる。モデリングは、モデルをスクラッチからトレーニングすることなく、高速な微調整プロセスで達成され、数分で目標を達成する。提案手法を定性的かつ定量的に評価し, 所望の変化を適用しつつ, 生成品質が持続することを示す。 We propose an algorithm for taming Normalizing Flow models - changing the probability that the model will produce a specific image or image category. We focus on Normalizing Flows because they can calculate the exact generation probability likelihood for a given image. We demonstrate taming using models that generate human faces, a subdomain with many interesting privacy and bias considerations. Our method can be used in the context of privacy, e.g., removing a specific person from the output of a model, and also in the context of debiasing by forcing a model to output specific image categories according to a given target distribution. Taming is achieved with a fast fine-tuning process without retraining the model from scratch, achieving the goal in a matter of minutes. We evaluate our method qualitatively and quantitatively, showing that the generation quality remains intact, while the desired changes are applied.	翻訳日:2023-04-04 23:49:26 公開日:2023-04-03
# Biomarker Activation Mapによる糖尿病網膜症の診断 Interpretable Diabetic Retinopathy Diagnosis based on Biomarker Activation Map ( http://arxiv.org/abs/2212.06299v2 ) ライセンス: Link先を確認	Pengxiao Zang, Tristan T. Hormel, Jie Wang, Yukun Guo, Steven T. Bailey, Christina J. Flaxel, David Huang, Thomas S. Hwang, and Yali Jia	(参考訳) 深層学習分類器は、光学コヒーレンス断層撮影(oct)とその血管造影(octa)に基づいて糖尿病網膜症(dr)を自動的に診断する最も正確な手段を提供する。これらのモデルのパワーは、部分的には、望ましいタスクを達成するのに必要な複雑さを提供する隠されたレイヤを含めることに起因する。しかし、隠れた層はアルゴリズムの出力を解釈しにくくする。本稿では, 臨床医が分類器の意思決定を検証・理解するための, 生成的敵対学習に基づく新しいバイオマーカー活性化マップ(BAM)フレームワークを提案する。 456個の黄斑スキャンを含むデータセットを、現在の臨床基準に基づいて非参照型または参照型DRとして評価した。 BAMを評価するのに使われたDR分類器は、このデータセットに基づいて最初に訓練された。 BAM生成フレームワークは、2つのU字型ジェネレータを組み合わせて設計され、この分類器に意味のある解釈性を提供する。メインジェネレータは、参照可能なスキャンを入力として取り、分類器によって非参照可能な出力を生成するように訓練された。次に、bamを主発電機の出力と入力との差分画像として構成する。 BAMが分類器を利用したバイオマーカーのみを強調するようにするために、アシスタントジェネレータは反対に行うように訓練され、参照できないスキャンから分類器によって参照可能なスキャンを生成する。生成したBAMは非灌流領域や網膜液を含む既知の病態の特徴を強調した。これらのハイライトに基づいて完全に解釈可能な分類器は、臨床医が自動DR診断をよりよく活用し、検証するのに役立ちます。 Deep learning classifiers provide the most accurate means of automatically diagnosing diabetic retinopathy (DR) based on optical coherence tomography (OCT) and its angiography (OCTA). The power of these models is attributable in part to the inclusion of hidden layers that provide the complexity required to achieve a desired task. However, hidden layers also render algorithm outputs difficult to interpret. Here we introduce a novel biomarker activation map (BAM) framework based on generative adversarial learning that allows clinicians to verify and understand classifiers decision-making. A data set including 456 macular scans were graded as non-referable or referable DR based on current clinical standards. A DR classifier that was used to evaluate our BAM was first trained based on this data set. The BAM generation framework was designed by combing two U-shaped generators to provide meaningful interpretability to this classifier. The main generator was trained to take referable scans as input and produce an output that would be classified by the classifier as non-referable. The BAM is then constructed as the difference image between the output and input of the main generator. To ensure that the BAM only highlights classifier-utilized biomarkers an assistant generator was trained to do the opposite, producing scans that would be classified as referable by the classifier from non-referable scans. The generated BAMs highlighted known pathologic features including nonperfusion area and retinal fluid. A fully interpretable classifier based on these highlights could help clinicians better utilize and verify automated DR diagnosis.	翻訳日:2023-04-04 23:44:17 公開日:2023-04-03
# 検出選択アルゴリズム : 物体検出のためのポスト処理を行う確率ベース最適化手法 Detection Selection Algorithm: A Likelihood based Optimization Method to Perform Post Processing for Object Detection ( http://arxiv.org/abs/2212.05706v2 ) ライセンス: Link先を確認	Angzhi Fan, Benjamin Ticknor and Yali Amit	(参考訳) 物体検出では、非最大抑圧(NMS)のような後処理法が広く用いられている。 NMSは偽陽性の検出回数を大幅に減らすことができるが、目標値の低いいくつかの検出を維持できる可能性がある。画像中のオブジェクトとそのラベルの正確な数を求めるため,NMSや関連手法の後に使用される検出選択アルゴリズム(DSA)と呼ばれるポスト処理手法を提案する。 DSAは検出されたバウンディングボックスのサブセットを優雅に選択し、オブジェクトの閉塞を考慮した画像全体の解釈を最も高い確率で行う完全なオブジェクト再構成を行う。アルゴリズムは4つの要素からなる。まず、オブジェクト間の閉塞関係を得るために、より高速なR-CNNに閉塞分岐を追加する。第2に,我々がデコーダと呼ぶ訓練済み生成ネットワークの潜在変数の最適化に基づいて,その可視部分から物体全体の外観を再構築できる単一再構成アルゴリズムを開発した。第3に, 咬合順序を考慮した仮説的解釈により, 全物体の同時再構成を行う全再構成アルゴリズムを提案する。最後に,リストから検出を漸進的に追加または削除し,対応する解釈の可能性を最大化する欲望アルゴリズムを提案する。 NMS や Soft-NMS を用いた DSA は NMS や Soft-NMS よりも優れた結果が得られる。 In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.	翻訳日:2023-04-04 23:42:56 公開日:2023-04-03
# イベントカメラを用いた物体検出用リカレントビジョントランス Recurrent Vision Transformers for Object Detection with Event Cameras ( http://arxiv.org/abs/2212.05598v2 ) ライセンス: Link先を確認	Mathias Gehrig and Davide Scaramuzza	(参考訳) イベントカメラを用いた物体検出のための新しいバックボーンであるリカレントビジョントランス (RVT) を提案する。イベントカメラは、高ダイナミックレンジでミリ秒以下のレイテンシで視覚情報を提供する。これらのユニークな特性は、時間クリティカルなシナリオにおける低レイテンシオブジェクトの検出と追跡に大きな可能性を提供します。イベントベースのビジョンでの以前の作業は、優れた検出性能を達成しているが、実質的な推論時間(通常は40ミリ秒以上)のコストで達成されている。リカレントビジョンバックボーンのハイレベルな設計を再検討することにより、同様のパフォーマンスを維持しつつ推論時間を6倍に短縮する。これを実現するために,各段階において3つの重要な概念,すなわち条件付き位置埋め込みと見なすことができる畳み込み前処理を用いる多段階設計を探索する。第2に、局所的および拡張的グローバル自己注意による空間的特徴の相互作用第3に、時間情報を保持しながらレイテンシを最小限に抑えるために、繰り返し時間的特徴集約。 RVTは、Gen1オートマチックデータセット上で47.2%のmAPを達成するイベントベースのオブジェクト検出において、最先端のパフォーマンスに到達するために、ゼロからトレーニングすることができる。同時に、RVTは高速な推論(T4 GPU上では12ミリ秒)と好ましいパラメータ効率(先行技術より5倍少ない)を提供する。私たちの研究は、イベントベースのビジョンを超えた研究に役立ち得る効果的な設計選択に対する新たな洞察をもたらします。 We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras. Event cameras provide visual information with sub-millisecond latency at a high-dynamic range and with strong robustness against motion blur. These unique properties offer great potential for low-latency object detection and tracking in time-critical scenarios. Prior work in event-based vision has achieved outstanding detection performance but at the cost of substantial inference time, typically beyond 40 milliseconds. By revisiting the high-level design of recurrent vision backbones, we reduce inference time by a factor of 6 while retaining similar performance. To achieve this, we explore a multi-stage design that utilizes three key concepts in each stage: First, a convolutional prior that can be regarded as a conditional positional embedding. Second, local and dilated global self-attention for spatial feature interaction. Third, recurrent temporal feature aggregation to minimize latency while retaining temporal information. RVTs can be trained from scratch to reach state-of-the-art performance on event-based object detection - achieving an mAP of 47.2% on the Gen1 automotive dataset. At the same time, RVTs offer fast inference (<12 ms on a T4 GPU) and favorable parameter efficiency (5 times fewer than prior art). Our study brings new insights into effective design choices that can be fruitful for research beyond event-based vision.	翻訳日:2023-04-04 23:42:31 公開日:2023-04-03
# REVEAL:マルチソースマルチモーダル知識メモリによる検索拡張ビジュアルランゲージ事前学習 REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory ( http://arxiv.org/abs/2212.05221v2 ) ライセンス: Link先を確認	Ziniu Hu and Ahmet Iscen and Chen Sun and Zirui Wang and Kai-Wei Chang and Yizhou Sun and Cordelia Schmid and David A. Ross and Alireza Fathi	(参考訳) 本稿では,世界の知識を大規模メモリにエンコードし,知識集約型クエリに答えるために,エンド・ツー・エンドで検索可能なビジュアル言語モデル(reveal)を提案する。 REVEALは、メモリ、エンコーダ、レシーバー、ジェネレータの4つのキーコンポーネントで構成されている。大規模メモリは、統一エンコーダを介して多様世界知識(画像テキストペア、質問応答ペア、知識グラフトリプレットなど)の様々なソースを符号化する。取得者はメモリ内の最も関連する知識エントリを見つけ、取得した知識と入力クエリを融合して出力を生成する。このアプローチの重要な特徴は、メモリ、エンコーダ、レトリバー、ジェネレータはすべて、大量のデータに対して、エンドツーエンドで事前訓練されていることです。さらに,本手法では多様なマルチモーダル・ナレッジ・ソースを利用できるため,大きな利得が得られている。本稿では,REVEALが視覚的質問応答と画像キャプションの最先端化を実現していることを示す。 In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that learns to encode world knowledge into a large-scale memory, and to retrieve from it to answer knowledge-intensive queries. REVEAL consists of four key components: the memory, the encoder, the retriever and the generator. The large-scale memory encodes various sources of multimodal world knowledge (e.g. image-text pairs, question answering pairs, knowledge graph triplets, etc) via a unified encoder. The retriever finds the most relevant knowledge entries in the memory, and the generator fuses the retrieved knowledge with the input query to produce the output. A key novelty in our approach is that the memory, encoder, retriever and generator are all pre-trained end-to-end on a massive amount of data. Furthermore, our approach can use a diverse set of multimodal knowledge sources, which is shown to result in significant gains. We show that REVEAL achieves state-of-the-art results on visual question answering and image captioning.	翻訳日:2023-04-04 23:42:10 公開日:2023-04-03
# cepha29: 自動脳波ランドマーク検出チャレンジ2023 CEPHA29: Automatic Cephalometric Landmark Detection Challenge 2023 ( http://arxiv.org/abs/2212.04808v2 ) ライセンス: Link先を確認	Muhammad Anwaar Khalid, Kanwal Zulfiqar, Ulfat Bashir, Areeba Shaheen, Rida Iqbal, Zarnab Rizwan, Ghina Rizwan, Muhammad Moazam Fraz	(参考訳) 定量的脳計測分析は、現代の矯正治療において最も広く用いられている臨床および研究ツールである。脳波ランドマークの正確な位置決定は解剖学的異常の定量化と分類を可能にするが、これらのランドマークをマークする従来の手作業は非常に退屈な作業である。自動頭蓋計測による目印検出システムの開発は、常に行われているが、矯正治療には不十分である。基本的な理由は、これらのデータセットでトレーニング用に提供される画像だけでなく、公開されているデータセットの量は、aiモデルがうまく機能しないためである。形態計測解析のための堅牢なAIソリューションの開発を容易にするため, IEEE International Symposium on Biomedical Imaging (ISBI 2023) と共同で, CEPHA29 Automatic Cephalometric Landmark Detection Challengeを開催する。この文脈では、1000個の頭部X線画像からなる、最も広く公開されているデータセットを提供する。我々は、私たちの挑戦が、自動頭脳計測のランドマーク識別の研究と革新を先導するだけでなく、この分野の新しい時代の始まりを示唆することを期待している。 Quantitative cephalometric analysis is the most widely used clinical and research tool in modern orthodontics. Accurate localization of cephalometric landmarks enables the quantification and classification of anatomical abnormalities, however, the traditional manual way of marking these landmarks is a very tedious job. Endeavours have constantly been made to develop automated cephalometric landmark detection systems but they are inadequate for orthodontic applications. The fundamental reason for this is that the amount of publicly available datasets as well as the images provided for training in these datasets are insufficient for an AI model to perform well. To facilitate the development of robust AI solutions for morphometric analysis, we organise the CEPHA29 Automatic Cephalometric Landmark Detection Challenge in conjunction with IEEE International Symposium on Biomedical Imaging (ISBI 2023). In this context, we provide the largest known publicly available dataset, consisting of 1000 cephalometric X-ray images. We hope that our challenge will not only derive forward research and innovation in automatic cephalometric landmark identification but will also signal the beginning of a new era in the discipline.	翻訳日:2023-04-04 23:41:52 公開日:2023-04-03
# Genie: 量子化のデータを見せてください Genie: Show Me the Data for Quantization ( http://arxiv.org/abs/2212.04780v2 ) ライセンス: Link先を確認	Yongkweon Jeon, Chungman Lee, Ho-young Kim	(参考訳) ゼロショット量子化は、プライバシに関連するコストや問題など、さまざまな理由からデータがアクセスできない場合に、軽量なディープニューラルネットワークを開発する上で有望なアプローチである。 FP32事前学習モデルにおけるバッチ正規化層の学習パラメータ($\mu$と$\sigma$)を利用することで、ゼロショット量子化スキームは合成データの生成に焦点を当てる。その後、事前学習されたモデル(教師)から量子化モデル(学生)への知識を蒸留し、量子化モデルに合成データセットを最適化する。しかし、これまでのゼロショット量子化は、タスク固有の損失と長期最適化を必要とする量子化対応トレーニング手法の文脈で主に議論されてきた。そこで我々は,高品質な量子化ネットワークを数時間で生成するゼロショット量子化のための後学習量子化方式を提案する。さらに,量子化に適したデータを生成する \genie~というフレームワークを提案する。 Genieによって合成されたデータにより、実際のデータセットを使わずに堅牢な量子化モデルを作成できる。また,学習後の量子化アルゴリズムを提案し,量子化モデルの性能を向上させる。これらを組み合わせることで、ゼロショットと少数ショットの量子化のギャップを埋めることができ、既存のアプローチと比べて量子化性能を著しく改善することができる。言い換えれば、ユニークな最先端ゼロショット量子化アプローチを得ることができる。 Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By exploiting the learned parameters ($\mu$ and $\sigma$) of batch normalization layers in an FP32-pre-trained model, zero-shot quantization schemes focus on generating synthetic data. Subsequently, they distill knowledge from the pre-trained model (teacher) to the quantized model (student) such that the quantized model can be optimized with the synthetic dataset. However, thus far, zero-shot quantization has primarily been discussed in the context of quantization-aware training methods, which require task-specific losses and long-term optimization as much as retraining. We thus introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours. Furthermore, we propose a framework called \genie~that generates data suited for quantization. With the data synthesized by Genie, we can produce robust quantized models without real datasets, which is comparable to few-shot quantization. We also propose a post-training quantization algorithm to enhance the performance of quantized models. By combining them, we can bridge the gap between zero-shot and few-shot quantization while significantly improving the quantization performance compared to that of existing approaches. In other words, we can obtain a unique state-of-the-art zero-shot quantization approach.	翻訳日:2023-04-04 23:41:32 公開日:2023-04-03
# ニューラルネットワークモデルにおけるモンタギュー意味論と修飾子一貫性測定 Montague semantics and modifier consistency measurement in neural language models ( http://arxiv.org/abs/2212.04310v2 ) ライセンス: Link先を確認	Danilo S. Carvalho, Edoardo Manino, Julia Rozanova, Lucas Cordeiro, Andr\'e Freitas	(参考訳) 近年,分布型言語表現モデルは非常に実践的な成功を収めている。同時に、解釈可能性の必要性は、固有の特性と能力に関する疑問を提起している。重要なことに、分布モデルはしばしば自然言語の合成現象を扱う際に矛盾し、それがそれらの安全性と公正性に重大な影響を及ぼす。それにもかかわらず、構成性に関する最近の研究は、類似性タスクのみの性能を改善することを目的としている。本研究は異なるアプローチを採り、現代言語モデルにおける構成行動を測定する手法を提案する。具体的には,形容詞・名詞句における形容詞修飾現象に注目した。モンタギュー意味論に触発された作曲行動の3つの新しいテストを紹介する。実験の結果,現在のニューラルランゲージモデルは,期待される言語理論に限定して振る舞うことが示された。このことは、これらの言語モデルが私たちが評価した意味的性質を捉えられないのか、あるいはモンゴルの伝統からの言語理論が分布モデルの期待する能力と一致しないのかという疑問を提起する。 In recent years, distributional language representation models have demonstrated great practical success. At the same time, the need for interpretability has elicited questions on their intrinsic properties and capabilities. Crucially, distributional models are often inconsistent when dealing with compositional phenomena in natural language, which has significant implications for their safety and fairness. Despite this, most current research on compositionality is directed towards improving their performance on similarity tasks only. This work takes a different approach, and proposes a methodology for measuring compositional behavior in contemporary language models. Specifically, we focus on adjectival modifier phenomena in adjective-noun phrases. We introduce three novel tests of compositional behavior inspired by Montague semantics. Our experimental results indicate that current neural language models behave according to the expected linguistic theories to a limited extent only. This raises the question of whether these language models are not able to capture the semantic properties we evaluated, or whether linguistic theories from Montagovian tradition would not match the expected capabilities of distributional models.	翻訳日:2023-04-04 23:40:39 公開日:2023-04-03
# FunkNN: 機能生成のための神経補間 FunkNN: Neural Interpolation for Functional Generation ( http://arxiv.org/abs/2212.14042v2 ) ライセンス: Link先を確認	AmirEhsan Khorashadizadeh, Anadi Chaman, Valentin Debarnot, Ivan Dokmani\'c	(参考訳) スケールをまたいで一般化し、任意の座標で評価し、正確な微分の計算を認め、概念的に単純である連続生成モデルを構築することができるか? 既存のMLPベースのアーキテクチャは、良好な畳み込み誘導バイアスを持つグリッドベースのジェネレータよりも悪いサンプルを生成する。異なるスケールで画像を生成することに焦点を当てたモデルの方が優れているが、画像やデリバティブの継続的な評価のために設計されていない複雑なアーキテクチャを採用する。信号処理の観点から、サンプルからの補間として連続画像生成を扱う。実際、正しくサンプリングされた離散画像は、低空間周波数に関する全ての情報を含んでいる。問題は、上記の設計基準を満たしながら、データ駆動方式でスペクトルを外挿する方法である。われわれの答えはfunknn ― 任意の座標で連続画像を再構築する方法を学び、任意の画像データセットに適用できる新しい畳み込みネットワーク。離散生成モデルと組み合わさって、連続的な不正な逆問題に先行して作用する関数生成器となる。 funknnは高品質な連続画像を生成し,パッチベースの設計により,高い分散性能を示す。さらに,空間的微分を持つ数種類のスタイリッシュな逆問題において,その性能を示す。 Can we build continuous generative models which generalize across scales, can be evaluated at any coordinate, admit calculation of exact derivatives, and are conceptually simple? Existing MLP-based architectures generate worse samples than the grid-based generators with favorable convolutional inductive biases. Models that focus on generating images at different scales do better, but employ complex architectures not designed for continuous evaluation of images and derivatives. We take a signal-processing perspective and treat continuous image generation as interpolation from samples. Indeed, correctly sampled discrete images contain all information about the low spatial frequencies. The question is then how to extrapolate the spectrum in a data-driven way while meeting the above design criteria. Our answer is FunkNN -- a new convolutional network which learns how to reconstruct continuous images at arbitrary coordinates and can be applied to any image dataset. Combined with a discrete generative model it becomes a functional generator which can act as a prior in continuous ill-posed inverse problems. We show that FunkNN generates high-quality continuous images and exhibits strong out-of-distribution performance thanks to its patch-based design. We further showcase its performance in several stylized inverse problems with exact spatial derivatives.	翻訳日:2023-04-04 23:33:02 公開日:2023-04-03
# タイムゲート光子検出による非ガウス状態生成 Non-Gaussian state generation with time-gated photon detection ( http://arxiv.org/abs/2212.13335v2 ) ライセンス: Link先を確認	Tatsuki Sonoyama, Kazuma Takahashi, Baramee Charoensombutamon, Sachiko Takasu, Kaori Hattori, Daiji Fukuda, Kosuke Fukui, Kan Takase, Warit Asavanant, Jun-ichi Yoshikawa, Mamoru Endo, Akira Furusawa	(参考訳) フォールトトレラントで普遍的な光学量子計算に必須である非ガウス状態は、一般的に光子検出器を用いたヘラルドスキームによって生成される。近年,光子検出器の大きなタイミングジッタが生成する非ガウス状態 [t] の純度を低下させることが理論的に示されている。ソノヤマ、$\textit{et al}$。 Phys。 rev. a $\textbf{105}$, 043714 (2022)]。本研究では, 時間差光子検出により, ウィグナー負性を持つ非ガウス状態を生成する。我々は,50 nsから10 nsまでの遷移エッジセンサに基づく光子数分解検出器のタイミングジッタを効果的に改善するために,タイムゲーティングに高速光スイッチを用いる。その結果、時間ゲート光子検出法なしでは観測できないウィグナー負性$-0.011\pm 0.004$の非ガウス状態を生成する。これらの結果は,非ガウシアン状態生成に対するタイミングジッタの効果を初めて実験的に確認し,高純度非ガウシアン状態生成の有望な方法を提供する。 Non-Gaussian states of light, which are essential in fault-tolerant and universal optical quantum computation, are typically generated by a heralding scheme using photon detectors. Recently, it is theoretically shown that the large timing jitter of the photon detectors deteriorates the purity of the generated non-Gaussian states [T. Sonoyama, $\textit{et al}$., Phys. Rev. A $\textbf{105}$, 043714 (2022)]. In this study, we generate non-Gaussian states with Wigner negativity by time-gated photon detection. We use a fast optical switch for time gating to effectively improve the timing jitter of a photon-number-resolving detector based on transition edge sensor from 50 ns to 10 ns. As a result, we generate non-Gaussian states with Wigner negativity of $-0.011\pm 0.004$, which cannot be observed without the time-gated photon detection method. These results confirm the effect of the timing jitter on non-Gaussian state generation experimentally for the first time and provide the promising method of high-purity non-Gaussian state generation.	翻訳日:2023-04-04 23:32:44 公開日:2023-04-03
# 物理インフォームドガウス過程回帰は線形PDE解を一般化する Physics-Informed Gaussian Process Regression Generalizes Linear PDE Solvers ( http://arxiv.org/abs/2212.12474v3 ) ライセンス: Link先を確認	Marvin Pf\"ortner and Ingo Steinwart and Philipp Hennig and Jonathan Wenger	(参考訳) 線形偏微分方程式(英: Linear partial differential equation, PDEs)は、熱伝達、電磁気、波動伝播などの物理過程を記述する重要な力学モデルのクラスである。実際には、離散化に基づく特殊数値法を用いてPDEを解く。一般に、未知のモデルパラメータの見積もりと、可能であれば初期化の物理的測定を用いる。このような解法はしばしば下流の応用でより大きな科学的モデルに埋め込まれ、エラー定量化が重要な役割を果たす。しかし、パラメータや測定の不確かさを無視することで、古典的なPDEソルバはその固有近似誤差の一貫した推定を導出できない可能性がある。本研究では、線形PDEを物理インフォームドガウス過程(GP)回帰として解釈することで、この問題を原理的にアプローチする。我々のフレームワークは、任意の有界線型作用素による観測に対するガウス過程推論定理の鍵となる一般化に基づいている。この確率論的視点は、(1)固有の離散化誤差の定量化、(2)モデルパラメータの不確かさを解に伝播させ、(3)ノイズ測定の条件を与える。この定式化の強さを実証し、重み付け残差法、コロケーション、有限体積、擬スペクトル、および有限要素法やスペクトル法のような(一般化)ガレルキン法を含むPDEソルバの中心クラスを厳密に一般化することを証明する。したがって、このクラスは構造化誤差推定を直接装備することができる。要約すると, 数値解析とベイズ推定の境界を曖昧にすることで, モジュラービルディングブロックとしての機械モデルと確率モデルとのシームレスな統合が可能となる。 Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models with a downstream application and thus error quantification plays a key role. However, by ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of the Gaussian process inference theorem to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models by blurring the boundaries between numerical analysis and Bayesian inference.	翻訳日:2023-04-04 23:32:03 公開日:2023-04-03
# VolRecon: 一般化可能な多視点再構成のための符号付き距離関数のボリュームレンダリング VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction ( http://arxiv.org/abs/2212.08067v2 ) ライセンス: Link先を確認	Yufan Ren, Fangjinhua Wang, Tong Zhang, Marc Pollefeys and Sabine S\"usstrunk	(参考訳) ニューラル・ラジアンス・フィールド(NeRF)が新しいビュー合成で成功し、研究者はニューラル・暗黙のシーン再構成を提案するようになった。しかし、既存のほとんどの暗黙的再構成手法はシーンごとのパラメータを最適化し、新しいシーンへの一般化性に欠ける。本稿では,SRDF(Signed Ray Distance Function)を用いた新しい一般化可能な暗黙的再構成手法であるVolReconを紹介する。細かいディテールとノイズが少ないシーンを再構築するために、volreconはマルチビュー特徴から集約された投影特徴と、粗いグローバル特徴量から補間されたボリューム特徴を組み合わせる。放射光変換器を用いて試料点のSRDF値を算出し,色と深さを描画する。 DTUデータセットでは、VolReconはスパースビュー再構築においてSparseNeuSを約30%上回り、フルビュー再構築においてMVSNetと同等の精度を達成する。さらに,提案手法は大規模ETH3Dベンチマークにおいて優れた一般化性能を示す。 The success of the Neural Radiance Fields (NeRF) in novel view synthesis has inspired researchers to propose neural implicit scene reconstruction. However, most existing neural implicit reconstruction methods optimize per-scene parameters and therefore lack generalizability to new scenes. We introduce VolRecon, a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF). To reconstruct the scene with fine details and little noise, VolRecon combines projection features aggregated from multi-view features, and volume features interpolated from a coarse global feature volume. Using a ray transformer, we compute SRDF values of sampled points on a ray and then render color and depth. On DTU dataset, VolRecon outperforms SparseNeuS by about 30% in sparse view reconstruction and achieves comparable accuracy as MVSNet in full view reconstruction. Furthermore, our approach exhibits good generalization performance on the large-scale ETH3D benchmark.	翻訳日:2023-04-04 23:31:12 公開日:2023-04-03
# ANNNIモデルにおける量子クエンチの簡単な理論 A simple theory for quantum quenches in the ANNNI model ( http://arxiv.org/abs/2301.04070v2 ) ライセンス: Link先を確認	Jacob H. Robertson, Riccardo Senese and Fabian H. L. Essler	(参考訳) Haldar et al. (Phys. X 11, 031062) による最近の数値研究において、近位量子臨界点のシグネチャは特定の量子クエンチの後に早期および中間の時間で観測できることが示されている。この研究は、主に軸方向のnext-nearest nearby ising(annni)モデルに焦点をあてた。ここでは単純な時間依存平均場理論を構築し,これらのクエンチの定量的な記述を短時間で得られるようにした。本手法は, 量子臨界点検出におけるクエンチダイナミクスによる基本的な限界に加えて, 報告された数値結果を理解するための簡単な枠組みを提供する。さらに,長期間の有界状態の形成から生じる様々な観測物に見られる特異な振動挙動の起源を説明する。 In a recent numerical study by Haldar et al. (Phys. Rev. X 11, 031062) it was shown that signatures of proximate quantum critical points can be observed at early and intermediate times after certain quantum quenches. Said work focused mainly on the case of the axial next-nearest neighbour Ising (ANNNI) model. Here we construct a simple time-dependent mean-field theory that allows us to obtain a quantitatively accurate description of these quenches at short times, which for reasons we explain remains a fair approximation at late times (with some caveats). Our approach provides a simple framework for understanding the reported numerical results as well as fundamental limitations on detecting quantum critical points through quench dynamics. We moreover explain the origin of the peculiar oscillatory behaviour seen in various observables as arising from the formation of a long-lived bound state.	翻訳日:2023-04-04 23:24:09 公開日:2023-04-03
# 意味マッチングとエッジアライメントを用いた光リモートセンシング画像における軽量サルエント物体検出 Lightweight Salient Object Detection in Optical Remote-Sensing Images via Semantic Matching and Edge Alignment ( http://arxiv.org/abs/2301.02778v2 ) ライセンス: Link先を確認	Gongyang Li, Zhi Liu, Xinpeng Zhang, Weisi Lin	(参考訳) 近年,畳み込みニューラルネットワーク(cnns)に依存する光リモートセンシング画像(ori-sod)における物体検出手法が数多く提案されている。しかし、ほとんどの手法はcnnがもたらした膨大なパラメータと計算コストを無視しており、可搬性と移動性に注意を払うのはごくわずかである。本稿では,セマンティックマッチングとエッジアライメントに基づくORSI-SODのための新しい軽量ネットワークSeaNetを提案する。具体的には、機能抽出のための軽量MobileNet-V2、高レベルの機能のための動的セマンティックマッチングモジュール(DSMM)、低レベルの機能のためのエッジ自己調整モジュール(ESAM)、推論のためのポータブルデコーダを含む。まず、高レベルの機能はセマンティックカーネルに圧縮される。次に,DSMMの動的畳み込み操作により,高次特徴の2つのグループにおける有能なオブジェクト位置を活性化する。一方,ESAMでは,低レベル特徴群2群から抽出したクロススケールエッジ情報をL2損失により自己整合させ,詳細化に利用する。最後に、最高レベルの特徴から、デコーダは2つのモジュールの出力に含まれる正確な位置と細部に基づいて、正常なオブジェクトを推論する。 2つの公開データセットに関する大規模な実験によると、私たちの軽量SeaNetは、最先端の軽量メソッドよりも優れているだけでなく、最先端の従来手法と同等の精度を得られる。私たちのコードと結果はhttps://github.com/mathlee/seanetで入手できます。 Recently, relying on convolutional neural networks (CNNs), many methods for salient object detection in optical remote sensing images (ORSI-SOD) are proposed. However, most methods ignore the huge parameters and computational cost brought by CNNs, and only a few pay attention to the portability and mobility. To facilitate practical applications, in this paper, we propose a novel lightweight network for ORSI-SOD based on semantic matching and edge alignment, termed SeaNet. Specifically, SeaNet includes a lightweight MobileNet-V2 for feature extraction, a dynamic semantic matching module (DSMM) for high-level features, an edge self-alignment module (ESAM) for low-level features, and a portable decoder for inference. First, the high-level features are compressed into semantic kernels. Then, semantic kernels are used to activate salient object locations in two groups of high-level features through dynamic convolution operations in DSMM. Meanwhile, in ESAM, cross-scale edge information extracted from two groups of low-level features is self-aligned through L2 loss and used for detail enhancement. Finally, starting from the highest-level features, the decoder infers salient objects based on the accurate locations and fine details contained in the outputs of the two modules. Extensive experiments on two public datasets demonstrate that our lightweight SeaNet not only outperforms most state-of-the-art lightweight methods but also yields comparable accuracy with state-of-the-art conventional methods, while having only 2.76M parameters and running with 1.7G FLOPs for 288x288 inputs. Our code and results are available at https://github.com/MathLee/SeaNet.	翻訳日:2023-04-04 23:23:52 公開日:2023-04-03
# codetalker: 個別動作を優先した音声駆動3d顔アニメーション CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior ( http://arxiv.org/abs/2301.02379v2 ) ライセンス: Link先を確認	Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong	(参考訳) 音声駆動の3D顔アニメーションは広く研究されているが、音声視覚データの極めて不適切な性質と不足のため、現実主義と鮮明さを達成するには依然としてギャップがある。既存の作業は、通常、回帰タスクへのクロスモーダルマッピングを定式化するが、これは回帰と平均の問題に悩まされ、過度に滑らかな顔の動きにつながる。本稿では,学習したコードブックの有限プロキシ空間において,音声による顔のアニメーションをコードクエリタスクとしてキャストすることを提案する。コードブックは、実際の顔の動きに対する自己再構成によって学習され、現実的な顔の動きに埋め込まれる。離散的動作空間上では、入力された音声信号から顔の動きを逐次合成する時間的自己回帰モデルが用いられ、口唇同期と多彩な表情が保証される。提案手法は, 定性的かつ定量的に, 現在の最先端手法よりも優れていることを示す。また、ユーザスタディは、知覚品質の優位性をさらに正当化する。 Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness due to the highly ill-posed nature and scarcity of audio-visual data. Existing works typically formulate the cross-modal mapping into a regression task, which suffers from the regression-to-mean problem leading to over-smoothed facial motions. In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty. The codebook is learned by self-reconstruction over real facial motions and thus embedded with realistic facial motion priors. Over the discrete motion space, a temporal autoregressive model is employed to sequentially synthesize facial motions from the input speech signal, which guarantees lip-sync as well as plausible facial expressions. We demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively. Also, a user study further justifies our superiority in perceptual quality.	翻訳日:2023-04-04 23:23:10 公開日:2023-04-03
# MedKLIP: 医学的知識を活かした言語画像による放射線診断 MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology ( http://arxiv.org/abs/2301.02228v3 ) ライセンス: Link先を確認	Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie	(参考訳) 本稿では,放射線学的日々の実践から画像テキストのペアレポートを活用し,ドメイン固有知識を用いた医学的視覚言語前訓練(vlp)の強化を検討する。 In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks, e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. ゼロショットと微調整の両方において,従来の疾患分類法や接地法と比較して高い性能を示した。 In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks, e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.	翻訳日:2023-04-04 23:22:51 公開日:2023-04-03
# ランダム深さ量子振幅推定 Random-depth Quantum Amplitude Estimation ( http://arxiv.org/abs/2301.00528v3 ) ライセンス: Link先を確認	Xi Lu and Hongwei Lin	(参考訳) 量子振幅推定は、量子計算と量子数値積分の基礎において重要なタスクである。最大ラピッド振幅推定(mlae)アルゴリズムは、古典モンテカルロ法上の理論的に二次的なスピードアップを持つ量子振幅推定問題の実用的な解である。 MLAEは量子フーリエ変換(QFT)を必要としないため、QFTベースのアルゴリズムよりも近い将来に広く使われる可能性が高い。しかし,mlaeは,その不正確性の主要な原因の一つであるいわゆる臨界点のため,偏りがないことが判明した。臨界点を避けるためにランダム深さ量子振幅推定法(RQAE)を提案する。また,本アルゴリズムがmlaeや他の量子振幅推定アルゴリズムよりも優れていないことを示す数値実験を行った。 The quantum amplitude estimation is a critical task in quantum computing and the foundation of quantum numerical integration. The maximum likelihood amplitude estimation (MLAE) algorithm is a practical solution to the quantum amplitude estimation problem, which has a theoretically quadratic speedup over classical Monte Carlo method. Since MLAE requires no use of the quantum Fourier transformation (QFT), it will be more likely to be widely used in the near future than QFT based algorithms. However, we find that MLAE is not unbiased due to the so-called critical points, which is one of the major causes of its inaccuracy. We propose a random-depth quantum amplitude estimation (RQAE) to avoid critical points. We also do numerical experiments to show that our algorithm is approximately unbiased and outperforms MLAE and other quantum amplitude estimation algorithms.	翻訳日:2023-04-04 23:22:18 公開日:2023-04-03
# Dream3D: 3次元形状とテキスト・画像拡散モデルを用いたゼロショットテキスト・ツー・3次元合成 Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models ( http://arxiv.org/abs/2212.14704v2 ) ライセンス: Link先を確認	Jiale Xu, Xintao Wang, Weihao Cheng, Yan-Pei Cao, Ying Shan, Xiaohu Qie, Shenghua Gao	(参考訳) 最近のCLIP誘導3D最適化手法であるDreamFieldsやPureCLIPNeRFは、ゼロショットテキストから3D合成において驚くべき結果を得た。しかし、スクラッチトレーニングや事前知識のないランダム初期化のため、これらの手法は入力テキストに適合する正確で忠実な3D構造を生成することができないことが多い。本稿では,CLIP誘導3次元最適化プロセスに明示的な3次元形状を付加する最初の試みを行う。具体的には、まず、テキストから形状への入力テキストから、先行して3d形状として高品質な3d形状を生成する。次に、神経放射場の初期化として使用し、完全なプロンプトで最適化する。そこで本研究では,テキストと画像のモダリティを直接,強力なテキストと画像の拡散モデルでブリッジする,シンプルかつ効果的な手法を提案する。テキスト・ツー・イメージ拡散モデルにより合成された画像と画像・画像間のスタイル領域のギャップを狭めるために,学習可能なテキストプロンプトを最適化し,描画スタイルの画像生成のためのテキスト・ツー・イメージ拡散モデルを微調整することを提案する。本手法であるdream3dは、最先端の手法と比較して、視覚品質と形状精度に優れる想像的な3dコンテンツを生成することができる。 Recent CLIP-guided 3D optimization methods, such as DreamFields and PureCLIPNeRF, have achieved impressive results in zero-shot text-to-3D synthesis. However, due to scratch training and random initialization without prior knowledge, these methods often fail to generate accurate and faithful 3D structures that conform to the input text. In this paper, we make the first attempt to introduce explicit 3D shape priors into the CLIP-guided 3D optimization process. Specifically, we first generate a high-quality 3D shape from the input text in the text-to-shape stage as a 3D shape prior. We then use it as the initialization of a neural radiance field and optimize it with the full prompt. To address the challenging text-to-shape generation task, we present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. To narrow the style domain gap between the images synthesized by the text-to-image diffusion model and shape renderings used to train the image-to-shape generator, we further propose to jointly optimize a learnable text prompt and fine-tune the text-to-image diffusion model for rendering-style image generation. Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy compared to state-of-the-art methods.	翻訳日:2023-04-04 23:21:49 公開日:2023-04-03
# 絡み合いの集約:ドメイン一般化におけるドメインのバリエーションを再考する Aggregation of Disentanglement: Reconsidering Domain Variations in Domain Generalization ( http://arxiv.org/abs/2302.02350v4 ) ライセンス: Link先を確認	Daoan Zhang, Mingkai Chen, Chenming Li, Lingyun Huang, Jianguo Zhang	(参考訳) ドメイン一般化(Domain Generalization, DG)は、さまざまなドメインにおけるモデル一般化を改善することを目的とした機械学習モデルの基本課題である。以前の手法では、様々なソースドメインからドメイン不変機能を生成することに重点を置いている。しかし,このドメインの変種には下流タスクのための有用な情報,ie,分類認識情報が含まれており,ほとんど無視されている。ソースドメインからドメイン不変の機能を学ぶのと異なり、入力イメージをドメインエキスパート機能とノイズに分離します。提案したドメインエキスパート機能は、各ドメインのイメージを独立して分類できる学習潜在空間にあり、分類対応ドメインのバリエーションを暗黙的に使用することができる。分析に基づいて、ドメインエキスパート機能をソースドメインイメージから切り離し、ターゲットのテストドメインを表現するためのソースドメインエキスパート機能を集約する、ドメインディスタングルメントネットワーク(ddn)と呼ばれる新しいパラダイムを提案しました。また、よりバランスよく分離可能な機能空間を形成するために、ドメインエキスパートの機能をガイドする新しいコントラスト学習手法も提案する。 PACS、VLCS、OfficeHome、DomainNet、TerraIncognitaの広く使われているベンチマーク実験は、最近提案された代替手法と比較して、我々の手法の競合性能を実証している。 Domain Generalization (DG) is a fundamental challenge for machine learning models, which aims to improve model generalization on various domains. Previous methods focus on generating domain invariant features from various source domains. However, we argue that the domain variantions also contain useful information, ie, classification-aware information, for downstream tasks, which has been largely ignored. Different from learning domain invariant features from source domains, we decouple the input images into Domain Expert Features and noise. The proposed domain expert features lie in a learned latent space where the images in each domain can be classified independently, enabling the implicit use of classification-aware domain variations. Based on the analysis, we proposed a novel paradigm called Domain Disentanglement Network (DDN) to disentangle the domain expert features from the source domain images and aggregate the source domain expert features for representing the target test domain. We also propound a new contrastive learning method to guide the domain expert features to form a more balanced and separable feature space. Experiments on the widely-used benchmarks of PACS, VLCS, OfficeHome, DomainNet, and TerraIncognita demonstrate the competitive performance of our method compared to the recently proposed alternatives.	翻訳日:2023-04-04 21:41:10 公開日:2023-04-03
# ChatGPTとその他の大規模生成AIモデルの制御 Regulating ChatGPT and other Large Generative AI Models ( http://arxiv.org/abs/2302.02337v5 ) ライセンス: Link先を確認	Philipp Hacker, Andreas Engel, Marco Mauer	(参考訳) ChatGPTやStable Diffusionのような大規模な生成AIモデル(LGAIM)は、私たちのコミュニケーション、図示、作成の方法に急速に変化しています。しかし、EUなどでは、AI規制は主にLGAIMではなく、従来のAIモデルに焦点を当てている。本稿では、信頼に値するAI規制に関する現在の議論の中で、これらの新しい生成モデルについて検討し、その能力にどのように適合するかを問う。技術基盤を整備した後は、(1)直接規制、(2)データ保護、(3)コンテンツモデレーション、(4)政策提案の4段階に進む。これは、LGAIMの開発者、デプロイ者、プロフェッショナルおよび非プロフェッショナルのユーザ、およびLGAIMのアウトプットを区別することで、LGAIM設定でAIバリューチェーンをキャプチャする新しい用語を提案する。我々は、これらの異なるアクターに対する規制業務をバリューチェーンに沿って調整し、LGAIMが社会全体の利益のために信頼でき、デプロイされることを保証するための4つの戦略を提案する。 ai法やその他の直接規制の規則は、事前訓練されたモデルの特異性に合致しなければならない。特に、規制は事前訓練されたモデル自身ではなく、具体的なハイリスクなアプリケーションに焦点を当てるべきであり、含めるべきである。一透明性に関する義務及び義務 (ii)リスク管理。非差別規定しかし、(iii)LGAIM開発者には適用できる。最後に (4) DSA コンテンツモデレーションルールの中核は LGAIM をカバーするように拡張されるべきである。これには通知とアクションのメカニズム、信頼できるフラグガーが含まれる。あらゆる分野において、規制当局や議員はチャットgptなどのダイナミクスを追跡するために迅速に行動する必要がある。 Large generative AI models (LGAIMs), such as ChatGPT or Stable Diffusion, are rapidly transforming the way we communicate, illustrate, and create. However, AI regulation, in the EU and beyond, has primarily focused on conventional AI models, not LGAIMs. This paper will situate these new generative models in the current debate on trustworthy AI regulation, and ask how the law can be tailored to their capabilities. After laying technical foundations, the legal part of the paper proceeds in four steps, covering (1) direct regulation, (2) data protection, (3) content moderation, and (4) policy proposals. It suggests a novel terminology to capture the AI value chain in LGAIM settings by differentiating between LGAIM developers, deployers, professional and non-professional users, as well as recipients of LGAIM output. We tailor regulatory duties to these different actors along the value chain and suggest four strategies to ensure that LGAIMs are trustworthy and deployed for the benefit of society at large. Rules in the AI Act and other direct regulation must match the specificities of pre-trained models. In particular, regulation should focus on concrete high-risk applications, and not the pre-trained model itself, and should include (i) obligations regarding transparency and (ii) risk management. Non-discrimination provisions (iii) may, however, apply to LGAIM developers. Lastly, (iv) the core of the DSA content moderation rules should be expanded to cover LGAIMs. This includes notice and action mechanisms, and trusted flaggers. In all areas, regulators and lawmakers need to act fast to keep track with the dynamics of ChatGPT et al.	翻訳日:2023-04-04 21:40:47 公開日:2023-04-03
# 部分的不変性による最適特徴の学習 Learning Optimal Features via Partial Invariance ( http://arxiv.org/abs/2301.12067v2 ) ライセンス: Link先を確認	Moulik Choraria, Ibtihal Ferwana, Ankur Mani, Lav R. Varshney	(参考訳) 分散シフトに頑健な学習モデルは、実際の適用可能性のコンテキストにおいて重要な関心事である。不変リスク最小化(IRM)は、複数の環境から堅牢なモデルを学ぶことを目的とした一般的なフレームワークである。 IRMの成功には重要な前提が必要であり、根底にある因果的メカニズムや特徴は環境全体にわたって不変である。満足しない場合には、IRMが予測子を過度に抑制できることを示し、これを緩和するために、$\textit{partial invariance}$ による緩和を提案する。本研究では、IRMの準最適性を理論的に強調し、トレーニング領域の分割から学習することで不変モデルを改善する方法を示す。線形な設定と、言語と画像データの両方のタスク上のディープニューラルネットワークの両方で実施した実験によって、結論の検証が可能になった。 Learning models that are robust to distribution shifts is a key concern in the context of their real-life applicability. Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments. The success of IRM requires an important assumption: the underlying causal mechanisms/features remain invariant across environments. When not satisfied, we show that IRM can over-constrain the predictor and to remedy this, we propose a relaxation via $\textit{partial invariance}$. In this work, we theoretically highlight the sub-optimality of IRM and then demonstrate how learning from a partition of training domains can help improve invariant models. Several experiments, conducted both in linear settings as well as with deep neural networks on tasks over both language and image data, allow us to verify our conclusions.	翻訳日:2023-04-04 21:39:42 公開日:2023-04-03
# ロバスト多視点三角測量のための半定値緩和 Semidefinite Relaxations for Robust Multiview Triangulation ( http://arxiv.org/abs/2301.11431v3 ) ライセンス: Link先を確認	Linus H\"arenstam-Nielsen, Niclas Zeller, Daniel Cremers	(参考訳) 本稿では,凸緩和に基づく最適ロバスト多視点三角測量のアプローチを提案する。この目的のために、最小二乗コスト関数を組み込むことで、既存の緩和アプローチを非ロバスト多視点三角測量に拡張する。本稿では,エピポーラ制約に基づく2つの定式化と,分数再投影制約に基づく2つの定式化を提案する。 1つ目は低次元であり、中程度の騒音と降圧レベルの下ではきつく、もう1つ目は高次元であり、したがって遅いが、極端な騒音と降圧レベルでもきつい。提案手法は,大きな雑音と大容量の異常の下でも,証明可能な最適再構成を計算できることを実証する。 We propose an approach based on convex relaxations for certifiably optimal robust multiview triangulation. To this end, we extend existing relaxation approaches to non-robust multiview triangulation by incorporating a least squares cost function. We propose two formulations, one based on epipolar constraints and one based on fractional reprojection constraints. The first is lower dimensional and remains tight under moderate noise and outlier levels, while the second is higher dimensional and therefore slower but remains tight even under extreme noise and outlier levels. We demonstrate through extensive experiments that the proposed approaches allow us to compute provably optimal reconstructions even under significant noise and a large percentage of outliers.	翻訳日:2023-04-04 21:39:19 公開日:2023-04-03
# 膨張エッジを持つ量子ホール系におけるアナログ・ド・シッター宇宙 Analog de Sitter universe in quantum Hall systems with an expanding edge ( http://arxiv.org/abs/2301.09270v2 ) ライセンス: Link先を確認	Yasusada Nambu and Masahiro Hotta	(参考訳) 量子ホール系におけるエッジの拡大は、量子1+1次元膨張宇宙のシミュレータとなる。これらの系では、エッジの排他は曲線時空におけるカイラルスカラー場として表される。このモデルにより予測されるホーキング放射と絡み合い挙動を、エッジ領域の膨張則がデ・シッター宇宙に対応すると仮定して検討する。量子場の観測可能な量として、検出領域に関連する局所空間モードをフィールドのウィンドウ関数を用いて導入し、それらの相関性を評価する。局所モードの自己相関関数に対するエッジ展開によるホーキング放射の影響を調べたところ,ホーキング放射による絡み合い死の発生が確認された。この絡み合いの挙動は、宇宙のインフレーションにおける「量子から古典への遷移」に関連している。 Expanding edges in quantum Hall systems can become a simulator of quantum 1+1 dimensional expanding universes. In these systems, edge exciations are represented as a chiral scalar field in curved spacetimes. We investigate Hawking radiation and entanglement behavior predicted by this model assuming that the expansion law of the edge region corresponds to a de Sitter universe. As observable quantities for the quantum field, local spatial modes associated with detection regions are introduced using window functions for the field, and their correlations are evaluated. We found impact of Hawking radiation caused by the edge expansion on auto-correlation functions of the local modes, and confirmed that entanglement death due to Hawking radiation occurs. This behavior of entanglement is related to ``quantum to classical transition" in cosmic inflations.	翻訳日:2023-04-04 21:38:18 公開日:2023-04-03
# Causal Triplet: インターベンション中心のCausal Representation Learningのためのオープンチャレンジ Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning ( http://arxiv.org/abs/2301.05169v2 ) ライセンス: Link先を確認	Yuejiang Liu, Alexandre Alahi, Chris Russell, Max Horn, Dominik Zietlow, Bernhard Sch\"olkopf, Francesco Locatello	(参考訳) 近年、介入の下で低レベルの画像ペアから高レベルの因果表現を学ぶことへの関心が高まっている。しかし、既存の取り組みは、現実世界の問題とは程遠い単純な合成設定に限られている。本稿では,視覚的により複雑なシーンを特徴とする因果表現学習ベンチマークであるcausal tripletを提案する。 (i)あるオブジェクトレベル変数のみが反事実観察を許すが、他の変数が許さない行為可能な反事実設定 (ii)独立因果機構原理からの分散的ロバスト性を重視した介入的下流課題。広範な実験を通じて、乱れやオブジェクト中心の表現の知識で構築されたモデルは、分散表現よりも著しく優れていることが分かりました。しかし、近年の因果表現学習手法は、そのような潜伏構造を特定するのに苦慮しており、今後の仕事のかなりの課題と機会を示している。私たちのコードとデータセットはhttps://sites.google.com/view/causaltripletで利用可能です。 Recent years have seen a surge of interest in learning high-level causal representations from low-level image pairs under interventions. Yet, existing efforts are largely limited to simple synthetic settings that are far away from real-world problems. In this paper, we present Causal Triplet, a causal representation learning benchmark featuring not only visually more complex scenes, but also two crucial desiderata commonly overlooked in previous works: (i) an actionable counterfactual setting, where only certain object-level variables allow for counterfactual observations whereas others do not; (ii) an interventional downstream task with an emphasis on out-of-distribution robustness from the independent causal mechanisms principle. Through extensive experiments, we find that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts. However, recent causal representation learning methods still struggle to identify such latent structures, indicating substantial challenges and opportunities for future work. Our code and datasets will be available at https://sites.google.com/view/causaltriplet.	翻訳日:2023-04-04 21:38:06 公開日:2023-04-03
# コンフォーマル予測による超信頼性低レイテンシトラフィックの動的スケジューリング Guaranteed Dynamic Scheduling of Ultra-Reliable Low-Latency Traffic via Conformal Prediction ( http://arxiv.org/abs/2302.07675v2 ) ライセンス: Link先を確認	Kfir M. Cohen, Sangwoo Park, Osvaldo Simeone, Petar Popovski, and Shlomo Shamai (Shitz)	(参考訳) アップリンクにおける超信頼性・低遅延トラフィック(urllc)の動的スケジューリングは、必要に応じてリソースを割り当てるだけで、モバイルブロードバンド(embb)デバイスなどの共存サービスの効率を大幅に向上させることができる。主な課題は、URLLCパケット生成のプロセスの不確実性によるものである。実際には、そのような予測は生成されるURLLCデータの量を過大評価または過小評価し、URLLCパケットに対してプリエンプティブに割り当てられるリソースの過剰または不足を生じる可能性がある。本稿では,urllcパケット用の新しいスケジューラを提案する。urllcトラフィック予測器の品質に関わらず,信頼性とレイテンシに関する形式的保証を提供する。提案手法は,オンライン整合予測(CP)の最近の進歩を活用し,設計者が設定した信頼性とレイテンシの要件を満たすために,割り当てリソースの量を動的に調整する原理に従う。 The dynamic scheduling of ultra-reliable and low-latency traffic (URLLC) in the uplink can significantly enhance the efficiency of coexisting services, such as enhanced mobile broadband (eMBB) devices, by only allocating resources when necessary. The main challenge is posed by the uncertainty in the process of URLLC packet generation, which mandates the use of predictors for URLLC traffic in the coming frames. In practice, such prediction may overestimate or underestimate the amount of URLLC data to be generated, yielding either an excessive or an insufficient amount of resources to be pre-emptively allocated for URLLC packets. In this paper, we introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor. The proposed method leverages recent advances in online conformal prediction (CP), and follows the principle of dynamically adjusting the amount of allocated resources so as to meet reliability and latency requirements set by the designer.	翻訳日:2023-04-04 21:32:11 公開日:2023-04-03
# 非特定運動データを用いた拡張可能なXRユーザ同定 Extensible Motion-based Identification of XR Users using Non-Specific Motion Data ( http://arxiv.org/abs/2302.07517v2 ) ライセンス: Link先を確認	Christian Schell, Konstantin Kobs, Tamara Fernando, Andreas Hotho, Marc Erich Latoschik	(参考訳) 本稿では,距離ベースと分類に基づくアプローチの強みを組み合わせることで,拡張現実ユーザの動きを識別する。そこで我々は,深層メトリック学習を活用した組込み型アプローチを提案する。われわれは,VRゲーム‘Half-Life: Alyx’’をプレイするユーザのデータセット上でモデルをトレーニングし,アート分類ベースモデルの状態をベースラインとして,複数の実験と分析を行う。その結果,埋め込み型手法が有効であった。 1) 数分間の登録データを使用して,非特定動作から新規ユーザを識別できる。 2)新しいユーザーを数秒以内に登録できるが、ベースラインアプローチの再トレーニングにはおよそ1日かかる。 3) 登録データが少ない場合にのみ,ベースラインアプローチよりも信頼性が高い。 4) 異なるVRデバイスで記録された別のデータセットから新しいユーザーを特定するために使用することができる。全体として、我々のソリューションは、拡張可能なxrユーザ識別システムの基礎であり、幅広いユーザ動作に適用できる。また、専門知識やハードウェア、あるいはディープラーニングモデルをトレーニングするためのデータを必要としない、XR実践者が使用可能なプロダクション対応モデルの道を開く。 In this paper, we combine the strengths of distance-based and classification-based approaches for the task of identifying extended reality users by their movements. For this we present an embedding-based approach that leverages deep metric learning. We train the model on a dataset of users playing the VR game ``Half-Life: Alyx'' and conduct multiple experiments and analyses using a state of the art classification-based model as baseline. The results show that the embedding-based method 1) is able to identify new users from non-specific movements using only a few minutes of enrollment data, 2) can enroll new users within seconds, while retraining the baseline approach takes almost a day, 3) is more reliable than the baseline approach when only little enrollment data is available, 4) can be used to identify new users from another dataset recorded with different VR devices. Altogether, our solution is a foundation for easily extensible XR user identification systems, applicable to a wide range of user motions. It also paves the way for production-ready models that could be used by XR practitioners without the requirements of expertise, hardware, or data for training deep learning models.	翻訳日:2023-04-04 21:31:52 公開日:2023-04-03
# 距離行列は幾何学的深層学習に十分か? Is Distance Matrix Enough for Geometric Deep Learning? ( http://arxiv.org/abs/2302.05743v2 ) ライセンス: Link先を確認	Zian Li, Xiyuan Wang, Yinan Huang, Muhan Zhang	(参考訳) グラフニューラルネットワーク(GNN)は、分子動力学シミュレーションなど、与えられたグラフの幾何学を含むタスクによく使用される。幾何学グラフの距離行列には完全な幾何学的情報が含まれているが、この幾何学を学ぶにはメッセージパッシングニューラルネットワーク(MPNN)が不十分であることが示されている。本研究では,MPNNが距離行列と区別できない反例の族を拡張し,新しい幾何学グラフと対称幾何学グラフの族を構築する。次に,距離行列に含まれるリッチな幾何学を効果的に活用できる$k$-DisGNNを提案する。我々は、我々のモデルの高表現力を示し、既存のよく設計された幾何モデルが特別なケースとして$k$-DisGNNsで統一できることを証明する。最も重要なことは、幾何学的深層学習と従来のグラフ表現学習の関連性を確立することであり、グラフ構造学習用にもともと設計された表現力の高いGNNモデルは、幾何的深層学習にも適用可能であること、そして既存の複雑同変モデルが唯一の解決方法ではないこと、である。実験結果は我々の理論を検証する。 Graph Neural Networks (GNNs) are often used for tasks involving the geometry of a given graph, such as molecular dynamics simulation. Although the distance matrix of a geometric graph contains complete geometric information, it has been demonstrated that Message Passing Neural Networks (MPNNs) are insufficient for learning this geometry. In this work, we expand on the families of counterexamples that MPNNs are unable to distinguish from their distance matrices, by constructing families of novel and symmetric geometric graphs. We then propose $k$-DisGNNs, which can effectively exploit the rich geometry contained in the distance matrix. We demonstrate the high expressive power of our models and prove that some existing well-designed geometric models can be unified by $k$-DisGNNs as special cases. Most importantly, we establish a connection between geometric deep learning and traditional graph representation learning, showing that those highly expressive GNN models originally designed for graph structure learning can also be applied to geometric deep learning problems with impressive performance, and that existing complex, equivariant models are not the only solution. Experimental results verify our theory.	翻訳日:2023-04-04 21:30:56 公開日:2023-04-03
# 強化学習のための事前学習対象中心表現の検討 An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning ( http://arxiv.org/abs/2302.04419v2 ) ライセンス: Link先を確認	Jaesik Yoon, Yi-Fu Wu, Heechul Bae, and Sungjin Ahn	(参考訳) 教師なしオブジェクト指向表現(OCR)学習は近年,視覚表現の新しいパラダイムとして注目されている。これは、サンプル効率、体系的な一般化、推論という観点から、様々な下流タスクの効果的な事前学習技術になる可能性があるためである。画像に基づく強化学習(RL)は、こうした下流作業において最も重要かつ頻繁に言及される課題の1つであるが、RLの利点は驚くほど研究されていない。代わりに、ほとんどの評価は、セグメンテーションの品質やオブジェクトプロパティの予測精度といった、より間接的な指標に焦点を当てている。本稿では,OCR事前学習による画像に基づく強化学習の有効性を実証実験により検討する。体系的な評価のために、単純なオブジェクト指向ビジュアルRLベンチマークを導入し、'Does OCR pre-training improve performance on object-centric tasks?'や'Can OCR pre-training help with out-of-distriion generalization?'といった質問に答える実験を行う。以上の結果から,RLに対するOCR事前学習の有効性と,特定のシナリオにおけるOCR利用の潜在的な限界に関する貴重な知見が得られた。さらに,視覚複雑な環境におけるパフォーマンスや,オブジェクト表現を集約する適切なプーリング層など,rlにocrを事前トレーニングする上での重要な側面についても検討した。 Unsupervised object-centric representation (OCR) learning has recently drawn attention as a new paradigm of visual representation. This is because of its potential of being an effective pre-training technique for various downstream tasks in terms of sample efficiency, systematic generalization, and reasoning. Although image-based reinforcement learning (RL) is one of the most important and thus frequently mentioned such downstream tasks, the benefit in RL has surprisingly not been investigated systematically thus far. Instead, most of the evaluations have focused on rather indirect metrics such as segmentation quality and object property prediction accuracy. In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments. For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as ``Does OCR pre-training improve performance on object-centric tasks?'' and ``Can OCR pre-training help with out-of-distribution generalization?''. Our results provide empirical evidence for valuable insights into the effectiveness of OCR pre-training for RL and the potential limitations of its use in certain scenarios. Additionally, this study also examines the critical aspects of incorporating OCR pre-training in RL, including performance in a visually complex environment and the appropriate pooling layer to aggregate the object representations.	翻訳日:2023-04-04 21:30:12 公開日:2023-04-03
# 過去と未来 : マルチカメラ3dマルチオブジェクトトラッキングのための時空間モデリング Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking ( http://arxiv.org/abs/2302.03802v2 ) ライセンス: Link先を確認	Ziqi Pang, Jie Li, Pavel Tokmakov, Dian Chen, Sergey Zagoruyko, Yu-Xiong Wang	(参考訳) 本研究では,エンドツーエンドのマルチカメラ3Dマルチオブジェクトトラッキング(MOT)フレームワークを提案する。時空間連続性を強調し、追跡対象の過去と将来の推論を統合する。そこで我々はこれを"Past-and-Future reasoning for Tracking"(PF-Track)と呼ぶ。具体的には、「注目による追跡」フレームワークに適応し、オブジェクトクエリと時間とともに追跡されたインスタンスを一貫性を持って表現する。私たちの"Past Reasoning"モジュールは、過去のフレームや他のオブジェクトからのクエリにクロスアタッチすることで、トラックを洗練し、オブジェクトの機能を強化することを学びました。 future reasoning"モジュールは、履歴情報を取り込み、堅牢な将来の軌跡を予測する。長期閉塞の場合,本手法は物体の位置を維持し,動き予測を統合することで再連想を可能にする。 nuScenes データセットでは,AMOTA のマージンが大きく向上し,従来の手法に比べて ID-Switch が90%削減された。コードとモデルはhttps://github.com/tri-ml/pf-trackで入手できる。 This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects. Thus, we name it "Past-and-Future reasoning for Tracking" (PF-Track). Specifically, our method adapts the "tracking by attention" framework and represents tracked instances coherently over time with object queries. To explicitly use historical cues, our "Past Reasoning" module learns to refine the tracks and enhance the object features by cross-attending to queries from previous frames and other objects. The "Future Reasoning" module digests historical information and predicts robust future trajectories. In the case of long-term occlusions, our method maintains the object positions and enables re-association by integrating motion predictions. On the nuScenes dataset, our method improves AMOTA by a large margin and remarkably reduces ID-Switches by 90% compared to prior approaches, which is an order of magnitude less. The code and models are made available at https://github.com/TRI-ML/PF-Track.	翻訳日:2023-04-04 21:29:50 公開日:2023-04-03
# 2つの損失は1より優れている:チーパプロキシを使った最適化の高速化 Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy ( http://arxiv.org/abs/2302.03542v2 ) ライセンス: Link先を確認	Blake Woodworth (SIERRA), Konstantin Mishchenko, Francis Bach (SIERRA, PSL)	(参考訳) 本稿では,関連関数をプロキシとして利用することにより,目的物を計算困難勾配で最小化するアルゴリズムを提案する。このアルゴリズムはプロキシ上の近似近近点反復と目的からの相対的勾配を組み合わせたものである。目的物とプロキシの差が$\delta$-smoothである場合、我々のアルゴリズムは、$\delta$-smoothの目的物に対する確率勾配勾配に一致する速度で収束することを保証する。我々のアルゴリズムは機械学習に多くの可能性があり、合成データ、物理シミュレータ、混合公開データ、プライベートデータなどを活用するための原則化された手段を提供する。 We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is $\delta$-smooth, our algorithm guarantees convergence at a rate matching stochastic gradient descent on a $\delta$-smooth objective, which can lead to substantially better sample efficiency. Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.	翻訳日:2023-04-04 21:29:33 公開日:2023-04-03
# 星-三角関係からの可積分量子回路 Integrable Quantum Circuits from the Star-Triangle Relation ( http://arxiv.org/abs/2302.12675v2 ) ライセンス: Link先を確認	Yuan Miao, Eric Vernier	(参考訳) 恒星-三角関係は、古典的な2次元統計力学モデルに対して正確な結果を提供する、正確に解けるモデルの領域において重要な役割を果たす。本稿では、星-三角関係を用いた可積分量子回路を構築する。この構成は、星-三角関係によって解かれた統計力学モデルに対して相互に可換な2パラメータ転移行列の族に依存しており、yang-baxter可積分頂点モデルに基づく既知構成とは異なる。スペクトルパラメータの特別な値において、転送行列は積分可能な量子回路にマッピングされ、そこでは局所保存電荷の無限の族が導出される。我々は、最近ロトコフらによって予想された積分性を持つ$Q$状態ポッツ回路と、我々の知識に新しい$\mathbb{Z}_Q$回路という、$Q$状態ポッツ回路の連鎖に作用する回路の2つの例を示す。最初の例では、$Q=3$ を Zamolodchikov-Fateev 19-頂点モデルに接続する。 The star-triangle relation plays an important role in the realm of exactly solvable models, offering exact results for classical two-dimensional statistical mechanical models. In this article, we construct integrable quantum circuits using the star-triangle relation. Our construction relies on families of mutually commuting two-parameter transfer matrices for statistical mechanical models solved by the star-triangle relation, and differs from previously known constructions based on Yang-Baxter integrable vertex models. At special value of the spectral parameter, the transfer matrices are mapped into integrable quantum circuits, for which infinite families of local conserved charges can be derived. We demonstrate the construction by giving two examples of circuits acting on a chain of $Q-$state qudits: $Q$-state Potts circuits, whose integrability has been conjectured recently by Lotkov et al., and $\mathbb{Z}_Q$ circuits, which are novel to our knowledge. In the first example, we present for $Q=3$ a connection to the Zamolodchikov-Fateev 19-vertex model.	翻訳日:2023-04-04 21:22:43 公開日:2023-04-03
# 単一物体追跡における変圧器 : 実験的検討 Transformers in Single Object Tracking: An Experimental Survey ( http://arxiv.org/abs/2302.11867v2 ) ライセンス: Link先を確認	Janani Thangavel, Thanikasalam Kokul, Amirthalingam Ramanan, and Subha Fernando	(参考訳) シングルオブジェクトトラッキングは、コンピュータビジョンにおいてよく知られ、挑戦的な研究トピックである。過去20年間、多くの研究者がこの問題を解くために様々なアルゴリズムを提案し、有望な結果を得た。近年、トランスフォーマーベースのトラッキングアプローチは、追跡ロバスト性が優れているため、単一オブジェクトトラッキングの新しい時代を告げている。トラッカの性能分析のための調査研究がいくつか行われているが、単一物体追跡におけるトランスフォーマーの導入後、別の調査研究が必要である。本研究では,変圧器追跡手法の文献と性能を分析することを目的とした。そこで我々は、Transformer Trackingアプローチの詳細な文献分析を行い、その追跡堅牢性と計算効率を、挑戦的なベンチマークデータセット上で評価する。さらに、異なるトラッキングシナリオでパフォーマンスを測定して、その強度と弱点を見つけました。我々の調査は、Transformer Trackingアプローチの基礎となる原則、直面している課題、今後の方向性に関する洞察を提供する。 Single object tracking is a well-known and challenging research topic in computer vision. Over the last two decades, numerous researchers have proposed various algorithms to solve this problem and achieved promising results. Recently, Transformer-based tracking approaches have ushered in a new era in single object tracking due to their superior tracking robustness. Although several survey studies have been conducted to analyze the performance of trackers, there is a need for another survey study after the introduction of Transformers in single object tracking. In this survey, we aim to analyze the literature and performances of Transformer tracking approaches. Therefore, we conduct an in-depth literature analysis of Transformer tracking approaches and evaluate their tracking robustness and computational efficiency on challenging benchmark datasets. In addition, we have measured their performances on different tracking scenarios to find their strength and weaknesses. Our survey provides insights into the underlying principles of Transformer tracking approaches, the challenges they face, and their future directions.	翻訳日:2023-04-04 21:22:03 公開日:2023-04-03
# 持続可能なオンデマンドライドプールの価格設定とマッチング Future Aware Pricing and Matching for Sustainable On-demand Ride Pooling ( http://arxiv.org/abs/2302.10510v2 ) ライセンス: Link先を確認	Xianjie Zhang and Pradeep Varakantham and Hao Jiang	(参考訳) オンデマンドのライドプーリングの人気は、顧客(低価格)、タクシードライバー(高い収入)、環境(少ない車両によるカーボンフットプリント)、そしてuberのような集約企業(高い収入)に提供される利点がある。これらの利点を達成するには、2つの重要な相互リンク課題を効果的に解決する必要がある。 (a)価格 --タクシーの顧客要求に価格を設定すること (b)マッチング -- タクシー・車への顧客(価格を受け入れた)の割り当て。伝統的に、これら2つの課題は、将来の要求に対する現在のマッチングの影響を考慮せずに、個別に研究され、(現在の要求のみを考慮して)妙明なアプローチを用いている。本稿では,価格とマッチングの問題を取り扱うとともに,価格とマッチング決定の今後の影響も考慮しながら,新たな枠組みを提案する。実世界のタクシーデータセットにおける実験結果では、固定収入の取得に必要な車両数(最大14%、平均10.6%)と、車両の走行距離(最大11.1%、平均3.7%)を削減し、持続的に収益(平均17%、平均6.4%)を大幅に改善できることを実証した。つまり、顧客、ドライバー、アグリゲータ(ライドプール会社)に対して高い収益を得ると同時に、環境(道路上の車両の数が少なく、燃料消費も少ないため)に適している、すべての利害関係者(顧客、ドライバー、アグリゲータ、環境)に理想的なウィンウィンシナリオを提供することができるのです。 The popularity of on-demand ride pooling is owing to the benefits offered to customers (lower prices), taxi drivers (higher revenue), environment (lower carbon footprint due to fewer vehicles) and aggregation companies like Uber (higher revenue). To achieve these benefits, two key interlinked challenges have to be solved effectively: (a) pricing -- setting prices to customer requests for taxis; and (b) matching -- assignment of customers (that accepted the prices) to taxis/cars. Traditionally, both these challenges have been studied individually and using myopic approaches (considering only current requests), without considering the impact of current matching on addressing future requests. In this paper, we develop a novel framework that handles the pricing and matching problems together, while also considering the future impact of the pricing and matching decisions. In our experimental results on a real-world taxi dataset, we demonstrate that our framework can significantly improve revenue (up to 17% and on average 6.4%) in a sustainable manner by reducing the number of vehicles (up to 14% and on average 10.6%) required to obtain a given fixed revenue and the overall distance travelled by vehicles (up to 11.1% and on average 3.7%). That is to say, we are able to provide an ideal win-win scenario for all stakeholders (customers, drivers, aggregator, environment) involved by obtaining higher revenue for customers, drivers, aggregator (ride pooling company) while being good for the environment (due to fewer number of vehicles on the road and lesser fuel consumed).	翻訳日:2023-04-04 21:21:29 公開日:2023-04-03
# 物体検出における連続的領域適応のための領域ギャップの評価 Assessing Domain Gap for Continual Domain Adaptation in Object Detection ( http://arxiv.org/abs/2302.10396v2 ) ライセンス: Link先を確認	Anh-Dzung Doan and Bach Long Nguyen and Surabhi Gupta and Ian Reid and Markus Wagner and Tat-Jun Chin	(参考訳) 自律システムにおける信頼できる物体検出を確保するために、検出器は、日時、天候、季節などの環境要因による外観の変化に対応できなければならない。これらの変更を継続的に取り入れることは有望な解決策であるが、計算コストはかかる。提案手法は,現在のトレーニングデータと同じ分布を持たない新しいデータを用いて,必要なときにのみ検出器を選択的に適応させることである。この目的のために、ドメインギャップ評価のための3つの一般的なメトリクスを調査し、ドメインギャップと検出精度との間に相関があることを見出した。そこで, 領域ギャップを基準として, 検出器の適応時期を決定する。提案手法は, 環境条件が周期的に変化する現実のシナリオにおいて, 検出器全体の性能を犠牲にすることなく, 検出器の動作効率を向上させる可能性を秘めている。私たちのコードはhttps://github.com/dadung/DGE-CDA.comで公開されています。 To ensure reliable object detection in autonomous systems, the detector must be able to adapt to changes in appearance caused by environmental factors such as time of day, weather, and seasons. Continually adapting the detector to incorporate these changes is a promising solution, but it can be computationally costly. Our proposed approach is to selectively adapt the detector only when necessary, using new data that does not have the same distribution as the current training data. To this end, we investigate three popular metrics for domain gap evaluation and find that there is a correlation between the domain gap and detection accuracy. Therefore, we apply the domain gap as a criterion to decide when to adapt the detector. Our experiments show that our approach has the potential to improve the efficiency of the detector's operation in real-world scenarios, where environmental conditions change in a cyclical manner, without sacrificing the overall performance of the detector. Our code is publicly available at https://github.com/dadung/DGE-CDA.	翻訳日:2023-04-04 21:21:01 公開日:2023-04-03
# ブートストラップ the original latent: ブラックボックスモデルからプライベートモデルを学ぶ Bootstrap The Original Latent: Learning a Private Model from a Black-box Model ( http://arxiv.org/abs/2303.03709v4 ) ライセンス: Link先を確認	Shuai Wang, Daoan Zhang, Jianguo Zhang, Weiwei Zhang, and Rui Li	(参考訳) 本稿では,モデル所有者とユーザニーズのデータ/モデルプライバシのバランスを考慮し,ブラックボックス基盤/ソースモデルのバックプロパゲーション結果のガイダンスを用いて,ユーザがプライベートモデルをより良いトレーニングを行うためのBack-Propagated Black-Box Adaptation (BPBA)を提案する。私たちの設定は、ファンデーション/ソースモデルの使用を容易にし、ファンデーション/ソースモデルの漏洩や誤用を防ぎます。さらに,基盤/ソースモデルを完全に活用するためのBootstrap The Original Latent(BTOL)という新たなトレーニング戦略を提案する。当社の戦略はドメインアダプタとフリーズ・アンド・ザウ戦略で構成されています。 3つのデータセットに対してBPBAとBlack-box UDA設定でBTOLを適用します。実験の結果,手作業による拡張を伴わずに,戦略が効率的かつ堅牢であることが確認された。 In this paper, considering the balance of data/model privacy of model owners and user needs, we propose a new setting called Back-Propagated Black-Box Adaptation (BPBA) for users to better train their private models via the guidance of the back-propagated results of a Black-box foundation/source model. Our setting can ease the usage of foundation/source models as well as prevent the leakage and misuse of foundation/source models. Moreover, we also propose a new training strategy called Bootstrap The Original Latent (BTOL) to fully utilize the foundation/source models. Our strategy consists of a domain adapter and a freeze-and-thaw strategy. We apply our BTOL under BPBA and Black-box UDA settings on three different datasets. Experiments show that our strategy is efficient and robust in various settings without manual augmentations.	翻訳日:2023-04-04 21:12:41 公開日:2023-04-03
# 量子コンピュータを用いた高次元離散時間結晶のシミュレーション Simulation of Higher Dimensional Discrete Time Crystals on a Quantum Computer ( http://arxiv.org/abs/2303.02727v2 ) ライセンス: Link先を確認	Christopher Sims	(参考訳) 位相秩序状態の研究は、量子物質における対称性保護状態への関心が高まっている。近年、この理論は低温での秩序状態を示す量子多体系に拡張されている。この例は離散時間結晶(DTC)であり、実際の量子コンピュータや駆動システムで実証されている。これらの状態は周期的であり、ある程度の障害に対して保護されている。一般に、DTCは安定な多体局在状態(MBL)と不規則な熱状態の2つの段階に分けられる。本研究は, DTCを2次元に一般化することにより, 熱雑音の低減, MBL範囲の動作範囲の増大を実証する。 The study of topologically ordered states have given rise to a growing interest in symmetry protected states in quantum matter. Recently, this theory has been extended to quantum many body systems which demonstrate ordered states at low temperature. An example of this is the discrete time crystal (DTC) which has been demonstrated in a real quantum computer and in driven systems. These states are periodic in time and are protected to disorder to a certain extent. In general, DTC can be classified into two phases, the stable many body localization (MBL) state, and the disordered thermal state. This work demonstrates by generalizing DTC to 2 dimensions, there is an decrease in thermal noise and an increase in the operating range of the MBL range in the presence of disorder.	翻訳日:2023-04-04 21:12:03 公開日:2023-04-03
# 画像テキスト検索のための共通知識最適化型スタイルトランス The style transformer with common knowledge optimization for image-text retrieval ( http://arxiv.org/abs/2303.00448v2 ) ライセンス: Link先を確認	Wenrui Li, Zhengyu Ma, Jinqiao Shi, Xiaopeng Fan	(参考訳) 異なるモダリティを関連付ける画像テキスト検索は,その優れた研究価値と広い実世界の応用により,広く注目を集めている。しかし、既存の手法のほとんどは、高レベルの意味的関係(スタイル埋め込み)とマルチモーダルからの共通知識を十分に考慮していない。そこで本稿では,画像テキスト検索のための共通知識最適化(CKSTN)を備えた新しいスタイルトランスフォーマーネットワークを提案する。主なモジュールは共通知識適応器 (CKA) であり、スタイル埋め込み抽出器 (SEE) と共通知識最適化 (CKO) モジュールの両方がある。具体的には、SEEはシーケンシャルアップデート戦略を使用して、SEEの異なるステージの特徴を効果的に接続します。 CKOモジュールは、様々なモダリティから共通知識の潜在概念を動的に捉えるために導入された。さらに、時間的共通知識を一般化するために、SEE内の異なるレイヤの特徴を従来の共通特徴ユニットと効果的に統合するためのシーケンシャルな更新戦略を提案する。 CKSTNは、MSCOCOおよびFlickr30Kデータセット上の画像テキスト検索における最先端手法の優位性を実証する。さらに、CKSTNは、より優れた性能と低いパラメータのため、実際のシーンに適用するためにより便利で実用的な軽量トランスフォーマーに基づいて構築される。 Image-text retrieval which associates different modalities has drawn broad attention due to its excellent research value and broad real-world application. However, most of the existing methods haven't taken the high-level semantic relationships ("style embedding") and common knowledge from multi-modalities into full consideration. To this end, we introduce a novel style transformer network with common knowledge optimization (CKSTN) for image-text retrieval. The main module is the common knowledge adaptor (CKA) with both the style embedding extractor (SEE) and the common knowledge optimization (CKO) modules. Specifically, the SEE uses the sequential update strategy to effectively connect the features of different stages in SEE. The CKO module is introduced to dynamically capture the latent concepts of common knowledge from different modalities. Besides, to get generalized temporal common knowledge, we propose a sequential update strategy to effectively integrate the features of different layers in SEE with previous common feature units. CKSTN demonstrates the superiorities of the state-of-the-art methods in image-text retrieval on MSCOCO and Flickr30K datasets. Moreover, CKSTN is constructed based on the lightweight transformer which is more convenient and practical for the application of real scenes, due to the better performance and lower parameters.	翻訳日:2023-04-04 21:10:50 公開日:2023-04-03
# 芸術の状況はどうなっていますか。機械学習ベンチマーク性能における多重性会計 What is the state of the art? Accounting for multiplicity in machine learning benchmark performance ( http://arxiv.org/abs/2303.07272v2 ) ライセンス: Link先を確認	Kajsa M{\o}llersen and Einar Holsb{\o}	(参考訳) 機械学習手法は一般に評価され、公開リポジトリのデータセットのパフォーマンスによって比較される。これにより、しばしば数千のメソッドが同じ条件下で、時間にわたって評価される。問題における最上位の成績は「最先端(SOTA)パフォーマンス」と呼ばれ、新しい手法を公表するための基準点として用いられる。 SOTAの最大性能を推定として用いることは偏りのある推定器であり、過度に楽観的な結果を与える。マルチプリシティ(multiplicity)は、複数の比較と複数のテストの文脈でよく研究されているトピックであるが、著者たちが認識している限り、SOTAの推定に関する議論からほとんど欠落している。新しい手法を評価するための基準として,楽観的な最先端推定法が用いられ,その結果が著しく劣る手法が容易に見過ごされる。本稿では、複数の分類器の場合の確率分布について、既知の解析手法を適用できるようにし、より優れたSOTA推定値を提供する。独立分類器を用いた模擬例による乗法の影響を実証する。分類子依存性が分散にどのように影響するかを示すとともに,精度が高い場合には影響が限定されることを示した。最後に,2020年のkaggleコンペティションという実例について論じる。 Machine learning methods are commonly evaluated and compared by their performance on data sets from public repositories. This allows for multiple methods, oftentimes several thousands, to be evaluated under identical conditions and across time. The highest ranked performance on a problem is referred to as state-of-the-art (SOTA) performance, and is used, among other things, as a reference point for publication of new methods. Using the highest-ranked performance as an estimate for SOTA is a biased estimator, giving overly optimistic results. The mechanisms at play are those of multiplicity, a topic that is well-studied in the context of multiple comparisons and multiple testing, but has, as far as the authors are aware of, been nearly absent from the discussion regarding SOTA estimates. The optimistic state-of-the-art estimate is used as a standard for evaluating new methods, and methods with substantial inferior results are easily overlooked. In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided. We demonstrate the impact of multiplicity through a simulated example with independent classifiers. We show how classifier dependency impacts the variance, but also that the impact is limited when the accuracy is high. Finally, we discuss a real-world example; a Kaggle competition from 2020.	翻訳日:2023-04-04 21:03:10 公開日:2023-04-03
# 動的環境におけるディジタルツインベースv2x通信を実現するマルチモーダルシミュレーションフレームワーク A Multi-Modal Simulation Framework to Enable Digital Twin-based V2X Communications in Dynamic Environments ( http://arxiv.org/abs/2303.06947v2 ) ライセンス: Link先を確認	Lorenzo Cazzella, Francesco Linsalata, Maurizio Magarini, Matteo Matteucci, Umberto Spagnolini	(参考訳) 近年,物理無線環境のためのDigital Twins (DT) が,物理通信機器における多層決定を可能にする伝搬環境の正確な仮想表現として提案されている。高周波帯では、DTは車体環境を特徴とする高移動環境において生じる課題を克服するのに役立つ。本稿では,V2X通信シナリオのDT作成のための新しいデータ駆動ワークフローと,現実的なセンサデータと正確なmmWave/sub-THz無線チャネルを生成するためのマルチモーダルシミュレーションフレームワークを提案する。提案手法は,Unreal Engineゲームエンジンと正確なレイトレーシングチャネルシミュレータに基づく,自動車シミュレーションおよびテストフレームワークを活用する。都市シナリオのシミュレーションでは、達成可能な現実的なセンサーとチャネルがインフラとエゴ車両の両方でモデル化されている。 Digital Twins (DTs) for physical wireless environments have been recently proposed as accurate virtual representations of the propagation environment that can enable multi-layer decisions at the physical communication equipment. At high frequency bands, DTs can help to overcome the challenges emerging in the high mobility conditions featuring vehicular environments. In this paper, we propose a novel data-driven workflow for the creation of the DT of a Vehicle-to-Everything (V2X) communication scenario and a multi-modal simulation framework for the generation of realistic sensor data and accurate mmWave/sub-THz wireless channels. The proposed method leverages an automotive simulation and testing framework based on the Unreal Engine game engine and an accurate ray-tracing channel simulator. Simulations over an urban scenario show the achievable realistic sensor and channel modelling both at the infrastructure and at an ego-vehicle.	翻訳日:2023-04-04 21:02:49 公開日:2023-04-03
# coganppis:タンパク質-タンパク質相互作用サイト予測のための共進化強化グローバルアテンションニューラルネットワーク CoGANPPIS: Coevolution-enhanced Global Attention Neural Network for Protein-Protein Interaction Site Prediction ( http://arxiv.org/abs/2303.06945v3 ) ライセンス: Link先を確認	Jiaxing Guo, Xuening Zhu, Zixin Hu, Xiaoxi Hu	(参考訳) タンパク質とタンパク質の相互作用は生化学的プロセスにおいて必須である。タンパク質-タンパク質相互作用部位(PPI)の正確な予測は、我々の生物学的メカニズムの理解を深め、新しい医薬品設計に不可欠である。しかし、従来のPPI予測実験手法はコストと時間を要するため、近年多くの計算手法、特にMLベースの手法が開発されている。これらの手法は, 満足度の高い結果を得たものの, 1) 多くのモデルでは有用な入力特徴を発掘しているが, 共進化的特徴を考慮に入れられなかった。(2) 注意ベースモデルでは, 対象残差から遠く離れた残差も考慮せず, 近隣残差に対してのみ注意重みを割り当てている。我々は,CGANPPISと呼ばれるPPI予測のためのシーケンスベースディープラーニングモデルである,共進化型グローバルアテンションニューラルネットワークを提案する。 It utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all the residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. そして、3つの出力が連結され、最終予測のために複数の完全連結層に渡される。 2つのベンチマークデータセット上のアプリケーションは、このモデルの最先端のパフォーマンスを実証しました。ソースコードはhttps://github.com/Slam1423/CoGANPPIS_source_codeで公開されている。 Protein-protein interactions are essential in biochemical processes. Accurate prediction of the protein-protein interaction sites (PPIs) deepens our understanding of biological mechanism and is crucial for new drug design. However, conventional experimental methods for PPIs prediction are costly and time-consuming so that many computational approaches, especially ML-based methods, have been developed recently. Although these approaches have achieved gratifying results, there are still two limitations: (1) Most models have excavated some useful input features, but failed to take coevolutionary features into account, which could provide clues for inter-residue relationships; (2) The attention-based models only allocate attention weights for neighboring residues, instead of doing it globally, neglecting that some residues being far away from the target residues might also matter. We propose a coevolution-enhanced global attention neural network, a sequence-based deep learning model for PPIs prediction, called CoGANPPIS. It utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all the residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. Then, the three outputs are concatenated and passed into several fully connected layers for the final prediction. Application on two benchmark datasets demonstrated a state-of-the-art performance of our model. The source code is publicly available at https://github.com/Slam1423/CoGANPPIS_source_code.	翻訳日:2023-04-04 21:02:33 公開日:2023-04-03
# テンソル分解における実対数正準閾値の上界とベイズ推定への応用 Upper Bound of Real Log Canonical Threshold of Tensor Decomposition and its Application to Bayesian Inference ( http://arxiv.org/abs/2303.05731v2 ) ライセンス: Link先を確認	Naoki Yoshida and Sumio Watanabe	(参考訳) テンソル分解は現在、データ分析、情報圧縮、知識回復に使われている。しかし、テンソル分解の数学的性質は特異学習機の1つであるため、まだ完全には解明されていない。本稿では,代数幾何学的手法を用いてテンソル分解の実対正準しきい値(rlct)の上界を与え,ベイズ一般化誤差を理論的に導出する。また,その数学的性質を数値実験によって考察する。 Tensor decomposition is now being used for data analysis, information compression, and knowledge recovery. However, the mathematical property of tensor decomposition is not yet fully clarified because it is one of singular learning machines. In this paper, we give the upper bound of its real log canonical threshold (RLCT) of the tensor decomposition by using an algebraic geometrical method and derive its Bayesian generalization error theoretically. We also give considerations about its mathematical property through numerical experiments.	翻訳日:2023-04-04 21:00:54 公開日:2023-04-03
# SUD$^2$:画像再構成のための拡散モデルによるスーパービジョン SUD$^2$: Supervision by Denoising Diffusion Models for Image Reconstruction ( http://arxiv.org/abs/2303.09642v2 ) ライセンス: Link先を確認	Matthew A. Chan, Sean I. Young, Christopher A. Metzler	(参考訳) 多くのイメージング逆問題$\unicode{x2014}$ 画像依存のin-paintingやdehazing$\unicode{x2014}$ は、前方モデルが未知あるいは未知の潜在パラメータに依存しているため困難である。膨大な量のペアトレーニングデータでニューラルネットワークをトレーニングすることで、そのような問題を解決することができるが、ペアトレーニングデータはしばしば利用できない。本稿では,ペアトレーニングデータが少ない場合に,画像再構成ネットワークをトレーニングするための汎用フレームワークを提案する。特に,画像復号化アルゴリズムと拡張により,ペアトレーニングデータがない場合のネットワークトレーニングを監督する拡散モデルをデノナイズする能力を示す。 Many imaging inverse problems$\unicode{x2014}$such as image-dependent in-painting and dehazing$\unicode{x2014}$are challenging because their forward models are unknown or depend on unknown latent parameters. While one can solve such problems by training a neural network with vast quantities of paired training data, such paired training data is often unavailable. In this paper, we propose a generalized framework for training image reconstruction networks when paired training data is scarce. In particular, we demonstrate the ability of image denoising algorithms and, by extension, denoising diffusion models to supervise network training in the absence of paired training data.	翻訳日:2023-04-04 20:54:31 公開日:2023-04-03
# 分割定数近似を超える形状パルスのシミュレーションと設計 Simulation and design of shaped pulses beyond the piecewise-constant approximation ( http://arxiv.org/abs/2303.09458v3 ) ライセンス: Link先を確認	Uluk Rasulov, Anupama Acharya, Marina Carravetta, Ilya Kuprov	(参考訳) 共振回路の応答関数は、入力が急速に変化するとリングアーティファクトを生成する。電磁分光学の物理的限界を探索すると、2種類の問題が発生する。まず、シミュレーション: システムは応答のトランジェントごとに正確に伝達されなければならず、計算コストがかかる。第二に、最適制御:回路応答を考慮に入れなければならない;そのような歪みに耐性のあるパルスを設計することが有利である。両問題の根源は回転するフレームの制御シーケンスに対する一般的な分割定数近似であり、磁気共鳴では初期から持続し、市販のハードウェアに絡み付いている。本稿では,スムーズな制御シーケンスを効率的にシミュレートし最適化できる最近のリー群法の実装とベンチマークについて報告する。 Response functions of resonant circuits create ringing artefacts if their input changes rapidly. When physical limits of electromagnetic spectroscopies are explored, this creates two types of problems. Firstly, simulation: the system must be propagated accurately through every response transient, this may be computationally expensive. Secondly, optimal control: circuit response must be taken into account; it may be advantageous to design pulses that are resilient to such distortions. At the root of both problems is the popular piecewise-constant approximation for control sequences in the rotating frame; in magnetic resonance it has persisted since the earliest days and has become entrenched in the commercially available hardware. In this paper, we report an implementation and benchmarks of recent Lie-group methods that can efficiently simulate and optimise smooth control sequences.	翻訳日:2023-04-04 20:54:18 公開日:2023-04-03
# タンゴまで1回はかかるが、もっとトラブルを起こすのか? 文脈内学習に必要な実演数 It Takes One to Tango but More Make Trouble? The Number of Demonstrations Needed for In-Context Learning ( http://arxiv.org/abs/2303.08119v2 ) ライセンス: Link先を確認	Jiuhai Chen, LiChang Chen, Chen Zhu, Tianyi Zhou	(参考訳) 大規模言語モデル(LLM)は、インコンテキスト学習(ICL)によっていくつかのインプット・アウトプット・デモ(デム)が提供されると複雑な推論を行うことができ、デモの中間的推論ステップ(CoT)が与えられるとより強力になる。 ICLでマルチデモを使う必要はあるか? 本稿では,<wei2022chain} のタスクにおける各テストクエリのデモを減らして ICL について検討する。驚いたことに、ランダムに選択されたデモのみを使用する場合、大きな劣化は観察されない。この現象を研究するために、各テストクエリに対して、デモを"正しいデモ"に分類し、正しい回答を導き、"間違ったデモ"を誤った回答に導く。私たちの分析では、これらの広く研究されているデータセットに固有のバイアスが示されています。ほとんどのデモは、テストクエリの大部分に対して正しいものです。さらに、ICL(with and w/o CoT)は1つの正しいデモのみを使用しており、これまでのほとんどの研究で採用されていた全デモICLよりも大幅に優れており、バイアス付きデータセットでは評価が難しい入力クエリの正しいデモ(s)を見つける際のLCMの弱点を示している。さらに,より正確なデモを行うと,その正確性が低下(改善)するマルチデモを用いて,iclの直観に反する行動が観察される。これは、iclがデモとそれらのスプリアス相関の間の干渉によって容易に誤解されることを意味する。我々の分析では、LLMのトレーニング、ICL、ベンチマーク設計で対処する必要があるいくつかの基本的な課題を取り上げている。 Large language models (LLMs) are capable to perform complex reasoning by in-context learning (ICL) when provided with a few input-output demonstrations (demos) and more powerful when intermediate reasoning steps ("chain of thoughts (CoT)") of the demos are given. Is it necessary to use multi-demo in ICL? In this paper, we study ICL using fewer demos for each test query on the tasks in~\cite{wei2022chain}. Surprisingly, we do not observe significant degradation when using only one randomly chosen demo. To study this phenomenon, for each test query, we categorize demos into "correct demos" leading to the correct answer, and "wrong demos" resulting in wrong answers. Our analysis reveals an inherent bias in those widely studied datasets: most demos are correct for a majority of test queries, which explains the good performance of using one random demo. Moreover, ICL (with and w/o CoT) using only one correct demo significantly outperforms all-demo ICL adopted by most previous works, indicating the weakness of LLMs in finding correct demo(s) for input queries, which is difficult to evaluate on the biased datasets. Furthermore, we observe a counterintuitive behavior of ICL using multi-demo, i.e., its accuracy degrades(improves) when given more correct(wrong) demos. This implies that ICL can be easily misguided by interference among demos and their spurious correlations. Our analyses highlight several fundamental challenges that need to be addressed in LLMs training, ICL, and benchmark design.	翻訳日:2023-04-04 20:52:51 公開日:2023-04-03
# RepoCoder: 反復検索と生成によるリポジトリレベルのコード補完 RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation ( http://arxiv.org/abs/2303.12570v2 ) ライセンス: Link先を確認	Fengji Zhang, Bei Chen, Yue Zhang, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, Weizhu Chen	(参考訳) リポジトリレベルのコード補完のタスクは、リポジトリのより広いコンテキストに基づいて未完成のコードを書き続けることです。自動化されたコード補完ツールでは、異なるファイルに散在する有用な情報を利用するのは難しい。この課題に対処するためのシンプルで汎用的で効果的なフレームワークであるRepoCoderを提案する。類似度ベースのレトリバーと事前学習されたコード言語モデルを組み合わせて、リポジトリレベルのコード補完プロセスを合理化し、コード補完にリポジトリレベルの情報の有効利用を可能にし、様々なレベルの粒度でコードを生成する機能を提供する。さらに、RepoCoderは、検索コンテキストと目的とする完了目標とのギャップを埋める、新しい反復検索生成パラダイムを利用する。また、ライン、API呼び出し、ファンクションボディ補完シナリオをカバーする最新かつ高品質な現実世界リポジトリで構成される新しいベンチマークRepoEvalを提案する。コードレトリバーとジェネレータの様々な組み合わせを用いて,レポコーダの性能をテストする。実験の結果,レポコーダはゼロショットコード補完ベースラインを全設定で10%以上向上させ,バニラ検索によるコード補完アプローチを一貫して上回っていることがわかった。さらに,RepoCoderの有効性を総合分析により検証し,今後の研究に有用な知見を提供する。 The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in different files. We propose RepoCoder, a simple, generic, and effective framework to address the challenge. It streamlines the repository-level code completion process by incorporating a similarity-based retriever and a pre-trained code language model, which allows for the effective utilization of repository-level information for code completion and grants the ability to generate code at various levels of granularity. Furthermore, RepoCoder utilizes a novel iterative retrieval-generation paradigm that bridges the gap between retrieval context and the intended completion target. We also propose a new benchmark RepoEval, which consists of the latest and high-quality real-world repositories covering line, API invocation, and function body completion scenarios. We test the performance of RepoCoder by using various combinations of code retrievers and generators. Experimental results indicate that RepoCoder significantly improves the zero-shot code completion baseline by over 10% in all settings and consistently outperforms the vanilla retrieval-augmented code completion approach. Furthermore, we validate the effectiveness of RepoCoder through comprehensive analysis, providing valuable insights for future research.	翻訳日:2023-04-04 20:45:29 公開日:2023-04-03
# MEGA: 生成AIの多言語評価 MEGA: Multilingual Evaluation of Generative AI ( http://arxiv.org/abs/2303.12528v2 ) ライセンス: Link先を確認	Kabir Ahuja and Rishav Hada and Millicent Ochieng and Prachi Jain and Harshita Diddee and Samuel Maina and Tanuja Ganu and Sameer Segal and Maxamed Axmed and Kalika Bali and Sunayana Sitaram	(参考訳) 生成AIモデルは、言語理解、推論、言語生成など、多くの自然言語処理タスクにおいて印象的なパフォーマンスを持つ。今日のAIコミュニティから求められている最も重要な質問の1つは、これらのモデルの能力と限界についてであり、生成的AIを評価することが非常に難しいことは明らかである。生成型大言語モデル(llm)の研究のほとんどは英語に限られており、これらのモデルが他言語をいかに理解し生成できるかは不明である。そこで本研究では,標準NLPベンチマークのモデル評価を行うジェネレーティブLLMsMEGAの総合ベンチマークを行い,8つのタスクと33の言語を網羅した。また, 生成型LLMの性能を, これらのタスクにおける非自己回帰モデル(SOTA)と比較し, 生成型LLMと比較して, 生成型モデルの性能について検討した。本稿では, 言語間でのモデルの性能を徹底的に分析し, 生成LDMが現在すべての言語に最適でない理由について論じる。我々は,多言語設定におけるジェネレーティブLLMの評価フレームワークを作成し,今後の発展に向けての方向性を提供する。 Generative AI models have impressive performance on many Natural Language Processing tasks such as language understanding, reasoning and language generation. One of the most important questions that is being asked by the AI community today is about the capabilities and limits of these models, and it is clear that evaluating generative AI is very challenging. Most studies on generative Large Language Models (LLMs) are restricted to English and it is unclear how capable these models are at understanding and generating other languages. We present the first comprehensive benchmarking of generative LLMs - MEGA, which evaluates models on standard NLP benchmarks, covering 8 diverse tasks and 33 typologically diverse languages. We also compare the performance of generative LLMs to State of the Art (SOTA) non-autoregressive models on these tasks to determine how well generative models perform compared to the previous generation of LLMs. We present a thorough analysis of the performance of models across languages and discuss some of the reasons why generative LLMs are currently not optimal for all languages. We create a framework for evaluating generative LLMs in the multilingual setting and provide directions for future progress in the field.	翻訳日:2023-04-04 20:45:06 公開日:2023-04-03
# ニューラルラジアンスフィールドの対話的幾何学的編集 Interactive Geometry Editing of Neural Radiance Fields ( http://arxiv.org/abs/2303.11537v2 ) ライセンス: Link先を確認	Shaoxu Li and Ye Pan	(参考訳) 本稿では,神経放射場操作のためのインタラクティブな幾何学的編集を可能にする手法を提案する。シーンの編集には2つのプロキシケージ(インナーケージと外部ケージ)を使用します。インナーケージは操作対象を定義し、アウターケージは調整空間を定義する。 2つのケージには様々な操作が適用される。ケージ選択後、インナーケージの操作は、インナーケージの所望の変換と外ケージの調整につながる。ユーザーは翻訳、回転、スケーリング、組み合わせでシーンを編集できる。角の操作やケージの端の操作もサポートされている。我々の手法は明示的な3次元幾何表現を必要としない。インタラクティブな幾何編集は、暗黙の神経放射場に直接適用される。広範な実験結果から,本手法の有効性が示された。 In this paper, we propose a method that enables interactive geometry editing for neural radiance fields manipulation. We use two proxy cages(inner cage and outer cage) to edit a scene. The inner cage defines the operation target, and the outer cage defines the adjustment space. Various operations apply to the two cages. After cage selection, operations on the inner cage lead to the desired transformation of the inner cage and adjustment of the outer cage. Users can edit the scene with translation, rotation, scaling, or combinations. The operations on the corners and edges of the cage are also supported. Our method does not need any explicit 3D geometry representations. The interactive geometry editing applies directly to the implicit neural radiance fields. Extensive experimental results demonstrate the effectiveness of our approach.	翻訳日:2023-04-04 20:44:12 公開日:2023-04-03
# 絡み合った送信機を有するマルチアクセスチャネル The Multiple-Access Channel with Entangled Transmitters ( http://arxiv.org/abs/2303.10456v2 ) ライセンス: Link先を確認	Uzi Pereg, Christian Deppe, and Holger Boche	(参考訳) 従来型マルチアクセスチャネル(mac)と絡み合いリソースとの通信を考慮し,通信開始前に2つの送信機で絡み合いリソースを共有する。 leditzki et al. (2020) は、疑似テレパシーゲームで定義される古典的なmacの例を示し、絡み合った送信機との和率は、そのようなリソースのない最高の達成可能な和率よりも厳密に高いことを示した。ここでは,一般MACのキャパシティ領域とエンタングル送信器の完全なキャパシティ特性を導出し,この結果が特別な場合として得られることを示す。有限次元の補助変数とアンシラを含む単一の文字公式が確立される。これにより、このレート領域を達成するのに十分な絡み合い率が得られる。さらに、メッセージ平均誤差基準の下での古典的なmacの容量領域は、最大誤差基準よりも厳密に大きいことが長年知られている(dueck, 1978)。絡み合った資源が与えられた場合、その領域は一致する。 Communication over a classical multiple-access channel (MAC) with entanglement resources is considered, whereby two transmitters share entanglement resources a priori before communication begins. Leditzki et al. (2020) presented an example of a classical MAC, defined in terms of a pseudo telepathy game, such that the sum rate with entangled transmitters is strictly higher than the best achievable sum rate without such resources. Here, we derive a full characterization of the capacity region for the general MAC with entangled transmitters, and show that the previous result can be obtained as a special case. A single letter formula is established involving auxiliary variables and ancillas of finite dimensions. This, in turn, leads to a sufficient entanglement rate to achieve the rate region. Furthermore, it has long been known that the capacity region of the classical MAC under a message-average error criterion can be strictly larger than with a maximal error criterion (Dueck, 1978). We observe that given entanglement resources, the regions coincide.	翻訳日:2023-04-04 20:42:38 公開日:2023-04-03
# TRAK: スケールでのモデル行動への貢献 TRAK: Attributing Model Behavior at Scale ( http://arxiv.org/abs/2303.14186v2 ) ライセンス: Link先を確認	Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry	(参考訳) データ帰属の目的は、モデルの予測をトレーニングデータに遡ることである。この目標への長い努力にもかかわらず、データ帰属に対する既存のアプローチは、ユーザに計算の扱いやすさと有効性を選択させる傾向がある。すなわち、計算可能な手法は、非凸設定(ディープニューラルネットワークの文脈など)におけるモデル予測の正確な帰属に苦労するが、そのような手法では、数千のモデルを訓練する必要があるため、大規模モデルやデータセットでは実用的でない。本稿では,大規模で微分可能なモデルに対して,有効かつ計算的に抽出可能なデータ帰属法であるTRAK(Tracing with the Randomly-projected After Kernel)を紹介する。特に、わずかに訓練されたモデルを活用することで、TRAKは何千ものモデルのトレーニングを必要とする属性メソッドのパフォーマンスにマッチすることができる。我々は、イメージネットで訓練された画像分類器、視覚言語モデル(CLIP)、言語モデル(BERT、mT5)のTRAKの有用性を実証する。私たちは https://github.com/MadryLab/trak で TRAK を使用するためのコードを提供しています。 The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .	翻訳日:2023-04-04 20:36:02 公開日:2023-04-03
# Make-It-3D:拡散前の単一画像からの高忠実度3D創出 Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior ( http://arxiv.org/abs/2303.14184v2 ) ライセンス: Link先を確認	Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, Dong Chen	(参考訳) 本研究では,1枚の画像のみから高忠実度3Dコンテンツを作成する問題について検討する。基本的には下層の3d幾何学を推定し、目に見えないテクスチャを同時に幻覚させる。この課題に対処するために,訓練された2次元拡散モデルからの事前知識を活用し,3次元生成のための3次元認識監督を行う。提案手法であるMake-It-3Dは,2段階の最適化パイプラインを用いており,第1段階は前部からの基準画像からの制約を取り入れ,第2段階は粗いモデルをテクスチャ化された点雲に変換し,第2段階は参照画像から高品質なテクスチャを活用しながら,拡散により現実性を高める。広汎な実験により,本手法は先行研究よりも大きなマージンを達成し,忠実な再建と印象的な視覚的品質を実現した。本手法は,汎用オブジェクトの単一画像から高品質な3D作成を実現するための最初の試みであり,テキスト・ツー・3D作成やテクスチャ編集などの様々な応用を可能にする。 In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our approach, Make-It-3D, employs a two-stage optimization pipeline: the first stage optimizes a neural radiance field by incorporating constraints from the reference image at the frontal view and diffusion prior at novel views; the second stage transforms the coarse model into textured point clouds and further elevates the realism with diffusion prior while leveraging the high-quality textures from the reference image. Extensive experiments demonstrate that our method outperforms prior works by a large margin, resulting in faithful reconstructions and impressive visual quality. Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.	翻訳日:2023-04-04 20:35:39 公開日:2023-04-03
# 物理インフォームドポイントネット:不規則な幾何の測地を同時に解くことができるか? 線形弾性への応用 Physics-informed PointNet: On how many irregular geometries can it solve an inverse problem simultaneously? Application to linear elasticity ( http://arxiv.org/abs/2303.13634v2 ) ライセンス: Link先を確認	Ali Kashefi, Leonidas J. Guibas, Tapan Mukerji	(参考訳) 正規物理情報ニューラルネットワーク(PINN)はスパースラベル付きデータを用いた偏微分方程式の解を1つの領域で予測する。一方、完全に教師付き学習モデルは通常、既知のソリューション(ラベル付きデータ)を持つ数千以上のドメインで訓練され、数百の未知のドメインでそのソリューションを予測する。物理インフォームドポイントネット(PIPN)は、PINN(弱教師付き学習モデル)と完全教師付き学習モデルの間のギャップを埋めるように設計されている。本稿では、PIPNが数百の領域に対して所望の偏微分方程式の解を同時に予測し、スパースラベル付きデータのみを使用することを示した。このフレームワークは、ラベル付きデータしか利用できない業界で高速な幾何学的設計の恩恵を受ける。特に, pipnは, 異なる地形を持つ500以上の領域において, 平面応力問題の解を同時に予測することを示した。さらに,顕著なバッチサイズの概念(すなわち,各サブエポックで pipn に供給されるジオメトリの数)を pipn に実装する先駆者でもある。具体的には,7,14,19,38,76,133のバッチサイズを試す。さらに、損失関数におけるスパースラベルデータの構成成分に対するPIPNサイズ、PIPNアーキテクチャにおける対称関数、および静的および動的重みの影響について検討した。 Regular physics-informed neural networks (PINNs) predict the solution of partial differential equations using sparse labeled data but only over a single domain. On the other hand, fully supervised learning models are first trained usually over a few thousand domains with known solutions (i.e., labeled data) and then predict the solution over a few hundred unseen domains. Physics-informed PointNet (PIPN) is primarily designed to fill this gap between PINNs (as weakly supervised learning models) and fully supervised learning models. In this article, we demonstrate that PIPN predicts the solution of desired partial differential equations over a few hundred domains simultaneously, while it only uses sparse labeled data. This framework benefits fast geometric designs in the industry when only sparse labeled data are available. Particularly, we show that PIPN predicts the solution of a plane stress problem over more than 500 domains with different geometries, simultaneously. Moreover, we pioneer implementing the concept of remarkable batch size (i.e., the number of geometries fed into PIPN at each sub-epoch) into PIPN. Specifically, we try batch sizes of 7, 14, 19, 38, 76, and 133. Additionally, the effect of the PIPN size, symmetric function in the PIPN architecture, and static and dynamic weights for the component of the sparse labeled data in the loss function are investigated.	翻訳日:2023-04-04 20:34:33 公開日:2023-04-03
# アクティブサンプリングを用いた病理組織学におけるデータ効率の良いコントラスト学習 Data Efficient Contrastive Learning in Histopathology using Active Sampling ( http://arxiv.org/abs/2303.16247v2 ) ライセンス: Link先を確認	Tahsin Reasat and David S. Smith	(参考訳) ディープラーニングに基づく診断システムは、デジタル病理学において正確で堅牢な定量的分析を提供することができる。これらのアルゴリズムは、病理組織像の高分解能のため、病理学では実用的でない大量の注釈付きトレーニングデータを必要とする。そこで,アドホックなプレテキストタスクを用いて特徴を学習するための自己指導手法が提案されている。自己教師型トレーニングプロセスは時間がかかり、学習した特徴空間、特にデータ不均衡の下で顕著な制約が欠如しているため、しばしばサブパー機能表現につながる。本研究では,少数のラベルと小さなプロキシネットワークを用いてトレーニングセットを積極的にサンプリングし,サンプル要求を93%削減し,トレーニング時間を99%削減することを提案する。 Deep Learning based diagnostics systems can provide accurate and robust quantitative analysis in digital pathology. These algorithms require large amounts of annotated training data which is impractical in pathology due to the high resolution of histopathological images. Hence, self-supervised methods have been proposed to learn features using ad-hoc pretext tasks. The self-supervised training process is time consuming and often leads to subpar feature representation due to a lack of constrain on the learnt feature space, particularly prominent under data imbalance. In this work, we propose to actively sample the training set using a handful of labels and a small proxy network, decreasing sample requirement by 93% and training time by 99%.	翻訳日:2023-04-04 20:26:20 公開日:2023-04-03
# 群不変多様体の拡散写像 Diffusion Maps for Group-Invariant Manifolds ( http://arxiv.org/abs/2303.16169v2 ) ライセンス: Link先を確認	Paulina Hoyos and Joe Kileel	(参考訳) 本稿では、コンパクトリー群$K$の作用の下でデータセットが不変であるときの多様体学習問題を考察する。私たちのアプローチは、既存のデータポイントの$k$-orbitsを統合してデータ誘発グラフラプラシアンを増強することで、$k$-invariantグラフラプラシアン$l$を得るというものです。 l$ は、k$ のユニタリ既約表現行列を用いて対角化できることを証明し、その固有値と固有関数を計算するための明示的な公式を提供する。さらに、正規化されたラプラシア作用素 $L_N$ がデータ多様体のラプラス・ベルトラミ作用素に収束し、収束率が向上し、対称性群 $K$ の次元で改善が増加することを示す。この研究は、Landa と Shkolnisky のステアブルグラフ Laplacian フレームワークを $\operatorname{SO}(2)$ の場合には任意のコンパクトリー群に拡張する。 In this article, we consider the manifold learning problem when the data set is invariant under the action of a compact Lie group $K$. Our approach consists in augmenting the data-induced graph Laplacian by integrating over the $K$-orbits of the existing data points, which yields a $K$-invariant graph Laplacian $L$. We prove that $L$ can be diagonalized by using the unitary irreducible representation matrices of $K$, and we provide an explicit formula for computing its eigenvalues and eigenfunctions. In addition, we show that the normalized Laplacian operator $L_N$ converges to the Laplace-Beltrami operator of the data manifold with an improved convergence rate, where the improvement grows with the dimension of the symmetry group $K$. This work extends the steerable graph Laplacian framework of Landa and Shkolnisky from the case of $\operatorname{SO}(2)$ to arbitrary compact Lie groups.	翻訳日:2023-04-04 20:25:48 公開日:2023-04-03
# PADME-SoSci: 社会科学のための分析と分散機械学習のためのプラットフォーム PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences ( http://arxiv.org/abs/2303.18200v2 ) ライセンス: Link先を確認	Zeyd Boukhers and Arnim Bleier and Yeliz Ucer Yediel and Mio Hienstorfer-Heitmann and Mehrshad Jaberansary and Adamantios Koumpis and Oya Beyan	(参考訳) データプライバシと所有権は、社会データ科学において重要であり、法的および倫理的な懸念を提起する。異なるパーティがデータの一部を所有している場合、データの共有と分析は難しい。この課題に対するアプローチは、分析のために収集する前にデータに非識別または匿名化技術を適用することである。しかし、これによりデータの有用性が低下し、再識別のリスクが高まる。これらの制約に対処するため,モデル実装とトレーニングを連携させる分散分析ツールであるPADMEを提案する。 PADMEは、モデルをすべてのパーティによって実装し、デプロイするフェデレートされたアプローチを使用して、トレーニングのために各データロケーションを漸進的に訪問する。これにより、すべてのデータが単一の場所にあるかのようにモデルをトレーニングしながら、ロケーションをまたいだデータ分析が可能になる。元の場所でデータに基づいてモデルをトレーニングすることは、データのオーナシップを保存する。さらに、すべてのデータロケーションで分析が完了するまで結果が提供されず、プライバシを確保し、結果のバイアスを回避する。 Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME, a distributed analytics tool that federates model implementation and training. PADME uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.	翻訳日:2023-04-04 20:17:30 公開日:2023-04-03
# 今日の連続学習アルゴリズムはどの程度効率的か? How Efficient Are Today's Continual Learning Algorithms? ( http://arxiv.org/abs/2303.18171v2 ) ライセンス: Link先を確認	Md Yousuf Harun, Jhair Gallardo, Tyler L. Hayes, Christopher Kanan	(参考訳) Supervised Continual Learningでは、ラベル付きデータのストリームからディープニューラルネットワーク(DNN)を更新する。ほとんどの研究は破滅的な忘れを克服することに重点を置いているが、継続的学習の背景にある大きな動機の1つは、トレーニングデータセットをスクラッチからトレーニングするのではなく、新しい情報でネットワークを効率的に更新できることだ。最近の連続的な学習手法は破滅的な忘れ問題を主に解決しているが、これらのアルゴリズムの効率性にはほとんど注意が払われていない。本稿では,近年のインクリメンタルなクラス学習手法について検討し,計算,メモリ,記憶の面では非常に非効率であることを示す。スクラッチからトレーニングするよりも多くの計算を必要とするメソッドもあります! 連続学習が現実の応用性を持つためには、研究コミュニティはこれらのアルゴリズムが使用するリソースを無視できない。破滅的な忘れを和らげるより連続的な学習がある。 Supervised Continual learning involves updating a deep neural network (DNN) from an ever-growing stream of labeled data. While most work has focused on overcoming catastrophic forgetting, one of the major motivations behind continual learning is being able to efficiently update a network with new information, rather than retraining from scratch on the training dataset as it grows over time. Despite recent continual learning methods largely solving the catastrophic forgetting problem, there has been little attention paid to the efficiency of these algorithms. Here, we study recent methods for incremental class learning and illustrate that many are highly inefficient in terms of compute, memory, and storage. Some methods even require more compute than training from scratch! We argue that for continual learning to have real-world applicability, the research community cannot ignore the resources used by these algorithms. There is more to continual learning than mitigating catastrophic forgetting.	翻訳日:2023-04-04 20:17:14 公開日:2023-04-03
# Poster: トレーニングDNNにおけるバイアス、ノード感度、ロングテール分布の関連性 Poster: Link between Bias, Node Sensitivity and Long-Tail Distribution in trained DNNs ( http://arxiv.org/abs/2303.16589v2 ) ライセンス: Link先を確認	Mahum Naseer and Muhammad Shafique	(参考訳) 優れた学習(と再学習)能力のため、ディープニューラルネットワーク(DNN)は多くの現実世界のアプリケーションで使われている。しかし、これらのデータ駆動機械学習モデルの学習は、トレーニングで利用できるデータと同じくらい一般的に優れている。したがって、長いテール分布を持つトレーニングデータセットは、異なる出力クラス間で異なるレベルの分類性能を提供する可能性があるため、dnnにとって課題となる。このようなネットワークの全体的なバイアスはすでに既存の研究で強調されているが、この研究は異なる出力クラスに対するノードの感度の変化につながるノードバイアスを特定する。私たちの知る限りでは、これはDNNにおけるこのユニークな課題を強調し、その可能性について議論し、この新しい研究の方向性にオープンな課題を提供する最初の作品です。実世界のデータセットでトレーニングされたネットワークの実証的なケーススタディを用いて、推論を支援する。 Owing to their remarkable learning (and relearning) capabilities, deep neural networks (DNNs) find use in numerous real-world applications. However, the learning of these data-driven machine learning models is generally as good as the data available to them for training. Hence, training datasets with long-tail distribution pose a challenge for DNNs, since the DNNs trained on them may provide a varying degree of classification performance across different output classes. While the overall bias of such networks is already highlighted in existing works, this work identifies the node bias that leads to a varying sensitivity of the nodes for different output classes. To the best of our knowledge, this is the first work highlighting this unique challenge in DNNs, discussing its probable causes, and providing open challenges for this new research direction. We support our reasoning using an empirical case study of the networks trained on a real-world dataset.	翻訳日:2023-04-04 20:16:22 公開日:2023-04-03
# グラフニューラルネットワークの事前トレーニングはいつか? データ生成の観点からの答え! When to Pre-Train Graph Neural Networks? An Answer from Data Generation Perspective! ( http://arxiv.org/abs/2303.16458v2 ) ライセンス: Link先を確認	Yuxuan Cao, Jiarong Xu, Carl Yang, Jiaan Wang, Yunchao Zhang, Chunping Wang, Lei Chen, Yang Yang	(参考訳) 近年,グラフ事前学習が注目されており,グラフデータから伝達可能な知識を学習して下流の性能を向上させることを目指している。これらの最近の試みにもかかわらず、下流タスクにグラフ事前学習モデルを適用する場合、負の転送は大きな問題である。既存の作業は、事前トレーニングの方法と、多数のグラフ事前トレーニングと微調整戦略を設計することで、事前トレーニングの方法の問題に多大な努力を払っていた。しかし、戦略がどんなに進歩しても、「事前訓練と微調整」のパラダイムは依然として明確な利益を得られないケースがある。本稿では,事前トレーニングや微調整を行う前に,事前トレーニングをいつ行うか(つまり,どのような状況でグラフ事前トレーニングを活用できるか)という重要な質問に答える汎用フレームワークw2pgnnを紹介する。まず,新しい視点から,事前学習データから下流データへの複雑な生成メカニズムを探索する。特に、w2pgnnは、まず事前トレーニングされたデータをgraphonベースに適合させ、graphon基底(すなわちgraphon)の各要素は、事前トレーニングされたグラフの集合によって共有される基本的な転送可能なパターンを識別する。グラフェン塩基のすべての凸結合は生成空間を生じさせ、そこから生成されたグラフは、事前学習の恩恵を受ける下流データのための解空間を形成する。これにより、発電機空間内の任意の発電機からの下流データの生成確率として事前学習の実現可能性を定量化することができる。 W2PGNNは、グラフ事前トレーニングモデルの適用範囲の提供、事前トレーニングの実行可能性の定量化、事前トレーニングデータの選択による下流のパフォーマンス向上など、幅広い3つのアプリケーションを提供している。後者の2つの応用について, 理論上, 合理的な解法と広範な経験的正当性を与える。 Recently, graph pre-training has attracted wide research attention, which aims to learn transferable knowledge from unlabeled graph data so as to improve downstream performance. Despite these recent attempts, the negative transfer is a major issue when applying graph pre-trained models to downstream tasks. Existing works made great efforts on the issue of what to pre-train and how to pre-train by designing a number of graph pre-training and fine-tuning strategies. However, there are indeed cases where no matter how advanced the strategy is, the "pre-train and fine-tune" paradigm still cannot achieve clear benefits. This paper introduces a generic framework W2PGNN to answer the crucial question of when to pre-train (i.e., in what situations could we take advantage of graph pre-training) before performing effortful pre-training or fine-tuning. We start from a new perspective to explore the complex generative mechanisms from the pre-training data to downstream data. In particular, W2PGNN first fits the pre-training data into graphon bases, each element of graphon basis (i.e., a graphon) identifies a fundamental transferable pattern shared by a collection of pre-training graphs. All convex combinations of graphon bases give rise to a generator space, from which graphs generated form the solution space for those downstream data that can benefit from pre-training. In this manner, the feasibility of pre-training can be quantified as the generation probability of the downstream data from any generator in the generator space. W2PGNN provides three broad applications, including providing the application scope of graph pre-trained models, quantifying the feasibility of performing pre-training, and helping select pre-training data to enhance downstream performance. We give a theoretically sound solution for the first application and extensive empirical justifications for the latter two applications.	翻訳日:2023-04-04 20:16:09 公開日:2023-04-03
# 安定拡散に対するクエリフリー逆攻撃に関するパイロット研究 A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion ( http://arxiv.org/abs/2303.16378v2 ) ライセンス: Link先を確認	Haomin Zhuang, Yihua Zhang and Sijia Liu	(参考訳) 安定拡散によるテキスト・トゥ・イメージ(T2I)生成における記録破りのパフォーマンスにもかかわらず、その逆の堅牢性には研究の注意が払われていない。本研究では,安定拡散に対する対角攻撃生成の問題について検討し,エンドツーエンドのモデルクエリがなくても,逆方向のテキストプロンプトが得られるかどうかを問う。結果の問題を「クエリフリーアタック生成」と呼ぶ。この問題を解決するために、T2Iモデルの脆弱性は、テキストエンコーダの堅牢性の欠如、例えば、安定拡散攻撃に使用されるCLIPテキストエンコーダに根ざしていることを示す。このような知見に基づいて,前者がテキスト埋め込み空間において最も影響力のある次元に基づいて構築され,我々は「ステアブルキー次元」と呼んでいる,非ターゲットのクエリフリーアタックとターゲットのクエリフリーアタックの両方を提案する。提案する攻撃を活用し,テキストプロンプトに対する5文字の摂動のみが,安定な拡散を用いて合成画像の重要コンテンツシフトを誘発できることを実証的に示す。さらに,提案するターゲット攻撃は拡散モデルを正確に制御し,対象画像コンテンツをスクラブし,非対象画像コンテンツに大きな変化を生じさせないことを示す。私たちのコードはhttps://github.com/OPTML-Group/QF-Attack.comで利用可能です。 Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem 'query-free attack generation'. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character perturbation to the text prompt is able to cause the significant content shift of synthesized images using Stable Diffusion. Moreover, we show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content without causing much change in untargeted image content. Our code is available at https://github.com/OPTML-Group/QF-Attack.	翻訳日:2023-04-04 20:15:38 公開日:2023-04-03
# YOLO-v7特徴量を用いたVVC符号化ビデオにおける物体検出精度の向上 Accuracy Improvement of Object Detection in VVC Coded Video Using YOLO-v7 Features ( http://arxiv.org/abs/2304.00689v1 ) ライセンス: Link先を確認	Takahiro Shindo, Taiju Watanabe, Kein Yamada, Hiroshi Watanabe	(参考訳) ディープラーニングに基づく画像認識技術の進歩に伴い、人工知能による自動ビデオ解析が普及している。画像認識に使用される映像の量が増加するにつれて、このような映像データの効率的な圧縮方法が必要となる。一般的に、画像符号化により画質が劣化すると、画像認識精度も低下する。そこで本稿では,符号化映像に後処理を適用することにより,画像認識精度,特に物体検出精度を向上させるニューラルネットワークに基づく手法を提案する。 Versatile Video Coding (VVC) は, ビデオ圧縮法として, 最高の符号化性能を有する最新のビデオ符号化法である。ニューラルネットワークは、最新のオブジェクト検出モデルであるYOLO-v7の特徴を使ってトレーニングされている。 VVCをビデオ符号化法とし、YOLO-v7を検出モデルとし、低ビットレートでも高い物体検出精度を実現する。実験の結果,提案手法とvvcの組み合わせにより,対象検出精度が通常のvvcよりも高い符号化性能が得られることがわかった。 With advances in image recognition technology based on deep learning, automatic video analysis by Artificial Intelligence is becoming more widespread. As the amount of video used for image recognition increases, efficient compression methods for such video data are necessary. In general, when the image quality deteriorates due to image encoding, the image recognition accuracy also falls. Therefore, in this paper, we propose a neural-network-based approach to improve image recognition accuracy, especially the object detection accuracy by applying post-processing to the encoded video. Versatile Video Coding (VVC) will be used for the video compression method, since it is the latest video coding method with the best encoding performance. The neural network is trained using the features of YOLO-v7, the latest object detection model. By using VVC as the video coding method and YOLO-v7 as the detection model, high object detection accuracy is achieved even at low bit rates. Experimental results show that the combination of the proposed method and VVC achieves better coding performance than regular VVC in object detection accuracy.	翻訳日:2023-04-04 16:55:34 公開日:2023-04-03
# マクロな低損失フォノンキャビティによる最小長の制約の改善 Improved Constraints on the Minimum Length with a Macroscopic Low Loss Phonon Cavity ( http://arxiv.org/abs/2304.00688v1 ) ライセンス: Link先を確認	William M. Campbell and Michael E. Tobar and Serge Galliou and Maxim Goryachev	(参考訳) 重力の量子的記述を定式化しようとする多くの理論は、基本的な最小長スケールの存在を示唆している。この最小長を組み込む一般的な方法は、一般化不確実性原理(generalized uncertainty principle, gup)として知られるハイゼンベルクの不確実性原理の修正である。複合システムに適用されたGUPの実験実験は、機械共振器モードの誘導周波数摂動を探索することにより、特定のシナリオにおける最小長の度合いを制限できる。本研究は, 従来の機械式共振器による制約を, 極低温クォーツバルク波共振器を用いて3桁の精度で改善するものである。純粋な機械的共振モードだけでなく、ハイブリッド電気機械的反共振モードも検討し、同じGUP誘発効果に敏感であることを示した。 Many theories that attempt to formulate a quantum description of gravity suggest the existence of a fundamental minimum length scale. A popular method for incorporating this minimum length is through a modification of the Heisenberg uncertainty principle known as the generalised uncertainty principle (GUP). Experimental tests of the GUP applied to composite systems can be performed by searching for the induced frequency perturbations of the modes of mechanical resonators, thus constraining the degree of minimum length in certain scenarios. In this work previous constraints made with mechanical resonators are improved upon by three orders of magnitude, via the utilisation of a cryogenic quartz bulk acoustic wave resonator. As well as purely mechanical resonant modes; hybrid electromechanical anti-resonant modes are investigated, and shown to be sensitive to the same GUP induced effects.	翻訳日:2023-04-04 16:55:18 公開日:2023-04-03
# 視覚タスクのための視覚言語モデル:調査 Vision-Language Models for Vision Tasks: A Survey ( http://arxiv.org/abs/2304.00685v1 ) ライセンス: Link先を確認	Jingyi Zhang, Jiaxing Huang, Sheng Jin and Shijian Lu	(参考訳) ほとんどの視覚認識研究は、ディープニューラルネットワーク(dnn)トレーニングにおけるクラウドラベルデータに大きく依存しており、それらは通常、単一の視覚認識タスクごとにdnnを訓練し、手間と時間を要する視覚認識パラダイムへと繋がる。この2つの課題に対処するため、視覚言語モデル(VLM)は近年、インターネット上でほぼ無限に利用できるWebスケールの画像テキストペアからリッチな視覚言語相関を学習し、単一のVLMを用いて様々な視覚認識タスクのゼロショット予測を可能にする、集中的に研究されている。 This paper provides a systematic review of visual language models for various visual recognition tasks, including: (1) the background that introduces the development of visual recognition paradigms; (2) the foundations of VLM that summarize the widely-adopted network architectures, pre-training objectives, and downstream tasks; (3) the widely-adopted datasets in VLM pre-training and evaluations; (4) the review and categorization of existing VLM pre-training methods, VLM transfer learning methods, and VLM knowledge distillation methods; (5) the benchmarking, analysis and discussion of the reviewed methods; (6) several research challenges and potential research directions that could be pursued in the future VLM studies for visual recognition. この調査に関連するプロジェクトはhttps://github.com/jingyi0000/vlm_surveyで作成されている。 Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm. To address the two challenges, Vision-Language Models (VLMs) have been intensively investigated recently, which learns rich vision-language correlation from web-scale image-text pairs that are almost infinitely available on the Internet and enables zero-shot predictions on various visual recognition tasks with a single VLM. This paper provides a systematic review of visual language models for various visual recognition tasks, including: (1) the background that introduces the development of visual recognition paradigms; (2) the foundations of VLM that summarize the widely-adopted network architectures, pre-training objectives, and downstream tasks; (3) the widely-adopted datasets in VLM pre-training and evaluations; (4) the review and categorization of existing VLM pre-training methods, VLM transfer learning methods, and VLM knowledge distillation methods; (5) the benchmarking, analysis and discussion of the reviewed methods; (6) several research challenges and potential research directions that could be pursued in the future VLM studies for visual recognition. A project associated with this survey has been created at https://github.com/jingyi0000/VLM_survey.	翻訳日:2023-04-04 16:55:05 公開日:2023-04-03
# 超伝導量子ビットの最適リセット Optimizing resetting of superconducting qubits ( http://arxiv.org/abs/2304.00684v1 ) ライセンス: Link先を確認	Ciro M. Diniz, Rogerio J. de Assis, Norton G. de Almeida and Celso J. Villas-Boas	(参考訳) 多くの量子アルゴリズムは、信頼できる統計結果を得るために多数の繰り返しを要求する。したがって、それぞれの繰り返しにおいて、量子ビットを可能な限り短時間で効率よく正確にリセットする必要があるため、量子コンピュータは古典的よりも有利である。本研究では,超伝導量子ビットにおける情報リセットのための3種類のモデルについて詳細な解析を行う。我々の実験装置は、主量子ビットの情報を消去するために使用される、異なる補助散逸系に結合された主量子ビットで構成されている。解析の結果,主キュービットのリセット時間を削減するために補助系に関連する結合や散逸率を増加させるには不十分であり,各研究手法のパラメータの最適集合を見出すことが動機となり,解析した3つのモデルのリセット時間を大幅に減少させることができた。 Many quantum algorithms demand a large number of repetitions to obtain reliable statistical results. Thus, at each repetition it is necessary to reset the qubits efficiently and precisely in the shortest possible time, so that quantum computers actually have advantages over classical ones. In this work, we perform a detailed analysis on three different models for information resetting in superconducting qubits. Our experimental setup consists of a main qubit coupled to different auxiliary dissipative systems, that are employed in order to perform the erasing of the information of the main qubit. Our analysis shows that it is not enough to increase the coupling and the dissipation rate associated with the auxiliary systems to decrease the resetting time of the main qubit, a fact that motivates us to find the optimal set of parameters for each studied approach, allowing a significant decrease in the reset time of the three models analyzed.	翻訳日:2023-04-04 16:54:46 公開日:2023-04-03
# 極性超強結合:基底状態における量子絡み合い Polaritonic Ultrastrong Coupling: Quantum Entanglement in Ground State ( http://arxiv.org/abs/2304.00680v1 ) ライセンス: Link先を確認	Qingtian Miao and G.S. Agarwal	(参考訳) 物質の基本励起と微小キャビティモードの超強結合は、完全に解析的な量子力学理論の枠組みで研究されている。初等励起はフォノン、励起子、プラズモンなどである。ハミルトニアンの対角化から、我々はポラリトンハミルトニアンの基底状態を得る。グラウンドステートはガウスクラスに属する。ガウスの性質を用いて基底状態における量子交絡を計算する。量子エンタングルメントには、エンタングルメントエントロピーと対数的負のパラメータの2つの異なる測度を使い、エンタングルメント測度に対してかなり単純な解析式を得る。以上の結果から,超強結合系では基底状態の量子絡み合い量が非常に大きいことがわかった。偏光子周波数の測定から得られる。 The ultrastrong coupling between the elementary excitations of matter and microcavity modes is studied in a fully analytical quantum-mechanical theoretical framework. The elementary excitation could be phonons, excitons, plasmons, etc. From the diagonalization of the Hamiltonian, we obtain the ground state of the polariton Hamiltonian. The ground state belongs to the Gaussian class. Using the Gaussian property we calculate the quantum entanglement in the ground state. We use two different measures for quantum entanglement -- entanglement entropy and the logarithmic negativity parameter and obtain rather simple analytical expressions for the entanglement measures. Our findings show that the amount of quantum entanglement in the ground state is quite significant in the ultrastrong coupling regime. It can be obtained from the measurement of the polariton frequencies.	翻訳日:2023-04-04 16:54:33 公開日:2023-04-03
# cv2x-loca:自律走行車のための路側ユニット対応協調ローカライズフレームワーク CV2X-LOCA: Roadside Unit-Enabled Cooperative Localization Framework for Autonomous Vehicles ( http://arxiv.org/abs/2304.00676v1 ) ライセンス: Link先を確認	Zilin Huang, Sikai Chen, Yuzhuang Pian, Zihao Sheng, Soyoung Ahn, and David A. Noyce	(参考訳) 都市部での安全な運転を可能にするために、正確なロバストな位置決めシステムは自動運転車(AV)にとって不可欠である。既存のグローバルナビゲーション衛星システム(GNSS)ベースの手法はオープンスキー地域での車両の配置に有効であるが、多層橋の下層や高層道路、トンネルなどの都市キャニオンでの高精度の位置決めは依然として課題である。本稿では,セルラーV2X(C-V2X)無線通信がGNSS環境下でのAVのローカライズ性能を向上させる可能性について検討する。具体的には,C-V2Xチャネル状態情報のみを用いてレーンレベルの位置決め精度を実現する,第1の道路側ユニット(RSU)対応協調ローカライゼーションフレームワーク,CV2X-LOCAを提案する。 CV2X-LOCAは、データ処理モジュール、粗い位置決めモジュール、環境パラメータ修正モジュール、車両軌道フィルタリングモジュールの4つの重要な部分から構成されている。これらのモジュールは、動的C-V2Xネットワークに存在する課題を共同で処理する。 CV2X-LOCAは, 高速走行, スパースRSUのカバー環境において, 騒音条件下であっても, 車両位置決めの最先端性能を実現する。この研究結果は、rsusの費用対効果に関する運輸機関の今後の投資決定に関する洞察も提供する。 An accurate and robust localization system is crucial for autonomous vehicles (AVs) to enable safe driving in urban scenes. While existing global navigation satellite system (GNSS)-based methods are effective at locating vehicles in open-sky regions, achieving high-accuracy positioning in urban canyons such as lower layers of multi-layer bridges, streets beside tall buildings, tunnels, etc., remains a challenge. In this paper, we investigate the potential of cellular-vehicle-to-everything (C-V2X) wireless communications in improving the localization performance of AVs under GNSS-denied environments. Specifically, we propose the first roadside unit (RSU)-enabled cooperative localization framework, namely CV2X-LOCA, that only uses C-V2X channel state information to achieve lane-level positioning accuracy. CV2X-LOCA consists of four key parts: data processing module, coarse positioning module, environment parameter correcting module, and vehicle trajectory filtering module. These modules jointly handle challenges present in dynamic C-V2X networks. Extensive simulation and field experiments show that CV2X-LOCA achieves state-of-the-art performance for vehicle localization even under noisy conditions with high-speed movement and sparse RSUs coverage environments. The study results also provide insights into future investment decisions for transportation agencies regarding deploying RSUs cost-effectively.	翻訳日:2023-04-04 16:54:19 公開日:2023-04-03
# フィルタインバージョンによる部分ビューオブジェクトビュー合成 Partial-View Object View Synthesis via Filtered Inversion ( http://arxiv.org/abs/2304.00673v1 ) ライセンス: Link先を確認	Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber	(参考訳) 本研究では,1つか数つの部分ビューからレンダリング可能な3dオブジェクト表現を予測する学習フレームワークおよび最適化プロセスであるfiltering inversion(finv)を提案する。 FINVは、部分的な観察からオブジェクトの新たなビューを合成するという課題に対処する。これを達成するため、finvは3次元生成モデルを訓練して形状事前学習を行う。推測において、新しい現実世界のオブジェクトの1つ以上のビューが与えられたとき、FINVはまず、生成モデルを複数の初期シードから反転させることで、オブジェクトの潜在コードを見つける。潜伏符号のセットの維持、finvフィルタの検証、およびパーティクルフィルタリングのような新しい観察を受けた後の再サンプリング。次にジェネレータは、利用可能なビューの各潜在コードに対して微調整され、新しいオブジェクトに適応する。 FINVは, 合成対象にのみ訓練された場合でも, 現実の物体(例えば, 椅子, テーブル, 車)の新規な視点を合成することに成功した。 sim-to-real問題に対処する能力により、FINVは実際のデータセットなしでオブジェクトカテゴリに使用できる。 FINVは、複数の実世界のデータセット上で最先端のパフォーマンスを達成し、部分的およびスパースなビューからオブジェクトの形状とテクスチャを回復し、閉塞に対して堅牢であり、より多くの観測でその表現を漸進的に改善することができる。 We propose Filtering Inversion (FINV), a learning framework and optimization process that predicts a renderable 3D object representation from one or few partial views. FINV addresses the challenge of synthesizing novel views of objects from partial observations, spanning cases where the object is not entirely in view, is partially occluded, or is only observed from similar views. To achieve this, FINV learns shape priors by training a 3D generative model. At inference, given one or more views of a novel real-world object, FINV first finds a set of latent codes for the object by inverting the generative model from multiple initial seeds. Maintaining the set of latent codes, FINV filters and resamples them after receiving each new observation, akin to particle filtering. The generator is then finetuned for each latent code on the available views in order to adapt to novel objects. We show that FINV successfully synthesizes novel views of real-world objects (e.g., chairs, tables, and cars), even if the generative prior is trained only on synthetic objects. The ability to address the sim-to-real problem allows FINV to be used for object categories without real-world datasets. FINV achieves state-of-the-art performance on multiple real-world datasets, recovers object shape and texture from partial and sparse views, is robust to occlusion, and is able to incrementally improve its representation with more observations.	翻訳日:2023-04-04 16:53:54 公開日:2023-04-03
# CRN: 高精度でロバストで効率的な3D知覚のためのカメラレーダネット CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception ( http://arxiv.org/abs/2304.00670v1 ) ライセンス: Link先を確認	Youngseok Kim, Sanmin Kim, Juyeb Shin, Jun Won Choi, Dongsuk Kum	(参考訳) 自律運転には、3Dオブジェクトの検出、追跡、セグメンテーションを含む正確で高速な3D知覚システムが必要である。最近の低コストカメラベースのアプローチは有望な結果を示しているが、照明の悪さや悪天候の影響を受けやすいため、局所誤差が大きい。したがって、精密な長距離測定を提供し、すべての環境で確実に作動する低コストのレーダーカメラは有望であるが、まだ十分に調査されていない。本稿では,様々なタスクに対して,意味的にリッチで空間的に正確なbird's-eye-view(bev)特徴マップを生成する,新しいカメラ・レーダー融合フレームワークであるcamer radar net(crn)を提案する。画像中の空間情報の欠如を克服するため、視線ビュー画像の特徴をスパースで正確なレーダーポイントの助けを借りてBEVに変換する。入力間の空間的不一致に対処するために設計されたマルチモーダル変形可能な注意を用いて,bevにおける画像とレーダ特徴マップをさらに集約する。リアルタイム設定のCRNは20FPSで動作し、nuScenes上のLiDAR検出器と同等の性能を達成し、100m設定で遠くでも性能を向上する。さらに、オフライン設定のCRNは、nuScenesテストセットで62.4%のNDS、57.5%のmAPを出力し、全カメラおよびカメラレーダー3Dオブジェクト検出器の中で第1位である。 Autonomous driving requires an accurate and fast 3D perception system that includes 3D object detection, tracking, and segmentation. Although recent low-cost camera-based approaches have shown promising results, they are susceptible to poor illumination or bad weather conditions and have a large localization error. Hence, fusing camera with low-cost radar, which provides precise long-range measurement and operates reliably in all environments, is promising but has not yet been thoroughly investigated. In this paper, we propose Camera Radar Net (CRN), a novel camera-radar fusion framework that generates a semantically rich and spatially accurate bird's-eye-view (BEV) feature map for various tasks. To overcome the lack of spatial information in an image, we transform perspective view image features to BEV with the help of sparse but accurate radar points. We further aggregate image and radar feature maps in BEV using multi-modal deformable attention designed to tackle the spatial misalignment between inputs. CRN with real-time setting operates at 20 FPS while achieving comparable performance to LiDAR detectors on nuScenes, and even outperforms at a far distance on 100m setting. Moreover, CRN with offline setting yields 62.4% NDS, 57.5% mAP on nuScenes test set and ranks first among all camera and camera-radar 3D object detectors.	翻訳日:2023-04-04 16:53:28 公開日:2023-04-03
# SAR ATRにおけるディープラーニングの非因性発見と説明 Discovering and Explaining the Non-Causality of Deep Learning in SAR ATR ( http://arxiv.org/abs/2304.00668v1 ) ライセンス: Link先を確認	Weijie Li, Wei Yang, Li Liu, Wenpeng Zhang, Yongxiang Liu	(参考訳) 合成開口レーダ自動目標認識(SAR ATR)は、SAR画像解釈において重要な技術の一つであり、軍事・民間分野で重要な応用分野である。この分野ではディープラーニングが広く使われており、近年ではベンチマークデータセット上で優れた認識率を達成している。しかし、ベンチマークデータセットは単一のデータ収集条件のため、データ選択バイアスに悩まされる。このデータバイアスは、深層学習モデルを強化し、非因果的背景クラッタを過度に適合させる。また,既存の手法ではモデル因果関係を定性的に分析し,このデータバイアスを深く分析していない。本稿では,データ選択バイアスがモデルの非因果性やclutterのスプリアス相関につながることを示す。まず,Shapley値を用いて,学習過程における目標領域,乱れ領域,影領域の寄与を定量化する。乱雑な貢献は、トレーニングプロセス中に大きな割合を占める。第2に、SAR ATRにおけるディープラーニングの非因果性の原因は、データ選択バイアスとモデルテクスチャバイアスである。データ選択バイアスはクラス関連クラッタと偽の特徴表現をもたらす。さらに,トレーニングセットとテストセットの類似した信号対クラッタ比(scr)からクラッタのスプリアス相関が生じる。最後に,クラッタのオーバーフィットを低減するためのランダムscr再重み付け手法を提案する。しかし、モデルのテクスチャバイアスは、データバイアスを取り除いた後にモデルの複雑さとともに増加する。ベンチマークMSTARデータセットの標準動作条件下での異なるモデルの実験結果から,上記の結論が得られた。 Synthetic aperture radar automatic target recognition (SAR ATR) is one of the critical technologies for SAR image interpretation, which has an important application prospect in military and civilian fields. Deep learning has been widely used in this area and achieved an excellent recognition rate on the benchmark dataset in recent years. However, the benchmark dataset suffers from data selection bias due to a single data collection condition. This data bias enhances deep learning models to overfit non-causal background clutter. Moreover, existing methods qualitatively analyze the model causality and do not deeply analyze this data bias. In this paper, we explicitly show that the data selection bias leads to the non-causality of the model and spurious correlation of clutter. First, we quantify the contribution of the target, clutter, and shadow regions during the training process through the Shapley value. The clutter contribution has a large proportion during the training process. Second, the causes of the non-causality of deep learning in SAR ATR include data selection bias and model texture bias. Data selection bias results in class-related clutter and false feature representation. Furthermore, the spurious correlation of clutter arises from the similar signal-to-clutter ratios (SCR) between the training and test sets. Finally, we propose a random SCR re-weighting method to reduce the overfitting for clutter. However, the model texture bias increases with model complexity after removing data bias. The experimental results of different models under the standard operating condition of the benchmark MSTAR dataset prove the above conclusions.	翻訳日:2023-04-04 16:53:01 公開日:2023-04-03
# 人工境界条件を用いた量子力学の量子シミュレーション Quantum Simulation for Quantum Dynamics with Artificial Boundary Conditions ( http://arxiv.org/abs/2304.00667v1 ) ライセンス: Link先を確認	Shi Jin and Nana Liu and Xiantao Li and Yue Yu	(参考訳) 量子力学 (quantum dynamics) は、時間依存的なシュリンガー方程式(英語版)(Schr\"odinger equation)とエルミート・ハミルトン方程式(英語版)(Hermitian Hamiltonian)という形で表される、量子コンピューティングの自然な応用である。しかし、電子の放出を伴う量子力学をシミュレートする際には、固定領域内で計算を限定するために人工境界条件(ABC)を用いる必要がある。 ABCの導入は力学のハミルトン構造を変え、進化がもはやユニタリではないため、既存の量子アルゴリズムを直接適用することはできない。本稿では,非エルミート力学をschr\"odinger形式に変換するための最近導入されたschr\"odingerization method (jin et al. arxiv:2212.13969 and arxiv:2212.14703) を用いた。本手法は,複素吸収ポテンシャル法,完全整合層法,dirichlet-to-neumann法を含む3種類のabcに対して実装する。これらのアルゴリズムの問合せ複雑性を分析し,数値実験を行い,その妥当性を検証した。これは、非有界領域における量子力学の利用可能な量子アルゴリズムと計算モデルの間のギャップを埋めるのに役立つ。 Quantum dynamics, typically expressed in the form of a time-dependent Schr\"odinger equation with a Hermitian Hamiltonian, is a natural application for quantum computing. However, when simulating quantum dynamics that involves the emission of electrons, it is necessary to use artificial boundary conditions (ABC) to confine the computation within a fixed domain. The introduction of ABCs alters the Hamiltonian structure of the dynamics, and existing quantum algorithms can not be directly applied since the evolution is no longer unitary. The current paper utilizes a recently introduced Schr\"odingerisation method (Jin et al. arXiv:2212.13969 and arXiv:2212.14703) that converts non-Hermitian dynamics to a Schr\"odinger form, for the artificial boundary problems. We implement this method for three types of ABCs, including the complex absorbing potential technique, perfectly matched layer methods, and Dirichlet-to-Neumann approach. We analyze the query complexity of these algorithms, and perform numerical experiments to demonstrate the validity of this approach. This helps to bridge the gap between available quantum algorithms and computational models for quantum dynamics in unbounded domains.	翻訳日:2023-04-04 16:52:43 公開日:2023-04-03
# テキスト駆動型ソフトマスクによるマルチモーダル表現学習 Multi-Modal Representation Learning with Text-Driven Soft Masks ( http://arxiv.org/abs/2304.00719v1 ) ライセンス: Link先を確認	Jaeyoo Park, Bohyung Han	(参考訳) 本稿では,新しい操作,損失,データ拡張戦略を導入することにより,自己教師付き学習フレームワーク内で視覚言語表現学習手法を提案する。まず、画像中の特定の単語に最も関係のある領域をソフトマスキングすることで、画像テキストマッチング(itm)タスクの多様な特徴を生成する。本フレームワークは細かなアノテーションを伴わない画像キャプチャペアのみに依存するため,マルチモーダルエンコーダを用いて単語条件の視覚的注意を演算することにより,各単語の関連領域を識別する。第2に,画像テキストコントラスト学習(image-text contrastive learning, itc)の目的に対して焦点損失を提示することで,ハードだが多様な例に焦点を合わせることを奨励する。最後に,テキストのマスキングと画像の歪みのレンダリングにより,様々な例をマイニングすることで,自己教師あり学習のためのマルチモーダルデータ拡張を行う。これらの3つのイノベーションの組み合わせは、事前学習されたモデルを学ぶのに効果的であり、複数の視覚言語下流タスクにおいて優れたパフォーマンスをもたらす。 We propose a visual-linguistic representation learning approach within a self-supervised learning framework by introducing a new operation, loss, and data augmentation strategy. First, we generate diverse features for the image-text matching (ITM) task via soft-masking the regions in an image, which are most relevant to a certain word in the corresponding caption, instead of completely removing them. Since our framework relies only on image-caption pairs with no fine-grained annotations, we identify the relevant regions to each word by computing the word-conditional visual attention using multi-modal encoder. Second, we encourage the model to focus more on hard but diverse examples by proposing a focal loss for the image-text contrastive learning (ITC) objective, which alleviates the inherent limitations of overfitting and bias issues. Last, we perform multi-modal data augmentations for self-supervised learning via mining various examples by masking texts and rendering distortions on images. We show that the combination of these three innovations is effective for learning a pretrained model, leading to outstanding performance on multiple vision-language downstream tasks.	翻訳日:2023-04-04 16:46:25 公開日:2023-04-03
# minirbt:中国製2段蒸留小型プリトレーニングモデル MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model ( http://arxiv.org/abs/2304.00717v1 ) ライセンス: Link先を確認	Xin Yao, Ziqing Yang, Yiming Cui, Shijin Wang	(参考訳) 自然言語処理では、事前訓練された言語モデルが重要な基盤となっている。しかしながら、これらのモデルは、大きなサイズ、長い推論時間、困難なデプロイメントといった問題に悩まされることが多い。さらに、ほとんどの主流の事前訓練モデルは英語に焦点を合わせており、小さな中国の事前訓練モデルについての研究は不十分である。本稿では,中国語の自然言語処理の研究を進めることを目的とした,中国語事前学習モデルMiniRBTを紹介する。 MiniRBTは狭く深い学生モデルを採用し、事前訓練中に全単語のマスキングと2段階の蒸留を取り入れ、下流の作業に適している。機械読解とテキスト分類タスクに関する実験により,MiniRBTはRoBERTaと比較して94%の性能を実現し,6.8倍の高速化を実現した。 In natural language processing, pre-trained language models have become essential infrastructures. However, these models often suffer from issues such as large size, long inference time, and challenging deployment. Moreover, most mainstream pre-trained models focus on English, and there are insufficient studies on small Chinese pre-trained models. In this paper, we introduce MiniRBT, a small Chinese pre-trained model that aims to advance research in Chinese natural language processing. MiniRBT employs a narrow and deep student model and incorporates whole word masking and two-stage distillation during pre-training to make it well-suited for most downstream tasks. Our experiments on machine reading comprehension and text classification tasks reveal that MiniRBT achieves 94% performance relative to RoBERTa, while providing a 6.8x speedup, demonstrating its effectiveness and efficiency.	翻訳日:2023-04-04 16:46:03 公開日:2023-04-03
# 量子チャネルと量子状態のいくつかの絶対性質 Quantum channels and some absolute properties of quantum states ( http://arxiv.org/abs/2304.00711v1 ) ライセンス: Link先を確認	Tapaswini Patro, Kaushiki Mukherjee, Nirman Ganguly	(参考訳) 環境相互作用は、量子情報処理プロトコルの実際の応用においてユビキタスである。このような相互作用は量子資源の枯渇をもたらす。量子情報の文脈における2つの重要なメリットは、完全に絡み合った分数(FEF)と複合量子系の条件エントロピーである。 FEFはテレポーテーションのようなタスクで重要な役割を担います。一方、条件エントロピーは特定の量子状態に対して負となりうるので、負性は密度の高い符号化や状態の融合といったタスクの資源として残っている。 2f $ > 1/d $ a $ d \otimes d $ quantum system は重要なしきい値であるが、いくつかの量子状態において、グローバルユニタリ操作においても閾値以下であり、結果として絶対完全絡み合い分数(afef)を持つ状態として知られる。条件付きフォン・ノイマンエントロピーを含む状態は、大域的ユニタリ作用の下で条件付きエントロピーの非負性を保持する状態があり、絶対的条件付きフォン・ノイマンエントロピー非負性状態 (ACVENN) と呼ばれる。本稿では,量子チャネルの作用を2つの量子ビットと2つのquditで検証し,ある量子状態が非絶対的状態から絶対的状態へと作用することを示す。グローバルなユニタリ操作は絶対的でない状態に戻すことができないため、絡み合いスワッピングネットワークを用いた検索のための処方料を提供する。さらに、絶対性の概念を条件R'enyiエントロピーに拡張し、絶対条件R'enyiエントロピー非負性(ACRENN)を持つ状態に必要な条件を求める。次に、三成分系の限界を含むように作業を拡張し、上記の絶対性に関してそれらの特徴付けを提供する。 Environmental interactions are ubiquitous in any real-world application of a quantum information processing protocol. Such interactions result in depletion of quantum resources. Two important figure of merits in the context of quantum information are the fully entangled fraction (FEF) and conditional entropy of a composite quantum system. FEF has a key role to play in tasks like teleportation. Conditional entropy on the other hand can be negative for certain quantum states and thus the negativity remains a resource for tasks like dense coding and state merging. FEF $ > 1/d $ for a $ d \otimes d $ quantum system is a significant threshold, however for some quantum states it remains less than the threshold even with global unitary operations, consequently being known as states having absolute fully entangled fraction (AFEF). Pertaining to conditional von Neumann entropy, there are some states which retains the nonnegativity of the conditional entropy under global unitary action, to be called as states with absolute conditional von Neumann entropy nonnegative (ACVENN) property. In the present submission, we probe the action of some quantum channels in two qubits and two qudits and find that some quantum states move from the non-absolute regime to the absolute regime under the action. Since, global unitary operations are unable to retrieve them back to the non-absolute regime, we provide a prescription for the retrieval using an entanglement swapping network. Furthermore, we extend the notion of absoluteness to conditional R\'enyi entropies and find the required condition for a state to have absolute conditional R\'enyi entropy non-negative (ACRENN) property. We then extend the work to include the marginals of a tripartite system and provide for their characterization with respect to the aforementioned absolute properties.	翻訳日:2023-04-04 16:45:48 公開日:2023-04-03
# ユニバーサルブレイディング量子ゲート Universal Braiding Quantum Gates ( http://arxiv.org/abs/2304.00710v1 ) ライセンス: Link先を確認	David Lovitz	(参考訳) ヤン・バクスター方程式と様々な形式は、統計力学、結び目理論、量子情報など多くの分野に応用されている。ブレンド・ヤン・バクスター方程式のユニタリ解は、位相量子コンピュータの量子ゲートとして特に興味深い。量子計算においてユニタリかつ普遍的である任意の次元の解に対する単純な構成を示す。また、ある一般化されたyang-baxter方程式に対する解の族を完全に分類し、方程式の特定の例に等式のスカラー倍である解しか持たないことを証明する。 The Yang-Baxter equation and it's various forms have applications in many fields, including statistical mechanics, knot theory, and quantum information. Unitary solutions of the braided Yang-Baxter equation are of particular interest as quantum gates for topological quantum computers. We demonstrate a simple construction for solutions in any dimension, which are both unitary and universal for quantum computation. We also fully classify a family of solutions to certain generalized Yang-Baxter equations and prove that certain instances of the equation only have solutions that are scalar multiples of the identity.	翻訳日:2023-04-04 16:45:13 公開日:2023-04-03
# 調整可能な確率的再構成誤差と平均シフトアウトリアースコアを用いたオートエンコーダに基づくアウトリアー検出の改善 Improving Autoencoder-based Outlier Detection with Adjustable Probabilistic Reconstruction Error and Mean-shift Outlier Scoring ( http://arxiv.org/abs/2304.00709v1 ) ライセンス: Link先を確認	Xu Tan, Jiawei Yang, Junqi Chen, Sylwan Rahardja, Susanto Rahardja	(参考訳) オートエンコーダは多くの機械学習タスクで、強力な学習能力のおかげで広く使われており、異常検出の分野で研究者の間で大きな関心を集めている。しかし,従来のオートエンコーダ方式には2つの側面があった。これにより、異常検出のパフォーマンスが制限された。まず,従来のオートエンコーダにおける平均二乗誤差は,その表現能力を制限したオートエンコーダの判定の不確実性を無視した。第2に、オートエンコーダは異常なリコンストラクション問題に苦しめられ、いくつかのアウトリアーは予期せぬほどうまく再構築され、インリアーからの識別が困難になる。上記の問題を緩和するため,本論文では2つの新しい手法を提案する。まず, 復元バイアスと判断不確実性の両方を考慮し, 確率的再構成誤差(pre)という新しい損失関数を構築した。これら2つの因子のトレードオフをさらに制御するために、前生成型確率的再構成誤差(apre)において2つの重みが導入された。第二に、平均シフト(MSS)に基づく概念的に新しい外れ値スコアリング法が提案され、オートエンコーダによって生じる誤りのインリエを低減する。 32個の実世界の外れ値検出データセットの実験により,提案手法の有効性が確認された。提案手法の組み合わせは, 最良ベースラインと比較して, 性能向上率の41%を達成した。 MSSは複数のオートエンコーダベースのアウトリア検出器の性能を平均20%改善した。提案する2つの手法は, 異常検出におけるオートエンコーダの開発を促進する可能性を秘めている。コードはwww.outliernet.comで再現可能である。 Autoencoders were widely used in many machine learning tasks thanks to their strong learning ability which has drawn great interest among researchers in the field of outlier detection. However, conventional autoencoder-based methods lacked considerations in two aspects. This limited their performance in outlier detection. First, the mean squared error used in conventional autoencoders ignored the judgment uncertainty of the autoencoder, which limited their representation ability. Second, autoencoders suffered from the abnormal reconstruction problem: some outliers can be unexpectedly reconstructed well, making them difficult to identify from the inliers. To mitigate the aforementioned issues, two novel methods were proposed in this paper. First, a novel loss function named Probabilistic Reconstruction Error (PRE) was constructed to factor in both reconstruction bias and judgment uncertainty. To further control the trade-off of these two factors, two weights were introduced in PRE producing Adjustable Probabilistic Reconstruction Error (APRE), which benefited the outlier detection in different applications. Second, a conceptually new outlier scoring method based on mean-shift (MSS) was proposed to reduce the false inliers caused by the autoencoder. Experiments on 32 real-world outlier detection datasets proved the effectiveness of the proposed methods. The combination of the proposed methods achieved 41% of the relative performance improvement compared to the best baseline. The MSS improved the performance of multiple autoencoder-based outlier detectors by an average of 20%. The proposed two methods have the potential to advance autoencoder's development in outlier detection. The code is available on www.OutlierNet.com for reproducibility.	翻訳日:2023-04-04 16:45:04 公開日:2023-04-03
# 滑らかな共分散を伴うオンライン最小二乗SGDの高次元スケーリング限界と揺らぎ High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance ( http://arxiv.org/abs/2304.00707v1 ) ライセンス: Link先を確認	Krishnakumar Balasubramanian, Promit Ghosal, Ye He	(参考訳) オンライン最小二乗確率勾配降下(sgd)アルゴリズムの高次元スケーリング限界とゆらぎを,データ生成モデルの特性を明示的に考慮して導出する。提案手法では,SGDを相互作用粒子系として繰り返し処理し,その相互作用は入力の共分散構造によって特徴づけられる。 8階までのモーメント上の滑らか性条件を仮定し、ガウス性を明確に仮定することなく、無限次元常微分方程式(odes)または確率微分方程式(sdes)の形で高次元のスケーリング限界とゆらぎを確立する。その結果,イテレートの正確な3段階の相転移が明らかになった。弾道性から拡散性,そしてノイズのばらつきが低レベルから中程度に,そして極端に高いノイズ設定へと変化する。低雑音環境では、(スケールした)反復の正確なゆらぎを無限次元のSDEとして特徴づける。また、導出制限ODEとSDEに対する解の存在と特異性を示す。その結果, 限界平均二乗推定や予測誤差のキャラクタリゼーションや, 限界方程式を解析的あるいは数値的に解くことで得られる変動など, いくつかの応用が得られた。 We derive high-dimensional scaling limits and fluctuations for the online least-squares Stochastic Gradient Descent (SGD) algorithm by taking the properties of the data generating model explicitly into consideration. Our approach treats the SGD iterates as an interacting particle system, where the expected interaction is characterized by the covariance structure of the input. Assuming smoothness conditions on moments of order up to eight orders, and without explicitly assuming Gaussianity, we establish the high-dimensional scaling limits and fluctuations in the form of infinite-dimensional Ordinary Differential Equations (ODEs) or Stochastic Differential Equations (SDEs). Our results reveal a precise three-step phase transition of the iterates; it goes from being ballistic, to diffusive, and finally to purely random behavior, as the noise variance goes from low, to moderate and finally to very-high noise setting. In the low-noise setting, we further characterize the precise fluctuations of the (scaled) iterates as infinite-dimensional SDEs. We also show the existence and uniqueness of solutions to the derived limiting ODEs and SDEs. Our results have several applications, including characterization of the limiting mean-square estimation or prediction errors and their fluctuations which can be obtained by analytically or numerically solving the limiting equations.	翻訳日:2023-04-04 16:44:38 公開日:2023-04-03
# d-score:突然変異演算子に基づくcnnのホワイトボックス診断スコア D-Score: A White-Box Diagnosis Score for CNNs Based on Mutation Operators ( http://arxiv.org/abs/2304.00697v1 ) ライセンス: Link先を確認	Xin Zhang and Yuqi Song and Xiaofeng Wang and Fei Zuo	(参考訳) 畳み込みニューラルネットワーク(cnns)は、自動運転や医療診断など、多くの安全クリティカルな領域に広く適用されている。標準テスト方法はテストセットにおけるモデルのパフォーマンスを評価するが、低品質で不十分なテストセットは信頼性の低い評価結果につながり、予期せぬ結果をもたらす可能性がある。したがって、cnnを総合的に評価する方法と、評価結果に基づいて、信頼度を高める方法が緊急対応すべき重要な課題である。以前の研究では、cnnのテストセットを評価するために突然変異試験を用いた。しかし、評価スコアはブラックボックスであり、テスト対象として十分に明示されていない。本稿では,突然変異演算子と画像変換を用いてモデルの特徴と注意分布を算出し,さらに,モデルのロバスト性とデータセットへの適合性を反映したd-scoreという診断スコアを提示するホワイトボックス診断手法を提案する。また,D-Scoreに基づくデータ拡張手法を提案し,CNNの性能を翻訳や再スケーリングに拡張する。広く使われている2つのデータセットと3つのCNNに関する総合的な実験は、我々のアプローチの有効性を実証している。 Convolutional neural networks (CNNs) have been widely applied in many safety-critical domains, such as autonomous driving and medical diagnosis. However, concerns have been raised with respect to the trustworthiness of these models: The standard testing method evaluates the performance of a model on a test set, while low-quality and insufficient test sets can lead to unreliable evaluation results, which can have unforeseeable consequences. Therefore, how to comprehensively evaluate CNNs and, based on the evaluation results, how to enhance their trustworthiness are the key problems to be urgently addressed. Prior work has used mutation tests to evaluate the test sets of CNNs. However, the evaluation scores are black boxes and not explicit enough for what is being tested. In this paper, we propose a white-box diagnostic approach that uses mutation operators and image transformation to calculate the feature and attention distribution of the model and further present a diagnosis score, namely D-Score, to reflect the model's robustness and fitness to a dataset. We also propose a D-Score based data augmentation method to enhance the CNN's performance to translations and rescalings. Comprehensive experiments on two widely used datasets and three commonly adopted CNNs demonstrate the effectiveness of our approach.	翻訳日:2023-04-04 16:44:14 公開日:2023-04-03
# 熱拡散関数(TSF):物理誘導材料分類 Thermal Spread Functions (TSF): Physics-guided Material Classification ( http://arxiv.org/abs/2304.00696v1 ) ライセンス: Link先を確認	Aniket Dashpute, Vishwanath Saragadam, Emma Alexander, Florian Willomitzer, Aggelos Katsaggelos, Ashok Veeraraghavan, Oliver Cossairt	(参考訳) ロバストで非破壊的な物質分類は、多くの視覚応用において難しいが重要な第一歩である。本研究では,物体の熱特性に依存する物理誘導材料分類フレームワークを提案する。我々の重要な観察は、物体の加熱と冷却の速度が、材料の固有の性質、すなわち放射率と拡散率に依存することである。熱カメラが加熱・冷却過程の計測を捉えている間、この観察を低出力レーザーで一定期間温め、それをオフにすることで活用する。次に、この空間的および時間的「熱拡散関数」(TSF)を用いて、有限差分法による逆熱方程式を解き、空間的に微分率と放射率を推定する。これらのタプルは、各空間画素で微細な材料ラベルを生成する分類器の訓練に使用される。提案手法は小型光源(低出力レーザー)とサーマルカメラのみを極端に必要とし,16クラスで86%の精度でロバストな分類結果を生成する。 Robust and non-destructive material classification is a challenging but crucial first-step in numerous vision applications. We propose a physics-guided material classification framework that relies on thermal properties of the object. Our key observation is that the rate of heating and cooling of an object depends on the unique intrinsic properties of the material, namely the emissivity and diffusivity. We leverage this observation by gently heating the objects in the scene with a low-power laser for a fixed duration and then turning it off, while a thermal camera captures measurements during the heating and cooling process. We then take this spatial and temporal "thermal spread function" (TSF) to solve an inverse heat equation using the finite-differences approach, resulting in a spatially varying estimate of diffusivity and emissivity. These tuples are then used to train a classifier that produces a fine-grained material label at each spatial pixel. Our approach is extremely simple requiring only a small light source (low power laser) and a thermal camera, and produces robust classification results with 86% accuracy over 16 classes.	翻訳日:2023-04-04 16:43:56 公開日:2023-04-03
# マトリックスプロファイルによるリチウムイオン電池オンライン膝のオンセット検出 Lithium-ion Battery Online Knee Onset Detection by Matrix Profile ( http://arxiv.org/abs/2304.00691v1 ) ライセンス: Link先を確認	Kate Qi Zhou, Yan Qin, Chau Yuen	(参考訳) リチウムイオン電池(LiBs)は膝の発症までわずかに劣化し、その後劣化は寿命(EOL)に加速する。加速劣化速度の開始を示す膝の発症は、電池の性能変化を早期に警告する上で重要である。しかし、オンライン膝の特定に関する文献は限られている。また、簡便に収集した測定値を用いてその識別を行うことが好ましい。これらの課題を解決するために、放電データ内の時間情報を利用してオンライン膝のオンセット識別法を開発した。第1に、わずかな劣化段階から放電電圧サイクルに埋め込まれた時間的ダイナミクスを動的時間ゆがみによって抽出する。第2に、サブシーケンス類似性探索中に、異常をマトリックスプロファイルで露呈する。新しいサイクルの時間的ダイナミクスが制御限界を超え、プロファイル指標がレジームの変化を示すと、膝の発症が検出される。最後に、識別された膝のオンセットを使用して、電池のEOLサイクルとの強い相関により、バッテリーを長距離または短距離のカテゴリに分類する。電池分類と同一統計分布下で得られたトレーニングデータのサポートにより,提案したSOH推定モデルは,ルート平均2乗誤差を0.22%以下に向上した推定結果が得られる。 Lithium-ion batteries (LiBs) degrade slightly until the knee onset, after which the deterioration accelerates to end of life (EOL). The knee onset, which marks the initiation of the accelerated degradation rate, is crucial in providing an early warning of the battery's performance changes. However, there is only limited literature on online knee onset identification. Furthermore, it is good to perform such identification using easily collected measurements. To solve these challenges, an online knee onset identification method is developed by exploiting the temporal information within the discharge data. First, the temporal dynamics embedded in the discharge voltage cycles from the slight degradation stage are extracted by the dynamic time warping. Second, the anomaly is exposed by Matrix Profile during subsequence similarity search. The knee onset is detected when the temporal dynamics of the new cycle exceed the control limit and the profile index indicates a change in regime. Finally, the identified knee onset is utilized to categorize the battery into long-range or short-range categories by its strong correlation with the battery's EOL cycles. With the support of the battery categorization and the training data acquired under the same statistic distribution, the proposed SOH estimation model achieves enhanced estimation results with a root mean squared error as low as 0.22%.	翻訳日:2023-04-04 16:43:38 公開日:2023-04-03
# 野生における3次元セマンティックセマンティックセグメンテーション--逆導電点雲の一般モデル学習 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds ( http://arxiv.org/abs/2304.00690v1 ) ライセンス: Link先を確認	Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing	(参考訳) 全天候条件下でのロバストポイントクラウド解析は、自動運転におけるレベル5の自律性に不可欠である。しかしながら、一般的な3Dセマンティックセグメンテーション(DSS)モデルを学習する方法はほとんど無視されている。我々は,ポイントレベルの密接なアノテーションを提供し,様々な気象条件下で3dsを解析可能な,悪天候のポイントクラウドデータセットであるsemanticstfを紹介する。全天候3DSSモデリングを2つの設定で検討する。 1) 正常ウェザーデータから悪ウェザーデータに適応するドメイン適応型3DSS 2) ドメイン一般化可能な3DSSは, 通常の天候データから全天候3DSSモデルを学習する。本研究は,既存の3DSS手法が悪天候データに遭遇する際の課題を明らかにするものである。さらに,点雲の幾何学的スタイルをランダム化し,それらの埋め込みを集約するドメインランダム化手法を考案し,その結果,様々な悪天候下で3dsを効果的に改善できる一般化モデルを構築した。 SemanticSTFと関連するコードは、 \url{https://github.com/xiaoaoran/SemanticSTF}で入手できる。 Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. We study all-weather 3DSS modeling under two setups: 1) domain adaptive 3DSS that adapts from normal-weather data to adverse-weather data; 2) domain generalizable 3DSS that learns all-weather 3DSS models from normal-weather data. Our studies reveal the challenge while existing 3DSS methods encounter adverse-weather data, showing the great value of SemanticSTF in steering the future endeavor along this very meaningful research direction. In addition, we design a domain randomization technique that alternatively randomizes the geometry styles of point clouds and aggregates their embeddings, ultimately leading to a generalizable model that can improve 3DSS under various adverse weather effectively. The SemanticSTF and related codes are available at \url{https://github.com/xiaoaoran/SemanticSTF}.	翻訳日:2023-04-04 16:43:19 公開日:2023-04-03
# 変分オートエンコーダを用いたデバイス画像-IVマッピングによる逆設計と前方予測 Device Image-IV Mapping using Variational Autoencoder for Inverse Design and Forward Prediction ( http://arxiv.org/abs/2304.00738v1 ) ライセンス: Link先を確認	Thomas Lu, Albert Lu, and Hiu Yung Wong	(参考訳) 本稿では,変分オートエンコーダ(vae)に基づく新しい枠組みを用いて,デバイス構造画像を対応する電流電圧(iv)特性にマッピングすることで,基礎となるデバイス物理の学習を実証する。 VAEは使用されるため、ドメインの専門知識は必要とせず、フレームワークはどんな新しいデバイスや測定にも素早くデプロイできる。これは、デバイス横断画像と電気的特性しか利用できない場合(例えば、新しい新興メモリ)に、新しいデバイスのコンパクトなモデリングに有用であることが期待される。実演には技術コンピュータ支援設計(tcad)と手描きの金属酸化物半導体(mos)デバイス画像とノイズドレイン電流ゲート電圧曲線(idvg)を用いた。このフレームワークは2つのVAE(画像多様体学習用とIDVG多様体学習用)を積み重ねて形成され、潜在変数を介して相互に通信する。異なる強度を持つ5つの独立変数が使用される。逆設計(所定のIDVGの設計構造を生成する)と前方予測(所定の構造画像に対する予測IDVG)をうまく行うことができ、画像がデバイスパラメータとして扱われる場合のコンパクトなモデリングに使用できる。多様体学習が用いられるため、機械は入力(手書き画像とノイズIDVG曲線)のノイズに対して頑健であり、弱い独立変数と無関係な独立変数に混同されない。 This paper demonstrates the learning of the underlying device physics by mapping device structure images to their corresponding Current-Voltage (IV) characteristics using a novel framework based on variational autoencoders (VAE). Since VAE is used, domain expertise is not required and the framework can be quickly deployed on any new device and measurement. This is expected to be useful in the compact modeling of novel devices when only device cross-sectional images and electrical characteristics are available (e.g. novel emerging memory). Technology Computer-Aided Design (TCAD) generated and hand-drawn Metal-Oxide-Semiconductor (MOS) device images and noisy drain-current-gate-voltage curves (IDVG) are used for the demonstration. The framework is formed by stacking two VAEs (one for image manifold learning and one for IDVG manifold learning) which communicate with each other through the latent variables. Five independent variables with different strengths are used. It is shown that it can perform inverse design (generate a design structure for a given IDVG) and forward prediction (predict IDVG for a given structure image, which can be used for compact modeling if the image is treated as device parameters) successfully. Since manifold learning is used, the machine is shown to be robust against noise in the inputs (i.e. using hand-drawn images and noisy IDVG curves) and not confused by weak and irrelevant independent variables.	翻訳日:2023-04-04 16:37:43 公開日:2023-04-03
# SparDL: 効率的なスパース通信による分散ディープラーニングトレーニング SparDL: Distributed Deep Learning Training with Efficient Sparse Communication ( http://arxiv.org/abs/2304.00737v1 ) ライセンス: Link先を確認	Minjun Zhao, Yichen Yin, Yuren Mao, Lu Chen, Yunjun Gao	(参考訳) Top-k$スペーシフィケーションは近年,分散ディープラーニングにおける通信量削減に広く利用されているが,Gradient Accumulation (GA) ジレンマにより,Top-k$スペーシフィケーションの性能は依然として限られている。 GAジレンマの処理にはいくつかの方法が提案されているが,(1)大量の余剰送信を導入すると通信の複雑化に不満を抱くこと,(2)非力の労働者には柔軟性がないこと,の2つの欠点がある。これら2つの問題を解決するために,SparDLと呼ばれるフレキシブルで効率的なスパース通信フレームワークを提案する。 SparDLはSpar-Reduce-Scatterアルゴリズムを用いて、追加の通信操作なしでGAジレンマを解く。さらに,通信複雑性をさらに低減し,通信複雑性のレイテンシと帯域幅コストの比率を調整するために,SparDLの一部としてSpar-All-Gatherアルゴリズムを提案する。広範な実験はspardlの優位性を検証する。 Top-$k$ sparsification has recently been widely used to reduce the communication volume in distributed deep learning; however, due to Gradient Accumulation (GA) dilemma, the performance of top-$k$ sparsification is still limited. Several methods have been proposed to handle the GA dilemma but have two drawbacks: (1) they are frustrated by the high communication complexity as they introduce a large amount of extra transmission; (2) they are not flexible for non-power-of-two numbers of workers. To solve these two problems, we propose a flexible and efficient sparse communication framework, dubbed SparDL. SparDL uses the Spar-Reduce-Scatter algorithm to solve the GA dilemma without additional communication operations and is flexible to any number of workers. Besides, to further reduce the communication complexity and adjust the proportion of latency and bandwidth cost in communication complexity, we propose the Spar-All-Gather algorithm as part of SparDL. Extensive experiments validate the superiority of SparDL.	翻訳日:2023-04-04 16:37:17 公開日:2023-04-03
# フェムト秒分解能における量子エミッタのマルチポーラロンダイナミクス検出のための液相単粒子分光法 Solution-phase single-particle spectroscopy for probing multi-polaronic dynamics in quantum emitters at femtosecond resolution ( http://arxiv.org/abs/2304.00735v1 ) ライセンス: Link先を確認	Jiaojian Shi, Yuejun Shen, Feng Pan, Weiwei Sun, Anudeep Mangu, Cindy Shi, Amy McKeown-Green, Parivash Moradifar, Moungi G. Bawendi, William E. Moerner, Jennifer A. Dionne, Fang Liu, Aaron M. Lindenberg	(参考訳) 多くの光量子技術の発展は、ほぼ完全な光コヒーレンスを持つ固体単一量子エミッタの可用性に依存する。しかしながら、系統的な改善を制限するスタンディング問題は、単一のエミッタレベルと超高速時間スケールでの微視的エネルギーフローの重大なサンプルの不均一性と機械的な理解の欠如である。フェムト秒分解能で前例のない明快さで単一分子および/または欠陥状態におけるサンプル平均ダイナミクスをキャプチャする光子相関検出を用いた溶液相単粒子ポンププローブ分光を開発した。我々はこの手法を2次元六方晶窒化ホウ素の単一量子エミッタに適用し, 高い不均一性と低い量子効率に苦しむ。ミリ秒からナノ秒の時間スケールでは、翻訳拡散、準安定状態関連肩、回転ダイナミクス、および反バンキング特性は、それぞれの異なる光子相関時間スケールによって切り離され、正規化された2光子放出量子収率を定量化する。フェムト秒分解能、スペクトル選択率、超低ノイズ(固体法よりも2桁改善)を活用することで、単一欠陥レベルで時間領域における電子-フォノンカップリングを可視化し、多電子励起によるポーラロン形成の加速を検出する。理論的ポーラロンモデルの結果と合致して、サンプル平均光子忠実性がカスケード放出効率と光デコヒーレンス時間にどのように変換されるかを示す。我々の研究は、単一エミッタ、分子、欠陥の超高速分光のための枠組みを提供し、量子情報応用のための超大規模キャラクタリゼーションと合成改善の新たな道を開く。 The development of many optical quantum technologies depends on the availability of solid-state single quantum emitters with near-perfect optical coherence. However, a standing issue that limits systematic improvement is the significant sample heterogeneity and lack of mechanistic understanding of microscopic energy flow at the single emitter level and ultrafast timescales. Here we develop solution-phase single-particle pump-probe spectroscopy with photon correlation detection that captures sample-averaged dynamics in single molecules and/or defect states with unprecedented clarity at femtosecond resolution. We apply this technique to single quantum emitters in two-dimensional hexagonal boron nitride, which suffers from significant heterogeneity and low quantum efficiency. From millisecond to nanosecond timescales, the translation diffusion, metastable-state-related bunching shoulders, rotational dynamics, and antibunching features are disentangled by their distinct photon-correlation timescales, which collectively quantify the normalized two-photon emission quantum yield. Leveraging its femtosecond resolution, spectral selectivity and ultralow noise (two orders of magnitude improvement over solid-state methods), we visualize electron-phonon coupling in the time domain at the single defect level, and discover the acceleration of polaronic formation driven by multi-electron excitation. Corroborated with results from a theoretical polaron model, we show how this translates to sample-averaged photon fidelity characterization of cascaded emission efficiency and optical decoherence time. Our work provides a framework for ultrafast spectroscopy in single emitters, molecules, or defects prone to photoluminescence intermittency and heterogeneity, opening new avenues of extreme-scale characterization and synthetic improvements for quantum information applications.	翻訳日:2023-04-04 16:36:58 公開日:2023-04-03
# 重力誘起低温原子の絡み合い Gravitationally-induced entanglement in cold atoms ( http://arxiv.org/abs/2304.00734v1 ) ライセンス: Link先を確認	Richard Howl, Nathan Cooper, Lucia Hackerm\"uller	(参考訳) 実験室で量子重力をテストするための有望なルートは、2つ以上の量子物質間の重力誘起絡み合い(GIE)を探すことである。主に、N00N状態や高スクイーズ状態のような非古典状態のマイクロソリッドシステムを用いている。ここでは、初めて、2つの冷たい原子ガス間のGIEを量子重力のテストとして考える。本稿では、2つの原子干渉計を並列に配置し、GIEと量子重力の証拠として出力ポートにおける原子数の相関関係を求める。 N00N や Schr\odinger cat のような挑戦的なマクロな重ね合わせ状態はなく、代わりに原子の古典的な「コヒーレント」状態がある。これにより、原子干渉計の総質量はプランク質量スケールと長い積分時間でなければならない。しかし、現在最先端の量子スクイーズでは、質量スケールは接近可能なレベルに還元でき、そのような質量スケールが近い将来どのように達成されるかについて概説する。 A promising route to testing quantum gravity in the laboratory is to look for gravitationally-induced entanglement (GIE) between two or more quantum matter systems. Principally, proposals for such tests have used microsolid systems, with highly non-classical states, such as N00N states or highly-squeezed states. Here, we consider, for the first time, GIE between two cold atomic gasses as a test of quantum gravity. We propose placing two atom interferometers next to each other in parallel and looking for correlations in the number of atoms at the output ports as evidence of GIE and quantum gravity. There are no challenging macroscopic superposition states, such as N00N or Schr\"odinger cat states, instead classical-like `coherent' states of atoms. This requires the total mass of the atom interferometers to be on the Planck mass scale, and long integration times. With current state-of-the-art quantum squeezing in cold atoms, however, we argue that the mass scale can be reduced to approachable levels and outline how such a mass scale can be achieved in the near future.	翻訳日:2023-04-04 16:36:26 公開日:2023-04-03
# ビデオにおける未バイアスシーングラフ生成 Unbiased Scene Graph Generation in Videos ( http://arxiv.org/abs/2304.00733v1 ) ライセンス: Link先を確認	Sayak Nag, Kyle Min, Subarna Tripathi, Amit K. Roy Chowdhury	(参考訳) 映像からの動的シーングラフ生成(SGG)の課題は、シーン固有のダイナミクス、モデル予測の時間的変動、画像ベースSGGの既存の課題に加えて、視覚的関係の長期分布などにより複雑かつ困難である。動的sggの既存の手法は、上述の課題、特に長期にわたる関係の分散に対処せずに、複雑なアーキテクチャを用いて時空間的コンテキストを捉えることに重点を置いている。これはしばしばバイアス付きシーングラフの生成につながる。これらの課題に対処するために,我々はテンプラと呼ばれる新しいフレームワークを紹介している。 TEMPURAは、トランスフォーマーに基づくシーケンスモデリングによりオブジェクトレベルの時間的整合性を採用し、メモリ誘導学習を用いて非バイアス関係表現を合成し、ガウス混合モデル(GMM)を用いて視覚関係の予測的不確実性を減衰させる。広範囲な実験により,既存の手法に比べて,より偏りのないシーングラフの生成において,性能が大幅に向上すること(場合によっては最大10%)を実証した。 The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of model predictions, and the long-tailed distribution of the visual relationships in addition to the already existing challenges in image-based SGG. Existing methods for dynamic SGG have primarily focused on capturing spatio-temporal context using complex architectures without addressing the challenges mentioned above, especially the long-tailed distribution of relationships. This often leads to the generation of biased scene graphs. To address these challenges, we introduce a new framework called TEMPURA: TEmporal consistency and Memory Prototype guided UnceRtainty Attenuation for unbiased dynamic SGG. TEMPURA employs object-level temporal consistencies via transformer-based sequence modeling, learns to synthesize unbiased relationship representations using memory-guided training, and attenuates the predictive uncertainty of visual relations using a Gaussian Mixture Model (GMM). Extensive experiments demonstrate that our method achieves significant (up to 10% in some cases) performance gain over existing methods highlighting its superiority in generating more unbiased scene graphs.	翻訳日:2023-04-04 16:36:06 公開日:2023-04-03
# 時空間流体プロセスの適応サンプリングのための予測モデルの利用 Leveraging Predictive Models for Adaptive Sampling of Spatiotemporal Fluid Processes ( http://arxiv.org/abs/2304.00732v1 ) ライセンス: Link先を確認	Sandeep Manjanna and Tom Z. Jiahao and M. Ani Hsieh	(参考訳) 時空間流体プロセスの永続的なモニタリングには、データのサンプリングと監視中のプロセスの予測モデルが必要である。本稿では,時空間過程の予測モデルに基づく適応サンプリングを行うPASSTアルゴリズムを提案する。 PASSTは、予測モデルを活用する適応型ロボットサンプリングアルゴリズムで、特定の領域における流体プロセスの効率的かつ永続的な監視を行う。本アルゴリズムは,学習した予測モデルから予測を活用し,自律走行車両が関心領域を適応的かつ効率的にサーベイする経路を計画する。次に、サンプルデータを用いて予測モデルに更新初期状態を与えることにより、より良い予測を得る。予測モデルの場合、流体過程のモデルを訓練するために知識に基づく神経常微分方程式を用いる。これらのモデルのサイズは桁違いに小さく、流体過程やその他の計算流体モデルを記述する偏微分方程式の直接数値シミュレーションから得られた流体データよりもずっと高速である。経路計画には、フィールド予測を報酬関数として使用する強化学習に基づく計画アルゴリズムを用いる。数値シミュレーションされた流体データと実世界の海流データの両方に対する適応的サンプリング経路計画アルゴリズムを評価し,与えられた領域の時空間場を長時間地平線でサンプリングできることを示した。また,学習モデルの学習レパートリーにない流体プロセスからサンプルを得るためのパストアルゴリズムの一般化能力を評価する。 Persistent monitoring of a spatiotemporal fluid process requires data sampling and predictive modeling of the process being monitored. In this paper we present PASST algorithm: Predictive-model based Adaptive Sampling of a Spatio-Temporal process. PASST is an adaptive robotic sampling algorithm that leverages predictive models to efficiently and persistently monitor a fluid process in a given region of interest. Our algorithm makes use of the predictions from a learned prediction model to plan a path for an autonomous vehicle to adaptively and efficiently survey the region of interest. In turn, the sampled data is used to obtain better predictions by giving an updated initial state to the predictive model. For predictive model, we use Knowledged-based Neural Ordinary Differential Equations to train models of fluid processes. These models are orders of magnitude smaller in size and run much faster than fluid data obtained from direct numerical simulations of the partial differential equations that describe the fluid processes or other comparable computational fluids models. For path planning, we use reinforcement learning based planning algorithms that use the field predictions as reward functions. We evaluate our adaptive sampling path planning algorithm on both numerically simulated fluid data and real-world nowcast ocean flow data to show that we can sample the spatiotemporal field in the given region of interest for long time horizons. We also evaluate PASST algorithm's generalization ability to sample from fluid processes that are not in the training repertoire of the learned models.	翻訳日:2023-04-04 16:35:45 公開日:2023-04-03
# 規則表現学習者に基づく解釈可能なローン債権評価方法 An Interpretable Loan Credit Evaluation Method Based on Rule Representation Learner ( http://arxiv.org/abs/2304.00731v1 ) ライセンス: Link先を確認	Zihao Chen, Xiaomeng Wang, Yuanjiang Huang, Tao Jia	(参考訳) モデルの解釈性は、ハイステイクフィールドにおける幅広い応用の障害の1つとなっている。解釈可能性を得る一般的な方法は、まずブラックボックスを構築し、次にポストホックメソッドを使って説明することである。しかし,ポストホック法による説明は必ずしも信頼できない。代わりに、レンディングクラブデータセットのためのrrl(rule representation learner)に基づいた本質的な解釈可能なモデルを設計する。具体的には、特徴はそれぞれの特性に応じて3つのカテゴリに分けられ、それぞれ3つのサブネットワークを構築することができ、それぞれが単一の隠蔽層を持つニューラルネットワークに似ているが、等価にルールのセットに変換できる。トレーニング中、私たちは以前の研究からバイナリ重みを効果的にトレーニングするためのトリックを学びました。最後に,本モデルと木モデルを比較した。その結果, 金融機関と借入業者の両方にとって実用上重要なブラックボックスに近い, 性能上, 解釈可能な決定木よりも, はるかに優れたモデルが得られた。さらに,本モデルはポストホック法で生成された説明の正確性をテストするために用いられ,ポストホック法が必ずしも信頼できるとは限らないことを示す。 The interpretability of model has become one of the obstacles to its wide application in the high-stake fields. The usual way to obtain interpretability is to build a black-box first and then explain it using the post-hoc methods. However, the explanations provided by the post-hoc method are not always reliable. Instead, we design an intrinsically interpretable model based on RRL(Rule Representation Learner) for the Lending Club dataset. Specifically, features can be divided into three categories according to their characteristics of themselves and build three sub-networks respectively, each of which is similar to a neural network with a single hidden layer but can be equivalently converted into a set of rules. During the training, we learned tricks from previous research to effectively train binary weights. Finally, our model is compared with the tree-based model. The results show that our model is much better than the interpretable decision tree in performance and close to other black-box, which is of practical significance to both financial institutions and borrowers. More importantly, our model is used to test the correctness of the explanations generated by the post-hoc method, the results show that the post-hoc method is not always reliable.	翻訳日:2023-04-04 16:35:23 公開日:2023-04-03
# CG-3DSRGAN:低線量PET画像からの画質回復のための分類ガイド付き3次元生成対向ネットワーク CG-3DSRGAN: A classification guided 3D generative adversarial network for image quality recovery from low-dose PET images ( http://arxiv.org/abs/2304.00725v1 ) ライセンス: Link先を確認	Yuxin Xue, Yige Peng, Lei Bi, and Dagan Feng, Jinman Kim	(参考訳) ポジトロン・エミッション・トモグラフィー (PET) は, 現代医療において, 最も感度の高い分子イメージング法である。注入トレーサー線量による高放射能はPET画像における主要な関心事であり、臨床応用を制限している。しかし、投与量を減少させると、画像品質が不適切な診断に繋がる。低線量で高画質な画像を作成する必要性から、低線量で高画質なPET合成のための畳み込みニューラルネットワーク(CNN)ベースの手法が開発されている。従来のCNNによる研究は通常、低用量PETを異なる線量還元レベルを考慮せずに特徴空間に直接マッピングする。本研究では,CG-3DSRGAN (Classification-Guided Generative Adversarial Network with Super Resolution Refinement) という新しい手法を提案する。具体的には、分類ヘッドによって誘導されるマルチタスク粗いジェネレータにより、低線量データに存在するノイズレベルの特徴をより包括的に理解し、画像合成を改善することができる。さらに,標準PETの空間的詳細を回復するために,第2段階のトレーニングとして補助的な超解像ネットワークであるContextual-Netを提案し,粗い予測と標準PETのギャップを狭める。本手法を全身PETにおける各線量低減因子 (DRF) の経時的変化と比較した。実験により、我々の手法は全てのDRFにおいて他よりも優れることを示した。 Positron emission tomography (PET) is the most sensitive molecular imaging modality routinely applied in our modern healthcare. High radioactivity caused by the injected tracer dose is a major concern in PET imaging and limits its clinical applications. However, reducing the dose leads to inadequate image quality for diagnostic practice. Motivated by the need to produce high quality images with minimum low-dose, Convolutional Neural Networks (CNNs) based methods have been developed for high quality PET synthesis from its low-dose counterparts. Previous CNNs-based studies usually directly map low-dose PET into features space without consideration of different dose reduction level. In this study, a novel approach named CG-3DSRGAN (Classification-Guided Generative Adversarial Network with Super Resolution Refinement) is presented. Specifically, a multi-tasking coarse generator, guided by a classification head, allows for a more comprehensive understanding of the noise-level features present in the low-dose data, resulting in improved image synthesis. Moreover, to recover spatial details of standard PET, an auxiliary super resolution network - Contextual-Net - is proposed as a second-stage training to narrow the gap between coarse prediction and standard PET. We compared our method to the state-of-the-art methods on whole-body PET with different dose reduction factors (DRFs). Experiments demonstrate our method can outperform others on all DRF.	翻訳日:2023-04-04 16:35:07 公開日:2023-04-03
# 参照自由テキスト品質評価における大規模言語モデルの利用を探る:予備的実証的研究 Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study ( http://arxiv.org/abs/2304.00723v1 ) ライセンス: Link先を確認	Yi Chen, Rui Wang, Haiyun Jiang, Shuming Shi, Ruifeng Xu	(参考訳) 自然言語処理において,生成テキストの品質評価は難しい課題である。この困難は本文の複雑さと多様性から生じる。最近では,openaiの大規模言語モデル(llm)であるchatgptが,さまざまなタスクのパフォーマンス向上によって注目を浴びている。そこで本報告では,LLM,特にChatGPTの有効性について検討し,テキスト品質評価におけるそれらの使用方法を検討する。 chatgptまたは類似のllmに基づく3種類の参照フリー評価手法を比較した。実験の結果,ChatGPTは様々な視点からテキスト品質を効果的に評価でき,既存の自動メトリクスよりも優れた性能を示すことがわかった。特に,ChatGPTを用いてテキスト品質を計測する数値スコアを生成するExplicit Scoreは,この3つの手法の中で最も効果的で信頼性の高い手法である。しかし、ChatGPTを用いて2つのテキストの品質を直接比較することは、最適以下の結果をもたらす可能性がある。本稿では,ChatGPT などの LLM を用いたテキスト品質評価手法の選択について,貴重な知見を提供する。 Evaluating the quality of generated text is a challenging task in natural language processing. This difficulty arises from the inherent complexity and diversity of text. Recently, OpenAI's ChatGPT, a powerful large language model (LLM), has garnered significant attention due to its impressive performance in various tasks. Therefore, we present this report to investigate the effectiveness of LLMs, especially ChatGPT, and explore ways to optimize their use in assessing text quality. We compared three kinds of reference-free evaluation methods based on ChatGPT or similar LLMs. The experimental results prove that ChatGPT is capable to evaluate text quality effectively from various perspectives without reference and demonstrates superior performance than most existing automatic metrics. In particular, the Explicit Score, which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method among the three exploited approaches. However, directly comparing the quality of two texts using ChatGPT may lead to suboptimal results. We hope this report will provide valuable insights into selecting appropriate methods for evaluating text quality with LLMs such as ChatGPT.	翻訳日:2023-04-04 16:34:44 公開日:2023-04-03
# 集合的エミッションの確立における貯水池記憶の影響--非マルコビアン性- Reservoir Memory Effect in the Establishment of Collective Emission: Non-Markovianity beyond Retardation ( http://arxiv.org/abs/2304.00722v1 ) ライセンス: Link先を確認	Yu-Xiang Zhang	(参考訳) 集団放出を確立するために、アンサンブル内の原子は仮想光子を交換することでその挙動を調整しなければならない。我々は、この非マルコフ過程を1次元(1次元)導波路に結合したサブ波長原子鎖で研究し、非マルコフ性の唯一の原因ではないことを発見した。もう1つの要因は光子環境の記憶であり、そこでは1つの励起原子は二次崩壊であるゼノレジームから指数崩壊に至る有限の時間を必要とする。導波路の設定では、このクロスオーバーは遅延よりも長い時間スケールを持ち、それによって集団行動の構築に影響を及ぼす。完全な量子処理と遅延効果のみを組み込んだ近似を比較することで、原子励起の集団によって特徴づけられるフィールドメモリ効果は、単一光子超放射において単一原子の崩壊よりもずっと顕著であることが分かる。 To establish a collective emission, the atoms in an ensemble must coordinate their behavior by exchanging virtual photons. We study this non-Markovian process in a subwavelength atom chain coupled to a one-dimensional (1D) waveguide and find that retardation is not the only cause of non-Markovianity. The other factor is the memory of the photonic environment, for which a single excited atom needs a finite time to cross from quadratic decay, the Zeno regime, to exponential decay. In waveguide setup, this crossover has a time scale longer than the retardation, thus impacts on the building up of collective behavior. By comparing a full quantum treatment with an approximation incorporating only the retardation effect, we find that field memory effect, characterized by the population of atomic excitation, is much more pronounced in single-photon superradiance than that in the decay of a single atom.	翻訳日:2023-04-04 16:34:27 公開日:2023-04-03
# 例外点近傍のピーターマン因子と位相剛性 Petermann factors and phase rigidities near exceptional points ( http://arxiv.org/abs/2304.00764v1 ) ライセンス: Link先を確認	Jan Wiersig	(参考訳) ピーターマン因子と位相剛性は、摂動に対するエネルギー固有値の感度やレーザーにおける量子過剰ノイズの大きさなど、オープン量子および波動系の様々な側面に便利な尺度である。非エルミート退化に近い2つの重要な量の挙動を議論する。小型の一般摂動の場合、例外点のスペクトル応答強度との関係を示す解析的明示的な公式を導出する。一般理論の予測は、おもちゃモデルの解析解と比較に成功している。さらに, ピーターマン係数とスペクトル応答強度との関係は, 後者を計算するための効率的な数値計算の基礎となることが示されている。 The Petermann factor and the phase rigidity are convenient measures for various aspects of open quantum and wave systems, such as the sensitivity of energy eigenvalues to perturbations or the magnitude of quantum excess noise in lasers. We discuss the behavior of these two important quantities near non-Hermitian degeneracies, so-called exceptional points. For small generic perturbations, we derive analytically explicit formulas which reveal a relation to the spectral response strength of the exceptional point. The predictions of the general theory are successfully compared to analytical solutions of a toy model. Moreover, it is demonstrated that the connection between Petermann factor and spectral response strength provides the basis for an efficient numerical scheme to calculate the latter.	翻訳日:2023-04-04 16:27:52 公開日:2023-04-03
# bollwm:インドの綿畑からボルワーム害虫をモニタリングする現実世界のデータセット BOLLWM: A real-world dataset for bollworm pest monitoring from cotton fields in India ( http://arxiv.org/abs/2304.00763v1 ) ライセンス: Link先を確認	Jerome White, Chandan Agrawal, Anmol Ojha, Apoorv Agnihotri, Makkunda Sharma, Jigar Doshi	(参考訳) 本稿では,インド全土の小規模農家や農業拡張労働者が5年間にわたって収集した農薬画像のデータセットについて述べる。このデータセットは、農夫の害虫管理決定を支援するために人工知能に依存するモバイルアプリケーションをサポートするために使用されている。作成は、組織化されたデータ収集の混合と、制御の少ないモバイルアプリケーションの使用から行われた。これにより、データセットは害虫検出コミュニティ内でユニークになり、他の非農業目的の検知データセットに近い多くの特徴が示される。これは、データセットを将来の害虫管理アプリケーションに適用するだけでなく、他の様々な研究課題への扉を開く。 This paper presents a dataset of agricultural pest images captured over five years by thousands of small holder farmers and farming extension workers across India. The dataset has been used to support a mobile application that relies on artificial intelligence to assist farmers with pest management decisions. Creation came from a mix of organized data collection, and from mobile application usage that was less controlled. This makes the dataset unique within the pest detection community, exhibiting a number of characteristics that place it closer to other non-agricultural objected detection datasets. This not only makes the dataset applicable to future pest management applications, it opens the door for a wide variety of other research agendas.	翻訳日:2023-04-04 16:27:40 公開日:2023-04-03
# 3d衣料アニメーションのための学習アンカー変換 Learning Anchor Transformations for 3D Garment Animation ( http://arxiv.org/abs/2304.00761v1 ) ライセンス: Link先を確認	Fang Zhao, Zekun Li, Shaoli Huang, Junwu Weng, Tianfei Zhou, Guo-Sen Xie, Jue Wang, Ying Shan	(参考訳) 本稿では,アンカーに基づく変形モデル,すなわちアンカーDEFを提案し,身体動作シーケンスから3次元衣料アニメーションを予測する。これは、余分な非線形変位を持つ剛性変換の混合により、衣料メッシュテンプレートを変形させる。メッシュ表面を囲む一連のアンカーは、剛性変換行列の学習を導くために導入された。アンカー変換が見つかると、衣服テンプレートの頂点ごとの非線形変位を正準空間に回帰させることができるため、変形空間学習の複雑さが軽減される。変換されたアンカーを位置、正規および方向の成分を満たすように明示的に制約することにより、空間における学習されたアンカー変換の物理的意味がより一般化するために保証される。さらに,代表的アンカー変換を学習するために,局所メッシュトポロジを意識してアンカー位置を最適化するアダプティブアンカー更新を提案する。異なる種類の衣服の質的および定量的実験により、アンコールDEFは、特にゆるやかな衣服において、動作中の3次元衣服の変形予測における最先端の性能を達成することを示した。 This paper proposes an anchor-based deformation model, namely AnchorDEF, to predict 3D garment animation from a body motion sequence. It deforms a garment mesh template by a mixture of rigid transformations with extra nonlinear displacements. A set of anchors around the mesh surface is introduced to guide the learning of rigid transformation matrices. Once the anchor transformations are found, per-vertex nonlinear displacements of the garment template can be regressed in a canonical space, which reduces the complexity of deformation space learning. By explicitly constraining the transformed anchors to satisfy the consistencies of position, normal and direction, the physical meaning of learned anchor transformations in space is guaranteed for better generalization. Furthermore, an adaptive anchor updating is proposed to optimize the anchor position by being aware of local mesh topology for learning representative anchor transformations. Qualitative and quantitative experiments on different types of garments demonstrate that AnchorDEF achieves the state-of-the-art performance on 3D garment deformation prediction in motion, especially for loose-fitting garments.	翻訳日:2023-04-04 16:27:30 公開日:2023-04-03
# FedIN: モデル不均一性のためのフェデレーション中間層学習 FedIN: Federated Intermediate Layers Learning for Model Heterogeneity ( http://arxiv.org/abs/2304.00759v1 ) ライセンス: Link先を確認	Chan Yun-Hin, Jiang Zhihan, Deng Jing, Ngai C.-H. Edith	(参考訳) フェデレートラーニング(FL)は、エッジデバイスがローカルおよびプライベートにトレーニングデータを維持しながら、グローバルな共有モデルを協調的にトレーニングすることを促進する。しかし、FLにおける一般的だが非現実的な仮定は、参加するエッジデバイスは同じリソースを持ち、同じグローバルモデルアーキテクチャを共有することである。本研究では,FedIN(Federated Intermediate Layers Learning)と呼ばれる新しいFL手法を提案する。 FedINのトレーニングモデルは、抽出器、中間層、分類器を含む3つの部分に分けられる。抽出器と分類器のモデルアーキテクチャは、中間層の特徴の一貫性を維持するためにすべてのデバイスで同じであるが、中間層のアーキテクチャはリソース容量に応じて異種デバイスに対して異なる。特徴から知識を生かすため、我々は、他のクライアントの機能に合わせて中間層を訓練し、訓練することを提案する。さらに,INトレーニングと局所トレーニングの競合によって引き起こされる勾配分散問題を緩和するため,凸最適化問題を定式化し,解決する。実験結果から,FedINは異種モデル環境において,最先端のアルゴリズムと比較して最高の性能を発揮することが示された。さらに,本研究では,イントレーニングの有効性と凸最適化問題に対する解法を示す。 Federated learning (FL) facilitates edge devices to cooperatively train a global shared model while maintaining the training data locally and privately. However, a common but impractical assumption in FL is that the participating edge devices possess the same required resources and share identical global model architecture. In this study, we propose a novel FL method called Federated Intermediate Layers Learning (FedIN), supporting heterogeneous models without utilizing any public dataset. The training models in FedIN are divided into three parts, including an extractor, the intermediate layers, and a classifier. The model architectures of the extractor and classifier are the same in all devices to maintain the consistency of the intermediate layer features, while the architectures of the intermediate layers can vary for heterogeneous devices according to their resource capacities. To exploit the knowledge from features, we propose IN training, training the intermediate layers in line with the features from other clients. Additionally, we formulate and solve a convex optimization problem to mitigate the gradient divergence problem induced by the conflicts between the IN training and the local training. The experiment results show that FedIN achieves the best performance in the heterogeneous model environment compared with the state-of-the-art algorithms. Furthermore, our ablation study demonstrates the effectiveness of IN training and the solution to the convex optimization problem.	翻訳日:2023-04-04 16:27:10 公開日:2023-04-03
# Spot-the-Camel: 安全な道路のためのコンピュータビジョン Spot-the-Camel: Computer Vision for Safer Roads ( http://arxiv.org/abs/2304.00757v1 ) ライセンス: Link先を確認	Khalid Alnujaidi, Ghada Alhabib, Abdulaziz Alodhieb	(参考訳) 人口が増加し、土地が都市化に利用されていくにつれて、私たちの道路や車によって生態系は混乱しています。このインフラストラクチャーの拡大は野生生物の領域を縮小し、多くの野生動物と車両の衝突(wvc)を引き起こした。これらのWVCの事例は、グローバルな社会経済的影響を持つ世界的な問題であり、数十億ドルの財産損害と、時には自動車利用者の死亡率をもたらす。サウジアラビアでは、この問題は同様のものであり、カメル・ヴェイクル衝突(CVC)の事例はラクダの大型化によって特に致命率 [1] の25% となる。この研究の焦点は、道路上でラクダを検出するタスクに基づいて、異なる物体検出モデルをテストすることである。実験で使用されるDeep Learning(DL)オブジェクト検出モデルは、Center Net、Efficient Det、Faster R-CNN、SSD、YOLOv8である。実験の結果, YOLOv8の精度は最高であり, トレーニングでは最も効率的であった。将来的には、田舎道をより安全にするシステムを開発することで、この事業を拡大する計画だ。 As the population grows and more land is being used for urbanization, ecosystems are disrupted by our roads and cars. This expansion of infrastructure cuts through wildlife territories, leading to many instances of Wildlife-Vehicle Collision (WVC). These instances of WVC are a global issue that is having a global socio-economic impact, resulting in billions of dollars in property damage and, at times, fatalities for vehicle occupants. In Saudi Arabia, this issue is similar, with instances of Camel-Vehicle Collision (CVC) being particularly deadly due to the large size of camels, which results in a 25% fatality rate [1]. The focus of this work is to test different object detection models on the task of detecting camels on the road. The Deep Learning (DL) object detection models used in the experiments are: Center Net, Efficient Det, Faster R-CNN, SSD, and YOLOv8. Results of the experiments show that YOLOv8 performed the best in terms of accuracy and was the most efficient in training. In the future, the plan is to expand on this work by developing a system to make countryside roads safer.	翻訳日:2023-04-04 16:26:50 公開日:2023-04-03
# 構造情報原理による効果的で安定な役割ベース多エージェント協調 Effective and Stable Role-Based Multi-Agent Collaboration by Structural Information Principles ( http://arxiv.org/abs/2304.00755v1 ) ライセンス: Link先を確認	Xianghua Zeng, Hao Peng, Angsheng Li	(参考訳) ロールベース学習はマルチエージェント強化学習(marl)の性能を向上させるための有望なアプローチである。しかしながら、現在のロールベースのメソッドでは、事前に定義されたロール構造か、ハイパーパラメータを選択するための実践的な経験のいずれかを前提として、複雑なタスクを効果的に分解する一連のロールを安定して発見することは保証できない。本稿では、SIRDという数学的構造情報原理に基づくロールディスカバリ手法を提案し、マルチエージェント協調のためのSIRD最適化MARLフレームワーク、SR-MARLを提案する。 SIRDはロール発見を階層的なアクション空間クラスタリングに変換する。具体的には、SIRDは構造化、スパーシフィケーション、最適化モジュールで構成され、最適なエンコーディングツリーを生成して、役割を発見するための抽象化を実行する。 SIRDは特定のMARLアルゴリズムに非依存であり、様々な値関数分解アプローチと柔軟に統合される。 StarCraft IIマイクロマネジメントベンチマークの実証的な評価は、最先端のMARLアルゴリズムと比較して、SR-MARLフレームワークは平均テストの勝利率を0.17%、6.08%、3.24%改善し、容易でハードなシナリオ下では16.67%、30.80%、66.30%の偏差を減少させることを示した。 Role-based learning is a promising approach to improving the performance of Multi-Agent Reinforcement Learning (MARL). Nevertheless, without manual assistance, current role-based methods cannot guarantee stably discovering a set of roles to effectively decompose a complex task, as they assume either a predefined role structure or practical experience for selecting hyperparameters. In this article, we propose a mathematical Structural Information principles-based Role Discovery method, namely SIRD, and then present a SIRD optimizing MARL framework, namely SR-MARL, for multi-agent collaboration. The SIRD transforms role discovery into a hierarchical action space clustering. Specifically, the SIRD consists of structuralization, sparsification, and optimization modules, where an optimal encoding tree is generated to perform abstracting to discover roles. The SIRD is agnostic to specific MARL algorithms and flexibly integrated with various value function factorization approaches. Empirical evaluations on the StarCraft II micromanagement benchmark demonstrate that, compared with state-of-the-art MARL algorithms, the SR-MARL framework improves the average test win rate by 0.17%, 6.08%, and 3.24%, and reduces the deviation by 16.67%, 30.80%, and 66.30%, under easy, hard, and super hard scenarios.	翻訳日:2023-04-04 16:26:26 公開日:2023-04-03
# 3DポイントクラウドのセマンティックセグメンテーションをU-Nextフレームワークで強化する Small but Mighty: Enhancing 3D Point Clouds Semantic Segmentation with U-Next Framework ( http://arxiv.org/abs/2304.00749v1 ) ライセンス: Link先を確認	Ziyin Zeng and Qingyong Hu and Zhong Xie and Jian Zhou and Yongyang Xu	(参考訳) 大規模3次元点雲のセマンティックセグメンテーションの問題点を考察する。近年,局所的特徴集約,損失関数の改善,サンプリング戦略など,多くの研究が進められている。ポイントクラウドセマンティックセグメンテーションの基本的なフレームワークはほとんど見過ごされているが、既存のアプローチのほとんどはデフォルトではU-Netアーキテクチャに依存している。本稿では,ポイントクラウドセマンティクスセグメンテーション用に設計された,小型だが強力なフレームワークであるu-nextを提案する。このフレームワークの鍵は、意味的に類似した特徴写像からマルチスケール階層表現を学ぶことである。具体的には,複数のU-Net$L^1$コーデックをネストした高密度な方法で積み重ねることで,セマンティックギャップを最小限に抑えるとともに,機能マップをスケールにわたって融合させて,詳細な詳細を効果的に回収する。また,よりスムーズな勾配伝搬とネットワーク最適化を実現するため,マルチレベル深層監視機構を考案した。 S3DIS、Tronto3D、SensatUrbanの3つの大規模ベンチマークで実施された大規模な実験は、提案したU-Nextアーキテクチャの優位性と有効性を示している。我々のU-Nextアーキテクチャは、さまざまなタスクやベースラインモデルにまたがる一貫性と可視性の向上を示し、将来の研究の一般的なフレームワークとして機能する可能性を示している。 We study the problem of semantic segmentation of large-scale 3D point clouds. In recent years, significant research efforts have been directed toward local feature aggregation, improved loss functions and sampling strategies. While the fundamental framework of point cloud semantic segmentation has been largely overlooked, with most existing approaches rely on the U-Net architecture by default. In this paper, we propose U-Next, a small but mighty framework designed for point cloud semantic segmentation. The key to this framework is to learn multi-scale hierarchical representations from semantically similar feature maps. Specifically, we build our U-Next by stacking multiple U-Net $L^1$ codecs in a nested and densely arranged manner to minimize the semantic gap, while simultaneously fusing the feature maps across scales to effectively recover the fine-grained details. We also devised a multi-level deep supervision mechanism to further smooth gradient propagation and facilitate network optimization. Extensive experiments conducted on three large-scale benchmarks including S3DIS, Toronto3D, and SensatUrban demonstrate the superiority and the effectiveness of the proposed U-Next architecture. Our U-Next architecture shows consistent and visible performance improvements across different tasks and baseline models, indicating its great potential to serve as a general framework for future research.	翻訳日:2023-04-04 16:25:56 公開日:2023-04-03
# OTS: 歴史的文書におけるテキストスポッティングのワンショット学習手法 OTS: A One-shot Learning Approach for Text Spotting in Historical Manuscripts ( http://arxiv.org/abs/2304.00746v1 ) ライセンス: Link先を確認	Wen-Bo Hu, Hong-Jian Zhan, Cong Liu, Bing Yin, Yue Lu	(参考訳) 歴史文書処理は、限定的な注釈付きトレーニングデータや新しいクラスの出現といった課題を提起する。そこで本研究では,新しい文字を1つの注釈付きサポートサンプルで正確にかつ確実に検出する,ワンショット学習ベースのテキストスポッティング(OTS)手法を提案する。認知研究からインスピレーションを得た空間アライメントモジュールを導入し、一つの支援画像に基づいてクエリ画像の最も識別性の高い空間領域を探索し、注目し、学習する。特に,低リソーススポッティングタスクは,例えば不均衡の問題に直面することが多いため,距離計量の埋め込み空間をより識別可能な,トーラス損失と呼ばれる新しい損失関数を提案する。我々のアプローチは非常に効率的で、わずかなトレーニングサンプルしか必要とせず、新しい文字やシンボルを扱う素晴らしい能力を示しています。データセットの多様性を高めるために、古代ドンバヒエログリフィクス(dbh)を含む新しい写本データセットを作成する。我々は、利用可能なVML-HD、TKH、NCデータセット、新しいDBHデータセットについて実験を行う。実験の結果,OTSは1ショットテキストスポッティングにおいて最先端の手法よりも優れていた。提案手法は,歴史写本のテキストスポッティング分野における有望な応用を提供する。 Historical manuscript processing poses challenges like limited annotated training data and novel class emergence. To address this, we propose a novel One-shot learning-based Text Spotting (OTS) approach that accurately and reliably spots novel characters with just one annotated support sample. Drawing inspiration from cognitive research, we introduce a spatial alignment module that finds, focuses on, and learns the most discriminative spatial regions in the query image based on one support image. Especially, since the low-resource spotting task often faces the problem of example imbalance, we propose a novel loss function called torus loss which can make the embedding space of distance metric more discriminative. Our approach is highly efficient and requires only a few training samples while exhibiting the remarkable ability to handle novel characters, and symbols. To enhance dataset diversity, a new manuscript dataset that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct experiments on publicly available VML-HD, TKH, NC datasets, and the new proposed DBH dataset. The experimental results demonstrate that OTS outperforms the state-of-the-art methods in one-shot text spotting. Overall, our proposed method offers promising applications in the field of text spotting in historical manuscripts.	翻訳日:2023-04-04 16:25:31 公開日:2023-04-03
# DeGPR:マルチクラス細胞検出・カウントのためのディープガイド後正則化 DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting ( http://arxiv.org/abs/2304.00741v1 ) ライセンス: Link先を確認	Aayush Kumar Tyagi, Chirag Mohapatra, Prasenjit Das, Govind Makharia, Lalita Mehra, Prathosh AP, Mausam	(参考訳) マルチクラス細胞検出とカウントは多くの病理診断に必須の課題である。手動計数は面倒で、しばしば病理学者の間でのサーバ間差につながる。複数の汎用的深層学習に基づく物体検出と計数方法が存在するが、限られたデータ、小さな重なり合った物体の存在、複数の細胞タイプ、重篤なクラス不均衡、細胞のサイズと形状の微妙な違いなどにより、医療画像中の細胞の検出と計数に容易に移行できない可能性がある。そこで本研究では,細胞間における識別的特徴を活かし,物体検出を支援するガイド付き後正則化(DeGPR)を提案する。これらの特徴は、病理学者によって提供されたり、視覚データから直接推測される。我々は2つの公開データセット(CoNSePとMoNuSAC)と、コントリビュートした新しいデータセットであるMuCeDでモデルを検証した。 MuCeDは、盲腸疾患を予測するためのヒト十二指腸の生検画像55枚からなる。 3つのデータセットで3つのオブジェクト検出ベースラインで広範な実験を行い、degprがモデルに依存しないことを示し、9%までの(絶対的な)マップゲインを得るベースラインを一貫して改善した。 Multi-class cell detection and counting is an essential task for many pathological diagnoses. Manual counting is tedious and often leads to inter-observer variations among pathologists. While there exist multiple, general-purpose, deep learning-based object detection and counting methods, they may not readily transfer to detecting and counting cells in medical images, due to the limited data, presence of tiny overlapping objects, multiple cell types, severe class-imbalance, minute differences in size/shape of cells, etc. In response, we propose guided posterior regularization (DeGPR), which assists an object detector by guiding it to exploit discriminative features among cells. The features may be pathologist-provided or inferred directly from visual data. We validate our model on two publicly available datasets (CoNSeP and MoNuSAC), and on MuCeD, a novel dataset that we contribute. MuCeD consists of 55 biopsy images of the human duodenum for predicting celiac disease. We perform extensive experimentation with three object detection baselines on three datasets to show that DeGPR is model-agnostic, and consistently improves baselines obtaining up to 9% (absolute) mAP gains.	翻訳日:2023-04-04 16:25:11 公開日:2023-04-03
# 言語モデルにおける知識表現の測定と操作 Measuring and Manipulating Knowledge Representations in Language Models ( http://arxiv.org/abs/2304.00740v1 ) ライセンス: Link先を確認	Evan Hernandez, Belinda Z. Li, Jacob Andreas	(参考訳) ニューラルネットワークモデル(lms)は、テキストで記述された世界の事実を表す。しばしばこれらの事実は訓練データに由来する(ほとんどのlmsではバナナという言葉の表現はバナナが果物であるという事実を象徴している)。時々、事実は入力テキスト自体に由来する("I poured the bottle"という文の表現は、ボトルが空になったという事実をエンコードしている)。 LMファクト表現の検査と修正を行うツールは、世界が変化した時に更新したり、バイアスのソースをローカライズしたり削除したり、生成されたテキストのエラーを識別したりできる。 LMにおける事実知識のクエリと修正のためのアプローチであるREMEDIについて述べる。 REMEDIは、LMの内部表現システムにおいて、テキストクエリから事実エンコーディングへのマップを学習する。これらのエンコーディングは知識エディタとして使用できる。lm隠れ表現に追加することで、下流生成を変更でき、新しい事実と一致させることができる。 REMEDIエンコーディングは、モデルプローブとしても使用することができる: LM表現と比較することで、LMが言及したエンティティにどの特性があるかを確認し、背景知識や入力テキストと矛盾する出力を生成するタイミングを予測することができる。したがって、REMEDIは、探索、プロンプト、モデル編集の研究をリンクし、LMにおける知識のきめ細かい検査と制御のための一般的なツールへのステップを提供する。 Neural language models (LMs) represent facts about the world described by text. Sometimes these facts derive from training data (in most LMs, a representation of the word banana encodes the fact that bananas are fruits). Sometimes facts derive from input text itself (a representation of the sentence "I poured out the bottle" encodes the fact that the bottle became empty). Tools for inspecting and modifying LM fact representations would be useful almost everywhere LMs are used: making it possible to update them when the world changes, to localize and remove sources of bias, and to identify errors in generated text. We describe REMEDI, an approach for querying and modifying factual knowledge in LMs. REMEDI learns a map from textual queries to fact encodings in an LM's internal representation system. These encodings can be used as knowledge editors: by adding them to LM hidden representations, we can modify downstream generation to be consistent with new facts. REMEDI encodings can also be used as model probes: by comparing them to LM representations, we can ascertain what properties LMs attribute to mentioned entities, and predict when they will generate outputs that conflict with background knowledge or input text. REMEDI thus links work on probing, prompting, and model editing, and offers steps toward general tools for fine-grained inspection and control of knowledge in LMs.	翻訳日:2023-04-04 16:24:49 公開日:2023-04-03
# ソース不要のドメイン適応に必要な微調整は少ない Few-shot Fine-tuning is All You Need for Source-free Domain Adaptation ( http://arxiv.org/abs/2304.00792v1 ) ライセンス: Link先を確認	Suho Lee, Seungwon Seo, Jihyo Kim, Yejin Lee, Sangheum Hwang	(参考訳) 近年、ラベル付きソースデータが常にアクセス可能であると仮定するアン教師なしドメイン適応(UDA)と比較して、ソースフリーなアン教師なしドメイン適応(SFUDA)が実用的で実現可能なアプローチとして出現している。しかし、SFUDAアプローチに関連する重要な制限はしばしば見過ごされ、現実のアプリケーションにおける実用性を制限する。これらの制限には、最適なハイパーパラメータを決定するための原則的な方法の欠如と、未ラベルのターゲットデータが、ソースデータに対するクローズドセットや同一ラベルの分布のような特定の要件を満たすことができない場合のパフォーマンス劣化が含まれる。これらの制限はすべて、SFUDAが完全にラベルのないターゲットデータに依存しているという事実に由来する。実世界のシナリオにおける既存のsfudaメソッドの限界を実証し、対象データへの分散やラベルの分散シフトを実証し、これらの方法が現実世界の設定に安全に適用できないことを検証した。実験結果から,SFUDAの限界を回避するために,ラベル付きデータ(例:1-または3-shot)で事前訓練したソースモデルを微調整することが,実用的で信頼性の高いソリューションであると主張している。一般的な信念とは対照的に、注意深い微調整モデルでは、ラベル付きデータのみをトレーニングしても過度な適合に悩まされず、サンプリングバイアスによるパフォーマンスの変化もほとんどない。様々なドメイン適応ベンチマークにおける実験結果から, マイナショットの微調整手法は標準sfuda設定で比較し, 現実的なシナリオで比較手法を上回った。私たちのコードはhttps://github.com/daintlab/fewshot-SFDAで利用可能です。 Recently, source-free unsupervised domain adaptation (SFUDA) has emerged as a more practical and feasible approach compared to unsupervised domain adaptation (UDA) which assumes that labeled source data are always accessible. However, significant limitations associated with SFUDA approaches are often overlooked, which limits their practicality in real-world applications. These limitations include a lack of principled ways to determine optimal hyperparameters and performance degradation when the unlabeled target data fail to meet certain requirements such as a closed-set and identical label distribution to the source data. All these limitations stem from the fact that SFUDA entirely relies on unlabeled target data. We empirically demonstrate the limitations of existing SFUDA methods in real-world scenarios including out-of-distribution and label distribution shifts in target data, and verify that none of these methods can be safely applied to real-world settings. Based on our experimental results, we claim that fine-tuning a source pretrained model with a few labeled data (e.g., 1- or 3-shot) is a practical and reliable solution to circumvent the limitations of SFUDA. Contrary to common belief, we find that carefully fine-tuned models do not suffer from overfitting even when trained with only a few labeled data, and also show little change in performance due to sampling bias. Our experimental results on various domain adaptation benchmarks demonstrate that the few-shot fine-tuning approach performs comparatively under the standard SFUDA settings, and outperforms comparison methods under realistic scenarios. Our code is available at https://github.com/daintlab/fewshot-SFDA .	翻訳日:2023-04-04 16:18:49 公開日:2023-04-03
# リアルタイムWindowsによる動的車両ルーティング問題を解決するための機械学習の組合せ最適化 Combinatorial Optimization enriched Machine Learning to solve the Dynamic Vehicle Routing Problem with Time Windows ( http://arxiv.org/abs/2304.00789v1 ) ライセンス: Link先を確認	L\'eo Baty, Kai Jungel, Patrick S. Klein, Axel Parmentier, Maximilian Schiffer	(参考訳) eコマースの台頭と顧客要求の増加により、ロジスティクスサービスプロバイダは日々の計画において新たな複雑さに直面している。既存のマルチステージ確率最適化アプローチは、基礎となる動的車両ルーティングの問題を解決するには、オンライン設定のアプリケーションには計算コストがかかりすぎるか、あるいは強化学習の場合、高次元の組合せ問題にうまく対応できない。これらの欠点を緩和するために,組合せ最適化層を組み込んだ新しい機械学習パイプラインを提案する。最近,EURO Meets NeurIPS Vehicle Routing Competition at NeurIPS 2022で推進されているディスパッチ波を用いた動的車両ルーティング問題に適用した。提案手法は,提案した動的車両経路問題の解法において,他の全ての手法よりも優れていた。本研究は,提案するパイプラインの有効性とメリットを,例えば,未確認のインスタンスやシナリオに対して符号化されたポリシの堅牢性を示すことで,競争で達成された結果を超えて強調する。 With the rise of e-commerce and increasing customer requirements, logistics service providers face a new complexity in their daily planning, mainly due to efficiently handling same day deliveries. Existing multi-stage stochastic optimization approaches that allow to solve the underlying dynamic vehicle routing problem are either computationally too expensive for an application in online settings, or -- in the case of reinforcement learning -- struggle to perform well on high-dimensional combinatorial problems. To mitigate these drawbacks, we propose a novel machine learning pipeline that incorporates a combinatorial optimization layer. We apply this general pipeline to a dynamic vehicle routing problem with dispatching waves, which was recently promoted in the EURO Meets NeurIPS Vehicle Routing Competition at NeurIPS 2022. Our methodology ranked first in this competition, outperforming all other approaches in solving the proposed dynamic vehicle routing problem. With this work, we provide a comprehensive numerical study that further highlights the efficacy and benefits of the proposed pipeline beyond the results achieved in the competition, e.g., by showcasing the robustness of the encoded policy against unseen instances and scenarios.	翻訳日:2023-04-04 16:18:21 公開日:2023-04-03
# 3次元アノテーションを伴わないオープンボキャブラリポイントクラウド物体検出 Open-Vocabulary Point-Cloud Object Detection without 3D Annotation ( http://arxiv.org/abs/2304.00788v1 ) ライセンス: Link先を確認	Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang	(参考訳) open-vocabulary detectionの目的は、任意のテキスト記述に基づいて新しいオブジェクトを識別することである。本稿では,オープンな3次元ポイントクラウド検出を分割・コンカレンス戦略により解決する。 1)各種オブジェクトのローカライズのための汎用表現を学習可能なポイントクラウド検出器の開発 2)テキスト表現とポイントクラウド表現を接続することで,検出者がテキストプロンプトに基づいて新たなオブジェクトカテゴリを分類できる。具体的には、2dプリトレーニングされた検出器から予測された2dバウンディングボックスの監督下で、ポイントクラウド検出器がオブジェクトのローカライズを学習するリッチイメージプリトレーニングモデルを用いる。さらに,画像,点雲,テキストのモダリティを結合し,視覚言語による事前学習モデル(CLIP)の恩恵を受けるために,非偏差三重項比較学習を提案する。ポイントクラウド検出器に画像と視覚言語を事前訓練した新しいモデルを使用することで、3Dアノテーションを必要とせずにオープンな3Dオブジェクト検出が可能になる。実験により,ScanNet および SUN RGB-D データセット上での幅広いベースラインに対して,少なくとも 3.03 点と 7.47 点の改善が得られた。さらに,アプローチが機能する理由を説明するために,包括的な分析を行う。 The goal of open-vocabulary detection is to identify novel objects based on arbitrary textual descriptions. In this paper, we address open-vocabulary 3D point-cloud detection by a dividing-and-conquering strategy, which involves: 1) developing a point-cloud detector that can learn a general representation for localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting. Specifically, we resort to rich image pre-trained models, by which the point-cloud detector learns localizing objects under the supervision of predicted 2D bounding boxes from 2D pre-trained detectors. Moreover, we propose a novel de-biased triplet cross-modal contrastive learning to connect the modalities of image, point-cloud and text, thereby enabling the point-cloud detector to benefit from vision-language pre-trained models,i.e.,CLIP. The novel use of image and vision-language pre-trained models for point-cloud detectors allows for open-vocabulary 3D object detection without the need for 3D annotations. Experiments demonstrate that the proposed method improves at least 3.03 points and 7.47 points over a wide range of baselines on the ScanNet and SUN RGB-D datasets, respectively. Furthermore, we provide a comprehensive analysis to explain why our approach works.	翻訳日:2023-04-04 16:17:58 公開日:2023-04-03
# 画像マッティングのための異方性事前学習 Disentangled Pre-training for Image Matting ( http://arxiv.org/abs/2304.00784v1 ) ライセンス: Link先を確認	Yanda Li, Zilong Huang, Gang Yu, Ling Chen, Yunchao Wei, Jianbo Jiao	(参考訳) 画像マッチングは、近年の文献における深層モデルのトレーニングを支援するために、高品質なピクセルレベルの人間のアノテーションを必要とする。このようなアノテーションは費用がかかり、スケールが難しいが、研究の発展を著しく妨げている。本研究では,無限個のデータを利用してマットング性能を向上させる自己教師付き事前学習手法を提案することで,この問題への最初の試みを行う。プリトレーニングタスクは、ランダムなトリマップとアルファマットを生成して画像不等角化目標を達成するイメージマットングと似た方法で設計される。次に、事前訓練されたモデルは、微調整のための下流マットングタスクの初期化として使用される。広範な実験評価により,提案手法は最先端のマットング法と他の自己教師付き初期化手法を大差で上回ることがわかった。また,異なるバックボーンアーキテクチャ上で提案手法の堅牢性を示す。コードとモデルは一般公開される予定だ。 Image matting requires high-quality pixel-level human annotations to support the training of a deep model in recent literature. Whereas such annotation is costly and hard to scale, significantly holding back the development of the research. In this work, we make the first attempt towards addressing this problem, by proposing a self-supervised pre-training approach that can leverage infinite numbers of data to boost the matting performance. The pre-training task is designed in a similar manner as image matting, where random trimap and alpha matte are generated to achieve an image disentanglement objective. The pre-trained model is then used as an initialisation of the downstream matting task for fine-tuning. Extensive experimental evaluations show that the proposed approach outperforms both the state-of-the-art matting methods and other alternative self-supervised initialisation approaches by a large margin. We also show the robustness of the proposed approach over different backbone architectures. The code and models will be publicly available.	翻訳日:2023-04-04 16:17:35 公開日:2023-04-03
# nemf:neural microflakeフィールドを用いた逆ボリュームレンダリング NeMF: Inverse Volume Rendering with Neural Microflake Field ( http://arxiv.org/abs/2304.00782v1 ) ライセンス: Link先を確認	Youjia Zhang, Teng Xu, Junqing Yu, Yuteng Ye, Junle Wang, Yanqing Jing, Jingyi Yu, Wei Yang	(参考訳) 未知の照明下で撮影された画像から物体の外観の物理的特性を復元することは、写真リアルなレンダリングには不可欠である。 Recent approaches adopt the emerging implicit scene representations and have shown impressive results.However, they unanimously adopt a surface-based representation,and hence can not well handle scenes with very complex geometry, translucent object and etc.In this paper, we propose to conduct inverse volume rendering, in contrast to surface-based, by representing a scene using microflake volume, which assumes the space is filled with infinite small flakes and light reflects or scatters at each spatial location according to microflake distributions. 我々はさらに、マイクロフレークボリュームを暗黙的にエンコードする座標ネットワークを採用し、原理的にエンド・ツー・エンドでネットワークをトレーニングするための微分可能なマイクロフレークボリュームレンダを開発し、我々のNeMFは、高度に複雑な幾何学や散乱物体の外観特性を効果的に回復し、高品質なリライティング、素材編集を可能にし、特に表面ベースアプローチでは不可能な散乱などのボリュームレンダリング効果をシミュレートする。 Recovering the physical attributes of an object's appearance from its images captured under an unknown illumination is challenging yet essential for photo-realistic rendering. Recent approaches adopt the emerging implicit scene representations and have shown impressive results.However, they unanimously adopt a surface-based representation,and hence can not well handle scenes with very complex geometry, translucent object and etc.In this paper, we propose to conduct inverse volume rendering, in contrast to surface-based, by representing a scene using microflake volume, which assumes the space is filled with infinite small flakes and light reflects or scatters at each spatial location according to microflake distributions. We further adopt the coordinate networks to implicitly encode the microflake volume, and develop a differentiable microflake volume renderer to train the network in an end-to-end way in principle.Our NeMF enables effective recovery of appearance attributes for highly complex geometry and scattering object, enables high-quality relighting, material editing, and especially simulates volume rendering effects, such as scattering, which is infeasible for surface-based approaches.	翻訳日:2023-04-04 16:17:22 公開日:2023-04-03
# デンス予測のための確率的確率的プロンプト学習 Probabilistic Prompt Learning for Dense Prediction ( http://arxiv.org/abs/2304.00779v1 ) ライセンス: Link先を確認	Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang, Kwanghoon Sohn	(参考訳) 決定論的素早い学習の最近の進歩は、様々な下流視覚タスクの代替となり、事前学習された視覚言語モデルの助けを借りて、モデルが強力な視覚表現を学習できるようになる。しかしながら、このアプローチは、単一の決定論的記述が画像全体を十分に表現できないため、より複雑で多様なオブジェクトを扱う必要のある密集した予測タスクのパフォーマンスを制限している。本稿では,高次予測タスクにおいて視覚言語知識を十分に活用するための新しい確率的プロンプト学習を提案する。まず,オブジェクトクラス全体の共通属性を記述するために,学習可能なクラス非依存属性プロンプトを導入する。属性は、クラス固有のテキスト分布を定義するために、クラス情報と視覚コンテキスト知識とを組み合わせる。テキスト表現をサンプル化し、確率的画素テキストマッチング損失を用いて高密度予測タスクを導出し、提案手法の安定性と一般化能力を高める。様々な密集予測タスクとアブレーション研究の広範な実験により,提案手法の有効性が示された。 Recent progress in deterministic prompt learning has become a promising alternative to various downstream vision tasks, enabling models to learn powerful visual representations with the help of pre-trained vision-language models. However, this approach results in limited performance for dense prediction tasks that require handling more complex and diverse objects, since a single and deterministic description cannot sufficiently represent the entire image. In this paper, we present a novel probabilistic prompt learning to fully exploit the vision-language knowledge in dense prediction tasks. First, we introduce learnable class-agnostic attribute prompts to describe universal attributes across the object class. The attributes are combined with class information and visual-context knowledge to define the class-specific textual distribution. Text representations are sampled and used to guide the dense prediction task using the probabilistic pixel-text matching loss, enhancing the stability and generalization capability of the proposed method. Extensive experiments on different dense prediction tasks and ablation studies demonstrate the effectiveness of our proposed method.	翻訳日:2023-04-04 16:17:03 公開日:2023-04-03
# 思考連鎖予測制御 Chain-of-Thought Predictive Control ( http://arxiv.org/abs/2304.00776v1 ) ライセンス: Link先を確認	Zhiwei Jia, Fangchen Liu, Vineet Thumuluri, Linghao Chen, Zhiao Huang, Hao Su	(参考訳) 複雑な低レベル制御タスク(コンタクトリッチオブジェクト操作など)の実証から、一般化可能なポリシー学習を研究する。本稿では,時間的抽象概念と階層的RL(HRL)の計画能力を,新規かつ効果的な方法で組み込んだ模倣学習手法を提案する。意思決定基盤モデルへのステップとして、当社の設計はスケーラブルで、高度に最適化されたデモを活用できます。具体的には、デモの短い部分列、すなわち CoT は、タスクのサブゴールの完了を示すことでそれらの階層構造を反映する。本モデルでは,CoT全体を協調的かつ構造化された長期アクションガイダンスとして動的に予測し,典型的な2段階のサブゴール条件のポリシーを一貫して上回っている。一方、このようなCoTは、デモ間で共有される決定パターン(重騒音やランダム性のあるものでさえ)を実証するため、一般化可能な政策学習を促進する。提案手法であるChain-of-Thought Predictive Control (CoTPC) は,スケーラブルかつ高度に最適化されたデモから,低レベルの操作タスクに挑戦する上で,既存のものよりも優れています。 We study generalizable policy learning from demonstrations for complex low-level control tasks (e.g., contact-rich object manipulations). We propose an imitation learning method that incorporates the idea of temporal abstraction and the planning capabilities from Hierarchical RL (HRL) in a novel and effective manner. As a step towards decision foundation models, our design can utilize scalable, albeit highly sub-optimal, demonstrations. Specifically, we find certain short subsequences of the demos, i.e. the chain-of-thought (CoT), reflect their hierarchical structures by marking the completion of subgoals in the tasks. Our model learns to dynamically predict the entire CoT as coherent and structured long-term action guidance and consistently outperforms typical two-stage subgoal-conditioned policies. On the other hand, such CoT facilitates generalizable policy learning as they exemplify the decision patterns shared among demos (even those with heavy noises and randomness). Our method, Chain-of-Thought Predictive Control (CoTPC), significantly outperforms existing ones on challenging low-level manipulation tasks from scalable yet highly sub-optimal demos.	翻訳日:2023-04-04 16:16:46 公開日:2023-04-03
# MRIスキャンによるMGMTプロモーターメチル化状態の予測深層学習モデルの広範な実験評価 MGMT promoter methylation status prediction using MRI scans? An extensive experimental evaluation of deep learning models ( http://arxiv.org/abs/2304.00774v1 ) ライセンス: Link先を確認	Numan Saeed, Muhammad Ridzuan, Hussain Alasmawi, Ikboljon Sobirov, Mohammad Yaqub	(参考訳) 医学的診断のための深層学習の研究は増えており、これらのシステムは臨床医を上回っているとしばしば主張されている。しかし、医療効果を示すシステムはごくわずかである。この観点から,高齢者の致死性脳腫瘍であるグリオブラスト腫(glioblastoma)に対する幅広い深層学習アルゴリズムについて検討した。手術、化学療法、放射線療法はグリオブラスト腫の標準的な治療である。腫瘍に特異的な遺伝子配列であるmgmtプロモーターのメチル化状態は化学療法の効果に影響する。 MGMTプロモーターメチル化は、いくつかのがんにおける化学療法反応と生存を改善する。 MGMTプロモーターメチル化は腫瘍組織生検によって決定され、遺伝子検査される。この長期かつ侵襲的な処置は、感染やその他の合併症のリスクを高める。そこで、研究者は深層学習モデルを用いて、脳MRIスキャンから腫瘍を調べ、MGMTプロモーターのメチル化状態を決定する。 MRIスキャンを用いてMGMTプロモーターのメチル化状態を予測するため,深層学習モデルと585人の参加者の公開MRIデータセットの1つを用いた。我々はこれらのモデルをGrad-CAM、オクルージョン感度、特徴可視化、学習損失景観を用いてテストする。以上の結果から, 癌診断における深層学習システムの精度と信頼性を確保するために, 外部コホートデータを用いてこれらのモデルの性能を検証すべきであることが示唆された。 The number of studies on deep learning for medical diagnosis is expanding, and these systems are often claimed to outperform clinicians. However, only a few systems have shown medical efficacy. From this perspective, we examine a wide range of deep learning algorithms for the assessment of glioblastoma - a common brain tumor in older adults that is lethal. Surgery, chemotherapy, and radiation are the standard treatments for glioblastoma patients. The methylation status of the MGMT promoter, a specific genetic sequence found in the tumor, affects chemotherapy's effectiveness. MGMT promoter methylation improves chemotherapy response and survival in several cancers. MGMT promoter methylation is determined by a tumor tissue biopsy, which is then genetically tested. This lengthy and invasive procedure increases the risk of infection and other complications. Thus, researchers have used deep learning models to examine the tumor from brain MRI scans to determine the MGMT promoter's methylation state. We employ deep learning models and one of the largest public MRI datasets of 585 participants to predict the methylation status of the MGMT promoter in glioblastoma tumors using MRI scans. We test these models using Grad-CAM, occlusion sensitivity, feature visualizations, and training loss landscapes. Our results show no correlation between these two, indicating that external cohort data should be used to verify these models' performance to assure the accuracy and reliability of deep learning systems in cancer diagnosis.	翻訳日:2023-04-04 16:16:25 公開日:2023-04-03
# オンライン確率ニュートン法による幾何中央値の推定と応用 Online stochastic Newton methods for estimating the geometric median and applications ( http://arxiv.org/abs/2304.00770v1 ) ライセンス: Link先を確認	Antoine Godichon-Baggioni (LPSM (UMR\_8001)), Wei Lu (LMI)	(参考訳) 大規模なサンプルの場合、少数の個体が平均のような基本的な統計指標を損なうことがある。非定型的個人を自動的に検出することは困難であり、別の戦略は堅牢なアプローチである。本稿では,中心傾向のロバスト指標である確率変数の幾何学的中央値の推定に着目する。逐次到着するデータの大量のサンプルを扱うために,幾何中央値の推定を行うオンライン確率ニュートンアルゴリズムを導入し,その収束率を示す。中央値とヘッセン行列の推定値を再帰的に更新できるので、任意の指定された方向における中央値の信頼区間を決定し、オンライン統計検査を行う。 In the context of large samples, a small number of individuals might spoil basic statistical indicators like the mean. It is difficult to detect automatically these atypical individuals, and an alternative strategy is using robust approaches. This paper focuses on estimating the geometric median of a random variable, which is a robust indicator of central tendency. In order to deal with large samples of data arriving sequentially, online stochastic Newton algorithms for estimating the geometric median are introduced and we give their rates of convergence. Since estimates of the median and those of the Hessian matrix can be recursively updated, we also determine confidences intervals of the median in any designated direction and perform online statistical tests.	翻訳日:2023-04-04 16:16:05 公開日:2023-04-03
# トポロジー行動による電力グリッドの管理--高度なルールベースと強化学習エージェントの比較研究 Managing power grids through topology actions: A comparative study between advanced rule-based and reinforcement learning agents ( http://arxiv.org/abs/2304.00765v1 ) ライセンス: Link先を確認	Malte Lehna and Jan Viebahn and Christoph Scholz and Antoine Marot and Sven Tomforde	(参考訳) 電力網の運用は、現在の上昇と再生可能エネルギー生産の増加により、ますます複雑になっている。その結果、アクティブグリッド管理は従来のアプローチで限界に達している。パワーネットワークの課題を実行するための学習の文脈において、強化学習(rl)は効率良く信頼性の高いアプローチであり、自動グリッド操作の可能性がかなり高いことが示されている。本稿では、Binbinchenから提出されたエージェントを分析し、RLとルールベースのアプローチの両方において、エージェントを改善するための新しい戦略を提供する。主な改善点はN-1戦略であり、1行が切断されてもグリッドを安定に保つトポロジー作用を考える。さらに、元のグリッドへのトポロジーの回帰も提案するが、これは有益であることが証明された。改善は、チャレンジテストセットの参照アプローチに対してテストされ、ルールベースのエージェントのパフォーマンスを27%向上することができる。ルールベースとRLエージェントを直接比較すると、同様の性能が得られる。しかし、rlエージェントには明確な計算上の利点がある。また、サンプルケースの振る舞いをより詳細に分析して、さらなる洞察を与えます。ここでは,n-1戦略を通じて,エージェントの行動がより多様化するのを観察した。 The operation of electricity grids has become increasingly complex due to the current upheaval and the increase in renewable energy production. As a consequence, active grid management is reaching its limits with conventional approaches. In the context of the Learning to Run a Power Network challenge, it has been shown that Reinforcement Learning (RL) is an efficient and reliable approach with considerable potential for automatic grid operation. In this article, we analyse the submitted agent from Binbinchen and provide novel strategies to improve the agent, both for the RL and the rule-based approach. The main improvement is a N-1 strategy, where we consider topology actions that keep the grid stable, even if one line is disconnected. More, we also propose a topology reversion to the original grid, which proved to be beneficial. The improvements are tested against reference approaches on the challenge test sets and are able to increase the performance of the rule-based agent by 27%. In direct comparison between rule-based and RL agent we find similar performance. However, the RL agent has a clear computational advantage. We also analyse the behaviour in an exemplary case in more detail to provide additional insights. Here, we observe that through the N-1 strategy, the actions of the agents become more diversified.	翻訳日:2023-04-04 16:15:54 公開日:2023-04-03
# 文書レベル関係抽出のための識別性とロバスト性の統合に向けて Towards Integration of Discriminability and Robustness for Document-Level Relation Extraction ( http://arxiv.org/abs/2304.00824v1 ) ライセンス: Link先を確認	Jia Guo, Stanley Kok, Lidong Bing	(参考訳) ドキュメントレベル関係抽出(docre)は、ドキュメントの長距離コンテキスト依存推論に依存するエンティティペアの関係を予測します。典型的なマルチラベル分類問題として、docreは、少数のポジティブな関係と多くのネガティブな関係を効果的に区別するという課題に直面している。この課題は、データセットにかなりの数のアノテーションエラーがある場合、さらに克服が困難になる。本研究では,DocRE問題に対する差別性と堅牢性の両方をよりよく統合することを目指している。具体的には,まず,確率的出力と内部表現の両方に対して高い識別性を与える効果的な損失関数を設計する。我々は,エントロピー最小化と教師付きコントラスト学習を革新的にカスタマイズした。ラベル誤りの影響を改善するため,本手法はモデルのロバスト性を高めるために,新しい負のラベルサンプリング戦略を導入した。さらに,アノテーションエラーを伴うより現実的なシナリオを模倣する2つの新しいデータレジームを導入し,サンプリング戦略を評価する。実験により,各コンポーネントの有効性を検証し,提案手法がDocREDデータセット,最近クリーン化したRe-DocRED,提案したデータレシスタンスにおいて,新たな最先端結果を実現することを示す。 Document-level relation extraction (DocRE) predicts relations for entity pairs that rely on long-range context-dependent reasoning in a document. As a typical multi-label classification problem, DocRE faces the challenge of effectively distinguishing a small set of positive relations from the majority of negative ones. This challenge becomes even more difficult to overcome when there exists a significant number of annotation errors in the dataset. In this work, we aim to achieve better integration of both the discriminability and robustness for the DocRE problem. Specifically, we first design an effective loss function to endow high discriminability to both probabilistic outputs and internal representations. We innovatively customize entropy minimization and supervised contrastive learning for the challenging multi-label and long-tailed learning problems. To ameliorate the impact of label errors, we equipped our method with a novel negative label sampling strategy to strengthen the model robustness. In addition, we introduce two new data regimes to mimic more realistic scenarios with annotation errors and evaluate our sampling strategy. Experimental results verify the effectiveness of each component and show that our method achieves new state-of-the-art results on the DocRED dataset, its recently cleaned version, Re-DocRED, and the proposed data regimes.	翻訳日:2023-04-04 16:08:36 公開日:2023-04-03
# 水中気泡の非線形振動による音楽の創造性 Musical creativity enabled by nonlinear oscillations of a bubble in water ( http://arxiv.org/abs/2304.00822v1 ) ライセンス: Link先を確認	Ivan S. Maksymov	(参考訳) オリジナルとアレンジされた既存の音楽成果は、習得に何年もの学習と実践を要する芸術である。しかし、aiによる音楽創造の分野における絶え間ない進歩にもかかわらず、質の高い音楽結果の生産は、まだ人間の前兆である。ここでは,水中の1つの気泡が,古典音楽の一片を符号化する音響圧力信号の下で非線形に振動するときに,創造的な音楽結果を生み出すために使用できることを実証する。バブルの応答の音声信号は、オリジナルの作曲のエレキギターバージョンに似ている。我々は,このバブルの性質が,音楽の配置と構成において人間の創造性をシミュレートできる物理に着想を得たAIシステムの構築に有効である,という理論的支持論を提案し,提案する。 Producing original and arranging existing musical outcomes is an art that takes years of learning and practice to master. Yet, despite the constant advances in the field of AI-powered musical creativity, production of quality musical outcomes remains a prerogative of the humans. Here we demonstrate that a single bubble in water can be used to produce creative musical outcomes, when it nonlinearly oscillates under an acoustic pressure signal that encodes a piece of classical music. The audio signal of the response of the bubble resembles an electric guitar version of the original composition. We suggest, and provide plausible theoretical supporting arguments, that this property of the bubble can be used to create physics-inspired AI systems capable of simulating human creativity in arrangement and composition of music.	翻訳日:2023-04-04 16:08:15 公開日:2023-04-03
# 適応的メッシュリファインメントのためのSwarm強化学習 Swarm Reinforcement Learning For Adaptive Mesh Refinement ( http://arxiv.org/abs/2304.00818v1 ) ライセンス: Link先を確認	Niklas Freymuth, Philipp Dahlinger, Tobias W\"urth, Luise K\"arger, Gerhard Neumann	(参考訳) アダプティブメッシュリファインメント(AMR)は、メッシュの解像度を動的に調整し、計算コストとシミュレーション精度のトレードオフを可能にするため、メッシュベースのシミュレーションには不可欠である。しかし、既存のAMRの方法はタスク依存のヒューリスティックス、高価なエラー推定器を使うか、より大きなメッシュやより複雑な問題にうまくスケールしない。本稿では、AMRをSwarm Reinforcement Learning問題として定式化し、メッシュの各要素を単純で均一なエージェントの協調システムの一部として見る。この問題の定式化とエージェントワイド報酬関数とグラフニューラルネットワークを組み合わせることで、任意の方程式系の信頼性とスケーラブルな洗練戦略を学習することができる。複雑なシミュレーションの精度と効率を改善するためのアプローチの有効性を実験的に実証した。その結果,学習ベースラインを上回って,推論中にエラーインジケータを必要とせず,従来のエラーベースのamrリファインメント戦略と同等のリファインメント品質を達成できた。 Adaptive Mesh Refinement (AMR) is crucial for mesh-based simulations, as it allows for dynamically adjusting the resolution of a mesh to trade off computational cost with the simulation accuracy. Yet, existing methods for AMR either use task-dependent heuristics, expensive error estimators, or do not scale well to larger meshes or more complex problems. In this paper, we formalize AMR as a Swarm Reinforcement Learning problem, viewing each element of a mesh as part of a collaborative system of simple and homogeneous agents. We combine this problem formulation with a novel agent-wise reward function and Graph Neural Networks, allowing us to learn reliable and scalable refinement strategies on arbitrary systems of equations. We experimentally demonstrate the effectiveness of our approach in improving the accuracy and efficiency of complex simulations. Our results show that we outperform learned baselines and achieve a refinement quality that is on par with a traditional error-based AMR refinement strategy without requiring error indicators during inference.	翻訳日:2023-04-04 16:08:00 公開日:2023-04-03
# 暗黙の談話関係をクラウドソーシングするための設計選択--タスク設計によるバイアスの顕在化 Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design ( http://arxiv.org/abs/2304.00815v1 ) ライセンス: Link先を確認	Valentina Pyatkin, Frances Yung, Merel C.J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg	(参考訳) 自然言語アノテーションの識別は、アノテーションやアノテーションフレームワークによって導入されたバイアスの観点から研究されている。そこで,本研究では,自然言語を用いて名詞の解釈を導出するクラウドソース言語アノテーションに対して,特に強い影響を与えるタスク設計バイアス(task design bias)を提案する。この目的のために,関係の曖昧さから繰り返し難易度が示された暗黙の談話関係アノテーションについて考察する。 2つの異なるアノテーションタスクを用いて得られた1200の談話関係のアノテーションを比較し、4つの異なるドメインにわたって両方のメソッドのバイアスを定量化する。どちらのメソッドもクラウドソーシング用に設計された自然言語アノテーションタスクである。タスク設計は、特定の関係に注釈者を押し付けることができ、いくつかの談話関係感覚は、一方または他方のアノテーションアプローチによりよりよく導かれることが示される。また、トレーニングやテストモデルでは、このようなバイアスを考慮するべきだと結論付けています。 Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations' ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.	翻訳日:2023-04-04 16:07:41 公開日:2023-04-03
# ディープニューラルネットワークのモデル非依存的到達性解析 Model-Agnostic Reachability Analysis on Deep Neural Networks ( http://arxiv.org/abs/2304.00813v1 ) ライセンス: Link先を確認	Chi Zhang, Wenjie Ruan, Fu Wang, Peipei Xu, Geyong Min, Xiaowei Huang	(参考訳) 検証は安全クリティカルシステムの形式解析において重要な役割を果たす。現在の検証手法の多くは、ディープニューラルネットワーク(DNN)に取り組む際に、特定の要件を持っている。それらは、例えばfeedforward neural networks(fnn)のような特定のネットワークカテゴリや、特定のアクティベーション機能を持つネットワーク、例えばrdluをターゲットにしている。本稿では、DeepAgnと呼ばれるモデルに依存しない検証フレームワークを開発し、FNN、リカレントニューラルネットワーク(RNN)、あるいは両者の混合に適用可能であることを示す。リプシッツ連続性の仮定の下で、DeepAgnは、グローバル収束を保証する新しい最適化スキームに基づいて、DNNの到達可能性を分析する。レイヤやパラメータといったネットワークの内部構造にアクセスする必要はない。到達可能性解析により、DeepAgnは与えられた入力に対する最大安全半径を計算し、接地的真逆の例を生成するなど、よく知られた堅牢性問題に取り組むことができる。我々はまた、最先端の検証アプローチよりも、非常に深い層と数百万のニューロンを持つFNNとRNNを含む、より広いレベルのディープニューラルネットワークを扱うDeepAgnの優れた能力と効率を実証的に示す。 Verification plays an essential role in the formal analysis of safety-critical systems. Most current verification methods have specific requirements when working on Deep Neural Networks (DNNs). They either target one particular network category, e.g., Feedforward Neural Networks (FNNs), or networks with specific activation functions, e.g., RdLU. In this paper, we develop a model-agnostic verification framework, called DeepAgn, and show that it can be applied to FNNs, Recurrent Neural Networks (RNNs), or a mixture of both. Under the assumption of Lipschitz continuity, DeepAgn analyses the reachability of DNNs based on a novel optimisation scheme with a global convergence guarantee. It does not require access to the network's internal structures, such as layers and parameters. Through reachability analysis, DeepAgn can tackle several well-known robustness problems, including computing the maximum safe radius for a given input, and generating the ground-truth adversarial examples. We also empirically demonstrate DeepAgn's superior capability and efficiency in handling a broader class of deep neural networks, including both FNNs, and RNNs with very deep layers and millions of neurons, than other state-of-the-art verification approaches.	翻訳日:2023-04-04 16:07:24 公開日:2023-04-03
# メソスコピックキャビティ-QEDシステムにおける深い光・物質相互作用の非摂動効果 Non-perturbative effects of deep-strong light-matter interaction in a mesoscopic cavity-QED system ( http://arxiv.org/abs/2304.00805v1 ) ライセンス: Link先を確認	Andrey Kudlis, Denis Novokreschenov, Ivan Iorsh, Ilya Tokatly	(参考訳) 量子ダイマーの2つの群を共通の電磁空洞に配置し、その群のいずれかに静的外部電位を選択的に印加することにより制御するシステムを考える。真空電磁ゆらぎへの強い結合の過程において、二量体間の創発的な光子アシスト相互作用は、第2群に適用されるポテンシャルに対する第1の偏りのない二量体群の強い非線形量子化クロスポーラライゼーション応答をもたらすことを示す。全体分極は、数と位置が群内の二量体の数のパリティに依存するような、ほぼ理想的なステップの連続を示す。この非摂動効果は、有限個のダイマーからなるメソスコピック系の特徴的な特徴であり、一般化されたディッケモデルの予測によく用いられる熱力学的極限で消失する。 We consider a system comprising two groups of quantum dimers placed in a common electromagnetic cavity, and controlled by selectively applying a static external potential to one of the groups. We show that in the regime of deep strong coupling to vacuum electromagnetic fluctuations, the emergent photon-assisted interaction between the dimers leads to a strongly non-linear quantized cross-polarization response of the first, unbiased group of dimers to the potential applied to the second group. The total polarization shows a series of almost ideal steps whose number and position depends on the parity of the numbers of dimers in the groups. This non-perturbative effect is a distinctive feature of mesoscopic systems comprising finite number of dimers and disappears in the thermodynamic limit which is commonly used in the desciption of the generalized Dicke models.	翻訳日:2023-04-04 16:07:04 公開日:2023-04-03
# 強化学習入門 A Tutorial Introduction to Reinforcement Learning ( http://arxiv.org/abs/2304.00803v1 ) ライセンス: Link先を確認	Mathukumalli Vidyasagar	(参考訳) 本稿では,Stochastic Approximation(SA)を統一テーマとして,強化学習(RL)に関する簡単な調査を行う。論文の範囲はMarkov Reward Processes、Markov Decision Processes、Stochastic Approximation Algorithm、時間差分学習や$Q$-learningといった広く使われているアルゴリズムを含む。 In this paper, we present a brief survey of Reinforcement Learning (RL), with particular emphasis on Stochastic Approximation (SA) as a unifying theme. The scope of the paper includes Markov Reward Processes, Markov Decision Processes, Stochastic Approximation algorithms, and widely used algorithms such as Temporal Difference Learning and $Q$-learning.	翻訳日:2023-04-04 16:06:45 公開日:2023-04-03
# ソフトディッションによるノイズ画像分割 Noisy Image Segmentation With Soft-Dice ( http://arxiv.org/abs/2304.00801v1 ) ライセンス: Link先を確認	Marcus Nordstr\"om, Henrik Hult, Atsuto Maki, Fredrik L\"ofman	(参考訳) 本稿では,対象ラベルにノイズが存在する状況において,医用画像セグメンテーションにおいて最も一般的な損失関数であるソフトダイス損失について検討する。特に最適解の集合が特徴づけられ、これらの解の体積バイアスの鋭い境界が提供される。さらに, 最適ソフトディスに収束するソフトセグメンテーションのシーケンスは, しきい値化を用いてハードセグメンテーションに変換した場合, 最適ディスに収束することを示した。これは、ソフトディースの計量を最大化するためのプロキシとしてしばしば使用されるため、重要な結果である。最後に、理論結果の確認実験を行う。 This paper presents a study on the soft-Dice loss, one of the most popular loss functions in medical image segmentation, for situations where noise is present in target labels. In particular, the set of optimal solutions are characterized and sharp bounds on the volume bias of these solutions are provided. It is further shown that a sequence of soft segmentations converging to optimal soft-Dice also converges to optimal Dice when converted to hard segmentations using thresholding. This is an important result because soft-Dice is often used as a proxy for maximizing the Dice metric. Finally, experiments confirming the theoretical results are provided.	翻訳日:2023-04-04 16:06:37 公開日:2023-04-03
# マイクロ波量子ダイオード Microwave quantum diode ( http://arxiv.org/abs/2304.00799v1 ) ライセンス: Link先を確認	Rishabh Upadhyay, Dmitry S. Golubev, Yu-Cheng Chang, George Thomas, Andrew Guthrie, Joonas T. Peltonen, and Jukka P. Pekola	(参考訳) 量子回路の脆弱な性質は、スケーラブルな量子アプリケーションにとって大きなボトルネックである。低温で動作する量子回路は、増幅バックアクションや外部ノイズに対して非常に脆弱である。この目的のために循環器やアイソレータなどの非逆マイクロ波デバイスが使用される。これらのデバイスは、量子回路のスケーラビリティを制限している。超伝導フラックス量子ビットの非線形性を利用した小型マイクロ波ダイオードアーキテクチャを提案する。 qubit縮退点において, 逆方向に伝達される電力レベルに有意な差があることを実験的に示す。観測結果は提案された理論モデルと一致している。入力電力は-99dBmで、また、クビット共振器近傍で交差領域を回避し、50MHzの広帯域帯域では6.81 GHzから6.86 GHz、250MHzでは6.67 GHzから6.91 GHzの伝送補正比が90%を超えることを報告した。提示されたアーキテクチャはコンパクトで、複数の読み出しチャネルに対して容易にスケーラブルであり、量子情報、マイクロ波読み出し、光メカニクスの多様な機会を開く可能性がある。 The fragile nature of quantum circuits is a major bottleneck to scalable quantum applications. Operating at cryogenic temperatures, quantum circuits are highly vulnerable to amplifier backaction and external noise. Non-reciprocal microwave devices such as circulators and isolators are used for this purpose. These devices have a considerable footprint in cryostats, limiting the scalability of quantum circuits. We present a compact microwave diode architecture, which exploits the non-linearity of a superconducting flux qubit. At the qubit degeneracy point we experimentally demonstrate a significant difference between the power levels transmitted in opposite directions. The observations align with the proposed theoretical model. At -99 dBm input power, and near the qubit-resonator avoided crossing region, we report the transmission rectification ratio exceeding 90% for a 50 MHz wide frequency range from 6.81 GHz to 6.86 GHz, and over 60% for the 250 MHz range from 6.67 GHz to 6.91 GHz. The presented architecture is compact, and easily scalable towards multiple readout channels, potentially opening up diverse opportunities in quantum information, microwave read-out and optomechanics.	翻訳日:2023-04-04 16:06:28 公開日:2023-04-03
# FinnWoodlands データセット FinnWoodlands Dataset ( http://arxiv.org/abs/2304.00793v1 ) ライセンス: Link先を確認	Juan Lagos, Urho Lempi\"o and Esa Rahtu	(参考訳) 大規模で多様なデータセットが利用可能になったことは、自動運転や屋内アプリケーションにおいて大きなブレークスルーをもたらしたが、林業アプリケーションはまだ遅れており、新しい森林データセットは、森林のようなシナリオのためのデータ駆動手法の開発において大きな進歩をもたらすだろう。本稿では, RGBステレオ画像, 点雲, スパース深度マップ, および意味, 例, 汎視的セグメンテーションのための接地真理手動アノテーションからなる森林データセット「textit{FinnWoodlands}」を紹介する。 \textit{FinnWoodlands} は、4226のオブジェクトを手動で注釈付けし、そのうち2562のオブジェクト (60.6\%) は、"Spruce Tree"、"Birch Tree"、"Pine Tree"の3つの異なるインスタンスカテゴリに分類されるツリートランクに対応している。ツリートランクの他に、インスタンスとして"Obstacles"オブジェクトや、"Lake"、"Ground"、"Track"といったセマンティックなクラスも注釈付けしました。私たちのデータセットは、環境の全体的表現が関連する森林アプリケーションで使用できます。ケースセグメンテーション、パン光学セグメンテーション、深さ補完の3つのモデルを用いた初期ベンチマークを行い、そのような非構造化シナリオがもたらす課題を説明する。 While the availability of large and diverse datasets has contributed to significant breakthroughs in autonomous driving and indoor applications, forestry applications are still lagging behind and new forest datasets would most certainly contribute to achieving significant progress in the development of data-driven methods for forest-like scenarios. This paper introduces a forest dataset called \textit{FinnWoodlands}, which consists of RGB stereo images, point clouds, and sparse depth maps, as well as ground truth manual annotations for semantic, instance, and panoptic segmentation. \textit{FinnWoodlands} comprises a total of 4226 objects manually annotated, out of which 2562 objects (60.6\%) correspond to tree trunks classified into three different instance categories, namely "Spruce Tree", "Birch Tree", and "Pine Tree". Besides tree trunks, we also annotated "Obstacles" objects as instances as well as the semantic stuff classes "Lake", "Ground", and "Track". Our dataset can be used in forestry applications where a holistic representation of the environment is relevant. We provide an initial benchmark using three models for instance segmentation, panoptic segmentation, and depth completion, and illustrate the challenges that such unstructured scenarios introduce.	翻訳日:2023-04-04 16:06:10 公開日:2023-04-03
# GreekBART:最初の事前訓練されたギリシャのシークエンス・ツー・シークエンスモデル GreekBART: The First Pretrained Greek Sequence-to-Sequence Model ( http://arxiv.org/abs/2304.00869v1 ) ライセンス: Link先を確認	Iakovos Evdaimon, Hadi Abdine, Christos Xypolopoulos, Stamatis Outsios, Michalis Vazirgiannis, Giorgos Stamou	(参考訳) 転校学習の時代は、コンピュータビジョンと自然言語処理の分野に革命をもたらし、様々なタスクにまたがる優れた事前学習モデルをもたらした。具体的には、自然言語処理タスクはトランスフォーマーベースの言語モデルによって支配されている。自然言語推論および自然言語生成タスクでは、BERTモデルとその変種は、GPTモデルとその後継と同様に、模範的な性能を示した。しかし、これらのモデルのほとんどは事前訓練され、主に英語や多言語コーパスで評価される。本稿では,bartベースアーキテクチャに基づいた最初のseq2seqモデルであるギリシャバルトを紹介し,大規模ギリシアコーパスで事前学習する。我々は,BART-random, Greek-BERT, XLM-Rを様々な識別課題で評価し,比較した。さらに,新たに導入されたギリシャ語用要約データセットである greeksum の 2 つの nlg タスクにおける性能について検討した。モデル、コード、新しい要約データセットが公開される予定だ。 The era of transfer learning has revolutionized the fields of Computer Vision and Natural Language Processing, bringing powerful pretrained models with exceptional performance across a variety of tasks. Specifically, Natural Language Processing tasks have been dominated by transformer-based language models. In Natural Language Inference and Natural Language Generation tasks, the BERT model and its variants, as well as the GPT model and its successors, demonstrated exemplary performance. However, the majority of these models are pretrained and assessed primarily for the English language or on a multilingual corpus. In this paper, we introduce GreekBART, the first Seq2Seq model based on BART-base architecture and pretrained on a large-scale Greek corpus. We evaluate and compare GreekBART against BART-random, Greek-BERT, and XLM-R on a variety of discriminative tasks. In addition, we examine its performance on two NLG tasks from GreekSUM, a newly introduced summarization dataset for the Greek language. The model, the code, and the new summarization dataset will be publicly available.	翻訳日:2023-04-04 16:00:50 公開日:2023-04-03
# より高いチャーン数を持つランダウレベルとそのアナログの特異性 Uniqueness of Landau levels and their analogs with higher Chern numbers ( http://arxiv.org/abs/2304.00866v1 ) ライセンス: Link先を確認	Bruno Mera, Tomoki Ozawa	(参考訳) 最も低いランダウレベルの波動関数は、一様磁場の下で2次元の荷電粒子のハミルトニアンの固有状態である。それらは実数空間と運動量空間の両方で正則であることが知られ、また運動量空間において一様で変換不変な幾何学的性質を示す。本稿ではストーン・ヴォン・ノイマンの定理を用いて、最低ランダウレベルの波動関数が、これらの条件を満たす単位チャーン数を持つ唯一の可能な状態であることを示す。また,高いチャーン数を持つ直接アナログの特異性を証明し,それらの表現を提供する。 Lowest Landau level wavefunctions are eigenstates of the Hamiltonian of a charged particle in two dimensions under a uniform magnetic field. They are known to be holomorphic both in real and momentum spaces, and also exhibit uniform, translationally invariant, geometrical properties in momentum space. In this paper, using the Stone-von Neumann theorem, we show that lowest Landau level wavefunctions are indeed the only possible states with unit Chern number satisfying these conditions. We also prove the uniqueness of their direct analogs with higher Chern numbers and provide their expressions.	翻訳日:2023-04-04 16:00:32 公開日:2023-04-03
# コヒーレント制御と非コヒーレント制御による2量子ビットシステムの最適状態操作 Optimal State Manipulation for a Two-Qubit System Driven by Coherent and Incoherent Controls ( http://arxiv.org/abs/2304.00863v1 ) ライセンス: Link先を確認	Oleg Morzhin, Alexander Pechen	(参考訳) 2量子ビット量子系の最適制御は、2量子ビットゲート生成からスピン鎖に沿ってコヒーレンス行列を転送する受信機の最適化に至るまでの応用により、高い関心を集めている。国家の準備と操作は、そのようなシステムを研究する上で重要な課題である。通常、レーザーパルスのようなコヒーレント制御は、2量子ビットシステムを操作するために用いられる。しかし、環境は一貫性のない制御リソースとして$\unicode{x2013}$を使うこともできる。本稿では、ゴリニ-コサコフスキー-スダルシャン-リンドブラッドマスター方程式により力学が支配される2量子系の最適状態操作について考察する。 2つの物理的に異なる相互作用クラスとコヒーレント制御を活用し、最終密度行列と目標密度行列の間のヒルベルト・シュミット重なりを最適化し、与えられた値へのステアリングの最適化を含む。我々は、コヒーレントかつ非コヒーレントな制御がポントリャーギンの最大原理を満たすときの条件と、それらが目的函数の定常点を形成するときの条件を見出す。さらに、この定常点がオーバーラップのグローバルに最小の値を与える場合を見出す。重ね合わせに上界と下界を用い,機能制御を併用した1段階および2段階の勾配投影法を開発した。 Optimal control of two-qubit quantum systems attracts high interest due to applications ranging from two-qubit gate generation to optimization of receiver for transferring coherence matrices along spin chains. State preparation and manipulation is among important tasks to study for such systems. Typically coherent control, e.g. a shaped laser pulse, is used to manipulate two-qubit systems. However, the environment can also be used $\unicode{x2013}$ as an incoherent control resource. In this article, we consider optimal state manipulation for a two-qubit system whose dynamics is governed by the Gorini-Kossakowski-Sudarshan-Lindblad master equation, where coherent control enters into the Hamiltonian and incoherent control into both the Hamiltonian (via Lamb shift) and the superoperator of dissipation. We exploit two physically different classes of interaction with coherent control and optimize the Hilbert-Schmidt overlap between final and target density matrices, including optimization of its steering to a given value. We find the conditions when zero coherent and incoherent controls satisfy the Pontryagin maximum principle, and in addition, when they form a stationary point of the objective functional. Moreover, we find a case when this stationary point provides the globally minimal value of the overlap. Using upper and lower bounds for the overlap, we develop one- and two-step gradient projection methods operating with functional controls.	翻訳日:2023-04-04 16:00:21 公開日:2023-04-03
# 音声感情認識システムの設計と評価:IEMOCAPを用いた実環境チェックケーススタディ Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAP ( http://arxiv.org/abs/2304.00860v1 ) ライセンス: Link先を確認	Nikolaos Antoniou and Athanasios Katsamanis and Theodoros Giannakopoulos and Shrikanth Narayanan	(参考訳) 音声感情認識(SER)の直接的かつ公平な比較を可能にするためのガイドラインと標準テストセットがすぐに必要となる。 Interactive Emotional Dyadic Motion Capture (IEMOCAP) データベースのようなリソースは、研究者がSERのモデルを開発し、テストするために広く採用されている参照コーパスとして現れてきたが、論文は再現性と一般化に挑戦するその用途において、幅広い仮定と多様性を明らかにしている。 IEMOCAPをユースケースとして用いたSERの最近の進歩に対する批判的なレビューに基づいて、我々の研究は2つのコントリビューションを目指している。第2に,オープンソース実装を用いた最近の出版物では,serの再現性評価に重点を置いている。 There is an imminent need for guidelines and standard test sets to allow direct and fair comparisons of speech emotion recognition (SER). While resources, such as the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database, have emerged as widely-adopted reference corpora for researchers to develop and test models for SER, published work reveals a wide range of assumptions and variety in its use that challenge reproducibility and generalization. Based on a critical review of the latest advances in SER using IEMOCAP as the use case, our work aims at two contributions: First, using an analysis of the recent literature, including assumptions made and metrics used therein, we provide a set of SER evaluation guidelines. Second, using recent publications with open-sourced implementations, we focus on reproducibility assessment in SER.	翻訳日:2023-04-04 15:59:56 公開日:2023-04-03
# 自己教師型骨格に基づく行動認識のためのFocalized Contrastive View-invariant Learning Focalized Contrastive View-invariant Learning for Self-supervised Skeleton-based Action Recognition ( http://arxiv.org/abs/2304.00858v1 ) ライセンス: Link先を確認	Qianhui Men, Edmond S. L. Ho, Hubert P. H. Shum, Howard Leung	(参考訳) ビュー不変表現の学習は,骨格に基づく行動認識における特徴識別能力の向上の鍵となる。既存のアプローチでは、暗黙のビュー依存表現による視点の影響を効果的に排除することはできない。本研究では,視点が粗い表現空間における視点固有情報を著しく抑圧する,focalized contrastive view-invariant learning(focovil)と呼ばれる自己教師付きフレームワークを提案する。多視点サンプルペア間の効果的なコントラスト損失で相互情報を最大化することにより、FoCoViLはアクションを共通のビュー不変性に関連付け、異種情報を同時に分離する。さらに,ペアワイズ類似性に基づく適応焦点化法を提案し,学習空間におけるよりクリアなクラスタ境界に対するコントラスト学習を強化する。教師付き分類器に大きく依存する既存の自己教師付き表現学習作業とは異なり、FoCoViLは教師なし分類器と教師なし分類器の両方で優れた認識性能を持つ。広範な実験により、コントラストベースの焦点化がより識別的な潜在表現を生成することも示されている。 Learning view-invariant representation is a key to improving feature discrimination power for skeleton-based action recognition. Existing approaches cannot effectively remove the impact of viewpoint due to the implicit view-dependent representations. In this work, we propose a self-supervised framework called Focalized Contrastive View-invariant Learning (FoCoViL), which significantly suppresses the view-specific information on the representation space where the viewpoints are coarsely aligned. By maximizing mutual information with an effective contrastive loss between multi-view sample pairs, FoCoViL associates actions with common view-invariant properties and simultaneously separates the dissimilar ones. We further propose an adaptive focalization method based on pairwise similarity to enhance contrastive learning for a clearer cluster boundary in the learned space. Different from many existing self-supervised representation learning work that rely heavily on supervised classifiers, FoCoViL performs well on both unsupervised and supervised classifiers with superior recognition performance. Extensive experiments also show that the proposed contrastive-based focalization generates a more discriminative latent representation.	翻訳日:2023-04-04 15:59:32 公開日:2023-04-03
# ハイパースペクトル画像復調用スペクトル強調矩形変換器 Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising ( http://arxiv.org/abs/2304.00844v1 ) ライセンス: Link先を確認	Miaoyu Li, Ji Liu, Ying Fu, Yulun Zhang and Dejing Dou	(参考訳) ノイズ除去は、ハイパースペクトル画像(hsi)アプリケーションにとって重要なステップである。深層学習の強大な力を目撃する一方で、既存のhsi分類法は、非局所的自己相似性を捉える上での限界に苦しむ。トランスフォーマーは長距離依存を捕捉する可能性を示しているが、HSIの空間的およびスペクトル的相関をモデル化するために特別に設計されたトランスフォーマーを用いた試みはほとんど行われていない。本稿では,スペクトル拡張矩形変圧器を提案し,hsisの非局所空間的類似性と大域的スペクトル低ランク性について検討する。前者については,空間領域における非局所的類似性を捉えるために,直方体自己付着を水平および垂直に活用する。後者のために,空間スペクトル立方体の大域的低ランク特性を抽出し,重なり合わない空間長方形間の相互作用を可能とし,雑音を抑制するスペクトル拡張モジュールを設計する。合成ノイズHSIと実雑音HSIを併用して広汎な実験を行い,本手法の有効性を客観的な計測値と主観的視覚的品質の両方の観点から示した。コードはhttps://github.com/myuli/sertで入手できる。 Denoising is a crucial step for hyperspectral image (HSI) applications. Though witnessing the great power of deep learning, existing HSI denoising methods suffer from limitations in capturing the non-local self-similarity. Transformers have shown potential in capturing long-range dependencies, but few attempts have been made with specifically designed Transformer to model the spatial and spectral correlation in HSIs. In this paper, we address these issues by proposing a spectral enhanced rectangle Transformer, driving it to explore the non-local spatial similarity and global spectral low-rank property of HSIs. For the former, we exploit the rectangle self-attention horizontally and vertically to capture the non-local similarity in the spatial domain. For the latter, we design a spectral enhancement module that is capable of extracting global underlying low-rank property of spatial-spectral cubes to suppress noise, while enabling the interactions among non-overlapping spatial rectangles. Extensive experiments have been conducted on both synthetic noisy HSIs and real noisy HSIs, showing the effectiveness of our proposed method in terms of both objective metric and subjective visual quality. The code is available at https://github.com/MyuLi/SERT.	翻訳日:2023-04-04 15:59:02 公開日:2023-04-03
# MetaHead: リアルなデジタルヘッドを作るためのエンジン MetaHead: An Engine to Create Realistic Digital Head ( http://arxiv.org/abs/2304.00838v1 ) ライセンス: Link先を確認	Dingyun Zhang, Chenglai Zhong, Yudong Guo, Yang Hong, Juyong Zhang	(参考訳) トレーニングデータの収集とラベル付けは、学習ベースの手法にとって重要なステップである。顔分析タスクでは、顔データを生成するためにいくつかの生成モデルを使用することができるが、生成の多様性、再現精度、立体整合性、高忠実度視覚的品質、編集容易性のサブセットしか達成できない。近年、グラフィックベースの生成手法が研究されているが、計算コストの高い低リアリズムヘッドしかレンダリングできない。本稿では,制御可能な頭部放射場(metahead-f)と,表示に一貫性のある3d制御可能なデジタルヘッドと,所定のカスタマイズ可能な特徴ラベルに準拠したデジタルヘッドを生成する汎用的トップダウン画像生成フレームワーク labelheadとからなる,統一的でフル機能の制御可能なデジタルヘッドエンジンであるmetaheadを提案する。制御可能なディジタルヘッドエンジンは、最先端の視覚的品質と再現精度を実現する。さらに、生成されたラベル付きデータは、実際のトレーニングデータを支援し、トレーニング効果の観点からグラフィックベースの手法によって生成されたラベル付きデータを著しく上回ることができる。 Collecting and labeling training data is one important step for learning-based methods because the process is time-consuming and biased. For face analysis tasks, although some generative models can be used to generate face data, they can only achieve a subset of generation diversity, reconstruction accuracy, 3D consistency, high-fidelity visual quality, and easy editability. One recent related work is the graphics-based generative method, but it can only render low realism head with high computation cost. In this paper, we propose MetaHead, a unified and full-featured controllable digital head engine, which consists of a controllable head radiance field(MetaHead-F) to super-realistically generate or reconstruct view-consistent 3D controllable digital heads and a generic top-down image generation framework LabelHead to generate digital heads consistent with the given customizable feature labels. Experiments validate that our controllable digital head engine achieves the state-of-the-art generation visual quality and reconstruction accuracy. Moreover, the generated labeled data can assist real training data and significantly surpass the labeled data generated by graphics-based methods in terms of training effect.	翻訳日:2023-04-04 15:58:31 公開日:2023-04-03
# 障害不変性ニューラル表現 Disorder-invariant Implicit Neural Representation ( http://arxiv.org/abs/2304.00837v1 ) ライセンス: Link先を確認	Hao Zhu, Shaowen Xie, Zhen Liu, Fengyi Liu, Qi Zhang, You Zhou, Yi Lin, Zhan Ma, Xun Cao	(参考訳) 入射神経表現(INR)は、信号の属性を対応する座標の関数として特徴づけ、逆問題を解決するための鋭い武器として現れる。しかし、INRの表現力は、ネットワークトレーニングにおけるスペクトルバイアスによって制限される。本稿では,入力信号の座標を再配置することにより,従来のinrバックボーンにハッシュテーブルを付加することで,そのような周波数関連問題を大幅に解決できることを示す。同じ属性のヒストグラムと異なる配置順序を共有する離散的な信号が与えられると、ハッシュテーブルは座標を後のinrネットワークを用いてより良くモデル化できる同じ分布に投影し、スペクトルバイアスを大幅に軽減することができる。さらに、DINERの表現力は、ハッシュテーブルの幅によって決定される。異なる幅は属性空間の異なる幾何学的要素に対応する: \textit{e.e.}, 1d curve, 2d curve-plane, 3d curve-volume それぞれ1ドル、2ドル、3ドルである。幾何学的要素のより広い領域はより強い表現力をもたらす。実験では、異なるINRバックボーン(MLP vs. SIREN)と様々なタスク(画像/ビデオ表現、位相検索、屈折率回復、神経放射場最適化)に対するDINERの一般化だけでなく、品質と速度の両方において最先端のアルゴリズムよりも優れていることを示す。 \textit{Project page:} \url{https://ezio77.github.io/DINER-website/} Implicit neural representation (INR) characterizes the attributes of a signal as a function of corresponding coordinates which emerges as a sharp weapon for solving inverse problems. However, the expressive power of INR is limited by the spectral bias in the network training. In this paper, we find that such a frequency-related problem could be greatly solved by re-arranging the coordinates of the input signal, for which we propose the disorder-invariant implicit neural representation (DINER) by augmenting a hash-table to a traditional INR backbone. Given discrete signals sharing the same histogram of attributes and different arrangement orders, the hash-table could project the coordinates into the same distribution for which the mapped signal can be better modeled using the subsequent INR network, leading to significantly alleviated spectral bias. Furthermore, the expressive power of the DINER is determined by the width of the hash-table. Different width corresponds to different geometrical elements in the attribute space, \textit{e.g.}, 1D curve, 2D curved-plane and 3D curved-volume when the width is set as $1$, $2$ and $3$, respectively. More covered areas of the geometrical elements result in stronger expressive power. Experiments not only reveal the generalization of the DINER for different INR backbones (MLP vs. SIREN) and various tasks (image/video representation, phase retrieval, refractive index recovery, and neural radiance field optimization) but also show the superiority over the state-of-the-art algorithms both in quality and speed. \textit{Project page:} \url{https://ezio77.github.io/DINER-website/}	翻訳日:2023-04-04 15:58:11 公開日:2023-04-03
# AUDIT:潜時拡散モデルによる指示の追従による音声編集 AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models ( http://arxiv.org/abs/2304.00830v1 ) ライセンス: Link先を確認	Yuancheng Wang, Zeqian Ju, Xu Tan, Lei He, Zhizheng Wu, Jiang Bian, Sheng Zhao	(参考訳) オーディオ編集は、背景の音響効果の追加、楽器の交換、損傷したオーディオの修復など、様々な目的に適用できる。近年,出力音声のテキスト記述を条件とした拡散雑音処理により,ゼロショット音声編集を実現する手法が提案されている。しかし、これらの方法にはまだいくつか問題がある。 1) 編集作業の訓練を受けておらず,良好な編集効果を確保できない。 2) 編集を必要としないオーディオセグメントを誤って変更することができる。 3) 出力音声の完全な記述が必要であり、実用シナリオでは必ずしも利用可能あるいは必要ではない。本研究では,遅延拡散モデルに基づく命令誘導音声編集モデルであるAUDITを提案する。具体的には、AUDITには3つの主要な設計特徴がある。 1)異なるオーディオ編集タスクのためのトリプルトトレーニングデータ(インストラクション、入力オーディオ、出力オーディオ)を構築し、命令および入力(編集対象)オーディオを条件として、出力(編集済み)オーディオを生成する拡散モデルを訓練する。 2) 入力音声と出力音声の違いを比較することにより,編集が必要なセグメントのみを自動で変更することを学ぶことができる。 3) テキスト入力として完全なターゲットオーディオ記述ではなく,編集命令のみを必要とする。 AUDITは、いくつかのオーディオ編集タスク(例えば、追加、ドロップ、置換、塗り替え、超解像)の客観的および主観的なメトリクスで最先端の結果を達成する。デモサンプルはhttps://audit-demo.github.io/で入手できる。 Audio editing is applicable for various purposes, such as adding background sound effects, replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-based methods achieved zero-shot audio editing by using a diffusion and denoising process conditioned on the text description of the output audio. However, these methods still have some problems: 1) they have not been trained on editing tasks and cannot ensure good editing effects; 2) they can erroneously modify audio segments that do not require editing; 3) they need a complete description of the output audio, which is not always available or necessary in practical scenarios. In this work, we propose AUDIT, an instruction-guided audio editing model based on latent diffusion models. Specifically, AUDIT has three main design features: 1) we construct triplet training data (instruction, input audio, output audio) for different audio editing tasks and train a diffusion model using instruction and input (to be edited) audio as conditions and generating output (edited) audio; 2) it can automatically learn to only modify segments that need to be edited by comparing the difference between the input and output audio; 3) it only needs edit instructions instead of full target audio descriptions as text input. AUDIT achieves state-of-the-art results in both objective and subjective metrics for several audio editing tasks (e.g., adding, dropping, replacement, inpainting, super-resolution). Demo samples are available at https://audit-demo.github.io/.	翻訳日:2023-04-04 15:57:41 公開日:2023-04-03
# 多品位情報融合によるソーシャルメディア上のマルチモーダルフェイクニュース検出 Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion ( http://arxiv.org/abs/2304.00827v1 ) ライセンス: Link先を確認	Yangming Zhou, Yuzhou Yang, Qichao Ying, Zhenxing Qian and Xinpeng Zhang	(参考訳) ソーシャルメディア上でのマルチメディアコンテンツの共有が容易になったことで、フェイクニュースが急速に拡散し、社会の安定と安全を脅かしている。そのため、偽ニュース検出は社会科学の分野で幅広い研究の関心を集めている。現在の手法は主にテキストと視覚的特徴の統合に集中しているが、細粒度と粗粒度の両方で効果的にマルチモーダル情報を活用できない。さらに、モダリティ間の相関の欠如や、各モダリティによってなされた決定の矛盾により、曖昧な問題に苦しむ。これらの課題を克服するため,偽ニュース検出のためのMMFN(Multi-fine Multi-modal Fusion Network)を提案する。ニュースの真正性を評価する多面的プロセスに着想を得て,テキストと画像からトークンレベルの特徴を符号化するために,トランスフォーマティブをベースとする2つの事前学習モデルを用いた。マルチモーダルモジュールは、CLIPエンコーダでエンコードされた粗い機能を考慮して、きめ細かい機能をフューズする。あいまいさ問題に対処するため、類似度に基づく重み付けによる一様分岐を設計し、マルチモーダル特徴の利用を適応的に調整する。実験の結果,提案手法は3つの有意なデータセット上で,最先端の手法よりも優れていた。 The easy sharing of multimedia content on social media has caused a rapid dissemination of fake news, which threatens society's stability and security. Therefore, fake news detection has garnered extensive research interest in the field of social forensics. Current methods primarily concentrate on the integration of textual and visual features but fail to effectively exploit multi-modal information at both fine-grained and coarse-grained levels. Furthermore, they suffer from an ambiguity problem due to a lack of correlation between modalities or a contradiction between the decisions made by each modality. To overcome these challenges, we present a Multi-grained Multi-modal Fusion Network (MMFN) for fake news detection. Inspired by the multi-grained process of human assessment of news authenticity, we respectively employ two Transformer-based pre-trained models to encode token-level features from text and images. The multi-modal module fuses fine-grained features, taking into account coarse-grained features encoded by the CLIP encoder. To address the ambiguity problem, we design uni-modal branches with similarity-based weighting to adaptively adjust the use of multi-modal features. Experimental results demonstrate that the proposed framework outperforms state-of-the-art methods on three prevalent datasets.	翻訳日:2023-04-04 15:57:17 公開日:2023-04-03
# lahm : multi-domain and multilingual hate speech identificationのための大規模注釈付きデータセット LAHM : Large Annotated Dataset for Multi-Domain and Multilingual Hate Speech Identification ( http://arxiv.org/abs/2304.00913v1 ) ライセンス: Link先を確認	Ankit Yadav, Shubham Chandel, Sushant Chatufale and Anil Bandhakavi	(参考訳) ヘイトスピーチ分析に関する現在の研究は、典型的には単言語および単一分類タスクに向けられている。本稿では、英語、ヒンディー語、アラビア語、フランス語、ドイツ語、スペイン語の多言語用ヘイトスピーチ分析データセットについて、ヘイトスピーチにおける虐待、人種差別、性差別、宗教的なヘイト、過激主義といった複数のドメインについて述べる。本論文は,この6つの言語において,これら5つの広い領域において,様々なタイプのヘイトスピーチを識別する問題を最初に解決した。本稿では、データセットの作成方法を説明し、異なるドメインに対して高レベルかつ低レベルなアノテーションを作成し、現在の最先端のマルチ言語およびマルチタスク学習アプローチをテストする方法について説明する。様々なモノリンガル、クロスリンガル、マシン翻訳の分類設定でデータセットを評価し、このタスクのために集約してマージしたオープンソースの英語データセットと比較します。次に,このアプローチを大規模ヘイトスピーチデータセットの作成に活用し,ヘイトスピーチ検出と分類全般を改善するためにアノテーションを活用する方法について論じる。 Current research on hate speech analysis is typically oriented towards monolingual and single classification tasks. In this paper, we present a new multilingual hate speech analysis dataset for English, Hindi, Arabic, French, German and Spanish languages for multiple domains across hate speech - Abuse, Racism, Sexism, Religious Hate and Extremism. To the best of our knowledge, this paper is the first to address the problem of identifying various types of hate speech in these five wide domains in these six languages. In this work, we describe how we created the dataset, created annotations at high level and low level for different domains and how we use it to test the current state-of-the-art multilingual and multitask learning approaches. We evaluate our dataset in various monolingual, cross-lingual and machine translation classification settings and compare it against open source English datasets that we aggregated and merged for this task. Then we discuss how this approach can be used to create large scale hate-speech datasets and how to leverage our annotations in order to improve hate speech detection and classification in general.	翻訳日:2023-04-04 15:50:36 公開日:2023-04-03
# laplace-fpinns: laplace-based fractional physics-informed neural networks for solve forward and inverse problems of subdiffusion Laplace-fPINNs: Laplace-based fractional physics-informed neural networks for solving forward and inverse problems of subdiffusion ( http://arxiv.org/abs/2304.00909v1 ) ライセンス: Link先を確認	Xiong-Bin Yan and Zhi-Qin John Xu and Zheng Ma	(参考訳) 物理インフォームドニューラルネットワーク(PINN)の使用は、分数拡散方程式の前方および逆問題の解法において有望であることを示している。しかし、分数微分には自動微分が適用できないため、PINNを用いた分数拡散方程式の解法はさらなる課題に対処する必要がある。この問題に対処するため,本論文ではラプラス型分数物理学インフォームドニューラルネットワーク (laplace-fpinns) と呼ばれるピンの拡張を提案する。このアプローチは補助点の質量の導入を回避し、損失関数を単純化する。いくつかの例を用いてLaplace-fPINNsアプローチの有効性を検証する。その結果,ラプラス-fpinns法は高次元分数拡散方程式の前方および逆問題の両方を効果的に解くことができることがわかった。 The use of Physics-informed neural networks (PINNs) has shown promise in solving forward and inverse problems of fractional diffusion equations. However, due to the fact that automatic differentiation is not applicable for fractional derivatives, solving fractional diffusion equations using PINNs requires addressing additional challenges. To address this issue, this paper proposes an extension to PINNs called Laplace-based fractional physics-informed neural networks (Laplace-fPINNs), which can effectively solve the forward and inverse problems of fractional diffusion equations. This approach avoids introducing a mass of auxiliary points and simplifies the loss function. We validate the effectiveness of the Laplace-fPINNs approach using several examples. Our numerical results demonstrate that the Laplace-fPINNs method can effectively solve both the forward and inverse problems of high-dimensional fractional diffusion equations.	翻訳日:2023-04-04 15:50:17 公開日:2023-04-03
# ScandEval: スカンジナビアの自然言語処理ベンチマーク ScandEval: A Benchmark for Scandinavian Natural Language Processing ( http://arxiv.org/abs/2304.00906v1 ) ライセンス: Link先を確認	Dan Saattrup Nielsen	(参考訳) 本稿では,スカンジナビア言語の4つの異なるタスクに対して事前学習されたモデルをベンチマークできる,スカンジナビアのベンチマークプラットフォームであるscandevalを紹介する。言語受容性と質問応答性という2つのタスクで使用されるデータセットは新しいものである。我々は,Hugging Face Hubにアップロードされたモデルを,再現可能な結果でベンチマークすることができるPythonパッケージとコマンドラインインターフェースであるScandevalを開発し,リリースする。このパッケージを使って100以上のスカンジナビア語または多言語モデルのベンチマークを行い、それらの結果をインタラクティブなオンラインリーダーボードに提示し、結果の分析を提供する。この分析は、スカンディナヴィア語族(デンマーク語、スウェーデン語、ノルウェー語)とインスキュラ・スカンディナヴィア語族(デンマーク語、スウェーデン語、ノルウェー語)の間でかなりの言語間移動が存在することを示している。ベンチマークの結果は、ノルウェー、スウェーデン、デンマークにおける言語技術への投資が、XLM-RoBERTaやmDeBERTaV3のような多言語モデルよりも優れた言語モデルを生み出したことを示している。パッケージとリーダーボードの両方のソースコードをリリースします。 This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained model on four different tasks in the Scandinavian languages. The datasets used in two of the tasks, linguistic acceptability and question answering, are new. We develop and release a Python package and command-line interface, scandeval, which can benchmark any model that has been uploaded to the Hugging Face Hub, with reproducible results. Using this package, we benchmark more than 100 Scandinavian or multilingual models and present the results of these in an interactive online leaderboard, as well as provide an analysis of the results. The analysis shows that there is substantial cross-lingual transfer among the Mainland Scandinavian languages (Danish, Swedish and Norwegian), with limited cross-lingual transfer between the group of Mainland Scandinavian languages and the group of Insular Scandinavian languages (Icelandic and Faroese). The benchmarking results also show that the investment in language technology in Norway, Sweden and Denmark has led to language models that outperform massively multilingual models such as XLM-RoBERTa and mDeBERTaV3. We release the source code for both the package and leaderboard.	翻訳日:2023-04-04 15:50:00 公開日:2023-04-03
# 学校における適応型学習プラットフォームの導入:教師のエンゲージメントに影響を与える要因を明らかにする Adoption of Adaptive Learning Platforms in Schools: Unveiling Factors Influencing Teachers Engagement ( http://arxiv.org/abs/2304.00903v1 ) ライセンス: Link先を確認	Mutlu Cukurova, Xin Miao, Richard Brooker	(参考訳) AIベースの適応学習プラットフォームの影響に関する証拠は存在するが、彼らの学校における大規模採用は、せいぜい遅い。さらに、学校で採用されるAIツールは、常に研究コミュニティで検討され研究されている製品であるとは限らない。そのため、採用に影響を与える要因の特定や、これらの要因が適応型学習プラットフォームへの教師の関与を予測できる程度に研究が進められている。そこで我々は,教師が学校における適応型学習プラットフォームを採用する上で,より包括的要因を測定するための信頼性の高い尺度を開発した。さらに,学校教師(n=792)を大国人からサンプリングし,このデータを用いて,学校における適応学習プラットフォームとの現実的な関わりを予測した。以上の結果から,教師の知識,信頼度,製品品質がすべて重要な要因であるにもかかわらず,教師が学校におけるaiプラットフォームと関わる上で最も重要な要因であるとは限らない。追加の作業負荷、教師の所有と信頼の増大、支援のメカニズムの生成、倫理的問題が最小化されていることを保証することは、学校でAIを採用する上でも不可欠であり、プラットフォームへの教師の関与をより良く予測する可能性がある。本論文は, 予測モデルの変動率を増大させ, 実装変動を実際に減少させることにより, 適応学習プラットフォームの現実的普及と有効性を高める要因の価値について考察した。 Albeit existing evidence about the impact of AI-based adaptive learning platforms, their scaled adoption in schools is slow at best. In addition, AI tools adopted in schools may not always be the considered and studied re-search products of the research community. Therefore, there have been in-creasing concerns about identifying factors influencing adoption, and studying the extent to which these factors can be used to predict teachers engagement with adaptive learning platforms. To address this, we developed a reliable instrument to measure more holistic factors influencing teachers adoption of adaptive learning platforms in schools. In addition, we present the results of its implementation with school teachers (n=792) sampled from a large country-level population and use this data to predict teachers real-world engagement with the adaptive learning platform in schools. Our results show that although teachers knowledge, confidence and product quality are all important factors, they are not necessarily the only, may not even be the most important factors influencing the teachers engagement with AI platforms in schools. Not generating any additional workload, in-creasing teacher ownership and trust, generating support mechanisms for help, and assuring that ethical issues are minimised, are also essential for the adoption of AI in schools and may predict teachers engagement with the platform better. We conclude the paper with a discussion on the value of factors identified to increase the real-world adoption and effectiveness of adaptive learning platforms by increasing the dimensions of variability in prediction models and decreasing the implementation variability in practice.	翻訳日:2023-04-04 15:49:35 公開日:2023-04-03
# パラメトリックマルチロス最適化による可変畳み込み Tunable Convolutions with Parametric Multi-Loss Optimization ( http://arxiv.org/abs/2304.00898v1 ) ライセンス: Link先を確認	Matteo Maggioni, Thomas Tanay, Francesca Babiloni, Steven McDonagh, Ale\v{s} Leonardis	(参考訳) ニューラルネットワークの振舞いは、トレーニング中に使用される特定の損失とデータによって不可分に決定される。しかしながら、ユーザの好みやデータの動的特性といった外部要因に基づいて、推論時にモデルをチューニングすることが望ましい場合が多い。これは、不適切な画像から画像への変換タスクの知覚歪曲トレードオフのバランスをとるために特に重要である。本研究では,多数の異なるカーネルを含むパラメトリック可変畳み込み層を,同じ数の目的を含むパラメトリックマルチロスを用いて最適化することを提案する。私たちの重要な洞察は、パラメータの共有セットを使用して、目的とカーネルの両方を動的に補間することです。トレーニング中、これらのパラメータはランダムにサンプリングされ、目的のすべての可能な組み合わせを明示的に最適化し、その結果、対応するカーネルにその効果を乱す。推論の間、これらのパラメータはモデルのインタラクティブな入力となり、モデルの振る舞いを信頼できる一貫した制御を可能にします。広範な実験結果から,既存のニューラルネットワークにおける従来の畳み込みの代替として,画像のデノイジング,デブラリング,スーパーレゾリューション,スタイル転送など,幅広いアプリケーションにおいて最先端の制御戦略を上回って,従来の畳み込みの代替として効果的に動作することが分かった。 Behavior of neural networks is irremediably determined by the specific loss and data used during training. However it is often desirable to tune the model at inference time based on external factors such as preferences of the user or dynamic characteristics of the data. This is especially important to balance the perception-distortion trade-off of ill-posed image-to-image translation tasks. In this work, we propose to optimize a parametric tunable convolutional layer, which includes a number of different kernels, using a parametric multi-loss, which includes an equal number of objectives. Our key insight is to use a shared set of parameters to dynamically interpolate both the objectives and the kernels. During training, these parameters are sampled at random to explicitly optimize all possible combinations of objectives and consequently disentangle their effect into the corresponding kernels. During inference, these parameters become interactive inputs of the model hence enabling reliable and consistent control over the model behavior. Extensive experimental results demonstrate that our tunable convolutions effectively work as a drop-in replacement for traditional convolutions in existing neural networks at virtually no extra computational cost, outperforming state-of-the-art control strategies in a wide range of applications; including image denoising, deblurring, super-resolution, and style transfer.	翻訳日:2023-04-04 15:49:08 公開日:2023-04-03
# ディープラーニングモデルのエネルギー消費量の推定は、精度だけではありません。 Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models ( http://arxiv.org/abs/2304.00897v1 ) ライセンス: Link先を確認	Johannes Getzner, Bertrand Charpentier, Stephan G\"unnemann	(参考訳) 現代の機械学習モデルは、膨大な量のエネルギーを消費し始めており、大きな炭素フットプリントを生み出している(Strubell et al., 2019)。この問題に対処するため,我々は,実際の運用やトレーニングを行わずに,事前にモデルのエネルギーニーズを見積もることのできる,エネルギー推定パイプライン1を開発した。そこで我々は,高品質なエネルギーデータを収集し,推定層エネルギーを蓄積することによりDLモデルのエネルギー消費を予測できる第1ベースラインモデルを構築した。 Modern machine learning models have started to consume incredible amounts of energy, thus incurring large carbon footprints (Strubell et al., 2019). To address this issue, we have created an energy estimation pipeline1, which allows practitioners to estimate the energy needs of their models in advance, without actually running or training them. We accomplished this, by collecting high-quality energy data and building a first baseline model, capable of predicting the energy consumption of DL models by accumulating their estimated layer-wise energies.	翻訳日:2023-04-04 15:48:46 公開日:2023-04-03
# エッジでのディープラーニングアプリケーションにおける階層推論のためのオンラインアルゴリズム Online Algorithms for Hierarchical Inference in Deep Learning applications at the Edge ( http://arxiv.org/abs/2304.00891v1 ) ライセンス: Link先を確認	Vishnu Narayanan Moothedath, Jaya Prakash Champati, James Gross	(参考訳) 本稿では,リソース制約のあるエッジデバイス(ED)に,汎用分類アプリケーション用の小型MLモデル(S-ML)と,大規模MLモデル(L-ML)をホストするエッジサーバ(ES)について検討する。 S-MLの推論精度はL-MLよりも低いため、すべてのデータサンプルをESにオフロードすると高い推測精度が得られるが、EDにS-MLを埋め込むことの目的を損なうとともに、遅延低減、帯域幅の節約、ローカル推論のエネルギー効率を損なう。 S-ML推論が正しい場合にのみ受け入れられる階層推論(hierarchical Inference, HI)の考え方を検討する。そうでなければ、データサンプルはL-ML推論のためにオフロードされる。しかし、HIの理想的な実装は、S-ML推論の正しさがEDに知られていないため、実現不可能である。そこで我々は,S-ML推論の正確性を予測するオンラインメタ学習フレームワークを提案する。その結果、オンライン学習の問題は、エキスパートアドバイザによる予測(Expert Advice:PEA)問題であることがわかった。我々は、edが推論を受け入れると、s-mlの正しさに関するフィードバックを受信する全フィードバックシナリオと、edが分類の根拠となる真理を受信しない非局所フィードバックシナリオを検討し、hil-f と hil-n アルゴリズムを提案し、データサンプル数に準ずる後悔の限界を証明する。我々は,画像分類用アルゴリズムであるImagenette, Imagewoof, MNIST, CIFAR-10の4つのデータセットを用いて,提案アルゴリズムの性能評価と評価を行った。 We consider a resource-constrained Edge Device (ED) embedded with a small-size ML model (S-ML) for a generic classification application, and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. To get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We thus propose an online meta-learning framework to predict the correctness of the S-ML inference. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. We consider the full feedback scenario, where the ED receives feedback on the correctness of the S-ML once it accepts the inference, and the no-local feedback scenario, where the ED does not receive the ground truth for the classification, and propose the HIL-F and HIL-N algorithms and prove a regret bound that is sublinear with the number of data samples. We evaluate and benchmark the performance of the proposed algorithms for image classification applications using four datasets, namely, Imagenette, Imagewoof, MNIST, and CIFAR-10.	翻訳日:2023-04-04 15:48:35 公開日:2023-04-03
# 対話対話:アクションレベル生成によるタスク指向対話システムの構築 Dialog-to-Actions: Building Task-Oriented Dialogue System via Action-Level Generation ( http://arxiv.org/abs/2304.00884v1 ) ライセンス: Link先を確認	Yuncheng Hua, Xiangyu Xi, Zheng Jiang, Guanwei Zhang, Chaobo Sun, Guanglu Wan, Wei Ye	(参考訳) タスク指向対話システムでは、エンドツーエンド生成に基づくアプローチが研究され、適用されている。しかし、産業シナリオでは、既存の手法は制御可能性(ドメイン一貫性のない応答、繰り返し問題など)と効率(例えば、長い計算時間など)のボトルネックに直面します。本稿では,アクションレベル生成によるタスク指向対話システムを提案する。具体的には,まず,大規模対話から対話行動を構築し,対話行動の列として各自然言語(nl)応答を表現する。さらに、対話履歴を入力として対話アクションのシーケンスを出力するシーケンスツーシーケンスモデルをトレーニングする。生成された対話動作は、音声応答に変換される。実験の結果, 軽量化手法は競争性能が向上し, 制御性と効率性が向上した。 End-to-end generation-based approaches have been investigated and applied in task-oriented dialogue systems. However, in industrial scenarios, existing methods face the bottlenecks of controllability (e.g., domain-inconsistent responses, repetition problem, etc) and efficiency (e.g., long computation time, etc). In this paper, we propose a task-oriented dialogue system via action-level generation. Specifically, we first construct dialogue actions from large-scale dialogues and represent each natural language (NL) response as a sequence of dialogue actions. Further, we train a Sequence-to-Sequence model which takes the dialogue history as input and outputs sequence of dialogue actions. The generated dialogue actions are transformed into verbal responses. Experimental results show that our light-weighted method achieves competitive performance, and has the advantage of controllability and efficiency.	翻訳日:2023-04-04 15:48:01 公開日:2023-04-03
# smproblog:確率的議論のためのproblogの安定モデルセマンティクス smProbLog: Stable Model Semantics in ProbLog for Probabilistic Argumentation ( http://arxiv.org/abs/2304.00879v1 ) ライセンス: Link先を確認	Pietro Totis, Angelika Kimmig, Luc De Raedt	(参考訳) 議論問題は、それらの関係構造から一連の引数の受け入れ可能性を決定することに関係している。利用可能な情報が不確実な場合、確率論的議論フレームワークは、それを説明するモデリングツールを提供する。この論文の最初の貢献は、確率的議論フレームワークを確率的論理プログラムとして新しい解釈である。確率論理プログラム(probabilistic logic program)は、いくつかの事実に確率を付記した論理プログラムである。本稿では,確率論的論理プログラミング(PLP)のセマンティクスにおいて,確率論的議論フレームワークを表すプログラムが共通の前提を満たしていないことを示す。この論文の第二の貢献は、確率的事実の選択が論理原子の真理割り当てを一意に決定しないプログラムのための新しいPLP意味論である。本論文の3番目の貢献は,この意味論をサポートするplpシステムの実装であるsmproblogの実装である。 smProbLogは確率論理型プログラミング言語ProbLogをベースにした新しいPLPフレームワークである。 smproblogはplpの典型的な推論や学習タスクをサポートしており、私たちの最初の貢献とともに確率的議論のための新しい推論ツールを提供しています。本手法は,提案アルゴリズムの計算コストを解析し,議論問題のデータセットに適用する実験を用いて評価する。 Argumentation problems are concerned with determining the acceptability of a set of arguments from their relational structure. When the available information is uncertain, probabilistic argumentation frameworks provide modelling tools to account for it. The first contribution of this paper is a novel interpretation of probabilistic argumentation frameworks as probabilistic logic programs. Probabilistic logic programs are logic programs in which some of the facts are annotated with probabilities. We show that the programs representing probabilistic argumentation frameworks do not satisfy a common assumption in probabilistic logic programming (PLP) semantics, which is, that probabilistic facts fully capture the uncertainty in the domain under investigation. The second contribution of this paper is then a novel PLP semantics for programs where a choice of probabilistic facts does not uniquely determine the truth assignment of the logical atoms. The third contribution of this paper is the implementation of a PLP system supporting this semantics: smProbLog. smProbLog is a novel PLP framework based on the probabilistic logic programming language ProbLog. smProbLog supports many inference and learning tasks typical of PLP, which, together with our first contribution, provide novel reasoning tools for probabilistic argumentation. We evaluate our approach with experiments analyzing the computational cost of the proposed algorithms and their application to a dataset of argumentation problems.	翻訳日:2023-04-04 15:47:47 公開日:2023-04-03
# 動的行動空間強化学習におけるアクションピックアップ Action Pick-up in Dynamic Action Space Reinforcement Learning ( http://arxiv.org/abs/2304.00873v1 ) ライセンス: Link先を確認	Jiaqi Ye, Xiaodong Li, Pangjing Wu, Feng Wang	(参考訳) ほとんどの強化学習アルゴリズムはマルコフ決定過程(MDP)が定常であるという重要な仮定に基づいている。しかし、動的アクション空間を持つ非定常MDPは、実世界のシナリオにおいて一様である。しかし, 動的行動空間強化学習の課題は, これまでにも数多く研究されてきたが, 学習効率を向上させるために, 新たな, 目に見えない行動から, どのように価値ある行動を選択するかは未定のままである。この問題に対処するために,我々は,新たなアクション群からパフォーマンスを最も高める可能性のある有用なアクションを自律的に選択するインテリジェントアクションピックアップ(ap)アルゴリズムを提案する。本稿では,まず,事前の最適政策が有用な知識と経験を提供することで,行動ピックアップにおいて重要な役割を果たすことを理論的に分析し,発見する。次に,事前の最適ポリシーに基づいて,周波数ベースグローバル法と状態クラスタリングベースローカル法という2つの異なるap法を設計する。最後に,動作空間が時間とともに変化する2つのシミュレーション環境におけるAPの評価を行った。実験の結果,提案したAPは学習効率のベースラインよりも優れていることがわかった。 Most reinforcement learning algorithms are based on a key assumption that Markov decision processes (MDPs) are stationary. However, non-stationary MDPs with dynamic action space are omnipresent in real-world scenarios. Yet problems of dynamic action space reinforcement learning have been studied by many previous works, how to choose valuable actions from new and unseen actions to improve learning efficiency remains unaddressed. To tackle this problem, we propose an intelligent Action Pick-up (AP) algorithm to autonomously choose valuable actions that are most likely to boost performance from a set of new actions. In this paper, we first theoretically analyze and find that a prior optimal policy plays an important role in action pick-up by providing useful knowledge and experience. Then, we design two different AP methods: frequency-based global method and state clustering-based local method, based on the prior optimal policy. Finally, we evaluate the AP on two simulated but challenging environments where action spaces vary over time. Experimental results demonstrate that our proposed AP has advantages over baselines in learning efficiency.	翻訳日:2023-04-04 15:47:26 公開日:2023-04-03
# 離散潜在変数を持つ表現の学習スパーシティ Learning Sparsity of Representations with Discrete Latent Variables ( http://arxiv.org/abs/2304.00935v1 ) ライセンス: Link先を確認	Zhao Xu, Daniel Onoro Rubio, Giuseppe Serra, Mathias Niepert	(参考訳) ディープラーニングの強みと確率モデルとをエレガントな方法で組み合わせる能力によって、深い潜在生成モデルに注目が集まっている。モデルで学んだデータ表現は、しばしば連続的で密度が高い。しかし、多くのアプリケーションでは、教師なし環境でデータのスパースな高次元埋め込みを学習したり、教師なし環境で数千の候補タグからマルチラベルを学習したりといったスパース表現が期待されている。いくつかのシナリオでは、スパーシティの程度にさらに制限がある可能性がある: 表現の0でない特徴の数は、予め定義されたしきい値 $l_0$ よりも大きくはならない。本稿では,スパース性の程度を明示的にモデル化し,定量化されたスパース性制約によりデータのスパース構造を学習するためのスパース深部潜在生成モデルsdlgmを提案する。表現の空間性は固定されていないが、事前に定義された制限の下で観察そのものに適合する。特に、各観測値 $i$ を補助確率変数 $l_i$ に導入し、その表現のスパーシティをモデル化する。スパース表現は、2つのGumbel-Softmax分布を介して2段階のサンプリングプロセスで生成される。推論と学習のために,mc勾配推定法に基づく不定形変分法を開発した。結果として生じるスパース表現はバックプロパゲーションで微分可能である。教師なしおよび教師なしの学習問題に対する複数のデータセットに対する実験評価は,提案手法の利点を示す。 Deep latent generative models have attracted increasing attention due to the capacity of combining the strengths of deep learning and probabilistic models in an elegant way. The data representations learned with the models are often continuous and dense. However in many applications, sparse representations are expected, such as learning sparse high dimensional embedding of data in an unsupervised setting, and learning multi-labels from thousands of candidate tags in a supervised setting. In some scenarios, there could be further restriction on degree of sparsity: the number of non-zero features of a representation cannot be larger than a pre-defined threshold $L_0$. In this paper we propose a sparse deep latent generative model SDLGM to explicitly model degree of sparsity and thus enable to learn the sparse structure of the data with the quantified sparsity constraint. The resulting sparsity of a representation is not fixed, but fits to the observation itself under the pre-defined restriction. In particular, we introduce to each observation $i$ an auxiliary random variable $L_i$, which models the sparsity of its representation. The sparse representations are then generated with a two-step sampling process via two Gumbel-Softmax distributions. For inference and learning, we develop an amortized variational method based on MC gradient estimator. The resulting sparse representations are differentiable with backpropagation. The experimental evaluation on multiple datasets for unsupervised and supervised learning problems shows the benefits of the proposed method.	翻訳日:2023-04-04 15:41:55 公開日:2023-04-03
# 連続学習表現における知識蓄積と特徴提示の課題 Knowledge Accumulation in Continually Learned Representations and the Issue of Feature Forgetting ( http://arxiv.org/abs/2304.00933v1 ) ライセンス: Link先を確認	Timm Hess, Eli Verwimp, Gido M. van de Ven, Tinne Tuytelaars	(参考訳) ニューラルネットワークはデフォルトで、すべてのトレーニングデータを一度に学習する。このようなモデルを新しいデータのシーケンシャルなチャンクでトレーニングする場合、古いデータの扱い方を壊滅的に忘れる傾向があります。本研究では,連続学習者が表現を学習し,忘れる方法について検討する。我々は,知識蓄積,時間とともに表現が向上する現象と,タスク固有の表現の喪失という2つの現象を観察する。両現象をよりよく理解するために,タスク排他比較と呼ばれる新しい分析手法を導入する。モデルがタスクを見ていて、タスク固有のすべての機能を忘れていない場合、そのタスクの表現は、同様のタスクでトレーニングされたモデルよりも優れているが、正確なものではない。画像分類実験の結果,タスク固有の特徴の多くは,これまで提案されてきたものと対照的に,すぐに忘れられることがわかった。さらに,リプレイや表現学習からのアイデアといった連続学習手法が,継続的に学習される表現に与える影響を実証する。表現品質は連続学習性能と密接な相関関係にあると結論づけた。 By default, neural networks learn on all training data at once. When such a model is trained on sequential chunks of new data, it tends to catastrophically forget how to handle old data. In this work we investigate how continual learners learn and forget representations. We observe two phenomena: knowledge accumulation, i.e. the improvement of a representation over time, and feature forgetting, i.e. the loss of task-specific representations. To better understand both phenomena, we introduce a new analysis technique called task exclusion comparison. If a model has seen a task and it has not forgotten all the task-specific features, then its representation for that task should be better than that of a model that was trained on similar tasks, but not that exact one. Our image classification experiments show that most task-specific features are quickly forgotten, in contrast to what has been suggested in the past. Further, we demonstrate how some continual learning methods, like replay, and ideas from representation learning affect a continually learned representation. We conclude by observing that representation quality is tightly correlated with continual learning performance.	翻訳日:2023-04-04 15:41:34 公開日:2023-04-03
# HypLiLoc: ハイパーボリック核融合によるLiDARの効率的な回帰を目指して HypLiLoc: Towards Effective LiDAR Pose Regression with Hyperbolic Fusion ( http://arxiv.org/abs/2304.00932v1 ) ライセンス: Link先を確認	Sijie Wang, Qiyu Kang, Rui She, Wei Wang, Kai Zhao, Yang Song, Wee Peng Tay	(参考訳) LiDARの再ローカライゼーションは、ロボット工学、自律運転、コンピュータビジョンなど、多くの分野で重要な役割を果たしている。データベースからのLiDARベースの検索は、通常、高い計算ストレージコストを発生させ、データベースがスパースすぎる場合、世界中の不正確なポーズ推定につながる可能性がある。一方、ポーズ回帰手法では、画像や雲を入力として捉え、エンドツーエンドでグローバルポーズを直接レグレッションする。データベースマッチングは行わず、検索技術よりも計算効率が高い。我々は、LiDARポーズ回帰の新しいモデルであるHypLiLocを提案する。 2つの分岐したバックボーンを用いてそれぞれ3次元特徴と2次元投影特徴を抽出する。より効率的な特徴表現を得るために,ユークリッド空間と双曲空間のマルチモーダル特徴融合を考える。実験結果から,HypLiLocは屋外および屋内の両方のデータセットで最先端の性能を達成することが示された。また,マルチモーダル特徴抽出とマルチスペース埋め込みの有効性を示すフレームワーク設計に関する広範なアブレーション研究を行う。私たちのコードは、https://github.com/sijieaaa/HypLiLocでリリースされています。 LiDAR relocalization plays a crucial role in many fields, including robotics, autonomous driving, and computer vision. LiDAR-based retrieval from a database typically incurs high computation storage costs and can lead to globally inaccurate pose estimations if the database is too sparse. On the other hand, pose regression methods take images or point clouds as inputs and directly regress global poses in an end-to-end manner. They do not perform database matching and are more computationally efficient than retrieval techniques. We propose HypLiLoc, a new model for LiDAR pose regression. We use two branched backbones to extract 3D features and 2D projection features, respectively. We consider multi-modal feature fusion in both Euclidean and hyperbolic spaces to obtain more effective feature representations. Experimental results indicate that HypLiLoc achieves state-of-the-art performance in both outdoor and indoor datasets. We also conduct extensive ablation studies on the framework design, which demonstrate the effectiveness of multi-modal feature extraction and multi-space embedding. Our code is released at: https://github.com/sijieaaa/HypLiLoc	翻訳日:2023-04-04 15:41:16 公開日:2023-04-03
# オンボード映像からのオンラインレーングラフ抽出 Online Lane Graph Extraction from Onboard Video ( http://arxiv.org/abs/2304.00930v1 ) ライセンス: Link先を確認	Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool	(参考訳) 自動運転には、周囲の道路網の構造化された理解が必要である。そのような理解の最も一般的で有用な表現の1つは、BEVレーングラフの形でなされる。本研究では,車載カメラからの映像ストリームを用いて,周囲のレーングラフのオンライン抽出を行う。単一の画像ではなくビデオを使うことは、入力が異なる時間ステップからの情報を組み合わせることのメリットと課題の両方をもたらす。我々は3つの異なるアプローチで出現した課題を調査した。第1のアプローチは、単一フレームレーングラフの推定値を統一レーングラフにマージ可能な、後処理ステップである。第2のアプローチでは、トランスに時間的埋め込みを組み込むことで、ネットワークが最適な時間的集約戦略を発見することができる。最後に、第3の手法と提案手法は、明示的なBEV投影とフレームワイド特徴のアライメントによる初期時間的アグリゲーションである。提案手法の単一モデルでは、1つを含む任意の画像を処理して正確なレーングラフを生成することができる。ヌースセンおよびアルゴバースデータセットを用いた実験では,提案手法の優越性を強調しながら,すべてのアプローチの有効性を示す。コードは公開されます。 Autonomous driving requires a structured understanding of the surrounding road network to navigate. One of the most common and useful representation of such an understanding is done in the form of BEV lane graphs. In this work, we use the video stream from an onboard camera for online extraction of the surrounding's lane graph. Using video, instead of a single image, as input poses both benefits and challenges in terms of combining the information from different timesteps. We study the emerged challenges using three different approaches. The first approach is a post-processing step that is capable of merging single frame lane graph estimates into a unified lane graph. The second approach uses the spatialtemporal embeddings in the transformer to enable the network to discover the best temporal aggregation strategy. Finally, the third, and the proposed method, is an early temporal aggregation through explicit BEV projection and alignment of framewise features. A single model of this proposed simple, yet effective, method can process any number of images, including one, to produce accurate lane graphs. The experiments on the Nuscenes and Argoverse datasets show the validity of all the approaches while highlighting the superiority of the proposed method. The code will be made public.	翻訳日:2023-04-04 15:41:01 公開日:2023-04-03
# オンライン第三者追跡による二酸化炭素排出量の定量化 Quantifying Carbon Emissions due to Online Third-Party Tracking ( http://arxiv.org/abs/2304.00927v1 ) ライセンス: Link先を確認	Michalis Pachilakis, Savino Dambra, Iskander Sanchez-Rola, Leyla Bilge	(参考訳) 過去10年間で、地球温暖化はいくつかの話題を巻き起こし、世界中の注目を集めた。炭素フットプリントは温室効果ガスの排出量を増加させ、惑星の温度上昇をもたらす主要な要因である。公共の注目は、輸送、食品消費、家庭用活動による二酸化炭素排出量の削減に向けられているが、オンライン活動によるCO2eq排出量の寄与は無視する。現在の情報化時代には、オンラインでのブラウジングに多くの時間を費やしています。この活性はco2eqを生成する電気を消費する。ウェブサイトのブラウジングは温室効果ガスの発生に寄与するが、インターネットが環境に与える影響は、Web追跡の実践によってさらに悪化している。実際、ほとんどのウェブページは、主に広告、データ分析、ユーザビリティの改善に使用されるコンテンツを追跡することで、非常にロードされている。この余分な内容は、電力消費が増加し、温室効果ガスの排出が増加するという大きなデータ伝達を意味する。本研究では,Webのトラッキングによるオーバーヘッドに着目し,そのネットワークと炭素のフットプリントを解析する。 1万人のユーザのブラウジングテレメトリと270万のwebサイトをクロールする実験結果を利用することで、web追跡はデータ送信を21%以上増加させることがわかり、これは毎年大気中の温室効果ガスの約11 mtの追加放出を示唆する。このような貢献は無視できないものではなく、肉の生産、輸送、さらには暗号通貨採掘といった現代の生活の多くの活動に匹敵するものである。また、異なる国、ウェブサイトカテゴリー、追跡機関の足跡を考慮すると、いくつかのアクターが他のアクターよりもはるかに大きな不平等が存在することも明らかにした。 In the past decade, global warming made several headlines and turned the attention of the whole world to it. Carbon footprint is the main factor that drives greenhouse emissions up and results in the temperature increase of the planet with dire consequences. While the attention of the public is turned to reducing carbon emissions by transportation, food consumption and household activities, we ignore the contribution of CO2eq emissions produced by online activities. In the current information era, we spend a big amount of our days browsing online. This activity consumes electricity which in turn produces CO2eq. While website browsing contributes to the production of greenhouse gas emissions, the impact of the Internet on the environment is further exacerbated by the web-tracking practice. Indeed, most webpages are heavily loaded by tracking content used mostly for advertising, data analytics and usability improvements. This extra content implies big data transmissions which results in higher electricity consumption and thus higher greenhouse gas emissions. In this work, we focus on the overhead caused by web tracking and analyse both its network and carbon footprint. By leveraging the browsing telemetry of 100k users and the results of a crawling experiment of 2.7M websites, we find that web tracking increases data transmissions upwards of 21%, which in turn implies the additional emission of around 11 Mt of greenhouse gases in the atmosphere every year. We find such contribution to be far from negligible, and comparable to many activities of modern life, such as meat production, transportation, and even cryptocurrency mining. Our study also highlights that there exist significant inequalities when considering the footprint of different countries, website categories, and tracking organizations, with a few actors contributing to a much greater extent than the remaining ones.	翻訳日:2023-04-04 15:40:46 公開日:2023-04-03
# Abstraqt: 抽象安定化器シミュレーションによる量子回路の解析 Abstraqt: Analysis of Quantum Circuits via Abstract Stabilizer Simulation ( http://arxiv.org/abs/2304.00921v1 ) ライセンス: Link先を確認	Benjamin Bichsel, Maximilian Baader, Anouk Paradis, Martin Vechev	(参考訳) 安定化器シミュレーションはクリフォードゲートのみからなる量子回路の重要なクラスを効率的にシミュレートすることができる。しかし、このシミュレーションの非クリフォードゲートを含む任意の量子回路への既存の拡張はすべて指数関数的ランタイムに苦しむ。本研究では、任意の量子回路上での効率的な安定化器シミュレーションのための新しい手法を、損失精度で提示することで、この問題に対処する。私たちのキーとなるアイデアは、量子状態の指数和表現を、(少なくとも)起こるすべてのサマンドをカバーする単一の抽象的なサマンドに圧縮することです。これにより,クリフォードゲート,非クリフォードゲート,(内部)計測などの回路操作の効果を過剰に吸収することにより,抽象サムマンドを効率的に操作できる抽象安定化シミュレータを導入することができる。我々はAbstraqtと呼ばれるツールに抽象シミュレータを実装し、既存の手法で回路特性を抽出できることを実験的に実証した。 Stabilizer simulation can efficiently simulate an important class of quantum circuits consisting exclusively of Clifford gates. However, all existing extensions of this simulation to arbitrary quantum circuits including non-Clifford gates suffer from an exponential runtime. In this work, we address this challenge by presenting a novel approach for efficient stabilizer simulation on arbitrary quantum circuits, at the cost of lost precision. Our key idea is to compress an exponential sum representation of the quantum state into a single abstract summand covering (at least) all occurring summands. This allows us to introduce an abstract stabilizer simulator that efficiently manipulates abstract summands by over-abstracting the effect of circuit operations including Clifford gates, non-Clifford gates, and (internal) measurements. We implemented our abstract simulator in a tool called Abstraqt and experimentally demonstrate that Abstraqt can establish circuit properties intractable for existing techniques.	翻訳日:2023-04-04 15:40:17 公開日:2023-04-03
# ノード分類における不確実性伝播 Uncertainty Propagation in Node Classification ( http://arxiv.org/abs/2304.00918v1 ) ライセンス: Link先を確認	Zhao Xu, Carolin Lawrence, Ammar Shaker, Raman Siarheyeu	(参考訳) ニューラルネットワークの予測の不確かさの定量化が最近注目を集めている。本研究では,ノード分類作業におけるグラフニューラルネットワーク(GNN)の不確実性の測定に焦点をあてる。既存のGNNはノード間のメッセージパッシングをモデル化している。メッセージはしばしば決定論的です。メッセージに不確実性はあるか? このような不確実性を、メッセージとともにグラフ上でどのように伝播させるのか? これらの問題に対処するために,gnnをベイズモデリングフレームワークに組み込むベイズ不確実性伝播(bup)法を提案し,予測確率とメッセージの不確かさのベイズ信頼度を有するノード分類の予測不確実性をモデル化する。本手法はガウスモデルにインスパイアされた新しい不確実性伝播機構を提案する。さらに,GNNが学習手順における予測不確実性を明瞭に統合できるようにするノード分類における不確実性指向損失を提案する。その結果、予測の不確実性が大きいトレーニング例が罰せられる。予測信頼性とアウト・オブ・ディストリビューション(OOD)予測に関して,BUPを実証する。学習された不確実性も深く分析される。 ood症例における不確かさとグラフトポロジーの関係,および予測不確実性との関係を広範な実験により検討した。人気のあるベンチマークデータセットを用いた実験結果は,提案手法の優れた性能を示す。 Quantifying predictive uncertainty of neural networks has recently attracted increasing attention. In this work, we focus on measuring uncertainty of graph neural networks (GNNs) for the task of node classification. Most existing GNNs model message passing among nodes. The messages are often deterministic. Questions naturally arise: Does there exist uncertainty in the messages? How could we propagate such uncertainty over a graph together with messages? To address these issues, we propose a Bayesian uncertainty propagation (BUP) method, which embeds GNNs in a Bayesian modeling framework, and models predictive uncertainty of node classification with Bayesian confidence of predictive probability and uncertainty of messages. Our method proposes a novel uncertainty propagation mechanism inspired by Gaussian models. Moreover, we present an uncertainty oriented loss for node classification that allows the GNNs to clearly integrate predictive uncertainty in learning procedure. Consequently, the training examples with large predictive uncertainty will be penalized. We demonstrate the BUP with respect to prediction reliability and out-of-distribution (OOD) predictions. The learned uncertainty is also analyzed in depth. The relations between uncertainty and graph topology, as well as predictive uncertainty in the OOD cases are investigated with extensive experiments. The empirical results with popular benchmark datasets demonstrate the superior performance of the proposed method.	翻訳日:2023-04-04 15:40:03 公開日:2023-04-03
# 拡散橋の混合輸送, schr\"odinger bridge問題と生成モデル Diffusion Bridge Mixture Transports, Schr\"odinger Bridge Problems and Generative Modeling ( http://arxiv.org/abs/2304.00917v1 ) ライセンス: Link先を確認	Stefano Peluchetti	(参考訳) 動的schr\"odinger bridge問題(英語版)は、2つの目標確率測度間の移動を定義する確率過程を求め、クルバック・リーバーの発散の観点から最接近の基準を最適に満たしている。本稿では,動的schr\"odinger bridge問題を解くために,新しいサンプリングベース反復アルゴリズムである反復拡散橋混合輸送 (idbm) を提案する。 IDBM手順は、各ステップにおける目標測度間の有効な結合を実現するという魅力的な性質を示す。我々はIDBM手順に関する最初の理論的研究を行い、その収束特性を確立した。理論的な結果は、様々な応用におけるIDBM手順の競合性能を実証する多数の数値実験によって補完される。生成モデリングの最近の進歩は、拡散過程の時間反転を用いて、単純な分布をデータ分布に大まかに輸送する生成過程を定義する。代替案として, idbm手順の第1イテレーションを, このトランスポートを実現する近似フリー手法として用いることを提案する。このアプローチは生成過程のダイナミクスを選択する際の柔軟性を高め、より長い離散化間隔でより高速なトレーニングと優れたサンプル品質を示す。実装面では、必要な修正は最小限の侵入的であり、生成サンプリングに必要な変更はなく、トレーニング損失計算に限定される。 The dynamic Schr\"odinger bridge problem seeks a stochastic process that defines a transport between two target probability measures, while optimally satisfying the criteria of being closest, in terms of Kullback-Leibler divergence, to a reference process. We propose a novel sampling-based iterative algorithm, the iterated diffusion bridge mixture transport (IDBM), aimed at solving the dynamic Schr\"odinger bridge problem. The IDBM procedure exhibits the attractive property of realizing a valid coupling between the target measures at each step. We perform an initial theoretical investigation of the IDBM procedure, establishing its convergence properties. The theoretical findings are complemented by numerous numerical experiments illustrating the competitive performance of the IDBM procedure across various applications. Recent advancements in generative modeling employ the time-reversal of a diffusion process to define a generative process that approximately transports a simple distribution to the data distribution. As an alternative, we propose using the first iteration of the IDBM procedure as an approximation-free method for realizing this transport. This approach offers greater flexibility in selecting the generative process dynamics and exhibits faster training and superior sample quality over longer discretization intervals. In terms of implementation, the necessary modifications are minimally intrusive, being limited to the training loss computation, with no changes necessary for generative sampling.	翻訳日:2023-04-04 15:39:44 公開日:2023-04-03
# DreamAvatar: 拡散モデルによる3次元人体アバター生成 DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models ( http://arxiv.org/abs/2304.00916v1 ) ライセンス: Link先を確認	Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong	(参考訳) 筆者はdreamavatarという,高品質な3dアバターを制御可能なポーズで生成するためのテキスト・アンド・シェイプガイドフレームワークを提案する。近年,テキストガイドによる3次元共通物体生成の手法が提案されているが,人体の形状・ポーズ・外観が複雑化しているため,高品質なアバターの生成が課題となっている。この課題に対処するためにDreamAvatarを提案する。これは3Dポイントの密度と色の特徴を予測するためのトレーニング可能なNeRFと、2Dセルフスーパービジョンを提供するための事前訓練されたテキスト-画像拡散モデルである。具体的には、SMPLモデルを利用して、生成のための粗いポーズと形状ガイダンスを提供する。我々は、標準空間と観測空間からなる双対空間設計を導入する。これは、学習可能な変形場によってNeRFを介して関連付けられ、最適化されたテクスチャと幾何を標準空間から目標とするアバターへ転送することができる。さらに,より詳細な形状とテクスチャを持ったより鮮明な生成を可能にするために,正規性正規化を利用する。広範な評価を通じて,DreamAvatarは既存の手法を著しく上回り,テキスト・アンド・シェイプ3次元世代のための新しい最先端技術を確立した。 We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been produced by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tackle this challenge, which utilizes a trainable NeRF for predicting density and color features for 3D points and a pre-trained text-to-image diffusion model for providing 2D self-supervision. Specifically, we leverage SMPL models to provide rough pose and shape guidance for the generation. We introduce a dual space design that comprises a canonical space and an observation space, which are related by a learnable deformation field through the NeRF, allowing for the transfer of well-optimized texture and geometry from the canonical space to the target posed avatar. Additionally, we exploit a normal-consistency regularization to allow for more vivid generation with detailed geometry and texture. Through extensive evaluations, we demonstrate that DreamAvatar significantly outperforms existing methods, establishing a new state-of-the-art for text-and-shape guided 3D human generation.	翻訳日:2023-04-04 15:39:22 公開日:2023-04-03
# e^{-i\pi n/2}\nabla_x ^{^{n}}\Psi =(E-\Delta(x) )\Psi $における境界状態の量子化条件 Quantization Condition of the Bound States in $e^{-i\pi n/2}\nabla_x ^{^{n}}\Psi =(E-\Delta(x) )\Psi $ ( http://arxiv.org/abs/2304.00914v1 ) ライセンス: Link先を確認	Xiong Fan	(参考訳) 一般近似量子化規則 $% \int_{L_{E}}^{R_{E}}k_0$ $dx=(N+\frac{1}{2})\pi $ for the bound states in the potential Well of the equations $e^{-i\pi n/2}\nabla ^{^{n}}\Psi =[E-\Delta (x)]\Psi ,$ where $k_0=(E-\Delta )^{1/n}$ with $N\in\mathbb{N}_{0} $, $n$ is an even natural number, $L_{E}$ and $R_{E}$ 古典的に禁止された領域の境界点が許される。唯一の仮説は、指数的に成長するすべての成分は無視可能であることである。 Schr\"{o}dinger 方程式や Bogoliubov-de Gennes 方程式を含む応用について論じる。 We will prove a general approximate quantization rule $% \int_{L_{E}}^{R_{E}}k_0$ $dx=(N+\frac{1}{2})\pi $ for the bound states in the potential well of the equations $e^{-i\pi n/2}\nabla_x ^{^{n}}\Psi =[E-\Delta (x)]\Psi ,$ where $k_0=(E-\Delta )^{1/n}$ with $N\in\mathbb{N}_{0} $, $n$ is an even natural number, and $L_{E}$ and $R_{E}$ the boundary points between the classically forbidden regions and the allowed region. The only hypothesis is that all exponentially growing components are negligible, which is appropriate for not narrow wells. Applications including the Schr\"{o}dinger equation and Bogoliubov-de Gennes equation will be discussed.	翻訳日:2023-04-04 15:38:56 公開日:2023-04-03
# ランダム関数型ニューラルネットワークの特性と応用の可能性 Properties and Potential Applications of Random Functional-Linked Types of Neural Networks ( http://arxiv.org/abs/2304.00957v1 ) ライセンス: Link先を確認	Guang-Yong Chen, Yong-Hang Yu, Min Gan, C. L. Philip Chen, Wenzhong Guo	(参考訳) ランダム関数リンク型ニューラルネットワーク(RFLNN)、例えば、極端学習機械(ELM)と広範学習システム(BLS)は、時間を要するトレーニングプロセスに苦しむことを回避し、深層構造における学習の代替手段を提供する。 rflnnは様々な分類と回帰タスクで優れた性能を達成しているが、これらのネットワークの性質と説明は以前の研究では無視されている。本稿では、rflnnの特性について、周波数領域の観点から考察し、これらのネットワークにおける周波数原理の存在、すなわち、低頻度を迅速に捕捉し、トレーニングプロセス中に高周波数成分を適合させることを見出した。これらの発見は、rflnnの理解と応用拡大に有用である。周波数原理によって導かれ、より優れた性能でBLSネットワークを生成する方法を提案し、ジャコビ反復法とBLSネットワークに現れる異なる周波数原理の観点から、ポゾン方程式を解くための効率的なアルゴリズムを設計する。 Random functional-linked types of neural networks (RFLNNs), e.g., the extreme learning machine (ELM) and broad learning system (BLS), which avoid suffering from a time-consuming training process, offer an alternative way of learning in deep structure. The RFLNNs have achieved excellent performance in various classification and regression tasks, however, the properties and explanations of these networks are ignored in previous research. This paper gives some insights into the properties of RFLNNs from the viewpoints of frequency domain, and discovers the presence of frequency principle in these networks, that is, they preferentially capture low-frequencies quickly and then fit the high frequency components during the training process. These findings are valuable for understanding the RFLNNs and expanding their applications. Guided by the frequency principle, we propose a method to generate a BLS network with better performance, and design an efficient algorithm for solving Poison's equation in view of the different frequency principle presenting in the Jacobi iterative method and BLS network.	翻訳日:2023-04-04 15:32:06 公開日:2023-04-03
# AirLoc: オブジェクトベースの屋内再ローカライゼーション AirLoc: Object-based Indoor Relocalization ( http://arxiv.org/abs/2304.00954v1 ) ライセンス: Link先を確認	Aryan, Bowen Li, Sebastian Scherer, Yun-Jou Lin, Chen Wang	(参考訳) 屋内再ローカライズは、自律探索のようなロボットのタスクと、ショッピングモールでの携帯電話によるナビゲーションのような民間用途の両方に不可欠である。従来の手法では、キーポイントの特徴や局所的なテクスチャなどの幾何学的情報を用いて屋内再局在を行うが、視覚的に類似したシーンを持つ環境では容易に失敗するか、多くのデータベースイメージを必要とする。人間がユニークなランドマークを認識して場所を覚えているという事実にインスパイアされた私たちは、幾何学的要素よりも有益である物体に頼る。そこで本研究では,AirLocと呼ばれるシンプルなオブジェクトベース屋内再配置手法を提案する。オブジェクト再識別とオブジェクト関係の記憶という重要な課題を克服するために,オブジェクトの外観の埋め込みとオブジェクト間の幾何学的関係を抽出する。幾何学的特徴と外観特徴を統合して累積的なシーン特徴を生成する。その結果、ロバストで正確でポータブルな屋内再局在システムとなり、室内レベルの再局在における最先端の手法を9.5%、精度7%で上回る結果となった。徹底的な評価に加えて, 重度の咬合, 知覚的エイリアス, 視点シフト, 変形などの課題において, 気流のロバスト性を示す実世界テストも実施する。 Indoor relocalization is vital for both robotic tasks like autonomous exploration and civil applications such as navigation with a cell phone in a shopping mall. Some previous approaches adopt geometrical information such as key-point features or local textures to carry out indoor relocalization, but they either easily fail in an environment with visually similar scenes or require many database images. Inspired by the fact that humans often remember places by recognizing unique landmarks, we resort to objects, which are more informative than geometry elements. In this work, we propose a simple yet effective object-based indoor relocalization approach, dubbed AirLoc. To overcome the critical challenges of object reidentification and remembering object relationships, we extract object-wise appearance embedding and inter-object geometric relationships. The geometry and appearance features are integrated to generate cumulative scene features. This results in a robust, accurate, and portable indoor relocalization system, which outperforms the state-of-the-art methods in room-level relocalization by 9.5% of PR-AUC and 7% of accuracy. In addition to exhaustive evaluation, we also carry out real-world tests, where AirLoc shows robustness in challenges like severe occlusion, perceptual aliasing, viewpoint shift, and deformation.	翻訳日:2023-04-04 15:31:47 公開日:2023-04-03
# バイナリニューラルネットワークにおけるデータフローの最適化 Optimizing data-flow in Binary Neural Networks ( http://arxiv.org/abs/2304.00952v1 ) ライセンス: Link先を確認	L. Vorabbi, D. Maltoni, S. Santi	(参考訳) バイナリニューラルネットワーク(BNN)は、高価な浮動小数点演算をビット演算に置き換えることで、ニューラルネットワークの推論時間を著しく加速することができる。しかし、既存のソリューションの多くはBNN層のデータフローを完全に最適化していないため、1ビットから16/32ビットへの中間変換は効率を損なうことが多い。我々は,BNNパイプラインにおけるデータフローと並列性を向上する新たなトレーニング手法を提案し,具体的には,データ幅を32ビットから8ビットに削減するクリッピングブロックを提案する。さらに、通常32ビットで保持されるバイナリ層の内部アキュムレータのサイズを小さくし、精度を損なうことなくデータのオーバーフローを防止する。さらに、レイテンシを低減し、デプロイを簡単にするBatch Normalizationレイヤの最適化も提供しています。最後に、ARM命令セットに対するバイナリ直接変換の最適化実装を提案する。実験の結果,少なくとも1つの完全精度モデルに対して精度を低下させることなく,推論速度を一貫した改善(最先端の2つのBNNフレームワークと比較して最大1.91と2.73倍)した。 Binary Neural Networks (BNNs) can significantly accelerate the inference time of a neural network by replacing its expensive floating-point arithmetic with bitwise operations. Most existing solutions, however, do not fully optimize data flow through the BNN layers, and intermediate conversions from 1 to 16/32 bits often further hinder efficiency. We propose a novel training scheme that can increase data flow and parallelism in the BNN pipeline; specifically, we introduce a clipping block that decreases the data-width from 32 bits to 8. Furthermore, we reduce the internal accumulator size of a binary layer, usually kept using 32-bit to prevent data overflow without losing accuracy. Additionally, we provide an optimization of the Batch Normalization layer that both reduces latency and simplifies deployment. Finally, we present an optimized implementation of the Binary Direct Convolution for ARM instruction sets. Our experiments show a consistent improvement of the inference speed (up to 1.91 and 2.73x compared to two state-of-the-art BNNs frameworks) with no drop in accuracy for at least one full-precision model.	翻訳日:2023-04-04 15:31:25 公開日:2023-04-03
# 人工樹状体計算 : 神経形回路における樹状体の場合 Artificial Dendritic Computation: The case for dendrites in neuromorphic circuits ( http://arxiv.org/abs/2304.00951v1 ) ライセンス: Link先を確認	Daniel John Mannion, Anthony John Kenyon	(参考訳) バイオインスパイアされたコンピューティングは、ニューロンとシナプスに焦点を当て、大きな成功を収めている。しかし、これらのデンドライトのつながりも重要な役割を担っている。本稿では,デンドリティック計算を複製する動機について検討し,その構築における今後の試みを導く枠組みを提案する。このフレームワークはデンドライトの重要な性質を特定し,音像定位処理におけるデンドライト計算の例を示す。我々は,BiLSTMニューラルネットワークの性能に及ぼすデンドライトの影響を評価し,デンドライト前処理がしきい値性能に必要なネットワークサイズを減らすことを発見した。 Bio-inspired computing has focused on neuron and synapses with great success. However, the connections between these, the dendrites, also play an important role. In this paper, we investigate the motivation for replicating dendritic computation and present a framework to guide future attempts in their construction. The framework identifies key properties of the dendrites and presents and example of dendritic computation in the task of sound localisation. We evaluate the impact of dendrites on an BiLSTM neural network's performance, finding that dendrite pre-processing reduce the size of network required for a threshold performance.	翻訳日:2023-04-04 15:31:09 公開日:2023-04-03
# 複数の産業エンティティの半自動コンピュータビジョンによる追跡 -フレームワークとデータセット作成アプローチ- Semi-Automated Computer Vision based Tracking of Multiple Industrial Entities -- A Framework and Dataset Creation Approach ( http://arxiv.org/abs/2304.00950v1 ) ライセンス: Link先を確認	J\'er\^ome Rutinowski, Hazem Youssef, Sven Franke, Irfan Fachrudin Priyanta, Frederik Polachowski, Moritz Roidl, Christopher Reining	(参考訳) この貢献は、産業エンティティ(例えばパレット、クレート、バレル)を6台のrgbカメラのネットワーク上で連続的に追跡するためのフレームワークであるtomie framework (tracking of multiple industrial entities)を提示している。このフレームワークは、複数のセンサー、データパイプライン、データアノテーション手順を使用しており、このコントリビューションで詳細に説明されている。産業部門のための完全自動化トラッキングシステムのビジョンを念頭に置いて、研究者は産業環境で効率的に高品質なデータをキャプチャできる。このフレームワークを使用すると、画像データセットであるTOMIEデータセットが作成され、同時にフレームワークの妥当性を評価するために使用される。このデータセットには、112,860フレームのアノテーションファイルと640,936のエンティティインスタンスが含まれている。このデータセットは、同等のデータセットを4倍にスケールし、ウェアハウスセクターの産業アプリケーションから引き出されたシナリオで構成されている。このデータセットにはByteTrack、Bot-Sort、SiamMOTという3つのトラッキングアルゴリズムが適用される。 This contribution presents the TOMIE framework (Tracking Of Multiple Industrial Entities), a framework for the continuous tracking of industrial entities (e.g., pallets, crates, barrels) over a network of, in this example, six RGB cameras. This framework, makes use of multiple sensors, data pipelines and data annotation procedures, and is described in detail in this contribution. With the vision of a fully automated tracking system for industrial entities in mind, it enables researchers to efficiently capture high quality data in an industrial setting. Using this framework, an image dataset, the TOMIE dataset, is created, which at the same time is used to gauge the framework's validity. This dataset contains annotation files for 112,860 frames and 640,936 entity instances that are captured from a set of six cameras that perceive a large indoor space. This dataset out-scales comparable datasets by a factor of four and is made up of scenarios, drawn from industrial applications from the sector of warehousing. Three tracking algorithms, namely ByteTrack, Bot-Sort and SiamMOT are applied to this dataset, serving as a proof-of-concept and providing tracking results that are comparable to the state of the art.	翻訳日:2023-04-04 15:31:00 公開日:2023-04-03
# VTAE:マニフォールド学習を用いた変分変換器オートエンコーダ VTAE: Variational Transformer Autoencoder with Manifolds Learning ( http://arxiv.org/abs/2304.00948v1 ) ライセンス: Link先を確認	Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Dacheng Tao, Xuelong Li	(参考訳) 深層生成モデルは、複数の潜伏変数を通して非線形データ分布を学習する成功例を示し、これらのモデルは潜伏サンプルをデータ空間にマッピングするために非線形関数(ジェネレータ)を使用する。一方、ジェネレータの非線形性は、潜在空間がデータ空間の不満足な投影を示し、表現学習が不十分であることを意味する。しかし、この弱射影はリーマン計量によって対処することができ、リーマン多様体上のデータサンプル間の測地計算と正確な補間が、深い生成モデルの性能を大幅に改善できることを示す。本稿では、リーマン多様体上の測地線を最小化し、表現学習を改善するために、変分空間変換オートエンコーダ(VTAE)を提案する。特に,空間変換器を符号化した変分オートエンコーダを慎重に設計し,潜在変数モデルをリーマン多様体上のデータに明示的に拡張し,大域的文脈モデリングを実現する。さらに, 2つの異なる対象の潜在表現間を横断しながら, 滑らかで妥当な補間を行うため, 性能の劣る線形補間を用いる既存モデルとは異なる測地補間ネットワークを提案する。ベンチマーク実験により,画像補間や再構成を含む様々なコンピュータビジョンタスクに対して,提案モデルにより予測精度と汎用性を向上できることが示された。 Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables and these models use a nonlinear function (generator) to map latent samples into the data space. On the other hand, the nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. This weak projection, however, can be addressed by a Riemannian metric, and we show that geodesics computation and accurate interpolations between data samples on the Riemannian manifold can substantially improve the performance of deep generative models. In this paper, a Variational spatial-Transformer AutoEncoder (VTAE) is proposed to minimize geodesics on a Riemannian manifold and improve representation learning. In particular, we carefully design the variational autoencoder with an encoded spatial-Transformer to explicitly expand the latent variable model to data on a Riemannian manifold, and obtain global context modelling. Moreover, to have smooth and plausible interpolations while traversing between two different objects' latent representations, we propose a geodesic interpolation network different from the existing models that use linear interpolation with inferior performance. Experiments on benchmarks show that our proposed model can improve predictive accuracy and versatility over a range of computer vision tasks, including image interpolations, and reconstructions.	翻訳日:2023-04-04 15:30:38 公開日:2023-04-03
# 再現:相対ポーズ注意シーン表現トランスフォーマ RePAST: Relative Pose Attention Scene Representation Transformer ( http://arxiv.org/abs/2304.00947v1 ) ライセンス: Link先を確認	Aleksandr Safin, Daniel Durckworth, Mehdi S. M. Sajjadi	(参考訳) SRT(Scene Representation Transformer)はインタラクティブなレートで新しいビューを描画する手法である。 SRTは任意に選択された参照カメラに対してカメラポーズを使用するため、入力ビューの順序に不変ではない。その結果、SRTは参照フレームを定期的に変更する必要がある大規模シーンには直接適用できない。本研究では,入力に基準フレームを固定する代わりに,対方向の相対カメラポーズ情報をトランスフォーマの注意機構に直接注入する相対ポーズ注意srt(repast)を提案する。これは定義上、任意のグローバル参照フレームの選択に不変でありながら、元のメソッドの完全な能力を保っているモデルにつながる。経験的な結果は、モデルにこの不変性を加えると品質が低下しないことを示している。これは、完全に潜在的なトランスフォーマーベースのレンダリング方法を大規模シーンに適用するためのステップであると考えています。 The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a result, SRT is not directly applicable to large-scale scenes where the reference frame would need to be changed regularly. In this work, we propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference frame at the input, we inject pairwise relative camera pose information directly into the attention mechanism of the Transformers. This leads to a model that is by definition invariant to the choice of any global reference frame, while still retaining the full capabilities of the original method. Empirical results show that adding this invariance to the model does not lead to a loss in quality. We believe that this is a step towards applying fully latent transformer-based rendering methods to large-scale scenes.	翻訳日:2023-04-04 15:30:14 公開日:2023-04-03
# MoLo:Few-shot行動認識のためのモーション強化ロングショートコントラスト学習 MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition ( http://arxiv.org/abs/2304.00946v1 ) ライセンス: Link先を確認	Xiang Wang, Shiwei Zhang, Zhiwu Qing, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang	(参考訳) 学習した視覚特徴のフレームレベルでのマッチングを行うことで、有望な性能を実現するための最先端のアクション認識手法しかし、一般的には2つの制限がある。一長期的時間的知覚を強制する指導の欠如により、局所的フレーム間の一致手続が不正確になる傾向があること。二明示的な動作学習は、通常無視され、部分的な情報を失うこと。これらの問題に対処するために、長短コントラスト目標と運動オートデコーダを含む2つの重要なコンポーネントを含む運動強化長短コントラスト学習法(MoLo)を開発した。特に、ロングショートのコントラストの目的は、同じクラスに属するビデオのグローバルトークンとの合意を最大化することで、ロングフォームな時間認識を伴うローカルフレームの特徴を付与することである。 motion autodecoderは、異なる特徴からピクセルの動きを再構築する軽量なアーキテクチャで、ネットワークにモーションダイナミクスを明示的に組み込む。これにより、MoLoは、広範囲の時間的コンテキストとモーションキューを同時に学習し、包括的な数ショットマッチングを行うことができる。提案手法の有効性を示すために,MoLoを5つの標準ベンチマークで評価し,MoLoが最近の先進的手法よりも良好に優れていることを示す。ソースコードはhttps://github.com/alibaba-mmai-research/moloで入手できる。 Current state-of-the-art approaches for few-shot action recognition achieve promising performance by conducting frame-level matching on learned visual features. However, they generally suffer from two limitations: i) the matching procedure between local frames tends to be inaccurate due to the lack of guidance to force long-range temporal perception; ii) explicit motion learning is usually ignored, leading to partial information loss. To address these issues, we develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder. Specifically, the long-short contrastive objective is to endow local frame features with long-form temporal awareness by maximizing their agreement with the global token of videos belonging to the same class. The motion autodecoder is a lightweight architecture to reconstruct pixel motions from the differential features, which explicitly embeds the network with motion dynamics. By this means, MoLo can simultaneously learn long-range temporal context and motion cues for comprehensive few-shot matching. To demonstrate the effectiveness, we evaluate MoLo on five standard benchmarks, and the results show that MoLo favorably outperforms recent advanced methods. The source code is available at https://github.com/alibaba-mmai-research/MoLo.	翻訳日:2023-04-04 15:29:59 公開日:2023-04-03
# VCR修復の教訓: カリフォルニア州消費者プライバシ法(CCPA)によるAndroidアプリ開発者のコンプライアンス Lessons in VCR Repair: Compliance of Android App Developers with the California Consumer Privacy Act (CCPA) ( http://arxiv.org/abs/2304.00944v1 ) ライセンス: Link先を確認	Nikita Samarin, Shayna Kothari, Zaina Siyed, Oscar Bjorkman, Reena Yuan, Primal Wijesekera, Noura Alomar, Jordan Fischer, Chris Hoofnagle and Serge Egelman	(参考訳) カリフォルニア州消費者プライバシ法(CCPA)は、カリフォルニア州住民に幅広いプライバシー保護と権利を付与している。当社は,androidアプリ開発者がccpaの規定に準拠している程度を調査し,消費者がビジネス目的や商業目的のために収集,利用,共有した個人情報を開示することにより,消費者に正確なプライバシー通知を提供し,"検証可能な消費者要求"(vcrs)に対応するように要求する。私たちは、CCPAに従わなければならない109のアプリケーションの実際のネットワークトラフィックと、アプリがプライバシポリシで収集したデータと、アプリ開発者に提出した"知る権利"要求に対する応答に含まれるデータを比較しました。当社の要求に即応した69人のアプリ開発者のうち、特定の個人情報(分類情報のみではなく)を提供したのは1人だけだった。しかし、識別子(55のアプリ、80%)、位置情報データ(21のアプリ、30%)、知覚データ(18のアプリ、26%)など、開示されていない情報のかなりの割合が収集された。我々は、アプリ開発者が"知る権利"要求やその他の関連する規則に従うのに役立つCCPAの改善について議論する。 The California Consumer Privacy Act (CCPA) provides California residents with a range of enhanced privacy protections and rights. Our research investigated the extent to which Android app developers comply with the provisions of the CCPA that require them to provide consumers with accurate privacy notices and respond to "verifiable consumer requests" (VCRs) by disclosing personal information that they have collected, used, or shared about consumers for a business or commercial purpose. We compared the actual network traffic of 109 apps that we believe must comply with the CCPA to the data that apps state they collect in their privacy policies and the data contained in responses to "right to know" requests that we submitted to the app's developers. Of the 69 app developers who substantively replied to our requests, all but one provided specific pieces of personal data (as opposed to only categorical information). However, a significant percentage of apps collected information that was not disclosed, including identifiers (55 apps, 80%), geolocation data (21 apps, 30%), and sensory data (18 apps, 26%) among other categories. We discuss improvements to the CCPA that could help app developers comply with "right to know" requests and other related regulations.	翻訳日:2023-04-04 15:29:36 公開日:2023-04-03
# 半局所結合型ポテンシャルエネルギー面を用いた機械反応研究のためのマルチスケールプロトコル Multi-scale Protocol for Mechanistic Reaction Studies Using Semi-local Fitted Potential Energy Surfaces ( http://arxiv.org/abs/2304.00942v1 ) ライセンス: Link先を確認	Tomislav Piskor, Peter Pinski, Thilo Mast, Vladimir V. Rybkin	(参考訳) 本研究では,化学反応機構の日常的理論的研究のためのマルチスケールプロトコルを提案する。安価な電子構造法により駆動されるNudged-Elastic Band (NEB) 法を用いて, 本システムの初期反応経路をサンプリングした。経路上の一組の点に対するより正確な電子構造理論で再計算された力は、半局所反応性ポテンシャルエネルギー表面(PES)を生成するための機械学習技術(この場合、対称勾配領域機械学習またはsGDML)を装着し、反応体、生成物、遷移状態(TS)領域を受け入れる。このアプローチは単分子(エンジインのベルグマン環化)と双分子(S$_\text{N}$2置換)反応にうまく適用されている。特に, 正確な参照法(casscfとccsd)を用いた50～150のエネルギー-力評価では, 静止点ジオメトリ, 固有反応-配位, バリアに対して定性的合意を与える半局所的pesを構築することが可能である。さらに, 振動周波数と反応速度係数の定性的な一致を見出した。この手法の性能の重要な側面は、計算の労力を省くだけでなく、反応経路に沿って有意義な情報を抽出することを可能にするマルチスケールな性質である。 TSの性質や計算経済によらず、このプロトコルは容易に自動化され、機械的反応の研究に日常的に利用できる。 In this work, we propose a multi-scale protocol for routine theoretical studies of chemical reaction mechanisms. The initial reaction paths of our investigated systems are sampled using the Nudged-Elastic Band (NEB) method driven by a cheap electronic structure method. Forces recalculated at the more accurate electronic structure theory for a set of points on the path are fitted with a machine-learning technique (in our case symmetric gradient domain machine learning or sGDML) to produce a semi-local reactive Potential Energy Surface (PES), embracing reactants, products and transition state (TS) regions. This approach has been successfully applied to a unimolecular (Bergman cyclization of enediyne) and a bimolecular (S$_\text{N}$2 substitution) reaction. In particular, we demonstrate that with only 50 to 150 energy-force evaluations with the accurate reference methods (here CASSCF and CCSD) it is possible to construct a semi-local PES giving qualitative agreement for stationary-point geometries, intrinsic reaction-coordinates and barriers. Furthermore, we find a qualitative agreement in vibrational frequencies and reaction rate coefficients. The key aspect of the method's performance is its multi-scale nature, which not only saves computational effort but also allows extracting meaningful information along the reaction path, characterized by zero gradients in all but one direction. Agnostic to the nature of the TS and computationally economic, the protocol can be readily automated and routinely used for mechanistic reaction studies.	翻訳日:2023-04-04 15:29:13 公開日:2023-04-03
# 都市景観における共同2次元3次元マルチタスク学習:3次元検出,セグメンテーション,深さ推定 Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation ( http://arxiv.org/abs/2304.00971v1 ) ライセンス: Link先を確認	Hanrong Ye	(参考訳) 本報告は、Cityscapes-3Dに基づく新しい2D-3Dマルチタスク学習ベンチマークの実装を詳述したTaskPrompterの補足文書として機能する。 TaskPrompterが学習を統一する革新的なマルチタスクプロンプトフレームワークを発表 (i)タスクジェネリック表現 (ii)タスク固有の表現、及び (iii)これらの学習目的を異なるネットワークモジュールに分離する従来のアプローチとは対照的に,クロスタスクインタラクション。この統一されたアプローチは、巧妙な経験的構造設計の必要性を低減させるだけでなく、モデル全体の能力が3つの目的を同時に最適化することに集中するため、マルチタスクネットワークの表現学習能力を大幅に向上させる。 taskprompterはcityscapes-3dデータセットに基づく新しいマルチタスクベンチマークを導入している。これは、モノクロ3d車両検出、セマンティックセグメンテーション、モノクロ深度推定の予測を同時生成するマルチタスクモデルを必要とする。これらのタスクは、特に自律運転システムの開発において、視覚シーンの2D-3Dの共同理解を達成するために不可欠である。この難解なベンチマークでは,マルチタスクモデルは,単一タスクのステート・オブ・ザ・アート法と比較して強い性能を示し,挑戦的な3次元検出と深さ推定タスクにおいて新たな最先端結果を確立する。 This report serves as a supplementary document for TaskPrompter, detailing its implementation on a new joint 2D-3D multi-task learning benchmark based on Cityscapes-3D. TaskPrompter presents an innovative multi-task prompting framework that unifies the learning of (i) task-generic representations, (ii) task-specific representations, and (iii) cross-task interactions, as opposed to previous approaches that separate these learning objectives into different network modules. This unified approach not only reduces the need for meticulous empirical structure design but also significantly enhances the multi-task network's representation learning capability, as the entire model capacity is devoted to optimizing the three objectives simultaneously. TaskPrompter introduces a new multi-task benchmark based on Cityscapes-3D dataset, which requires the multi-task model to concurrently generate predictions for monocular 3D vehicle detection, semantic segmentation, and monocular depth estimation. These tasks are essential for achieving a joint 2D-3D understanding of visual scenes, particularly in the development of autonomous driving systems. On this challenging benchmark, our multi-task model demonstrates strong performance compared to single-task state-of-the-art methods and establishes new state-of-the-art results on the challenging 3D detection and depth estimation tasks.	翻訳日:2023-04-04 15:23:09 公開日:2023-04-03
# QSARのための等角予測手法の開発と評価 Development and Evaluation of Conformal Prediction Methods for QSAR ( http://arxiv.org/abs/2304.00970v1 ) ライセンス: Link先を確認	Yuting Xu, Andy Liaw, Robert P. Sheridan, Vladimir Svetnik	(参考訳) qsar回帰モデル(quantical structure-activity relationship)は、分子記述子を用いて化合物の生物活性を予測する手法である。 QSARモデルからの予測は、例えば分子構造を最適化し、化合物をさらなる実験的試験に優先し、毒性を推定するのに役立つ。活性の正確な推定に加えて、予測に関連する不確実性(例えば、特定の確率で真の分子活性を含む予測間隔(PI)を70%、90%、95%の確率で計算することなどが好ましい。課題は、予測性能の優れた機械学習(ML)アルゴリズムの多くは、予測の不確実性を推定するためにいくつかのアドオンメソッドを必要とすることである。これらのアルゴリズムの開発は統計およびMLコミュニティによる活発な研究領域であるが、QSARモデリングの実装は限定的である。共形予測(cp)は有望なアプローチである。予測アルゴリズムと無関係であり、データ分布の弱い仮定の下で有効な予測間隔を生成することができる。我々は,Deep Neural NetworksやGradient Boosting Machinesなど,最も高度なMLモデルに適した計算効率の高いCPアルゴリズムを提案する。提案する共形予測器の有効性と効率は,QSARデータセットの多種多様な収集とシミュレーション研究で実証された。 The quantitative structure-activity relationship (QSAR) regression model is a commonly used technique for predicting biological activities of compounds using their molecular descriptors. Predictions from QSAR models can help, for example, to optimize molecular structure; prioritize compounds for further experimental testing; and estimate their toxicity. In addition to the accurate estimation of the activity, it is highly desirable to obtain some estimate of the uncertainty associated with the prediction, e.g., calculate a prediction interval (PI) containing the true molecular activity with a pre-specified probability, say 70%, 90% or 95%. The challenge is that most machine learning (ML) algorithms that achieve superior predictive performance require some add-on methods for estimating uncertainty of their prediction. The development of these algorithms is an active area of research by statistical and ML communities but their implementation for QSAR modeling remains limited. Conformal prediction (CP) is a promising approach. It is agnostic to the prediction algorithm and can produce valid prediction intervals under some weak assumptions on the data distribution. We proposed computationally efficient CP algorithms tailored to the most advanced ML models, including Deep Neural Networks and Gradient Boosting Machines. The validity and efficiency of proposed conformal predictors are demonstrated on a diverse collection of QSAR datasets as well as simulation studies.	翻訳日:2023-04-04 15:22:44 公開日:2023-04-03
# いつももっといいの? 推薦者システムにおける説明の知覚に及ぼす個人的特徴と詳細度の影響 Is More Always Better? The Effects of Personal Characteristics and Level of Detail on the Perception of Explanations in a Recommender System ( http://arxiv.org/abs/2304.00969v1 ) ライセンス: Link先を確認	Mohamed Amine Chatti and Mouadh Guesmi and Laura Vorgerd and Thao Ngo and Shoeb Joarder and Qurat Ul Ain and Arham Muslim	(参考訳) 説明の認識はエンドユーザによって大きく異なる可能性があるが、説明可能なレコメンデーションシステム(rs)は伝統的に1サイズモデルに従っており、個々のユーザのコンテキスト、すなわち目標や個人の特性を考慮せずに、各ユーザに対して同じ詳細な説明レベルを提供する。この研究ギャップを埋めるため,本稿では,ユーザエージェンシーに,どの説明を見たいかを決めることによって,パーソナライズされたアプローチから,パーソナライズ可能なレコメンデーションへの転換を目指す。我々は,様々なタイプのエンドユーザの要求を満たすために,3段階の詳細な情報(基本,中間,高度)を,オンデマンドでパーソナライズしたレコメンデーションの説明を提供する透明なレコメンデーションと関心モデリングアプリケーション(RIMA)を開発した。対象内調査(n=31)を行い,利用者の個人的特性と詳細な説明レベルとの関係と,これら2つの変数が異なる説明目標に対する説明可能なrsの知覚に及ぼす影響について検討した。その結果,細部レベルの異なる説明可能なrsの認識は,説明目標とユーザタイプによって異なる程度に影響を受けることがわかった。そこで本稿では,ユーザのコンテキストに合わせて記述インタフェースの体系設計を支援するための理論的および設計ガイドラインを提案する。 Despite the acknowledgment that the perception of explanations may vary considerably between end-users, explainable recommender systems (RS) have traditionally followed a one-size-fits-all model, whereby the same explanation level of detail is provided to each user, without taking into consideration individual user's context, i.e., goals and personal characteristics. To fill this research gap, we aim in this paper at a shift from a one-size-fits-all to a personalized approach to explainable recommendation by giving users agency in deciding which explanation they would like to see. We developed a transparent Recommendation and Interest Modeling Application (RIMA) that provides on-demand personalized explanations of the recommendations, with three levels of detail (basic, intermediate, advanced) to meet the demands of different types of end-users. We conducted a within-subject study (N=31) to investigate the relationship between user's personal characteristics and the explanation level of detail, and the effects of these two variables on the perception of the explainable RS with regard to different explanation goals. Our results show that the perception of explainable RS with different levels of detail is affected to different degrees by the explanation goal and user type. Consequently, we suggested some theoretical and design guidelines to support the systematic design of explanatory interfaces in RS tailored to the user's context.	翻訳日:2023-04-04 15:22:25 公開日:2023-04-03
# 歴史的物体予測による多視点3次元物体検出器の時間的訓練 Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction ( http://arxiv.org/abs/2304.00967v1 ) ライセンス: Link先を確認	Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu	(参考訳) 本稿では,時間的情報をより効果的に活用するための,多視点3D検出のための新しいパラダイムである履歴オブジェクト予測(HoP)を提案する。現在のタイムスタンプtを考えると、隣接するフレームからタイムスタンプt-kの擬似Bird's-Eye View(BEV)機能を生成し、この機能を使用してタイムスタンプt-kに設定されたオブジェクトを予測する。我々のアプローチは、歴史的タイムスタンプで発生する物体の空間的位置と時間的動きを検知するために検出器を強制することが、より正確なBEV特徴学習につながるという観察によって動機づけられている。まず,短期および長期の時間デコーダを精巧に設計し,対応するカメラ画像の関与なしにタイムスタンプt-kの擬似bev機能を生成する。第二に、生成された擬似BEV機能を用いて対象目標を予測するために、追加のオブジェクトデコーダを柔軟に取り付ける。トレーニング中にのみHoPを実行するので、提案手法は推論時に余分なオーバーヘッドを導入しない。プラグアンドプレイのアプローチとして、HoPはBEVFormerやBEVDetシリーズを含む最先端のBEV検出フレームワークに簡単に組み込める。さらに、補助的なHoPアプローチは、一般的な時間的モデリング手法と相補的であり、大幅な性能向上をもたらす。提案したHoPがnuScenesデータセットに与える影響を評価するために,大規模な実験を行った。 BEVFormerやBEVDet4D-Depthなど代表的手法を選択して評価する。驚いたことに、HoP は nuScenes テストで 68.5% の NDS と 62.4% の mAP を達成し、リーダーボード上の全ての3Dオブジェクト検出器を上回っている。コードはhttps://github.com/Sense-X/HoP.comから入手できる。 In this paper, we propose a new paradigm, named Historical Object Prediction (HoP) for multi-view 3D detection to leverage temporal information more effectively. The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our approach is motivated by the observation that enforcing the detector to capture both the spatial location and temporal motion of objects occurring at historical timestamps can lead to more accurate BEV feature learning. First, we elaborately design short-term and long-term temporal decoders, which can generate the pseudo BEV feature for timestamp t-k without the involvement of its corresponding camera images. Second, an additional object decoder is flexibly attached to predict the object targets using the generated pseudo BEV feature. Note that we only perform HoP during training, thus the proposed method does not introduce extra overheads during inference. As a plug-and-play approach, HoP can be easily incorporated into state-of-the-art BEV detection frameworks, including BEVFormer and BEVDet series. Furthermore, the auxiliary HoP approach is complementary to prevalent temporal modeling methods, leading to significant performance gains. Extensive experiments are conducted to evaluate the effectiveness of the proposed HoP on the nuScenes dataset. We choose the representative methods, including BEVFormer and BEVDet4D-Depth to evaluate our method. Surprisingly, HoP achieves 68.5% NDS and 62.4% mAP with ViT-L on nuScenes test, outperforming all the 3D object detectors on the leaderboard. Codes will be available at https://github.com/Sense-X/HoP.	翻訳日:2023-04-04 15:21:59 公開日:2023-04-03
# StyleGANとCLIPの潜在空間における方向を適応的に探索するロバストテキスト駆動画像編集法 Robust Text-driven Image Editing Method that Adaptively Explores Directions in Latent Spaces of StyleGAN and CLIP ( http://arxiv.org/abs/2304.00964v1 ) ライセンス: Link先を確認	Tsuyoshi Baba, Kosuke Nishida, Kyosuke Nishida	(参考訳) 自動画像編集には多くの応用があるため大きな需要があり、ユーザが想像するように柔軟で直感的な編集を実現するためには自然言語命令の使用が不可欠である。テキスト駆動画像編集における先駆的な作業であるStyleCLIPは、CLIP空間の編集方向を見つけ、その方向をStyleGAN空間にマッピングすることで画像を編集する。同時に、原画像以外の適切な入力と、画像編集のためのテキスト命令を調整することは困難である。本研究では,SVMを用いたStyleGANとCLIP空間における編集方向を適応的に構築する手法を提案する。本モデルは,SVMをトレーニングして正負の画像を分類したCLIP空間において,編集方向を正規ベクトルとして表現する。画像は、画像とテキスト命令のCLIP類似性に従って、StyleGANの事前トレーニングに使用された大規模な画像コーパスから検索される。提案方式はStyleCLIPベースラインと同様に動作し,計算時間を増やすことなく簡単な入力が可能であることを確認した。 Automatic image editing has great demands because of its numerous applications, and the use of natural language instructions is essential to achieving flexible and intuitive editing as the user imagines. A pioneering work in text-driven image editing, StyleCLIP, finds an edit direction in the CLIP space and then edits the image by mapping the direction to the StyleGAN space. At the same time, it is difficult to tune appropriate inputs other than the original image and text instructions for image editing. In this study, we propose a method to construct the edit direction adaptively in the StyleGAN and CLIP spaces with SVM. Our model represents the edit direction as a normal vector in the CLIP space obtained by training a SVM to classify positive and negative images. The images are retrieved from a large-scale image corpus, originally used for pre-training StyleGAN, according to the CLIP similarity between the images and the text instruction. We confirmed that our model performed as well as the StyleCLIP baseline, whereas it allows simple inputs without increasing the computational time.	翻訳日:2023-04-04 15:21:22 公開日:2023-04-03
# キャビティ光学におけるダークモード工学によるメカニカルクイズリングの制御可能生成 Controllable generation of mechanical quadrature squeezing via dark-mode engineering in cavity optomechanics ( http://arxiv.org/abs/2304.00963v1 ) ライセンス: Link先を確認	Jian Huang, Deng-Gao Lai, and Jie-Qiao Liao	(参考訳) 量子スクイージングは、量子精度測定や連続可変量子情報処理のような現代の量子技術において重要な資源である。メカニカルモードの圧縮状態の生成は、キャビティ光学において重要な課題である。近年のマルチモード光学への関心に触発され、マルチメカニカル共振器における二次スキューズ生成の興味深い話題となっている。しかし、多重縮退型メカニカルモード光学系では、ダークモード効果はメカニカルモードの量子効果を強く抑制する。本稿では, 合成ゲージ場法でダークモード効果を破り, メカニカルモード光学系におけるメカニカルスクイーズの発生について検討する。また, 機械モードが有限温度で作用すると, ダークモード効果により機械的なスクイーズが弱くなり, 消滅するのに対し, ダークモード効果が破られると強い機械的なスクイーズが発生することがわかった。特に、メカニカルスクイージングの熱-フォノン占有耐性は、ダークモード効果を壊さずに、それよりも約3桁大きい。また、この手法を一般化してダークモードを破り、マルチメカニカルモードの光学系で機械的スクイーズを生成する。本研究は, 一般の物理機構を記述し, ノイズ耐性量子リソース生成への道を開く。 Quantum squeezing is an important resource in modern quantum technologies, such as quantum precision measurement and continuous-variable quantum information processing. The generation of squeezed states of mechanical modes is a significant task in cavity optomechanics. Motivated by recent interest in multimode optomechanics, it becomes an interesting topic to create quadrature squeezing in multiple mechanical resonators. However, in the multiple-degenerate-mechanical-mode optomechanical systems, the dark-mode effect strongly suppresses the quantum effects in mechanical modes. Here we study the generation of mechanical squeezing in a two-mechanical-mode optomechanical system by breaking the dark-mode effect with the synthetic-gauge-field method. We find that when the mechanical modes work at a finite temperature, the mechanical squeezing is weak or even disappeared due to the dark-mode effect, while the strong mechanical squeezing can be generated once the dark-mode effect is broken. In particular, the thermal-phonon-occupation tolerance of the mechanical squeezing is approximately three orders of magnitude larger than that without breaking the dark-mode effect. We also generalize this method to break the dark modes and to create the mechanical squeezing in a multiple-mechanical-mode optomechanical system. Our results describe a general physical mechanism and pave the way towards the generation of noise-resistant quantum resources.	翻訳日:2023-04-04 15:21:02 公開日:2023-04-03
# regionplc: オープンワールド3dシーン理解のための局所的ポイント言語コントラスト学習 RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding ( http://arxiv.org/abs/2304.00962v1 ) ライセンス: Link先を確認	Jihan Yang, Runyu Ding, Zhe Wang, Xiaojuan Qi	(参考訳) 既存の3Dシーン理解タスクは、クローズセットベンチマークで高いパフォーマンスを達成したが、現実のアプリケーションでは新しいカテゴリを処理できなかった。そこで本研究では,オープンボキャブラリー認識機能を備えたクローズドセットデータセット上で学習されたモデルを取り入れた,open-world 3dシーン理解のための地域的ポイント言語コントラスト学習フレームワークであるregionplcを提案する。本研究では,2次元基礎モデルから地域レベルの視覚言語知識をキャプションを通して引き出すための密集した視覚プロンプトを提案する。次に,シーン理解のためのキャプションから無意味なポイント独立学習を可能にするために,ポイント識別型コントラスト学習目標を設計する。 ScanNet, ScanNet200, nuScenesデータセットについて広範な実験を行った。我々のRereaPLCは,従来の3次元オープンワールドシーン理解手法を,セマンティックスとインスタンスセグメンテーションで平均11.6\%,平均6.6\%で大幅に上回っている。また、トレーニングと推論のコストが低い人間のアノテーションがない場合に、オープンワールドが有望な結果を示す。コードはリリースされる。 Existing 3D scene understanding tasks have achieved high performance on close-set benchmarks but fail to handle novel categories in real-world applications. To this end, we propose a Regional Point-Language Contrastive learning framework, namely RegionPLC, for open-world 3D scene understanding, which equips models trained on closed-set datasets with open-vocabulary recognition capabilities. We propose dense visual prompts to elicit region-level visual-language knowledge from 2D foundation models via captioning, which further allows us to build dense regional point-language associations. Then, we design a point-discriminative contrastive learning objective to enable point-independent learning from captions for dense scene understanding. We conduct extensive experiments on ScanNet, ScanNet200, and nuScenes datasets. Our RegionPLC significantly outperforms previous base-annotated 3D open-world scene understanding approaches by an average of 11.6\% and 6.6\% for semantic and instance segmentation, respectively. It also shows promising open-world results in absence of any human annotation with low training and inference costs. Code will be released.	翻訳日:2023-04-04 15:20:39 公開日:2023-04-03
# セルフオーダーポイント雲 Self-Ordering Point Clouds ( http://arxiv.org/abs/2304.00961v1 ) ライセンス: Link先を確認	Pengwan Yang, Yuki M. Asano, Cees G. M. Snoek	(参考訳) 本稿では,3次元点群内の点の代表的な部分集合を点順順序で見つけるタスクについて述べる。ポイントとクラウドのラベルを取得するのが難しいため、この困難なビジョン問題に対処する試みはごくわずかである。これらの作業とは違って,我々はセルフスーパービジョン(self-supervision)と呼ばれる3dポイントクラウドにおけるポイントワイズオーダリングのタスクを導入する。さらに、自己教師型でポイントワイズを学習する最初のエンドツーエンドのトレーニング可能なネットワークにも貢献する。新たな微分可能な点採点ソート戦略を採用し、階層的なコントラストスキームを構築して自己スーパービジョン信号を得る。複数のデータセットやタスクの教師付き順序付け手法と比較しても,この手法を広範囲に拡張し,スケーラビリティと優れた性能を示す。 In this paper we address the task of finding representative subsets of points in a 3D point cloud by means of a point-wise ordering. Only a few works have tried to address this challenging vision problem, all with the help of hard to obtain point and cloud labels. Different from these works, we introduce the task of point-wise ordering in 3D point clouds through self-supervision, which we call self-ordering. We further contribute the first end-to-end trainable network that learns a point-wise ordering in a self-supervised fashion. It utilizes a novel differentiable point scoring-sorting strategy and it constructs an hierarchical contrastive scheme to obtain self-supervision signals. We extensively ablate the method and show its scalability and superior performance even compared to supervised ordering methods on multiple datasets and tasks including zero-shot ordering of point clouds from unseen categories.	翻訳日:2023-04-04 15:20:21 公開日:2023-04-03
# DrBERT : フランスの医学・臨床領域におけるロバスト事前訓練モデル DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains ( http://arxiv.org/abs/2304.00958v1 ) ライセンス: Link先を確認	Yanis Labrak and Adrien Bazoge and Richard Dufour and Mickael Rouvier and Emmanuel Morin and B\'eatrice Daille and Pierre-Antoine Gourraud	(参考訳) 近年,学習済み言語モデル (PLM) は,幅広い自然言語処理(NLP)タスクにおいて最高の性能を達成している。最初のモデルは一般的なドメインデータに基づいてトレーニングされたが、特定のドメインをより効果的に扱うために特別なモデルが登場した。本稿では,医学領域におけるフランス語のPLMに関する独自の研究を提案する。私たちは初めて、webからの公開データと医療機関のプライベートデータの両方で訓練されたplmのパフォーマンスを比較しました。また, 生物医学的課題の組において, 異なる学習戦略を評価する。特に,既存のバイオメディカルPLMを外国語で活用し,対象とするデータに基づいて事前学習を行うことが可能であることを示す。最後に、DrBERTと呼ばれるフランスのバイオメディカル分野のためのPLMと、これらのモデルがトレーニングされているフリーライセンス下の医療データの最大コーパスをリリースする。 In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains. In this paper, we propose an original study of PLMs in the medical domain on French language. We compare, for the first time, the performance of PLMs trained on both public data from the web and private data from healthcare establishments. We also evaluate different learning strategies on a set of biomedical tasks. In particular, we show that we can take advantage of already existing biomedical PLMs in a foreign language by further pre-train it on our targeted data. Finally, we release the first specialized PLMs for the biomedical field in French, called DrBERT, as well as the largest corpus of medical data under free license on which these models are trained.	翻訳日:2023-04-04 15:20:06 公開日:2023-04-03
# セキュアなiotベースのデバイス監視サービスのためのフェデレーションカルマンフィルタ Federated Kalman Filter for Secure IoT-based Device Monitoring Services ( http://arxiv.org/abs/2304.00991v1 ) ライセンス: Link先を確認	Marc Jayson Baucas and Petros Spachos	(参考訳) デバイス監視サービスは、最近の技術の進化と、継続的に増加するモノのインターネット(IoT)デバイスによって人気が高まっている。人気のサービスには、デバイス位置情報を使用するサービスがある。しかし、これらのサービスはデータ収集と送信の性質上、プライバシーの問題にぶつかる。本研究では,フェデレートカルマンフィルタ(FKF)とフェデレーションラーニングアプローチとプライバシ保護のためのプライベートブロックチェーン技術を組み合わせたプラットフォームを導入する。標準カルマンフィルタ (kf) による受信信号強度指標 (rssi) に基づく位置推定手法に対する提案設計の精度について検討した。実験結果から、デバイス監視におけるRSSIに基づくローカライゼーションのためのデータ推定の改善の可能性が示された。 Device monitoring services have increased in popularity with the evolution of recent technology and the continuously increased number of Internet of Things (IoT) devices. Among the popular services are the ones that use device location information. However, these services run into privacy issues due to the nature of data collection and transmission. In this work, we introduce a platform incorporating Federated Kalman Filter (FKF) with a federated learning approach and private blockchain technology for privacy preservation. We analyze the accuracy of the proposed design against a standard Kalman Filter (KF) implementation of localization based on the Received Signal Strength Indicator (RSSI). The experimental results reveal significant potential for improved data estimation for RSSI-based localization in device monitoring.	翻訳日:2023-04-04 15:12:53 公開日:2023-04-03
# 反復的洗練と統計的結果検証によるループ内深層学習モデルの効率的な訓練 Efficient human-in-loop deep learning model training with iterative refinement and statistical result validation ( http://arxiv.org/abs/2304.00990v1 ) ライセンス: Link先を確認	Manuel Zahn, Douglas P. Perrin	(参考訳) 画像の注釈とラベル付けは、深層学習を医療データに適用する際の最大の課題である。現在のプロセスは時間とコストがかかるため、テクノロジを広く採用する上での制限要因となっている。さらに、最良のモデルを選択するためには、測定されたパフォーマンス改善が重要であることを検証することが重要です。本稿では,超音波イメージング機械学習パイプラインのためのデータクリーニングの必要な部分であるセグメンテーションを作成する方法を示す。本研究では,自動生成したトレーニングデータと高速人間の視覚チェックを活用し,時間/感情とコストを低く保ちながらモデルの精度を向上させる4段階の手法を提案する。また,統計解析の活用のために,複数回実施実験を行った。粗悪な品質の地上真実データと迅速な視覚検査は、より高価な人為的な地上真実データを用いて改良された初期ベースモデルを効率的に訓練する。本手法は、静的PHIを含む背景データを除去し、心臓超音波セグメンテーションタスクで実演する。実験を複数回行い、生徒のt-testをパフォーマンス分布で使用することで、意義が示される。 92%の単純なしきい値アルゴリズムの初期セグメンテーション精度を98%に改善した。複雑なアルゴリズムでトレーニングされたモデルの性能は、より貧弱な実行アルゴリズムと少量の高品質なデータとの事前トレーニングによって一致または打ち負かすことができる。ディープラーニングモデルに対する統計学的意義分析の導入は、測定された性能改善の検証に役立つ。この方法は、高品質なトレーニングデータを取得するコストと労力を最小にしつつ、精度の高いモデルを達成するためのコスト効率と迅速なアプローチを提供する。 Annotation and labeling of images are some of the biggest challenges in applying deep learning to medical data. Current processes are time and cost-intensive and, therefore, a limiting factor for the wide adoption of the technology. Additionally validating that measured performance improvements are significant is important to select the best model. In this paper, we demonstrate a method for creating segmentations, a necessary part of a data cleaning for ultrasound imaging machine learning pipelines. We propose a four-step method to leverage automatically generated training data and fast human visual checks to improve model accuracy while keeping the time/effort and cost low. We also showcase running experiments multiple times to allow the usage of statistical analysis. Poor quality automated ground truth data and quick visual inspections efficiently train an initial base model, which is refined using a small set of more expensive human-generated ground truth data. The method is demonstrated on a cardiac ultrasound segmentation task, removing background data, including static PHI. Significance is shown by running the experiments multiple times and using the student's t-test on the performance distributions. The initial segmentation accuracy of a simple thresholding algorithm of 92% was improved to 98%. The performance of models trained on complicated algorithms can be matched or beaten by pre-training with the poorer performing algorithms and a small quantity of high-quality data. The introduction of statistic significance analysis for deep learning models helps to validate the performance improvements measured. The method offers a cost-effective and fast approach to achieving high-accuracy models while minimizing the cost and effort of acquiring high-quality training data.	翻訳日:2023-04-04 15:12:42 公開日:2023-04-03
# ホーキング効果はシュワルツシルト時空における量子テレポーテーションの忠実度を常に低下させるか? Does Hawking effect always degrade fidelity of quantum teleportation in Schwarzschild spacetime? ( http://arxiv.org/abs/2304.00984v1 ) ライセンス: Link先を確認	Xiao-Wei Fan, Hao-Yu Wu, Rui-Di Wang, Xiao-Li Huang, Hao-Sheng Zeng, Shu-Min Wu	(参考訳) 以前の研究では、ホーキング効果がシュワルツシルトブラックホールにおける量子相関と量子テレポーテーションの忠実性を破壊することが示されている。本稿では,シュワルツシルト時空におけるユーザ間のディラック場の量子テレポーテーションの忠実性について検討する。ホーキング温度の上昇に伴い、量子テレポーテーションの忠実度は、初期状態の選択に応じて単調に増加し、単調に減少し、あるいは非単調に増加し、つまりホーキング効果が量子テレポーテーションの純忠実度を生じさせる。この顕著な結果は、ブラックホールのホーキング効果が量子テレポーテーションの忠実さを損なうことができるという広範な信念を覆す。また、量子ステアリングはシュワルツシルト時空における量子テレポーテーションの完全性を保証することはできない。この新しい予期せぬ情報源は、ホーキング効果の実験的な証拠に新しいアイデアをもたらすかもしれない。 Previous studies have shown that the Hawking effect always destroys quantum correlations and the fidelity of quantum teleportation in the Schwarzschild black hole. Here, we investigate the fidelity of quantum teleportation of Dirac fields between users in Schwarzschild spacetime. We find that, with the increase of the Hawking temperature, the fidelity of quantum teleportation can monotonically increases, monotonically decreases, or non-monotonically increases, depending on the choice of the initial state, which means that the Hawking effect can create net fidelity of quantum teleportation. This striking result banishes the extended belief that the Hawking effect of the black hole can only destroy the fidelity of quantum teleportation. We also find that quantum steering cannot fully guarantee the fidelity of quantum teleportation in Schwarzschild spacetime. This new unexpected source may provide a new idea for the experimental evidence of the Hawking effect.	翻訳日:2023-04-04 15:11:46 公開日:2023-04-03
# 二層グラフェン量子ドットにおける長寿命バレー状態 Long-lived valley states in bilayer graphene quantum dots ( http://arxiv.org/abs/2304.00980v1 ) ライセンス: Link先を確認	Rebekka Garreis and Chuyao Tong and Jocelyn Terle and Max Josef Ruckriegel and Jonas Daniel Gerber and Lisa Maria G\"achter and Kenji Watanabe and Takashi Taniguchi and Thomas Ihn and Klaus Ensslin and Wei Wister Huang	(参考訳) 二層グラフェン(blg)は、2次元系における電気制御可能な量子ビットのための有望なホスト材料として出現している。特に興味深いのは、量子情報をスピン量子ビットとしてだけでなく、六角形ブラベイ格子の対称性から生じる2次元の軌道縮退、いわゆるバレー自由度でエンコードする能力である。既知のスピン混合と軌道混合のメカニズムは、バレーで機能する可能性は低い。さらに、BLGにおけるバレー州のトポロジカルな性質は、コヒーレント量子ビット操作のためのユニークな経路を約束する。ゲート定義のBLG量子ドットデバイスは、近年、高品質なスピンとバレー量子状態にアクセスするための汎用的なビルディングブロックとして確立されている。しかし、これらの量子ビットのコヒーレンス特性を最終的に制限し、実用的な量子ビットとしての適合性を制限したバレー状態の緩和時間の測定は、これまでにも残っていない。ここでは、ゲート定義二重量子ドットを含む高品質なBLGデバイスで得られたスピンおよびバレー状態の特性緩和時間を初めて測定する。谷の状態は99\,\text{%}$よりはるかに高い忠実度で区別でき、また、谷三重項と一重項の間の非常に長い緩和時間(500\,\text{ms}$=B=250\,\text{mT}$)はスピン状態の緩和時間よりも1桁長くなる。バレー状態の孤立化に対する我々のアプローチは、コヒーレント・バレー・クビット振動の測定方法を舗装し、最近実証された谷の電気的チューニング性と組み合わせることで、BLGの長寿命バレー・クビットの電気的制御のための実用的なプラットフォームを提供することができる。 Bilayer graphene (BLG) is emerging as a promising host material for electrically controllable qubits in a two-dimensional system. Of particular interest is the ability to encode quantum information not only as spin qubits but also in the so-called valley degree of freedom, a two-fold orbital degeneracy that arises from the symmetry of the hexagonal Bravais lattice. Known spin-mixing and orbital-mixing mechanisms are unlikely to be at work for valleys. Moreover, the topological nature of valley states in BLG promises unique routes for coherent qubit manipulation. Gate-defined BLG quantum-dot devices have been recently established as a versatile building block for accessing high-quality spin and valley quantum states. However, measurements of the relaxation time of valley states -- which ultimately limits these qubits' coherence properties and therefore their suitability as practical qubits -- remained so far elusive. Here we report the first measurement of the characteristic relaxation times of spin and valley states, obtained in a high-quality BLG device containing gate-defined double quantum dots. We show that valley states can be distinguished with a fidelity of well above $99\,\text{%}$ and report remarkably long relaxation time between valley triplets and singlets, exceeding $500\,\text{ms}$ at $B=250\,\text{mT}$, more than one order of magnitude longer than the relaxation times we measure for spin states. Our approach to isolating valley states paves the way to measuring coherent valley-qubit oscillations and, in combination with the recently demonstrated electrical tunability of the valley $g$-factor, could provide a practical platform for the electrical control of long-lived valley qubits in BLG.	翻訳日:2023-04-04 15:11:27 公開日:2023-04-03
# ワイルドデータベースにおける潜時フィンガープリント A Latent Fingerprint in the Wild Database ( http://arxiv.org/abs/2304.00979v1 ) ライセンス: Link先を確認	Xinwei Liu, Kiran Raja, Renfang Wang, Hong Qiu, Hucheng Wu, Dechao Sun, Qiguang Zheng, Nian Liu, Xiaoxia Wang, Gehang Huang, Raghavendra Ramachandra, Christoph Busch	(参考訳) 潜在指紋は、犯罪現場、デジタル法医学、法執行機関において、最も重要かつ広く利用されている証拠の一つである。最近の研究で報告された進歩の数にもかかわらず、独立ベンチマークやアルゴリズムを改善するための大規模評価データベースの欠如といった重大なオープン問題が不十分に解決されていることに注意する。利用可能なデータベースの大部分は、セミパブリックな性質、野生環境での買収の欠如、後処理パイプラインである。さらに、アルゴリズムの堅牢性を評価するために、実際の犯罪シーンと同様の現実的なキャプチャシナリオを表現していない。さらに、既存の潜在指紋認識用データベースは、多数のユニークなサブジェクト/指紋インスタンスを持っておらず、また、潜在指紋に対するクロス比較を行うための根拠となる真実/参照指紋画像を提供していない。本稿では,(1)光学および(2)静電容量センサからの参照指紋,(3)スマートフォンの指紋,(4)壁面からの潜在指紋,(5)ipad表面,(6)アルミニウムホイル表面からの参照指紋,という5つの異なる取得シナリオを含む,新たな野生の潜在指紋データベースを提案する。新しいデータベースは、上記のすべての設定でキャプチャされた1,318のユニークな指紋インスタンスで構成されている。この研究では、光学式および容量型センサーの2,636個の指紋、スマートフォンの1,318個の指紋、および132人の被験者の9,224個の潜伏指紋が提供された。データセットは、さまざまな年齢グループ、性別と背景の等しい表現を考慮して構築される。さらに,潜伏指紋認識研究における今後の方向性の課題を明らかにするために,様々なサブセット評価を幅広く分析する。 Latent fingerprints are among the most important and widely used evidence in crime scenes, digital forensics and law enforcement worldwide. Despite the number of advancements reported in recent works, we note that significant open issues such as independent benchmarking and lack of large-scale evaluation databases for improving the algorithms are inadequately addressed. The available databases are mostly of semi-public nature, lack of acquisition in the wild environment, and post-processing pipelines. Moreover, they do not represent a realistic capture scenario similar to real crime scenes, to benchmark the robustness of the algorithms. Further, existing databases for latent fingerprint recognition do not have a large number of unique subjects/fingerprint instances or do not provide ground truth/reference fingerprint images to conduct a cross-comparison against the latent. In this paper, we introduce a new wild large-scale latent fingerprint database that includes five different acquisition scenarios: reference fingerprints from (1) optical and (2) capacitive sensors, (3) smartphone fingerprints, latent fingerprints captured from (4) wall surface, (5) Ipad surface, and (6) aluminium foil surface. The new database consists of 1,318 unique fingerprint instances captured in all above mentioned settings. A total of 2,636 reference fingerprints from optical and capacitive sensors, 1,318 fingerphotos from smartphones, and 9,224 latent fingerprints from each of the 132 subjects were provided in this work. The dataset is constructed considering various age groups, equal representations of genders and backgrounds. In addition, we provide an extensive set of analysis of various subset evaluations to highlight open challenges for future directions in latent fingerprint recognition research.	翻訳日:2023-04-04 15:10:54 公開日:2023-04-03
# 完全配向量子センサを用いた超伝導渦の広視野定量磁気イメージング Wide-field quantitative magnetic imaging of superconducting vortices using perfectly aligned quantum sensors ( http://arxiv.org/abs/2304.01024v1 ) ライセンス: Link先を確認	Shunsuke Nishimura, Taku Kobayashi, Daichi Sasaki, Takeyuki Tsuji, Takayuki Iwasaki, Mutsuko Hatano, Kento Sasaki, and Kensuke Kobayashi	(参考訳) 超伝導渦の可視化に様々な技術が応用され、電磁応答の手がかりとなっている。ここでは, 完全に整列したダイヤモンド量子センサを用いて, 超伝導薄膜中の渦の成層場を広範囲に定量的に可視化する。センサの不均一性の影響を軽減する解析により,yba$_2$cu$_3$o$_{7-\delta}$における単一渦の磁束を,精度$\pm10~\%$で可視化する。得られた渦形状は理論モデルと一致し, 浸透深さと温度依存性は従来の研究と一致し, 精度と広い適用性が証明された。この広視野イメージングは、原理的には極端条件下でも機能し、様々な超伝導体のキャラクタリゼーションを可能にする。 Various techniques have been applied to visualize superconducting vortices, providing clues to their electromagnetic response. Here, we present a wide-field, quantitative imaging of the stray field of the vortices in a superconducting thin film using perfectly aligned diamond quantum sensors. Our analysis, which mitigates the influence of the sensor inhomogeneities, visualizes the magnetic flux of single vortices in YBa$_2$Cu$_3$O$_{7-\delta}$ with an accuracy of $\pm10~\%$. The obtained vortex shape is consistent with the theoretical model, and penetration depth and its temperature dependence agree with previous studies, proving our technique's accuracy and broad applicability. This wide-field imaging, which in principle works even under extreme conditions, allows the characterization of various superconductors.	翻訳日:2023-04-04 15:04:30 公開日:2023-04-03
# ニューラルネットワーク探索のための自己教師付き学習 Self-Supervised learning for Neural Architecture Search (NAS) ( http://arxiv.org/abs/2304.01023v1 ) ライセンス: Link先を確認	Samuel Ducros	(参考訳) このインターンシップの目的は、不正なデータ、すなわちAIが自動的に正しい結果を予測することができるデータを使用する革新的な方法を提案することである。この段階にたどり着くためには,(1) 技術の状況を調べ,それに対して自分自身を配置すること,(2) 開発経路のアイデアを思いついたこと,(3) それらのアイデアを実践すること,(4) , そして最後に, 技術の状況に対して私たち自身を配置すること, そして再びシーケンスを開始すること,といった手順を踏襲する。インターンシップの間、このシーケンスは何度か行われ、インターンシップ中に探索されたトラックを提供する。 The objective of this internship is to propose an innovative method that uses unlabelled data, i.e. data that will allow the AI to automatically learn to predict the correct outcome. To reach this stage, the steps to be followed can be defined as follows: (1) consult the state of the art and position ourself against it, (2) come up with ideas for development paths, (3) implement these ideas, (4) and finally test them to position ourself against the state of the art, and then start the sequence again. During my internship, this sequence was done several times and therefore gives the tracks explored during the internship.	翻訳日:2023-04-04 15:04:15 公開日:2023-04-03
# 言語間情報検索のためのシンプルで効果的なニューラルランク付けとリランクベースライン Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval ( http://arxiv.org/abs/2304.01019v1 ) ライセンス: Link先を確認	Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang	(参考訳) 多言語言語モデルの出現により、言語間情報検索(CLIR)への関心が復活した。しかし、急速な進歩は、手法の混乱を招き、再現性は芸術の状況に遅れを取っている。第一に,単言語検索を足場として,多段階アーキテクチャを用いた言語横断検索の異なるアプローチを組織するための概念的枠組みを提供する。第二に、ペルシア、ロシア、中国のTREC 2022 NeuCLIRトラックから収集したテストコレクションに対して、Anserini IRツールキットとPyserini IRツールキットに単純かつ効果的に再現可能なベースラインを実装した。私たちの取り組みは、TREC評価に最も効果的な実行を提出した2つのチームのコラボレーションに基づいています。これらの貢献は将来の進歩の確固たる基盤を提供する。 The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another. However, the rapid pace of progress has led to a confusing panoply of methods and reproducibility has lagged behind the state of the art. In this context, our work makes two important contributions: First, we provide a conceptual framework for organizing different approaches to cross-lingual retrieval using multi-stage architectures for mono-lingual retrieval as a scaffold. Second, we implement simple yet effective reproducible baselines in the Anserini and Pyserini IR toolkits for test collections from the TREC 2022 NeuCLIR Track, in Persian, Russian, and Chinese. Our efforts are built on a collaboration of the two teams that submitted the most effective runs to the TREC evaluation. These contributions provide a firm foundation for future advances.	翻訳日:2023-04-04 15:04:01 公開日:2023-04-03
# ディープフェイクテキストの検出における個人およびチームに基づくヒューマンファクターの理解 Understanding Individual and Team-based Human Factors in Detecting Deepfake Texts ( http://arxiv.org/abs/2304.01002v1 ) ライセンス: Link先を確認	Adaku Uchendu, Jooyoung Lee, Hua Shen, Thai Le, Ting-Hao 'Kenneth' Huang, Dongwon Lee	(参考訳) 近年、AIにおける自然言語生成(NLG)技術(T5、GPT-3、ChatGPT)は大幅に改善され、人間のような長いコヒーレントテキストを大規模に生成できるようになり、いわゆるディープフェイクテキストを生み出している。この進歩は、その利益にもかかわらず、セキュリティとプライバシの問題(例えば、盗作、アイデンティティの難読化、偽情報攻撃)を引き起こす可能性がある。そのため、人文テキストとディープフェイクテキストを区別するために、効果的で実用的でスケーラブルなソリューションを開発することが重要になっている。この課題に向けて、本研究では、人間がディープフェイクテキストを識別する方法に、スキルレベルやコラボレーションなどの要因がどう影響するかを調査し、(1)協調チームが個人よりもディープフェイクテキストをよりよく検出できるか、という3つの研究課題を研究する。 2) 専門家は非専門家よりもディープフェイクテキストを検出できるのか? (3)人間の検出性能を最大化する要因は何か。我々は,(1) amazon mechanical turk (amt) 上の非専門家の人間または非同期のチーム,(2)専門家の人間または同期のチーム,という2つのプラットフォーム上でこれらの質問を実装した。 By analyzing the detection performance and the factors that affected performance, some of our key findings are: (1) expert humans detect deepfake texts significantly better than non-expert humans, (2) synchronous teams on the Upwork detect deepfake texts significantly better than individuals, while asynchronous teams on the AMT detect deepfake texts weakly better than individuals, and (3) among various error categories, examining coherence and consistency in texts is useful in detecting deepfake texts. 結論として,我々の研究は,ディープフェイクテキストの協調的人間検出を改善するための,今後のツールやフレームワークの設計に影響を及ぼす可能性がある。 In recent years, Natural Language Generation (NLG) techniques in AI (e.g., T5, GPT-3, ChatGPT) have shown a massive improvement and are now capable of generating human-like long coherent texts at scale, yielding so-called deepfake texts. This advancement, despite their benefits, can also cause security and privacy issues (e.g., plagiarism, identity obfuscation, disinformation attack). As such, it has become critically important to develop effective, practical, and scalable solutions to differentiate deepfake texts from human-written texts. Toward this challenge, in this work, we investigate how factors such as skill levels and collaborations impact how humans identify deepfake texts, studying three research questions: (1) do collaborative teams detect deepfake texts better than individuals? (2) do expert humans detect deepfake texts better than non-expert humans? (3) what are the factors that maximize the detection performance of humans? We implement these questions on two platforms: (1) non-expert humans or asynchronous teams on Amazon Mechanical Turk (AMT) and (2) expert humans or synchronous teams on the Upwork. By analyzing the detection performance and the factors that affected performance, some of our key findings are: (1) expert humans detect deepfake texts significantly better than non-expert humans, (2) synchronous teams on the Upwork detect deepfake texts significantly better than individuals, while asynchronous teams on the AMT detect deepfake texts weakly better than individuals, and (3) among various error categories, examining coherence and consistency in texts is useful in detecting deepfake texts. In conclusion, our work could inform the design of future tools/framework to improve collaborative human detection of deepfake texts.	翻訳日:2023-04-04 15:01:48 公開日:2023-04-03
# VoxelFormer:多視点3Dオブジェクト検出のためのデュアルビューアテンションに基づく鳥の視点特徴生成 VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection ( http://arxiv.org/abs/2304.01054v1 ) ライセンス: Link先を確認	Zhuoling Li, Chuanrui Zhang, Wei-Chiu Ma, Yipin Zhou, Linyan Huang, Haoqian Wang, SerNam Lim, Hengshuang Zhao	(参考訳) 近年,変圧器を用いた検出器は2次元視覚知覚タスクにおいて顕著な性能を示した。しかし、多視点3Dオブジェクト検出におけるそれらの性能は、畳み込みニューラルネットワークに基づく検出器の最先端(SOTA)よりも劣っている。本研究では,バードアイビュー(BEV)機能生成の観点から,この問題を考察する。具体的には,変換器をベースとしたSOTA,BEVFormerが採用するBEV特徴生成手法について検討し,その2つの限界を同定する。 (i)bevからのみ注意重みを発生させるため、監視のためのライダーポイントの使用を妨げ、 (II)デフォルマブルサンプリングによりカメラビュー機能をBEVに集約し、少数の機能のみを選択し、すべての情報を利用することができない。これらの制約を克服するため、BEVとカメラの両方から注目重みを生成する新しいBEV特徴生成手法、デュアルビューアテンションを提案する。この方法は、すべてのカメラ機能をBEV機能にエンコードする。デュアルビューとBEVFormerアーキテクチャを組み合わせることで、VoxelFormerという新しい検出器を構築する。 nuScenesベンチマークで大規模な実験を行い、デュアルビューアテンションとVoxelForerの優位性を検証する。トレーニング中に3エンコーダと1つの歴史的なフレームを採用するだけで、VoxelFormerは依然としてBEVFormerよりも大幅に優れています。同じ環境でのトレーニングでは、VoxelFormerはBEVFormerを4.9% NDSポイント上回ることができる。コードはhttps://github.com/lizhuoling/voxelformer-public.gitで入手できる。 In recent years, transformer-based detectors have demonstrated remarkable performance in 2D visual perception tasks. However, their performance in multi-view 3D object detection remains inferior to the state-of-the-art (SOTA) of convolutional neural network based detectors. In this work, we investigate this issue from the perspective of bird's-eye-view (BEV) feature generation. Specifically, we examine the BEV feature generation method employed by the transformer-based SOTA, BEVFormer, and identify its two limitations: (i) it only generates attention weights from BEV, which precludes the use of lidar points for supervision, and (ii) it aggregates camera view features to the BEV through deformable sampling, which only selects a small subset of features and fails to exploit all information. To overcome these limitations, we propose a novel BEV feature generation method, dual-view attention, which generates attention weights from both the BEV and camera view. This method encodes all camera features into the BEV feature. By combining dual-view attention with the BEVFormer architecture, we build a new detector named VoxelFormer. Extensive experiments are conducted on the nuScenes benchmark to verify the superiority of dual-view attention and VoxelForer. We observe that even only adopting 3 encoders and 1 historical frame during training, VoxelFormer still outperforms BEVFormer significantly. When trained in the same setting, VoxelFormer can surpass BEVFormer by 4.9% NDS point. Code is available at: https://github.com/Lizhuoling/VoxelFormer-public.git.	翻訳日:2023-04-04 14:54:19 公開日:2023-04-03
# ViT-DAE: 組織像解析のためのトランスフォーマー駆動拡散オートエンコーダ ViT-DAE: Transformer-driven Diffusion Autoencoder for Histopathology Image Analysis ( http://arxiv.org/abs/2304.01053v1 ) ライセンス: Link先を確認	Xuan Xu, Saarthak Kapse, Rajarsi Gupta, Prateek Prasanna	(参考訳) 生成aiは、元のデータソースによく似たデータを合成する能力によって、近年かなりの注目を集めている。 generative adversarial networks (gans) は病理組織学的画像解析に革新的なアプローチを提供してきたが、モード崩壊や判別器の過剰フィットといった限界に苦しめられている。近年,ノイズ拡散モデルがコンピュータビジョンにおいて有望な結果を示している。これらのモデルはトレーニング中に優れた安定性を示し、分散カバレッジが向上し、高品質な多様な画像を生成する。さらに、ノイズや摂動に対する高い弾力性を示しており、画像は一般的に人工物を含み、染色のかなりのバリエーションを示すデジタル病理学での使用に適している。本稿では,視覚変換器(ViT)と拡散オートエンコーダを統合し,高品質な病理画像合成を行う新しいアプローチであるViT-DAEを提案する。 vitが計算病理学の拡散オートエンコーダに導入されたのはこれが初めてであり、このモデルが組織病理画像の複雑で複雑な詳細をよりよく捉えることができる。公開されている3つのデータセットに対するViT-DAEの有効性を示す。提案手法は, 実写画像生成におけるGAN法とバニラDAE法より優れている。 Generative AI has received substantial attention in recent years due to its ability to synthesize data that closely resembles the original data source. While Generative Adversarial Networks (GANs) have provided innovative approaches for histopathological image analysis, they suffer from limitations such as mode collapse and overfitting in discriminator. Recently, Denoising Diffusion models have demonstrated promising results in computer vision. These models exhibit superior stability during training, better distribution coverage, and produce high-quality diverse images. Additionally, they display a high degree of resilience to noise and perturbations, making them well-suited for use in digital pathology, where images commonly contain artifacts and exhibit significant variations in staining. In this paper, we present a novel approach, namely ViT-DAE, which integrates vision transformers (ViT) and diffusion autoencoders for high-quality histopathology image synthesis. This marks the first time that ViT has been introduced to diffusion autoencoders in computational pathology, allowing the model to better capture the complex and intricate details of histopathology images. We demonstrate the effectiveness of ViT-DAE on three publicly available datasets. Our approach outperforms recent GAN-based and vanilla DAE methods in generating realistic images.	翻訳日:2023-04-04 14:53:57 公開日:2023-04-03
# 光ボックス電位における準1次元ボースガスの最適制御 Optimal control of quasi-1D Bose gases in optical box potentials ( http://arxiv.org/abs/2304.01051v1 ) ライセンス: Link先を確認	Andreas Deutschmann-Olek and Katharina Schrom and Nikolaus W\"urkner and J\"org Schmiedmayer and Sebastian Erne and Andreas Kugi	(参考訳) 本稿では, 最適制御手法を用いて, 高伸長ポテンシャルに閉じ込められた擬似-1次元ボースガスの操作について検討する。気体の有効な平均場ダイナミクスは、1次元の非多項schr\"odinger方程式によって説明できる。我々は Winckel と Borzi (2008) による Gross-Pitaevskii 方程式の間接最適制御法を拡張し、状態およびエネルギーコスト汎関数に必要な最適条件を得る。このアプローチは、最適条件を数値的に解くことにより、準1Dボースガスを(光学的)ボックスポテンシャルで最適に圧縮し、いわゆる近距離断熱性を見つけるために適用される。提案手法の挙動を解析し,還元基底関数を用いた簡単な直接最適化手法と比較した。シミュレーションの結果,提案手法の有効性が示された。 In this paper, we investigate the manipulation of quasi-1D Bose gases that are trapped in a highly elongated potential by optimal control methods. The effective meanfield dynamics of the gas can be described by a one-dimensional non-polynomial Schr\"odinger equation. We extend the indirect optimal control method for the Gross-Pitaevskii equation by Winckel and Borzi (2008) to obtain necessary optimality conditions for state and energy cost functionals. This approach is then applied to optimally compress a quasi-1D Bose gase in an (optical) box potential, i.e., to find a so-called short-cut to adiabaticity, by solving the optimality conditions numerically. The behavior of the proposed method is finally analyzed and compared to simple direct optimization strategies using reduced basis functions. Simulations results demonstrate the feasibility of the proposed approach.	翻訳日:2023-04-04 14:53:36 公開日:2023-04-03
# Polytuplet Loss: 可読性学習と論理的推論モデルへの逆アプローチ Polytuplet Loss: A Reverse Approach to Training Reading Comprehension and Logical Reasoning Models ( http://arxiv.org/abs/2304.01046v1 ) ライセンス: Link先を確認	Jeffrey Lu, Ivan Rodriguez	(参考訳) 授業中、生徒は理解力と論理的推論力でテストされる。学生はこうした試験を修了するための様々な戦略を開発しており、その一部は一般に他よりも優れていると考えられている。そのような戦略の1つは、絶対的精度よりも相対的精度を強調することであり、理論的には問題の解答に必要な情報を完全に知ることなく正しい解を生成できる。本稿では,この戦略を転校学習モデルの学習に応用し,読解と論理推論の問題を解く効果について検討する。モデルは、難読性理解と論理的推論ベンチマークであるreclorデータセットで評価された。これまでの研究は論理推論のスキルを対象としていたが,一般的なトレーニング方法とモデルアーキテクチャに注目した。本稿では,三重項損失関数の拡張であるポリタップレット損失関数を提案する。その結果,ポリタプレット損失モデルの方が既存のベースラインモデルより優れていることがわかった。ポリタプレット損失は他のコントラスト損失関数の代替として有望なものであるが、その利点を定量化するためにさらなる研究が必要である。 Throughout schooling, students are tested on reading comprehension and logical reasoning. Students have developed various strategies for completing such exams, some of which are generally thought to outperform others. One such strategy involves emphasizing relative accuracy over absolute accuracy and can theoretically produce the correct answer without full knowledge of the information required to solve the question. This paper examines the effectiveness of applying such a strategy to train transfer learning models to solve reading comprehension and logical reasoning questions. The models were evaluated on the ReClor dataset, a challenging reading comprehension and logical reasoning benchmark. While previous studies targeted logical reasoning skills, we focus on a general training method and model architecture. We propose the polytuplet loss function, an extension of the triplet loss function, to ensure prioritization of learning the relative correctness of answer choices over learning the true accuracy of each choice. Our results indicate that models employing polytuplet loss outperform existing baseline models. Although polytuplet loss is a promising alternative to other contrastive loss functions, further research is required to quantify the benefits it may present.	翻訳日:2023-04-04 14:53:19 公開日:2023-04-03
# DivClust: ディープクラスタリングにおける多様性の制御 DivClust: Controlling Diversity in Deep Clustering ( http://arxiv.org/abs/2304.01042v1 ) ライセンス: Link先を確認	Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, Ioannis Patras	(参考訳) クラスタリングは機械学習の分野で主要な研究トピックであり、最近Deep Learningが大きな成功を収めた。しかしながら、既存のディープクラスタリング手法では対処されないクラスタリングの側面は、所定のデータセットに対して、効率的に複数の多様なパーティションを生成することである。これは特に重要であり、コンセンサスクラスタリングには多様なベースクラスタリングが必要であり、単一のクラスタリングに依存するよりも、より良く、より堅牢な結果を生み出すことが判明している。このギャップに対処するために、既存のディープクラスタリングフレームワークに組み込むことが可能な多様性制御損失であるdivclustを提案する。複数のデータセットと深いクラスタリングフレームワークで実験を行い、それを示しています。 a) 計算コストが極めて小さいフレームワークやデータセットの多様性を効果的に制御する手法。 b) DivClustが学んだクラスタリングの集合には、単一クラスタリングベースラインを著しく上回るソリューションが含まれており、 c) 既成のコンセンサスクラスタリングアルゴリズムを用いて、DivClustは、単一クラスタリングベースラインを一貫して上回り、ベースとなるディープクラスタリングフレームワークの性能を効果的に向上するコンセンサスクラスタリングソリューションを生成する。 Clustering has been a major research topic in the field of machine learning, one to which Deep Learning has recently been applied with significant success. However, an aspect of clustering that is not addressed by existing deep clustering methods, is that of efficiently producing multiple, diverse partitionings for a given dataset. This is particularly important, as a diverse set of base clusterings are necessary for consensus clustering, which has been found to produce better and more robust results than relying on a single clustering. To address this gap, we propose DivClust, a diversity controlling loss that can be incorporated into existing deep clustering frameworks to produce multiple clusterings with the desired degree of diversity. We conduct experiments with multiple datasets and deep clustering frameworks and show that: a) our method effectively controls diversity across frameworks and datasets with very small additional computational cost, b) the sets of clusterings learned by DivClust include solutions that significantly outperform single-clustering baselines, and c) using an off-the-shelf consensus clustering algorithm, DivClust produces consensus clustering solutions that consistently outperform single-clustering baselines, effectively improving the performance of the base deep clustering framework.	翻訳日:2023-04-04 14:53:02 公開日:2023-04-03
# 非エルミート超可積分系 Non-Hermitian superintegrable systems ( http://arxiv.org/abs/2304.01039v1 ) ライセンス: Link先を確認	Francisco Correa, Luis Inzunza, Ian Marquette	(参考訳) マースデン-ワインスタイン還元法の非エルミート一般化は、n 次元球面 $s^n$ 上の量子 $\mathcal{pt}$-symmetric superintegrable モデルの族を構築するために導入された。このメカニズムは、それぞれ$u(2)$と$u(3)$ Lie代数に関連する1次元および2次元の例で説明され、実スペクトルを持つ新しい量子モデルと自発な$\mathcal{PT}$-対称破壊を与える。ある極限において、モデルは既知の非エルミート系と以前に研究された実超可積分系の複素拡張に還元される。 A non-Hermitian generalisation of the Marsden--Weinstein reduction method is introduced to construct families of quantum $\mathcal{PT}$-symmetric superintegrable models over an $n$-dimensional sphere $S^n$. The mechanism is illustrated with one- and two-dimensional examples, related to $u(2)$ and $u(3)$ Lie algebras respectively, providing new quantum models with real spectra and spontaneous $\mathcal{PT}$-symmetric breaking. In certain limits, the models reduce to known non-Hermitian systems and complex extensions of previously studied real superintegrable systems.	翻訳日:2023-04-04 14:52:41 公開日:2023-04-03
# 臨界1+1Dアベリアン・ヒッグス模型のスペクトル特性 Spectral properties of critical 1+1D Abelian-Higgs model ( http://arxiv.org/abs/2304.01030v1 ) ライセンス: Link先を確認	Titas Chanda, Marcello Dalmonte, Maciej Lewenstein, Jakub Zakrzewski, Luca Tagliacozzo	(参考訳) 1+1d におけるゲージ対称性の存在は、動的ゲージボソンの存在を意味するものではないため冗長であることが知られている。その結果、連続体において、光子と相互作用するボソニック物質の理論は、高次元ヒッグスとクーロン相が非摂動効果によって連結されるため、単一の位相を持つ。しかし, [phys. rev. lett. 18, 090601 (2022)] で発表された最近の研究により, 格子上で系を離散化した場合の予期せぬ相転移が明らかになった。この遷移は中心電荷が$c=3/2$である共形場理論によって記述される。本稿では、この$c=3/2$理論の2つの成分、すなわち自由マヨラナフェルミオンおよびボゾン成分を平衡および外平衡スペクトル分析によって特徴づけることを目的とする。 The presence of gauge symmetry in 1+1D is known to be redundant, since it does not imply the existence of dynamical gauge bosons. As a consequence, in the continuum, the Abelian-Higgs model, the theory of bosonic matter interacting with photons, just possesses a single phase, as the higher dimensional Higgs and Coulomb phases are connected via non-perturbative effects. However, recent research published in [Phys. Rev. Lett. 128, 090601 (2022)] has revealed an unexpected phase transition when the system is discretized on the lattice. This transition is described by a conformal field theory with a central charge of $c=3/2$. In this paper, we aim to characterize the two components of this $c=3/2$ theory -- namely the free Majorana fermionic and bosonic parts -- through equilibrium and out-of-equilibrium spectral analyses.	翻訳日:2023-04-04 14:52:31 公開日:2023-04-03
# 知識蒸留による作物分別領域の一般化 Domain Generalization for Crop Segmentation with Knowledge Distillation ( http://arxiv.org/abs/2304.01029v1 ) ライセンス: Link先を確認	Simone Angarano, Mauro Martini, Alessandro Navone, Marcello Chiaberge	(参考訳) 近年、精密農業は、フィールドマネジメントに関わるすべての活動をサポートするために、自動化プロセスに近づいた農業を徐々に方向付けている。サービスロボティクスは、監視、噴霧、収穫といった人間の介入なしにフィールドをナビゲートできる自律エージェントを配置することで、この進化において主要な役割を果たす。これらの正確な行動を実行するには、移動ロボットは周囲を理解し、野生のターゲットを特定するリアルタイム認識システムが必要である。新しい作物や環境条件への一般化は、ラベル付きサンプルがほとんど利用できないため、実用化には不可欠である。本稿では,作物の分節化の問題を調査し,知識蒸留によるドメインの一般化を促進する新しい手法を提案する。提案フレームワークでは,ソースドメイン上で個別に訓練されたモデルの集合から,未知のターゲットドメインに適応可能な学生モデルへ知識を伝達する。そこで本研究では,5万種以上の植物を対象とし,異なる地形様式,気象条件,光シナリオをカバーする作物区分のための多領域合成データセットを提案する。我々は最先端手法よりも優れた性能を示す。このアプローチは作物の分節化におけるドメインの一般化に有望な解決策を提供し、精密な農業応用を促進する可能性を秘めている。 In recent years, precision agriculture has gradually oriented farming closer to automation processes to support all the activities related to field management. Service robotics plays a predominant role in this evolution by deploying autonomous agents that can navigate fields while performing tasks without human intervention, such as monitoring, spraying, and harvesting. To execute these precise actions, mobile robots need a real-time perception system that understands their surroundings and identifies their targets in the wild. Generalizing to new crops and environmental conditions is critical for practical applications, as labeled samples are rarely available. In this paper, we investigate the problem of crop segmentation and propose a novel approach to enhance domain generalization using knowledge distillation. In the proposed framework, we transfer knowledge from an ensemble of models individually trained on source domains to a student model that can adapt to unseen target domains. To evaluate the proposed method, we present a synthetic multi-domain dataset for crop segmentation containing plants of variegate shapes and covering different terrain styles, weather conditions, and light scenarios for more than 50,000 samples. We demonstrate significant improvements in performance over state-of-the-art methods. Our approach provides a promising solution for domain generalization in crop segmentation and has the potential to enhance precision agriculture applications.	翻訳日:2023-04-04 14:52:15 公開日:2023-04-03
# ゲージ場理論におけるベル-CHSH不等式のBRST不変式 BRST invariant formulation of the Bell-CHSH inequality in gauge field theories ( http://arxiv.org/abs/2304.01028v1 ) ライセンス: Link先を確認	David Dudal, Philipe De Fabritiis, Marcelo S. Guimaraes, Giovani Peruzzo, Silvio P. Sorella	(参考訳) ゲージ場理論におけるベル-CHSHの不等式について述べる。フォック空間におけるBRST電荷コホモロジーの九五大島解析を用いて、ベル-CHSH不等式は明らかにBRST不変の方法で定式化される。自由四次元マックスウェル理論とアベリアン・ヒッグス模型の例は精査されている。不等式はBRST不変の圧縮状態を用いて探索され、Tsirelson境界に近い大きなベル-CHSH不等式違反を可能にする。量子力学における2つの1/2$スピン粒子の絡み合った状態と比較した。 A study of the Bell-CHSH inequality in gauge field theories is presented. By using the Kugo-Ojima analysis of the BRST charge cohomology in Fock space, the Bell-CHSH inequality is formulated in a manifestly BRST invariant way. The examples of the free four-dimensional Maxwell theory and the Abelian Higgs model are scrutinized. The inequality is probed by using BRST invariant squeezed states, allowing for large Bell-CHSH inequality violations, close to Tsirelson's bound. An illustrative comparison with the entangled state of two $1/2$ spin particles in Quantum Mechanics is provided.	翻訳日:2023-04-04 14:51:53 公開日:2023-04-03
# 人工ニューラルネットワークと時系列数:非線形INGARCHモデルの一類 Artificial neural networks and time series of counts: A class of nonlinear INGARCH models ( http://arxiv.org/abs/2304.01025v1 ) ライセンス: Link先を確認	Malte Jahn	(参考訳) 条件付きヘテロスケダスティック性(INGARCH)を持つ一般化整数値自己回帰モデルを用いて、時系列のカウントを頻繁に解析する。これらのモデルは応答関数を用いて過去の観測ベクトルと過去の条件予測を現在の観測の条件予測にマッピングする。本稿では,INGARCHモデルと人工ニューラルネットワーク(ANN)の応答関数を組み合わせることで,非線形INGARCHモデルのクラスを得る方法について述べる。 ANNフレームワークは、対応するニューラルモデルの退化バージョンとして、既存のINGARCHモデルの解釈を可能にする。最大確率推定、限界効果、信頼区間の詳細が与えられる。有界数と非有界数の時系列の実証分析により、ニューラルINGARCHモデルは、情報損失の観点から、合理的に退化した競合モデルより優れていることが示された。 Time series of counts are frequently analyzed using generalized integer-valued autoregressive models with conditional heteroskedasticity (INGARCH). These models employ response functions to map a vector of past observations and past conditional expectations to the conditional expectation of the present observation. In this paper, it is shown how INGARCH models can be combined with artificial neural network (ANN) response functions to obtain a class of nonlinear INGARCH models. The ANN framework allows for the interpretation of many existing INGARCH models as a degenerate version of a corresponding neural model. Details on maximum likelihood estimation, marginal effects and confidence intervals are given. The empirical analysis of time series of bounded and unbounded counts reveals that the neural INGARCH models are able to outperform reasonable degenerate competitor models in terms of the information loss.	翻訳日:2023-04-04 14:51:42 公開日:2023-04-03
# 光:統合型マルチタスク学習ネットワークによる衛星画像からの個別建物抽出と高さ推定 LIGHT: Joint Individual Building Extraction and Height Estimation from Satellite Images through a Unified Multitask Learning Network ( http://arxiv.org/abs/2304.01090v1 ) ライセンス: Link先を確認	Yongqiang Mao, Xian Sun, Xingliang Huang, Kaiqiang Chen	(参考訳) ビルの抽出と高さ推定はリモートセンシング画像解釈における2つの重要な基本課題であり、都市計画、現実世界の3D構築、その他の分野で広く利用されている。現存する研究のほとんどは、この2つの課題を独立した研究とみなしている。したがって、高さ情報は、建物の抽出精度を向上させるために完全には利用できない。本研究では,建物の高さマップ,境界ボックス,セグメンテーションマスクマップを同時に出力する統合マルチタスク学習ネットワーク(LIGHT)を用いて,IndividuaL buIlding extract と heiGHt Estimation を初めて組み合わせた。具体的には、LIGHTはインスタンスセグメンテーションブランチと高さ推定ブランチで構成される。特に,マルチスケール機能ブランチを効果的に統一し,ブランチ間の機能を緩和するために,ブランチ間の機能インタラクションを効率的に行うGCTI (Gated Cross Task Interaction) モジュールを提案する。 DFC2023データセットの実験では、LIGHTは優れた性能を達成でき、ResNet101をバックボーンとしたGCTIモジュールは、それぞれ2.8%のAP50と6.5%のデルタ1でマルチタスク学習の性能を大幅に向上させることができる。 Building extraction and height estimation are two important basic tasks in remote sensing image interpretation, which are widely used in urban planning, real-world 3D construction, and other fields. Most of the existing research regards the two tasks as independent studies. Therefore the height information cannot be fully used to improve the accuracy of building extraction and vice versa. In this work, we combine the individuaL buIlding extraction and heiGHt estimation through a unified multiTask learning network (LIGHT) for the first time, which simultaneously outputs a height map, bounding boxes, and a segmentation mask map of buildings. Specifically, LIGHT consists of an instance segmentation branch and a height estimation branch. In particular, so as to effectively unify multi-scale feature branches and alleviate feature spans between branches, we propose a Gated Cross Task Interaction (GCTI) module that can efficiently perform feature interaction between branches. Experiments on the DFC2023 dataset show that our LIGHT can achieve superior performance, and our GCTI module with ResNet101 as the backbone can significantly improve the performance of multitask learning by 2.8% AP50 and 6.5% delta1, respectively.	翻訳日:2023-04-04 14:45:51 公開日:2023-04-03
# RPTQ:大規模言語モデルのためのリオーダーベースポストトレーニング量子化 RPTQ: Reorder-based Post-training Quantization for Large Language Models ( http://arxiv.org/abs/2304.01089v1 ) ライセンス: Link先を確認	Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu	(参考訳) 大規模言語モデル(llm)は様々なタスクにおいて優れた性能を示しているが、そのデプロイは、その巨大なモデルサイズのために困難をもたらす。本稿では,LCMの量子化における主な課題は,外乱の問題だけでなく,チャネル間のアクティベーション範囲の違いによるものであることを確認し,LCMのアクティベーションの定量化の問題に対処する,新しいリオーダーベースの量子化手法であるRTPQを提案する。 RPTQはアクティベーション中のチャネルを並べ替え、クラスタ内でそれらを定量化することで、チャネルの範囲差の影響を低減する。さらに,明示的な順序変更を回避し,ストレージと計算オーバーヘッドを削減する。このアプローチを実装することで,LLMモデルを3ビットアクティベーションに初めてプッシュすることで,大きなブレークスルーを達成した。 Large-scale language models (LLMs) have demonstrated outstanding performance on various tasks, but their deployment poses challenges due to their enormous model size. In this paper, we identify that the main challenge in quantizing LLMs stems from the different activation ranges between the channels, rather than just the issue of outliers.We propose a novel reorder-based quantization approach, RPTQ, that addresses the issue of quantizing the activations of LLMs. RPTQ rearranges the channels in the activations and then quantizing them in clusters, thereby reducing the impact of range difference of channels. In addition, we reduce the storage and computation overhead by avoiding explicit reordering. By implementing this approach, we achieved a significant breakthrough by pushing LLM models to 3 bit activation for the first time.	翻訳日:2023-04-04 14:45:13 公開日:2023-04-03
# 自己構築型ニューラルネットワーク Self-building Neural Networks ( http://arxiv.org/abs/2304.01086v1 ) ライセンス: Link先を確認	Andrea Ferigo, Giovanni Iacca	(参考訳) 生命の前半では、シナプト形成と呼ばれる過程を通じて学習しながら脳が発達する。ニューロンは互いに成長し相互作用し、シナプスを形成する。しかし、最終的には脳はシナプスを吐き出す。従来の研究は学習とプルーニングを独立に重視していたが、本研究では、ヘビアン学習とプルーニングの組み合わせにより、シナプト生成過程をシミュレートすることを目的として、生物学的に妥当なモデルを提案する。このようにして、タスクの解き方を学習しながら、エージェントはその経験を特定のネットワーク構造に変換する。すなわち、ネットワーク構造はタスクの実行中に自身を構築する。このアプローチを自己構築ニューラルネットワーク(SBNN)と呼ぶ。提案したSBNNと従来のニューラルネットワーク(NN)をOpenAIの3つの古典的な制御タスクと比較する。その結果,我々のモデルは従来のNNよりも性能がよいことがわかった。また,本モデルでは, NNよりも, 刈り込み速度を増大させながら, 性能劣化が小さいことが観察された。最後に,検証テストを実施し,学習段階では認識できないタスクでモデルをテストする。このケースでは、sbnnが従来のnnよりも新しいタスクに適応できることが示されている。 During the first part of life, the brain develops while it learns through a process called synaptogenesis. The neurons, growing and interacting with each other, create synapses. However, eventually the brain prunes those synapses. While previous work focused on learning and pruning independently, in this work we propose a biologically plausible model that, thanks to a combination of Hebbian learning and pruning, aims to simulate the synaptogenesis process. In this way, while learning how to solve the task, the agent translates its experience into a particular network structure. Namely, the network structure builds itself during the execution of the task. We call this approach Self-building Neural Network (SBNN). We compare our proposed SBNN with traditional neural networks (NNs) over three classical control tasks from OpenAI. The results show that our model performs generally better than traditional NNs. Moreover, we observe that the performance decay while increasing the pruning rate is smaller in our model than with NNs. Finally, we perform a validation test, testing the models over tasks unseen during the learning phase. In this case, the results show that SBNNs can adapt to new tasks better than the traditional NNs, especially when over $80\%$ of the weights are pruned.	翻訳日:2023-04-04 14:44:49 公開日:2023-04-03
# ソースデータのない非教師なし肺結節の検出 Unsupervised Cross-domain Pulmonary Nodule Detection without Source Data ( http://arxiv.org/abs/2304.01085v1 ) ライセンス: Link先を確認	Rui Xu, Yong Luo, Bo Du	(参考訳) クロスドメイン肺結節検出は、ソースとターゲットドメイン間のデータ分布の大きなシフトにより、性能劣化に悩まされる。また、医療データアノテーションのコストが高いことから、対象画像がラベル付けされていないと仮定されることが多い。既存のアプローチは、この教師なしドメイン適応設定に大きく進歩した。しかし、プライバシー上の懸念から、ソース医療データがアクセスできないことが多いため、医療アプリケーションでこの設定が有効なことは滅多にない。そこで本研究では,肺結節検出(SUP)のためのソースフリーな非教師なしクロスドメイン手法を提案する。まず、インスタンスレベルのコントラスト学習を利用して、ソースモデルをターゲットドメインに適応させる。そして、適応モデルを教師と学生のインタラクション方法で訓練し、さらに精度を向上させるために重み付きエントロピー損失を組み込む。トレーニング済みのソースモデルを3つの一般的な肺結節データセットに適用することにより,本手法の有効性を実証した。 Cross domain pulmonary nodule detection suffers from performance degradation due to large shift of data distributions between the source and target domain. Besides, considering the high cost of medical data annotation, it is often assumed that the target images are unlabeled. Existing approaches have made much progress for this unsupervised domain adaptation setting. However, this setting is still rarely plausible in the medical application since the source medical data are often not accessible due to the privacy concerns. This motivates us to propose a Source-free Unsupervised cross-domain method for Pulmonary nodule detection (SUP). It first adapts the source model to the target domain by utilizing instance-level contrastive learning. Then the adapted model is trained in a teacher-student interaction manner, and a weighted entropy loss is incorporated to further improve the accuracy. Extensive experiments by adapting a pre-trained source model to three popular pulmonary nodule datasets demonstrate the effectiveness of our method.	翻訳日:2023-04-04 14:44:29 公開日:2023-04-03
# 大きな言語モデルの推論ロジックは、シンボリックな概念に分解できるだろうか? Can the Inference Logic of Large Language Models be Disentangled into Symbolic Concepts? ( http://arxiv.org/abs/2304.01083v1 ) ライセンス: Link先を確認	Wen Shen, Lei Cheng, Yuxiao Yang, Mingjie Li, Quanshi Zhang	(参考訳) 本稿では,大言語モデル(llms)の推論論理を記号的概念の集合として説明する。最近の多くの研究で、伝統的なDNNは、通常スパースシンボルの概念を符号化している。しかし、llm は従来の dnn よりも多くのパラメータを持つため、llm がスパースシンボリック概念を符号化するかどうかはまだ未解決の問題である。そこで本研究では,対話タスクのためのLLMの推論スコアを,少数の記号的概念に分解することを提案する。入力文の任意のマスキング状態に対して,これらの疎い概念を用いて LLM のすべての推測スコアを適切に推定できることを検証する。また、LLMで符号化された概念の転送可能性を評価し、シンボリックな概念が類似の入力文間で高い転送性を示すことを検証する。より重要なことに、これらの象徴的な概念は、LLMの予測エラーの原因となる正確な理由を説明するために使用できる。 In this paper, we explain the inference logic of large language models (LLMs) as a set of symbolic concepts. Many recent studies have discovered that traditional DNNs usually encode sparse symbolic concepts. However, because an LLM has much more parameters than traditional DNNs, whether the LLM also encodes sparse symbolic concepts is still an open problem. Therefore, in this paper, we propose to disentangle the inference score of LLMs for dialogue tasks into a small number of symbolic concepts. We verify that we can use those sparse concepts to well estimate all inference scores of the LLM on all arbitrarily masking states of the input sentence. We also evaluate the transferability of concepts encoded by an LLM and verify that symbolic concepts usually exhibit high transferability across similar input sentences. More crucially, those symbolic concepts can be used to explain the exact reasons accountable for the LLM's prediction errors.	翻訳日:2023-04-04 14:44:16 公開日:2023-04-03
# fmgnn:融合多様体グラフニューラルネットワーク FMGNN: Fused Manifold Graph Neural Network ( http://arxiv.org/abs/2304.01081v1 ) ライセンス: Link先を確認	Cheng Deng, Fan Xu, Jiaxing Ding, Luoyi Fu, Weinan Zhang, Xinbing Wang	(参考訳) グラフ表現学習は様々なグラフタスクにおいて広く研究され、効果を示している。既存のほとんどの作品はユークリッド空間にグラフデータを埋め込んでいるが、近年の作品は埋め込みモデルを双曲空間や球面空間に拡張し、階層構造や環構造のような複雑な構造を持つグラフの性能を向上させる。異なる多様体からの埋め込みを融合することは、異なるグラフ構造上の埋め込み能力をさらに活用することができる。しかし、既存の埋め込み融合法は、異なる多様体上の同じ頂点の埋め込みの相互作用や調整を考慮せずに、出力埋め込みの連結や総和に焦点を当てており、最終的な融合結果に歪みや印象をもたらす可能性がある。さらに、異なる座標系から同じ頂点の埋め込みを融合させることも困難である。これらの課題に直面して,これらの多様体間の相互作用とアライメントを伴う異なるリーマン多様体にグラフを埋め込み,頂点と選択されたランドマークの間の異なる多様体上の距離を通して頂点埋め込みを融合する,新しいgnnアーキテクチャであるfmgnnを提案する。実験により,FMGNNはノード分類とリンク予測タスクのベンチマークにおいて,強いベースラインよりも優れた性能が得られることが示された。 Graph representation learning has been widely studied and demonstrated effectiveness in various graph tasks. Most existing works embed graph data in the Euclidean space, while recent works extend the embedding models to hyperbolic or spherical spaces to achieve better performance on graphs with complex structures, such as hierarchical or ring structures. Fusing the embedding from different manifolds can further take advantage of the embedding capabilities over different graph structures. However, existing embedding fusion methods mostly focus on concatenating or summing up the output embeddings, without considering interacting and aligning the embeddings of the same vertices on different manifolds, which can lead to distortion and impression in the final fusion results. Besides, it is also challenging to fuse the embeddings of the same vertices from different coordinate systems. In face of these challenges, we propose the Fused Manifold Graph Neural Network (FMGNN), a novel GNN architecture that embeds graphs into different Riemannian manifolds with interaction and alignment among these manifolds during training and fuses the vertex embeddings through the distances on different manifolds between vertices and selected landmarks, geometric coresets. Our experiments demonstrate that FMGNN yields superior performance over strong baselines on the benchmarks of node classification and link prediction tasks.	翻訳日:2023-04-04 14:44:00 公開日:2023-04-03
# 線形相補性プログラミングを用いた時系列の等角予測領域 Conformal Prediction Regions for Time Series using Linear Complementarity Programming ( http://arxiv.org/abs/2304.01075v1 ) ライセンス: Link先を確認	Matthew Cleaveland, Insup Lee, George J. Pappas, Lars Lindemann	(参考訳) コンフォーマル予測は、高い確率で有効な機械学習モデルの予測領域を生成する統計ツールである。しかし、時系列データに共形予測を適用すると、保守的な予測領域が生じる。実際、信頼度1-\delta$でT$以上の予測領域を得るには、 {previous works requires each individual prediction region is valid} with confidence $1-\delta/T$。学習可能な時系列予測器を使用する場合,この保守性を低減する最適化手法を提案する。複数の時間ステップで予測誤差を個別に考慮する代わりに、パラメータ化された予測誤差をパラメータ化する。追加データセット上でパラメータを最適化することにより、保守的でない予測領域を見つける。この問題を混合整数線形相補性プログラム (MILCP) としてキャストし, 線形相補性プログラム (LCP) に緩和することを示した。さらに、緩和されたLPは元のMILCPと同じ最適コストであることを示す。最後に,歩行者軌道予測器を用いたケーススタディにおいて,本手法の有効性を示す。 Conformal prediction is a statistical tool for producing prediction regions of machine learning models that are valid with high probability. However, applying conformal prediction to time series data leads to conservative prediction regions. In fact, to obtain prediction regions over $T$ time steps with confidence $1-\delta$, {previous works require that each individual prediction region is valid} with confidence $1-\delta/T$. We propose an optimization-based method for reducing this conservatism to enable long horizon planning and verification when using learning-enabled time series predictors. Instead of considering prediction errors individually at each time step, we consider a parameterized prediction error over multiple time steps. By optimizing the parameters over an additional dataset, we find prediction regions that are not conservative. We show that this problem can be cast as a mixed integer linear complementarity program (MILCP), which we then relax into a linear complementarity program (LCP). Additionally, we prove that the relaxed LP has the same optimal cost as the original MILCP. Finally, we demonstrate the efficacy of our method on a case study using pedestrian trajectory predictors.	翻訳日:2023-04-04 14:43:35 公開日:2023-04-03
# 節の絡み合い, 理論を探る例 Entanglement of Sections, Examples Looking for a Theory ( http://arxiv.org/abs/2304.01072v1 ) ライセンス: Link先を確認	M. H. Freedman and M. B. Hastings	(参考訳) 量子情報は状態の絡み合いに関するものである。この出発点にパラメータを追加し、単一の状態がバンドルの非バナッシングセクションとなるようにします。例を通してセクションの絡み合いのパターンを考察する。 Quantum information is about the entanglement of states. To this starting point we add parameters whereby a single state becomes a non-vanishing section of a bundle. We consider through examples the possible entanglement patterns of sections.	翻訳日:2023-04-04 14:43:17 公開日:2023-04-03
# HyperThumbnail: レート歪み最適化によるリアルタイム6Kイメージ再スケーリング HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization ( http://arxiv.org/abs/2304.01064v1 ) ライセンス: Link先を確認	Chenyang Qi, Xin Yang, Ka Leong Cheng, Ying-Cong Chen, Qifeng Chen	(参考訳) 現代の画像再構成は、HR画像再構成のための埋め込み情報を含む低解像度(LR)サムネイル画像に高解像度(HR)画像を埋め込むことを目的としている。従来の超解像とは異なり、LRサムネイルに埋め込まれた情報から、元の画像に忠実な高忠実なHR画像復元を可能にする。しかし、最先端画像再スケーリング手法では、lr画像ファイルサイズを最適化せず、超高解像度(例えば6k)画像再構成のリアルタイム性能を低下させる。これら2つの課題に対処するために、リアルタイム6Kレート歪み認識画像再スケーリングのための新しいフレームワーク(HyperThumbnail)を提案する。提案する量子化予測モジュールにより,まずHR画像のJPEG LRサムネイルへの埋め込みを行い,HR再構成の品質を最大化しながら,埋め込みLR JPEGサムネイルのファイルサイズを最小化する。そして、効率的な周波数認識復号器は、LR1から高忠実度HR画像をリアルタイムに再構成する。広範な実験により,従来の画像再スケーリングベースラインよりも性能が優れており,リアルタイムに6k画像再構成が可能となった。 Contemporary image rescaling aims at embedding a high-resolution (HR) image into a low-resolution (LR) thumbnail image that contains embedded information for HR image reconstruction. Unlike traditional image super-resolution, this enables high-fidelity HR image restoration faithful to the original one, given the embedded information in the LR thumbnail. However, state-of-the-art image rescaling methods do not optimize the LR image file size for efficient sharing and fall short of real-time performance for ultra-high-resolution (e.g., 6K) image reconstruction. To address these two challenges, we propose a novel framework (HyperThumbnail) for real-time 6K rate-distortion-aware image rescaling. Our framework first embeds an HR image into a JPEG LR thumbnail by an encoder with our proposed quantization prediction module, which minimizes the file size of the embedding LR JPEG thumbnail while maximizing HR reconstruction quality. Then, an efficient frequency-aware decoder reconstructs a high-fidelity HR image from the LR one in real time. Extensive experiments demonstrate that our framework outperforms previous image rescaling baselines in rate-distortion performance and can perform 6K image reconstruction in real time.	翻訳日:2023-04-04 14:43:13 公開日:2023-04-03
# 多層平均場ネットワークによる深度分離 Depth Separation with Multilayer Mean-Field Networks ( http://arxiv.org/abs/2304.01063v1 ) ライセンス: Link先を確認	Yunwei Ren, Mo Zhou, Rong Ge	(参考訳) 深層ネットワークが浅層ネットワークよりも強力な理由である深層分離は、ディープラーニング理論において大きな問題となっている。以前の結果はしばしば表現力にフォーカスする。例えば、arXiv:1904.06984は3層ネットワークを使って簡単に近似できる関数を構築したが、2層ネットワークでは近似できない。本稿では,arxiv:1904.06984によって構築された関数を,多項式数のニューロンを効率的に持つ超パラメータネットワークを用いて学習することができる。その結果、平均場限界を多層ネットワークに拡張する新しい方法と、無限幅平均場ネットワークの離散化によって生じる誤差を要因とする損失の分解に依拠する。 Depth separation -- why a deeper network is more powerful than a shallower one -- has been a major problem in deep learning theory. Previous results often focus on representation power. For example, arXiv:1904.06984 constructed a function that is easy to approximate using a 3-layer network but not approximable by any 2-layer network. In this paper, we show that this separation is in fact algorithmic: one can learn the function constructed by arXiv:1904.06984 using an overparameterized network with polynomially many neurons efficiently. Our result relies on a new way of extending the mean-field limit to multilayer networks, and a decomposition of loss that factors out the error introduced by the discretization of infinite-width mean-field networks.	翻訳日:2023-04-04 14:42:52 公開日:2023-04-03
# テキスト教師付き意味セグメンテーションによる空間的一貫性のあるグループ化 Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation ( http://arxiv.org/abs/2304.01114v1 ) ライセンス: Link先を確認	Yabo Zhang, Zihao Wang, Jun Hao Liew, Jingjia Huang, Manyu Zhu, Jiashi Feng, Wangmeng Zuo	(参考訳) 本研究では,画像-文対の学習を通してのみ意味的セグメンテーションを行う。アノテーションが不足しているため、既存のテキスト管理手法では、ピクセル非感性フィードバックによってイメージをセマンティック領域にグループ化することしか学べない。その結果、グループ化された結果は粗く、しばしば小さなスプリアス領域を含んでおり、セグメンテーションの上限性能を制限している。一方,自己教師モデルによるグループ化の結果は,より意味的に一貫性があり,既存の手法のボトルネックを解消している。そこで本研究では,テキスト教師付きセマンティックセマンティックセグメンテーションを用いた自己教師付き空間一貫性グループを提案する。部分的グループ化の結果を考えると、2つのコア設計による画像レベルから領域レベル認識へのテキスト教師付きモデルの適用をさらに進める。まず,一方向の名詞と地域間の対比損失による微粒なアライメントを奨励し,不一致な名詞と地域間のペアを減らす。第2に、すべてのグループ領域の同時認識を可能にするために、コンテキスト対応マスキング戦略を採用する。空間的に一貫性のあるグループ化と領域適応認識を併用して,パスカルVOCおよびパスカルコンテキストのベンチマークにおいて59.2% mIoUと32.4% mIoUを達成し,最先端の手法をはるかに上回っている。 In this work, we investigate performing semantic segmentation solely through the training on image-sentence pairs. Due to the lack of dense annotations, existing text-supervised methods can only learn to group an image into semantic regions via pixel-insensitive feedback. As a result, their grouped results are coarse and often contain small spurious regions, limiting the upper-bound performance of segmentation. On the other hand, we observe that grouped results from self-supervised models are more semantically consistent and break the bottleneck of existing methods. Motivated by this, we introduce associate self-supervised spatially-consistent grouping with text-supervised semantic segmentation. Considering the part-like grouped results, we further adapt a text-supervised model from image-level to region-level recognition with two core designs. First, we encourage fine-grained alignment with a one-way noun-to-region contrastive loss, which reduces the mismatched noun-region pairs. Second, we adopt a contextually aware masking strategy to enable simultaneous recognition of all grouped regions. Coupled with spatially-consistent grouping and region-adapted recognition, our method achieves 59.2% mIoU and 32.4% mIoU on Pascal VOC and Pascal Context benchmarks, significantly surpassing the state-of-the-art methods.	翻訳日:2023-04-04 14:36:18 公開日:2023-04-03
# mcmcにおける神経制御変動の理論的保証 Theoretical guarantees for neural control variates in MCMC ( http://arxiv.org/abs/2304.01111v1 ) ライセンス: Link先を確認	Denis Belomestny, Artur Goldman, Alexey Naumov, Sergey Samsonov	(参考訳) 本稿では,加法制御変数に基づくマルコフ連鎖の分散低減手法と,漸近的分散に対する適切な推定値の最小化を提案する。制御変数がディープニューラルネットワークとして表現される場合、特に注目する。マルコフ連鎖上の様々なエルゴード性仮定の下での漸近分散の最適収束率を導出する。提案手法は分散還元アルゴリズムと関数近似理論の確率的誤差に関する最近の結果に依拠する。 In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance. We focus on the particular case when control variates are represented as deep neural networks. We derive the optimal convergence rate of the asymptotic variance under various ergodicity assumptions on the underlying Markov chain. The proposed approach relies upon recent results on the stochastic errors of variance reduction algorithms and function approximation theory.	翻訳日:2023-04-04 14:35:54 公開日:2023-04-03
# AutoLabel: オープンセットビデオドメイン適応のためのCLIPベースのフレームワーク AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation ( http://arxiv.org/abs/2304.01110v1 ) ライセンス: Link先を確認	Giacomo Zara, Subhankar Roy, Paolo Rota, Elisa Ricci	(参考訳) open-set unsupervised video domain adaptation (ouvda) は、ラベル付きソースドメインから、ターゲットに存在するがソースに存在しない"ターゲット-プライベート"カテゴリを含むラベル付きターゲットドメインへのアクション認識モデルを適用するタスクを扱う。本研究は、事前学習された言語と視覚モデル(CLIP)の使用を提案することにより、特定のオープンセット分類器や重み付けされた対人学習を訓練する以前の作業から逸脱する。 CLIPは、リッチな表現とゼロショット認識機能のために、OUVDAに適している。しかし、CLIPのゼロショットプロトコルでターゲットプライベートなインスタンスを拒否するには、ターゲットプライベートなラベル名に関するオラクルの知識が必要である。本稿では,ラベル名の知識の欠如を回避するために,オブジェクト中心の合成候補クラス名を自動的に発見・生成するAutoLabelを提案する。その単純さにもかかわらず、AutoLabelを装備したCLIPは、ターゲットプライベートなインスタンスを十分に拒否できるため、2つのドメインの共有クラス間のアライメントがより容易になる。コードは利用可能です。 Open-set Unsupervised Video Domain Adaptation (OUVDA) deals with the task of adapting an action recognition model from a labelled source domain to an unlabelled target domain that contains "target-private" categories, which are present in the target but absent in the source. In this work we deviate from the prior work of training a specialized open-set classifier or weighted adversarial learning by proposing to use pre-trained Language and Vision Models (CLIP). The CLIP is well suited for OUVDA due to its rich representation and the zero-shot recognition capabilities. However, rejecting target-private instances with the CLIP's zero-shot protocol requires oracle knowledge about the target-private label names. To circumvent the impossibility of the knowledge of label names, we propose AutoLabel that automatically discovers and generates object-centric compositional candidate target-private class names. Despite its simplicity, we show that CLIP when equipped with AutoLabel can satisfactorily reject the target-private instances, thereby facilitating better alignment between the shared classes of the two domains. The code is available.	翻訳日:2023-04-04 14:35:48 公開日:2023-04-03
# 偶然の世代 Coincidental Generation ( http://arxiv.org/abs/2304.01108v1 ) ライセンス: Link先を確認	Jordan W. Suchow and Necdet G\"urkan	(参考訳) Generative AI models are emerging as a versatile tool across diverse industries with applications in synthetic data generation computational art personalization of products and services and immersive entertainment Here we introduce a new privacy concern in the adoption and use of generative AI models that of coincidental generation Coincidental generation occurs when a models output inadvertently bears a likeness to a realworld entity Consider for example synthetic portrait generators which are today deployed in commercial applications such as virtual modeling agencies and synthetic stock photography We argue that the low intrinsic dimensionality of human face perception implies that every synthetically generated face will coincidentally resemble an actual person all but guaranteeing a privacy violation in the form of a misappropriation of likeness. Generative AI models are emerging as a versatile tool across diverse industries with applications in synthetic data generation computational art personalization of products and services and immersive entertainment Here we introduce a new privacy concern in the adoption and use of generative AI models that of coincidental generation Coincidental generation occurs when a models output inadvertently bears a likeness to a realworld entity Consider for example synthetic portrait generators which are today deployed in commercial applications such as virtual modeling agencies and synthetic stock photography We argue that the low intrinsic dimensionality of human face perception implies that every synthetically generated face will coincidentally resemble an actual person all but guaranteeing a privacy violation in the form of a misappropriation of likeness.	翻訳日:2023-04-04 14:35:26 公開日:2023-04-03
# crossword: マスキングによるデータ圧縮への意味的アプローチ Crossword: A Semantic Approach to Data Compression via Masking ( http://arxiv.org/abs/2304.01106v1 ) ライセンス: Link先を確認	Mingxiao Li, Rui Jin, Liyao Xiang, Kaiming Shen, Shuguang Cui	(参考訳) データ圧縮の伝統的な手法は、典型的には記号レベルの統計に基づいており、情報ソースは確率変数や確率過程の長いシーケンスとしてモデル化され、損失のない圧縮のエントロピーや損失のない圧縮の相互情報として基本的な限界を確立する。しかし、現実世界のソース(テキスト、音楽、音声を含む)は、人間の知覚と密接な関係があるため、統計的に定義できないことが多いため、モデル駆動のアプローチはかなり最適ではない。本研究は英語テキストに注意を集中させ,その意味的側面を利用して圧縮効率をさらに高める。主なアイデアはパズルのクロスワードに由来するもので、いくつかのキー文字が提供される限り、隠された単語を正確に再構築することができる。提案手法は上記のゲームに類似している。簡単に言えば、エンコーダは意味的損失に応じて各単語の意味的重要性を評価し、その後、マイナーな単語をマスキングし、デコーダは意味的文脈から意味的文脈でマスクされた単語を復元する。実験により,提案手法はhuffman codeやutf-8 codeのような従来の手法に比べて圧縮効率が向上すると同時に,目的とするテキストの意味をかなり保持できることを示した。 The traditional methods for data compression are typically based on the symbol-level statistics, with the information source modeled as a long sequence of i.i.d. random variables or a stochastic process, thus establishing the fundamental limit as entropy for lossless compression and as mutual information for lossy compression. However, the source (including text, music, and speech) in the real world is often statistically ill-defined because of its close connection to human perception, and thus the model-driven approach can be quite suboptimal. This study places careful emphasis on English text and exploits its semantic aspect to enhance the compression efficiency further. The main idea stems from the puzzle crossword, observing that the hidden words can still be precisely reconstructed so long as some key letters are provided. The proposed masking-based strategy resembles the above game. In a nutshell, the encoder evaluates the semantic importance of each word according to the semantic loss and then masks the minor ones, while the decoder aims to recover the masked words from the semantic context by means of the Transformer. Our experiments show that the proposed semantic approach can achieve much higher compression efficiency than the traditional methods such as Huffman code and UTF-8 code, while preserving the meaning in the target text to a great extent.	翻訳日:2023-04-04 14:35:15 公開日:2023-04-03
# RunBugRun - プログラムの自動修復のための実行可能なデータセット RunBugRun -- An Executable Dataset for Automated Program Repair ( http://arxiv.org/abs/2304.01102v1 ) ライセンス: Link先を確認	Julian Aron Prenner and Romain Robbes	(参考訳) 近年、APR(Automated Program repair)において、特にディープニューラルネットワークへのデータ駆動技術への移行が注目されている。これは数十万、あるいは数百万の実行不能なコードフラグメントのトレーニングを伴います。我々は、ニューラルプログラム修復(NPR)でしばしば無視されるコードの側面、すなわちその実行にもっと注意を向けたいと思います。コード実行にはいくつかの大きな利点がある。候補修正をテストベースで評価することができ、修復を支援する貴重な情報を提供することができる。本研究では,8つの異なるプログラミング言語で書かれたプログラム競合サイトに提出された,45万個の小さなバグ/修正プログラムペアの完全な実行データセットを示す。データセットとともに、プログラムをコンパイル、安全に実行、テストするためのインフラストラクチャと、きめ細かいバグタイプのラベルを提供します。参照点を与えるため,提案手法は2つのベースラインに対して,1つは生成と検証に基づく評価結果であり,もう1つは深層学習に関する評価結果である。このデータセットでは、完全な静的コード表現を超えて、ニューラルプログラムの修復を強化し、実行ベースの機能の使用を促進し、いくつかの異なる言語を含めることで、現在のAPRデータセットとベンチマークの状況において、Javaの優位性と相反する、いくつかの目標を達成したいと考えています。 Recently, we can notice a transition to data-driven techniques in Automated Program Repair (APR), in particular towards deep neural networks. This entails training on hundreds of thousands or even millions of non-executable code fragments. We would like to bring more attention to an aspect of code often neglected in Neural Program Repair (NPR), namely its execution. Code execution has several significant advantages. It allows for test-based evaluation of candidate fixes and can provide valuable information to aid repair. In this work we present a fully executable dataset of 450,000 small buggy/fixed program pairs originally submitted to programming competition websites written in eight different programming languages. Along with the dataset we provide infrastructure to compile, safely execute and test programs as well as fine-grained bug-type labels. To give a point of reference, we provide basic evaluation results for two baselines, one based on a generate-and-validate approach and one on deep learning. With this dataset we follow several goals: we want to lift Neural Program Repair beyond fully static code representations, foster the use of execution-based features and, by including several different languages, counterbalance the predominance of Java in the current landscape of APR datasets and benchmarks.	翻訳日:2023-04-04 14:34:50 公開日:2023-04-03
# Dsfer-Net:近代ホップフィールドネットワークを用いたバイテンポラル変化検出のための深層スーパービジョンと特徴検索ネットワーク Dsfer-Net: A Deep Supervision and Feature Retrieval Network for Bitemporal Change Detection Using Modern Hopfield Networks ( http://arxiv.org/abs/2304.01101v1 ) ライセンス: Link先を確認	Shizhen Chang, Michael Kopp, Pedram Ghamisi	(参考訳) 高解像度リモートセンシング画像への重要な応用として,地表面の変化の監視と解析が目的である。高解像度リモートセンシングデータの量の増加とテクスチャの特徴の複雑さにより、多くの定量的深層学習手法が提案されている。これらの手法は,深部特徴抽出と空間時空間情報の組み合わせによって従来の変化検出手法を上回っているが,検出性能向上における深い特徴の作用についての合理的な説明はいまだに欠けている。本研究では,現代のホップフィールドネットワーク層がセマンティック理解においてかなりの性能を発揮することを示す。本稿では,バイテンポラル変化検出のためのDeep Supervision and feature Retrieval Network (Dsfer-Net)を提案する。具体的には,完全畳み込み型シャムネットワークを用いて,バイテンポラル画像の高度に代表される深い特徴を抽出する。バイテンポラル画像の逐次的地理情報に基づいて特徴検索モジュールを設計し,特徴特徴を抽出し,識別情報を深く教師された方法で活用する。また,教師付き特徴検索モジュールは,提案するネットワークの深い層における意味的理解について説明可能な証明を与える。最後に、このエンドツーエンドネットワークは、異なるレイヤから取得した特徴と特徴ペアを集約することで、新しいフレームワークを実現する。 3つの公開データセット(LEVIR-CD、WHU-CD、CDD)で実施された実験は、提案したDsfer-Netが他の最先端手法よりも優れていることを確認した。コードはオンラインで入手できる(https://github.com/ShizhenChang/Dsfer-Net)。 Change detection, as an important application for high-resolution remote sensing images, aims to monitor and analyze changes in the land surface over time. With the rapid growth in the quantity of high-resolution remote sensing data and the complexity of texture features, a number of quantitative deep learning-based methods have been proposed. Although these methods outperform traditional change detection methods by extracting deep features and combining spatial-temporal information, reasonable explanations about how deep features work on improving the detection performance are still lacking. In our investigations, we find that modern Hopfield network layers achieve considerable performance in semantic understandings. In this paper, we propose a Deep Supervision and FEature Retrieval network (Dsfer-Net) for bitemporal change detection. Specifically, the highly representative deep features of bitemporal images are jointly extracted through a fully convolutional Siamese network. Based on the sequential geo-information of the bitemporal images, we then design a feature retrieval module to retrieve the difference feature and leverage discriminative information in a deeply supervised manner. We also note that the deeply supervised feature retrieval module gives explainable proofs about the semantic understandings of the proposed network in its deep layers. Finally, this end-to-end network achieves a novel framework by aggregating the retrieved features and feature pairs from different layers. Experiments conducted on three public datasets (LEVIR-CD, WHU-CD, and CDD) confirm the superiority of the proposed Dsfer-Net over other state-of-the-art methods. Code will be available online (https://github.com/ShizhenChang/Dsfer-Net).	翻訳日:2023-04-04 14:34:29 公開日:2023-04-03
# doctorglm:中国の医師の微調整はハーキュリアンの仕事ではない DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task ( http://arxiv.org/abs/2304.01097v1 ) ライセンス: Link先を確認	Honglin Xiong, Sheng Wang, Yitao Zhu, Zihao Zhao, Yuxiao Liu, Qian Wang, Dinggang Shen	(参考訳) chatgptやgpt-4を含む大規模言語モデル(llm)の最近の進歩は、人間の指示に対する理解と応答において顕著である。にもかかわらず、これらのモデルは英語でよく機能し、医学領域で明示的に訓練されていないため、診断、医薬品の推奨、その他の医療アドバイスにおいて最適でない精度をもたらす。加えて、対話モデルの訓練と展開は、まだ病院にとって不可能であると考えられており、LLMの推進を妨げる。これらの課題に対処するため,我々はchatgptの助けを借りて,中国語の医療対話データベースを収集し,容易に展開できるllmの訓練手法をいくつか採用した。注目すべきは、ChatGLM-6Bを1台のA100 80Gで13時間で微調整できたことです。 DoctorGLMは現在、様々な誤りを含む初期段階のエンジニアリングの試みである。私たちは、医療に焦点を当てた機能を改善するためのフィードバックや提案を広くコミュニティと共有しています。 The recent progress of large language models (LLMs), including ChatGPT and GPT-4, in comprehending and responding to human instructions has been remarkable. Nevertheless, these models typically perform better in English and have not been explicitly trained for the medical domain, resulting in suboptimal precision in diagnoses, drug recommendations, and other medical advice. Additionally, training and deploying a dialogue model is still believed to be impossible for hospitals, hindering the promotion of LLMs. To tackle these challenges, we have collected databases of medical dialogues in Chinese with ChatGPT's help and adopted several techniques to train an easy-deploy LLM. Remarkably, we were able to fine-tune the ChatGLM-6B on a single A100 80G in 13 hours, which means having a healthcare-purpose LLM can be very affordable. DoctorGLM is currently an early-stage engineering attempt and contain various mistakes. We are sharing it with the broader community to invite feedback and suggestions to improve its healthcare-focused capabilities: https://github.com/xionghonglin/DoctorGLM.	翻訳日:2023-04-04 14:34:02 公開日:2023-04-03
# 忍における人間行動の緩和に向けたニューラルネットワークの進化III : 忍者師範の復帰 Evolving Artificial Neural Networks To Imitate Human Behaviour In Shinobi III : Return of the Ninja Master ( http://arxiv.org/abs/2304.01096v1 ) ライセンス: Link先を確認	Maximilien Le Clei	(参考訳) 私たちの社会はますます計算ツールが好きだ。この現象は、新しい人工知能パラダイムの出現に続いて、過去10年間で大きく増加している。具体的には、Deep Neural NetworksとStochastic Gradient Descentという2つのアルゴリズム技術の結合によって、計算能力が指数関数的に増加し、多くの現代技術において主要な資産となり続けている。しかし、進歩が進むにつれ、他の方法がこのような様々なハードウェアの進歩の恩恵を享受できるかどうか疑問視する向きもある。この研究をさらに進めるために、我々はこの論文を進化的アルゴリズムとその動的ニューラルネットワークへの応用に当てはめている。強力な計算資源を活用しながら新たな手法を考案することで、様々なベンチマークで強力なパフォーマンスを持つエージェントを開発できるだけでなく、ゲームshinobi iiiの人間と非常によく似たエージェントを開発できることがわかった。 Our society is increasingly fond of computational tools. This phenomenon has greatly increased over the past decade following, among other factors, the emergence of a new Artificial Intelligence paradigm. Specifically, the coupling of two algorithmic techniques, Deep Neural Networks and Stochastic Gradient Descent, thrusted by an exponentially increasing computing capacity, has and is continuing to become a major asset in many modern technologies. However, as progress takes its course, some still wonder whether other methods could similarly or even more greatly benefit from these various hardware advances. In order to further this study, we delve in this thesis into Evolutionary Algorithms and their application to Dynamic Neural Networks, two techniques which despite enjoying many advantageous properties have yet to find their niche in contemporary Artificial Intelligence. We find that by elaborating new methods while exploiting strong computational resources, it becomes possible to develop strongly performing agents on a variety of benchmarks but also some other agents behaving very similarly to human subjects on the video game Shinobi III : Return of The Ninja Master, typical complex tasks previously out of reach for non-gradient-based optimization.	翻訳日:2023-04-04 14:33:45 公開日:2023-04-03
# キャプションの変更:リモートセンシングによる変更キャプションのための注意ネットワーク Changes to Captions: An Attentive Network for Remote Sensing Change Captioning ( http://arxiv.org/abs/2304.01091v1 ) ライセンス: Link先を確認	Shizhen Chang and Pedram Ghamisi	(参考訳) 近年,自然言語処理(NLP)技術を用いたリモートセンシング画像の直接学習と解析に注目が集まっている。多時期リモートセンシング画像における変化を正確に記述する能力は,地理空間の理解や土地計画においてますます重要になっている。自然画像変化キャプションタスクとは異なり、リモートセンシング変化キャプションは、照明、季節効果、複雑な土地被覆など、さまざまな要因に関わらず、最も重要な変化を捉えることを目的としている。本研究では,リモートセンシング画像の変化を正確に記述することの重要性を強調し,自然画像と合成画像とリモートセンシング画像における変化キャプションタスクの比較を行う。正確なキャプション生成の課題に対処するため,両時間リモートセンシング画像に対して,Chg2Capと呼ばれる注意的変更対キャプションネットワークを提案する。ネットワークは3つの主要コンポーネントから構成される。 1) 画像ペアごとに高レベル表現を収集するシームズCNNに基づく特徴抽出器 2 画像埋め込みを生成するための変更関連特徴の特定のための階層的自己注意ブロック及び残留ブロックを含む注意的復号器 3) 画像埋め込みと記述への単語埋め込みの関係をデコードするトランスベースのキャプション生成装置。提案するChg2Capネットワークを2つの代表的なリモートセンシングデータセットで評価し,総合的な実験分析を行った。コードと事前訓練されたモデルはhttps://github.com/ShizhenChang/Chg2Cap.comからオンラインで入手できる。 In recent years, advanced research has focused on the direct learning and analysis of remote sensing images using natural language processing (NLP) techniques. The ability to accurately describe changes occurring in multi-temporal remote sensing images is becoming increasingly important for geospatial understanding and land planning. Unlike natural image change captioning tasks, remote sensing change captioning aims to capture the most significant changes, irrespective of various influential factors such as illumination, seasonal effects, and complex land covers. In this study, we highlight the significance of accurately describing changes in remote sensing images and present a comparison of the change captioning task for natural and synthetic images and remote sensing images. To address the challenge of generating accurate captions, we propose an attentive changes-to-captions network, called Chg2Cap for short, for bi-temporal remote sensing images. The network comprises three main components: 1) a Siamese CNN-based feature extractor to collect high-level representations for each image pair; 2) an attentive decoder that includes a hierarchical self-attention block to locate change-related features and a residual block to generate the image embedding; and 3) a transformer-based caption generator to decode the relationship between the image embedding and the word embedding into a description. The proposed Chg2Cap network is evaluated on two representative remote sensing datasets, and a comprehensive experimental analysis is provided. The code and pre-trained models will be available online at https://github.com/ShizhenChang/Chg2Cap.	翻訳日:2023-04-04 14:33:25 公開日:2023-04-03
# 非可積分非負超マーチンガールに対する拡張ヴィルの不等式 The extended Ville's inequality for nonintegrable nonnegative supermartingales ( http://arxiv.org/abs/2304.01163v1 ) ライセンス: Link先を確認	Hongjian Wang and Aaditya Ramdas	(参考訳) ロビンズの最初の研究に続いて、積分性も有限性も必要とせず、非負超行列の延長理論を厳格に提示する。特に、ロビンズによって予見された重要な極大不等式が導出され、これは拡張ヴィルの不等式と呼ばれ、古典ヴィルの不等式(可積分な非負のスーパーマーチンガール)を強化し、また我々の非可積分な設定にも適用される。我々は混合法の拡張を導出し、拡張された非負超行列の$\sigma$-finite混合に適用する。非パラメトリックな信頼シーケンスの導出における不適切な混合(プライアー)や(拡張された)e-プロセスの使用など、シーケンシャルな統計に対する我々の理論のいくつかの意味を示す。 Following initial work by Robbins, we rigorously present an extended theory of nonnegative supermartingales, requiring neither integrability nor finiteness. In particular, we derive a key maximal inequality foreshadowed by Robbins, which we call the extended Ville's inequality, that strengthens the classical Ville's inequality (for integrable nonnegative supermartingales), and also applies to our nonintegrable setting. We derive an extension of the method of mixtures, which applies to $\sigma$-finite mixtures of our extended nonnegative supermartingales. We present some implications of our theory for sequential statistics, such as the use of improper mixtures (priors) in deriving nonparametric confidence sequences and (extended) e-processes.	翻訳日:2023-04-04 14:26:57 公開日:2023-04-03
# 確率的ミラー降下は敵の遅延攻撃に弱いか? 交通割り当てのレジリエンスに関する研究 Is Stochastic Mirror Descent Vulnerable to Adversarial Delay Attacks? A Traffic Assignment Resilience Study ( http://arxiv.org/abs/2304.01161v1 ) ライセンス: Link先を確認	Yunian Pan, Tao Li, and Quanyan Zhu	(参考訳) Intextit{Intelligent Navigation Systems} (INS) は、データ収集プロセス中に、INSとトランスポートネットワークの間の通信チャネルをインターセプトする情報攻撃ベクトルの増加に晒される。 insの弾力性を測定するために、ウォードロップ非平衡解 (wardrop non-equilibrium solution,wanes) という概念を用いる。集中度引数を用いることで、任意の有界なフィードバック遅延攻撃は、遅延ミラー降下(dmd)オンライン学習フレームワーク内のトラフィックフローの軌跡に沿って、$\tilde{\mathcal{o}}(\sqrt{{{d^3}{t^{-1}}})のオーダーまで体系的なパフォーマンスを低下させるだけであることが判明した。この性能低下は軽度の仮定だけで起こり得る。以上の結果から,学習ベースのinsインフラストラクチャは,情報構造に一定期間の混乱が生じても,ウォードロップ非平衡を実現できることが示唆された。これらの発見は、輸送エコシステムの異なる層にまたがる妨害攻撃に対する防御メカニズムを設計するための貴重な洞察を提供する。 \textit{Intelligent Navigation Systems} (INS) are exposed to an increasing number of informational attack vectors, which often intercept through the communication channels between the INS and the transportation network during the data collecting process. To measure the resilience of INS, we use the concept of a Wardrop Non-Equilibrium Solution (WANES), which is characterized by the probabilistic outcome of learning within a bounded number of interactions. By using concentration arguments, we have discovered that any bounded feedback delaying attack only degrades the systematic performance up to order $\tilde{\mathcal{O}}(\sqrt{{d^3}{T^{-1}}})$ along the traffic flow trajectory within the Delayed Mirror Descent (DMD) online-learning framework. This degradation in performance can occur with only mild assumptions imposed. Our result implies that learning-based INS infrastructures can achieve Wardrop Non-equilibrium even when experiencing a certain period of disruption in the information structure. These findings provide valuable insights for designing defense mechanisms against possible jamming attacks across different layers of the transportation ecosystem.	翻訳日:2023-04-04 14:26:39 公開日:2023-04-03
# DribbleBot: 野生での動的レッグ操作 DribbleBot: Dynamic Legged Manipulation in the Wild ( http://arxiv.org/abs/2304.01159v1 ) ライセンス: Link先を確認	Yandong Ji, Gabriel B. Margolis, Pulkit Agrawal	(参考訳) ドリブルボット(DribbleBot、Dexterous Ball Manipulation with a Legged Robot)は、人間と同じ現実の条件下でサッカーボールをドリブルできるロボットシステムである。我々は,強化学習を用いたシミュレーションにおけるトレーニング政策のパラダイムを採用し,それらを現実世界に移す。異なる地形における変動球運動の計算における批判的課題を克服し,オンボード・コンピューティングの制約下でボディマウントカメラを用いて球を知覚する。以上の結果から,現在の四足歩行プラットフォームは,同時移動と感覚観察から直接操作を含む動的全身制御問題の研究に適していることを示す。 DribbleBot (Dexterous Ball Manipulation with a Legged Robot) is a legged robotic system that can dribble a soccer ball under the same real-world conditions as humans (i.e., in-the-wild). We adopt the paradigm of training policies in simulation using reinforcement learning and transferring them into the real world. We overcome critical challenges of accounting for variable ball motion dynamics on different terrains and perceiving the ball using body-mounted cameras under the constraints of onboard computing. Our results provide evidence that current quadruped platforms are well-suited for studying dynamic whole-body control problems involving simultaneous locomotion and manipulation directly from sensory observations.	翻訳日:2023-04-04 14:26:19 公開日:2023-04-03
# 測定の超循環系 Hypercyclic systems of measurements ( http://arxiv.org/abs/2304.01155v1 ) ライセンス: Link先を確認	Victor H. Cervantes and Ehtibar N. Dzhafarov	(参考訳) 循環系は量子力学の基礎、特に文脈分析において主要な役割を担ってきた。現在までには、文脈性とその性質の異なる尺度を含む、外乱のない循環系に関する本質的に完全な理論が存在する。本稿では, 循環システムのクラスを一般化し, 構造的特性のいくつかを保存した新しい種類の測定システムを紹介する。これらの超循環系の理論的および実験的解析は、文脈性の理論の発展に有用であることが示唆される。 Cyclic systems have played a dominant role in the foundations of quantum mechanics, especially in contextuality analysis. By now we have an essentially complete theory of the cyclic systems, both without and with disturbance, including different measures of contextuality and their properties. In this concept paper we introduce a new class of systems of measurements, one that generalizes the class of cyclic systems while preserving some of their structural characteristics. We suggest that theoretical and experimental analysis of these hypercyclic systems may prove to be beneficial in developing theories of contextuality.	翻訳日:2023-04-04 14:26:07 公開日:2023-04-03
# 空間ネットワークのための代数的および幾何学的モデル Algebraic and Geometric Models for Space Networking ( http://arxiv.org/abs/2304.01150v1 ) ライセンス: Link先を確認	William Bernardoni, Robert Cardona, Jacob Cleveland, Justin Curry, Robert Green, Brian Heller, Alan Hylton, Tung Lam, Robert Kassouf-Short	(参考訳) 本稿では,ネットワーク空間通信における代数的および幾何学的視点を紹介する。我々の主な貢献は、実数直線 P(R) の部分集合の値を持つ行列の項で定義される時間変化グラフ(TVG)の新たな定義である。我々は、P(R) の半環特性を利用して、行列乗算と切り離されたクリーネ星を用いたテレビGにおけるマルチホップ通信をモデル化する。これにより、無作為に選択されたSTARLINK衛星の大規模なサンプルに対して、ライフタイムカーブと呼ばれるTVGの通信能力に関する新たな統計が生み出される。トポロジカルデータ解析(TDA)にインスパイアされた新しい指標を用いて,STARLINKの大規模サブサンプルが時間的に強く連結されている場合の判定を行う。地球と火星の間のネットワークシナリオをより良くモデル化するために,伝播遅延をモデル化できる様々なセミリングと,保存・フォワードなどの遅延耐性ネットワーク(DTN)に共通するプロトコルを導入する。最後に,異なる宇宙ネットワークの実現に向けたzigzagの持続性の適用可能性を示し,k-nearest neighbors (knn) 分類による時変トポロジーのみを用いた地球・月衛星の識別の有効性を示す。 In this paper we introduce some new algebraic and geometric perspectives on networked space communications. Our main contribution is a novel definition of a time-varying graph (TVG), defined in terms of a matrix with values in subsets of the real line P(R). We leverage semi-ring properties of P(R) to model multi-hop communication in a TVG using matrix multiplication and a truncated Kleene star. This leads to novel statistics on the communication capacity of TVGs called lifetime curves, which we generate for large samples of randomly chosen STARLINK satellites, whose connectivity is modeled over day-long simulations. Determining when a large subsample of STARLINK is temporally strongly connected is further analyzed using novel metrics introduced here that are inspired by topological data analysis (TDA). To better model networking scenarios between the Earth and Mars, we introduce various semi-rings capable of modeling propagation delay as well as protocols common to Delay Tolerant Networking (DTN), such as store-and-forward. Finally, we illustrate the applicability of zigzag persistence for featurizing different space networks and demonstrate the efficacy of K-Nearest Neighbors (KNN) classification for distinguishing Earth-Mars and Earth-Moon satellite systems using time-varying topology alone.	翻訳日:2023-04-04 14:25:59 公開日:2023-04-03
# 頭を使う: 長距離ビデオ認識を改良 Use Your Head: Improving Long-Tail Video Recognition ( http://arxiv.org/abs/2304.01143v1 ) ライセンス: Link先を確認	Toby Perrett, Saptarshi Sinha, Tilo Burghardt, Majid Mirmehdi, Dima Damen	(参考訳) 本稿では,ロングテールビデオ認識について検討する。自然に収集されたビデオデータセットや既存のロングテール画像ベンチマークとは異なり、現在のビデオベンチマークは複数のロングテールプロパティで不足している。一番重要なのは、尻尾にショットのクラスがほとんどないことです。そこで本研究では,ssv2とvideoltの2つのデータセットからサブセットをサンプリングすることで,ロングテール認識を評価する新しいビデオベンチマークを提案する。そこで本研究では,ヘッドクラスからサンプルを重み付けした組み合わせとして再構成することで,少数クラスからのインスタンスへの過度適合を低減できるLong-Tail Mixed Reconstructionを提案する。 lmrはラベル混合を用いてロバストな決定境界を学習する。 EPIC-KITCHENS と提案した SSv2-LT と VideoLT-LT で最先端の平均クラス精度を実現する。ベンチマークとコード: tobyperrett.github.io/lmr This paper presents an investigation into long-tail video recognition. We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties. Most critically, they lack few-shot classes in their tails. In response, we propose new video benchmarks that better assess long-tail recognition, by sampling subsets from two datasets: SSv2 and VideoLT. We then propose a method, Long-Tail Mixed Reconstruction, which reduces overfitting to instances from few-shot classes by reconstructing them as weighted combinations of samples from head classes. LMR then employs label mixing to learn robust decision boundaries. It achieves state-of-the-art average class accuracy on EPIC-KITCHENS and the proposed SSv2-LT and VideoLT-LT. Benchmarks and code at: tobyperrett.github.io/lmr	翻訳日:2023-04-04 14:25:34 公開日:2023-04-03
# 2つの非エルゴード可逆セルオートマトン、1つは古典的、もう1つは量子的 On two non-ergodic reversible cellular automata, one classical, the other quantum ( http://arxiv.org/abs/2304.01130v1 ) ライセンス: Link先を確認	Tomaz Prosen	(参考訳) 本稿では, 1+1次元のセルオートマトンと, さらなる研究や応用を保証できる, 単純で興味深い性質を持つ2種類の運動粒子モデルを提案し, 議論する。最初のモデルは2種類の準粒子を記述した決定論的で可逆的なオートマトンであり、安定な質量を持たない物質粒子は速度$\pm 1$で動き、不安定で立ち上がり(速度ゼロ)の磁場粒子である。モデルの3つの保存電荷に対する2つの異なる連続性方程式について議論する。最初の2つの電荷と対応する電流は、3つの(3)格子サイトをサポートし、保存されたエネルギー-運動量テンソルの格子類似体を表すが、9つの(9)サイトをサポートする追加の保存電荷と電流を見つける。 2つ目のモデルは、最近導入され研究された荷電ハードポイント格子気体の量子(または確率的)変形を表しており、異なる二元電荷(\pm 1$)と二元速度(\pm 1$)の粒子は弾性衝突散乱によって非自明に混合することができる。このモデルのユニタリ進化則は、ヤン・バクスター方程式を完全に満たさないが、局所保存作用素の無限集合、いわゆるグライダー作用素を産む興味深い関連する同一性を満たす。 We propose and discuss two variants of kinetic particle models - cellular automata in 1+1 dimensions, which have some appeal due to their simplicity and intriguing properties which could warrant further research and applications. The first model is a deterministic and reversible automaton describing two species of quasiparticles: stable massless matter particles moving with velocity $\pm 1$ and unstable, standing (zero velocity) field particles. We discuss two distinct continuity equations for three conserved charges of the model. While the first two charges and the corresponding currents have support three (3) lattice sites and represent a lattice analogue of conserved energy-momentum tensor, we find an additional conserved charge and current with support of nine (9) sites, implying non-ergodic behaviour and potentially signalling integrability of the model with a highly nested R-matrix structure. The second model represents a quantum (or stochastic) deformation of a recently introduced and studied charged hardpoint lattice gas, where particles of different binary charge ($\pm 1$) and binary velocity ($\pm 1$) can nontrivially mix upon elastic collisional scattering. We show that while the unitary evolution rule of this model does not satisfy the full Yang-Baxter equation, it still satisfies an intriguing related identity which gives birth to an infinite set of local conserved operators, the-so-called glider operators.	翻訳日:2023-04-04 14:25:21 公開日:2023-04-03
# 定量的表現と高次元分布距離を用いた合成パラメータ効果検出 Synthesis parameter effect detection using quantitative representations and high dimensional distribution distances ( http://arxiv.org/abs/2304.01120v1 ) ライセンス: Link先を確認	Alex Hagen, Shane Jackson	(参考訳) 合成過程のパラメータが材料の微細構造に及ぼす影響の検出は、材料科学の重要で、しかし、理解に足らない目標である。我々は,Pu(III)オキサレートから酸化プルトニウムを合成する設計実験を解析するために,コプラ理論,高次元分布距離,置換統計に基づく効果の検出法を開発した。結果の酸化プルトニウムの微細構造に及ぼすストライクオーダーとシュウ酸フィードの影響を,文献とよく一致させた。また, 酸性濃度, ストライクオーダー, 降水温度の2つのペア間の過剰な二変量効果も検出した。 Detection of effects of the parameters of the synthetic process on the microstructure of materials is an important, yet elusive goal of materials science. We develop a method for detecting effects based on copula theory, high dimensional distribution distances, and permutational statistics to analyze a designed experiment synthesizing plutonium oxide from Pu(III) Oxalate. We detect effects of strike order and oxalic acid feed on the microstructure of the resulting plutonium oxide, which match the literature well. We also detect excess bivariate effects between the pairs of acid concentration, strike order and precipitation temperature.	翻訳日:2023-04-04 14:24:51 公開日:2023-04-03
# データサイエンスのための解釈可能なシンボリック回帰:2022年競争の分析 Interpretable Symbolic Regression for Data Science: Analysis of the 2022 Competition ( http://arxiv.org/abs/2304.01117v1 ) ライセンス: Link先を確認	F. O. de Franca, M. Virgolin, M. Kommenda, M. S. Majumder, M. Cranmer, G. Espada, L. Ingelse, A. Fonseca, M. Landajuela, B. Petersen, R. Glatt, N. Mundhenk, C. S. Lee, J. D. Hochhalter, D. L. Randall, P. Kamienny, H. Zhang, G. Dick, A. Simon, B. Burlacu, Jaan Kasak, Meera Machado, Casper Wilstrup, W. G. La Cava	(参考訳) 現象を正確に記述した解析式に対する記号回帰探索このアプローチの主な魅力は、ユーザにとって洞察力のある解釈可能なモデルを返すことだ。歴史的に、記号回帰のアルゴリズムの大半は進化的アルゴリズムに基づいている。しかし、最近、列挙アルゴリズム、混合線形整数プログラミング、ニューラルネットワーク、ベイズ最適化のようなアプローチを利用する新しい提案が急増している。これらの新しいアプローチが現実世界のデータでしばしば直面する共通の課題に対してどのように振る舞うかを評価するために、私たちは2022年の遺伝的および進化的計算会議でコンペティションを開催しました。実世界のトラックでは,ドメインエキスパートを用いて,候補モデルの信頼性を判断し,現実的に解釈可能性を評価する。このコンペで得られた結果の詳細な分析を行い,シンボル回帰アルゴリズムの課題について議論し,今後の競争改善の可能性を明らかにする。 Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.	翻訳日:2023-04-04 14:24:41 公開日:2023-04-03
# ReMoDiffuse:Retrieval-Augmented Motion Diffusion Model ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model ( http://arxiv.org/abs/2304.01116v1 ) ライセンス: Link先を確認	Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, Ziwei Liu	(参考訳) 3Dモーション生成はクリエイティブ産業にとって不可欠だ。最近の進歩は、テキスト駆動モーション生成のためのドメイン知識を持つ生成モデルに依存している。しかし、より多様な動きでの演奏は満足できないままである。本研究では,検索機構を統合した拡散モデルに基づく動き生成フレームワーク remodiffuse を提案する。 ReMoDiffuseは3つの重要な設計でテキスト駆動モーション生成の一般化性と多様性を高める 1) ハイブリッド検索は, 意味的およびキネマティックな類似性の観点から, データベースから適切な参照を求める。 2)Semantic-Modulated Transformerは検索知識を選択的に吸収し,検索したサンプルと対象の動作シーケンスの差に適応する。 3) 条件混合は, 推論中に検索データベースをより活用し, 分類器フリーガイダンスの尺度感度を克服する。広範な実験により、remodiffuseは、特により多様なモーション生成のために、テキスト・モーションの一貫性と動作品質の両方をバランスさせることにより、最先端の手法よりも優れていることが示されている。 3D human motion generation is crucial for creative industry. Recent advances rely on generative models with domain knowledge for text-driven motion generation, leading to substantial progress in capturing common motions. However, the performance on more diverse motions remains unsatisfactory. In this work, we propose ReMoDiffuse, a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process. ReMoDiffuse enhances the generalizability and diversity of text-driven motion generation with three key designs: 1) Hybrid Retrieval finds appropriate references from the database in terms of both semantic and kinematic similarities. 2) Semantic-Modulated Transformer selectively absorbs retrieval knowledge, adapting to the difference between retrieved samples and the target motion sequence. 3) Condition Mixture better utilizes the retrieval database during inference, overcoming the scale sensitivity in classifier-free guidance. Extensive experiments demonstrate that ReMoDiffuse outperforms state-of-the-art methods by balancing both text-motion consistency and motion quality, especially for more diverse motion generation.	翻訳日:2023-04-04 14:24:24 公開日:2023-04-03
# 量子インスツルメンテーション制御キットシステムを用いた絡み合った光子対源デモンストレータ Entangled Photon Pair Source Demonstrator using the Quantum Instrumentation Control Kit System ( http://arxiv.org/abs/2304.01190v1 ) ライセンス: Link先を確認	Si Xie, Leandro Stefanazzi, Christina Wang, Cristian Pena, Raju Valivarthi, Lautaro Narvaez, Gustavo Cancelo, Keshav Kapoor, Boris Korzh, Matthew Shaw, Panagiotis Spentzouris, Maria Spiropulu	(参考訳) 本稿では,RFSoCFPGA技術を用いた量子計測制御キット(QICK)システムによる光子対の絡み合った光源の駆動と光子信号の検出について報告する。 QICKシステムでは、一致事故率150を超え、エンタングルメントの可視性は95%を超え、従来の波形生成器を用いた性能測定値と一致している。また,QICKのディジタル化関数を用いた同時検出読み出しを行い,内部の同期時間3.2 psを実現した。本稿では,量子ネットワークの動作において,商用波形生成器とタイムタガーをrfsoc-fpga技術で置き換える実現可能性を明確に示し,コストを1桁以上削減することを示す。 We report the first demonstration of using the Quantum Instrumentation and Control Kit (QICK) system on RFSoCFPGA technology to drive an entangled photon pair source and to detect the photon signals. With the QICK system, we achieve high levels of performance metrics including coincidence-to-accidental ratio exceeding 150, and entanglement visibility exceeding 95%, consistent with performance metrics achieved using conventional waveform generators. We also demonstrate simultaneous detector readout using the digitization functional of QICK, achieving internal system synchronization time resolution of 3.2 ps. The work reported in this paper represents an explicit demonstration of the feasibility for replacing commercial waveform generators and time taggers with RFSoC-FPGA technology in the operation of a quantum network, representing a cost reduction of more than an order of magnitude.	翻訳日:2023-04-04 14:17:16 公開日:2023-04-03
# Poseをフォローする: Pose-Guided Text-to-Video Generation by Pose-free Videos Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos ( http://arxiv.org/abs/2304.01186v1 ) ライセンス: Link先を確認	Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen	(参考訳) テキスト編集可能でポーズ制御可能なキャラクタビデオの生成は、さまざまなデジタル人間を作成する上で不必要に要求される。それでも、このタスクは、ペア化されたビデオの字幕と、ビデオの生成前のモデルを含む包括的なデータセットが存在しないことで制限されている。本研究では,手軽に得られるデータセット(画像ポーズペアとポーズフリービデオ)と事前学習されたテキスト・ツー・イメージモデル(t2i)を活用し,ポーズ制御可能なキャラクタビデオを得ることのできる,新たな2段階学習方式を提案する。具体的には、第1段階では、キーポイントと画像のペアのみが制御可能なテキストと画像の生成にのみ使用される。我々はポーズ情報をエンコードするゼロイニシャライズ畳み込みエンコーダを学習する。第2段階では,学習可能な時間的自己着脱ブロックと再構成されたクロスフレーム自己着脱ブロックを付加することにより,ポーズフリービデオデータセットを介して,上記ネットワークの動作を微調整する。本手法は,新たな設計により,事前学習したt2iモデルの編集と概念合成能力を維持しつつ,連続的なポーズ制御可能なキャラクタビデオの生成に成功している。コードとモデルは公開される予定だ。 Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human. Nevertheless, this task has been restricted by the absence of a comprehensive dataset featuring paired video-pose captions and the generative prior models for videos. In this work, we design a novel two-stage training scheme that can utilize easily obtained datasets (i.e.,image pose pair and pose-free video) and the pre-trained text-to-image (T2I) model to obtain the pose-controllable character videos. Specifically, in the first stage, only the keypoint-image pairs are used only for a controllable text-to-image generation. We learn a zero-initialized convolu- tional encoder to encode the pose information. In the second stage, we finetune the motion of the above network via a pose-free video dataset by adding the learnable temporal self-attention and reformed cross-frame self-attention blocks. Powered by our new designs, our method successfully generates continuously pose-controllable character videos while keeps the editing and concept composition ability of the pre-trained T2I model. The code and models will be made publicly available.	翻訳日:2023-04-04 14:17:00 公開日:2023-04-03
# weaktr: 弱教師付き意味セグメンテーションのためのプレーンビジョントランスフォーマの検討 WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation ( http://arxiv.org/abs/2304.01184v1 ) ライセンス: Link先を確認	Lianghui Zhu, Yingyue Li, Jieming Fang, Yan Liu, Hao Xin, Wenyu Liu, Xinggang Wang	(参考訳) 本稿では,Wakly-supervised Semantic Segmentation (WSSS) のためのプレーンビジョン変換器 (ViT) の特性について検討する。クラスアクティベーションマップ(CAM)は、分類ネットワークを理解してWSSSを起動する上で非常に重要である。我々は、ViTの異なるアテンションヘッドが異なる画像領域に焦点を当てていることを観察する。そこで, より完全な対象を持つ傾向のある高品質CAM結果に対して, 自己注意マップを適応的に融合させながら, 注目ヘッドの重要性をエンドツーエンドで推定する手法を提案する。さらに,CAMの結果をオンラインリトレーニングしてWSSSタスクを完了するためのViTベースの勾配クリッピングデコーダを提案する。我々はこの平易なTransformerベースのWeakly教師付き学習フレームワークをWeakTrと名付けた。標準的なベンチマークでは、PASCAL VOC 2012のvalセットでは78.4% mIoU、COCO 2014のvalセットでは50.3% mIoUである。コードはhttps://github.com/hustvl/WeakTr.comで入手できる。 This paper explores the properties of the plain Vision Transformer (ViT) for Weakly-supervised Semantic Segmentation (WSSS). The class activation map (CAM) is of critical importance for understanding a classification network and launching WSSS. We observe that different attention heads of ViT focus on different image areas. Thus a novel weight-based method is proposed to end-to-end estimate the importance of attention heads, while the self-attention maps are adaptively fused for high-quality CAM results that tend to have more complete objects. Besides, we propose a ViT-based gradient clipping decoder for online retraining with the CAM results to complete the WSSS task. We name this plain Transformer-based Weakly-supervised learning framework WeakTr. It achieves the state-of-the-art WSSS performance on standard benchmarks, i.e., 78.4% mIoU on the val set of PASCAL VOC 2012 and 50.3% mIoU on the val set of COCO 2014. Code is available at https://github.com/hustvl/WeakTr.	翻訳日:2023-04-04 14:16:38 公開日:2023-04-03
# schr\"odinger方程式の非線形拡大の完全可解モデル Exactly solvable models of nonlinear extensions of the Schr\"odinger equation ( http://arxiv.org/abs/2304.01183v1 ) ライセンス: Link先を確認	Tom Dodge and Peter Schweitzer	(参考訳) schr\"odinger方程式の完全可解な非線形拡張を構成する方法を提案する。この方法は、正確に解ける通常のシュリンガー方程式と、正確に解ける非線形理論の間の一定の条件下で確立できる対応を探索する。本手法の具体例をいくつか紹介する。我々はよく知られたソリトン解を再定義し、様々な空間次元において解くことができる新しい非線形理論を見つける。この手法は、より非線形な理論を構築し、相対論的ソリトン理論に一般化することができ、多くの応用が期待できる。 A method is presented to construct exactly solvable nonlinear extensions of the Schr\"odinger equation. The method explores a correspondence which can be established under certain conditions between exactly solvable ordinary Schr\"odinger equations and exactly solvable nonlinear theories. We provide several examples illustrating the method. We rederive well-known soliton solutions and find new exactly solvable nonlinear theories in various space dimensions which, to the best of our knowledge, have not yet been discussed in literature. Our method can be used to construct further nonlinear theories and generalized to relativistic soliton theories, and may have many applications.	翻訳日:2023-04-04 14:16:20 公開日:2023-04-03
# 双極子対称性破壊からの非フェルミ液体 Non-Fermi Liquids from Dipolar Symmetry Breaking ( http://arxiv.org/abs/2304.01181v1 ) ライセンス: Link先を確認	Amogh Anakru, Zhen Bi	(参考訳) フラクトロニック位相の出現と量子力学の新しい普遍性クラスは、凝縮系における双極子対称性の重要性を強調している。本研究では,種々の空間次元のフェルミオンモデルにおける双極子対称性の対称性破断相の性質について検討する。このような系では、フェルミオンは双極子凝縮によってエネルギー分散を得る。変換対称性と双極子対称性の間の非自明な可換性のため、二極子縮合の金石モードは分散フェルミオンに強く結合し、自然に低エネルギーで非フェルミ液体を生じさせる。双極子対称性の破れ相のIR記述は、創発的U(1)ゲージ場と結合するフェルミ曲面のよく知られた理論に類似している。また,双極子対称性がわずかに破れた場合の交叉挙動と異方性双極子保存の場合についても論じる。 The emergence of fractonic topological phases and novel universality classes for quantum dynamics highlights the importance of dipolar symmetry in condensed matter systems. In this work, we study the properties of symmetry-breaking phases of the dipolar symmetries in fermionic models in various spatial dimensions. In such systems, fermions obtain energy dispersion through dipole condensation. Due to the nontrivial commutation between the translation symmetry and dipolar symmetry, the Goldstone modes of the dipolar condensate are strongly coupled to the dispersive fermions and naturally give rise to non-Fermi liquids at low energies. The IR description of the dipolar symmetry-breaking phase is analogous to the well-known theory of a Fermi surface coupled to an emergent U(1) gauge field. We also discuss the crossover behavior when the dipolar symmetry is slightly broken and the cases with anisotropic dipolar conservation.	翻訳日:2023-04-04 14:16:10 公開日:2023-04-03
# BERTを用いたパーラーにおけるヘイトスピーチターゲット検出 Hate Speech Targets Detection in Parler using BERT ( http://arxiv.org/abs/2304.01179v1 ) ライセンス: Link先を確認	Nadav Schneider, Shimon Shouei, Saleem Ghantous, Elad Feldman	(参考訳) オンラインソーシャルネットワークは、私たちの日常生活の基本的な構成要素となっている。残念ながら、これらのプラットフォームはヘイトスピーチの舞台でもある。人気ソーシャルネットワークはヘイトスピーチに対する規則を定めている。その結果、ParlerやGabのようなソーシャルネットワークは、無料の音声プラットフォームを提唱し、主張している。これらのプラットフォームは、様々なターゲットに対するヘイトスピーチの地区となっている。本稿では、ヘイトスピーチとそのターゲットを検知し、パラーヘイトターゲットの分布を作成するためのパイプラインを提案する。パイプラインは2つのモデルで構成されており、1つはヘイトスピーチ検出用、もう1つはターゲット分類用である。この作業で使用されるソースコードと他の関連するソースは、https://github.com/NadavSc/HateRecognition.gitで公開されている。 Online social networks have become a fundamental component of our everyday life. Unfortunately, these platforms are also a stage for hate speech. Popular social networks have regularized rules against hate speech. Consequently, social networks like Parler and Gab advocating and claiming to be free speech platforms have evolved. These platforms have become a district for hate speech against diverse targets. We present in our paper a pipeline for detecting hate speech and its targets and use it for creating Parler hate targets' distribution. The pipeline consists of two models; one for hate speech detection and the second for target classification, both based on BERT with Back-Translation and data pre-processing for improved results. The source code used in this work, as well as other relevant sources, are available at: https://github.com/NadavSc/HateRecognition.git	翻訳日:2023-04-04 14:15:55 公開日:2023-04-03
# エンタングルメントスペクトル平坦性による非安定化性の定量化 Quantifying non-stabilizerness through entanglement spectrum flatness ( http://arxiv.org/abs/2304.01175v1 ) ライセンス: Link先を確認	Emanuele Tirrito, Poetri Sonya Tarabunga, Gugliemo Lami, Titas Chanda, Lorenzo Leone, Salvatore F.E. Oliviero, Marcello Dalmonte, Mario Collura, and Alioscia Hamma	(参考訳) 非安定化性(non-stabilizerness)は、量子コンピューティングにおいて有利なリソースであり、非クリフォード演算へのアクセスにある。非安定性がどのように量子化され、他の量子資源とどのように関連しているかを包括的に理解することは、量子複雑性の起源の研究と特徴付けに不可欠である。本研究では、純量子状態に対する非安定度と絡み合いスペクトルの平坦度との直接接続を確立する。この接続を利用して、ノイズがあっても非安定化剤の効率よく探索できることを示す。以上の結果から,非安定化性と絡み合い応答の直接関係を明らかにし,コールドアトムおよび固体プラットフォームにおける非安定化性を調べるための明快な実験プロトコルを定義した。 Non-stabilizerness - also colloquially referred to as magic - is the a resource for advantage in quantum computing and lies in the access to non-Clifford operations. Developing a comprehensive understanding of how non-stabilizerness can be quantified and how it relates other quantum resources is crucial for studying and characterizing the origin of quantum complexity. In this work, we establish a direct connection between non-stabilizerness and entanglement spectrum flatness for a pure quantum state. We show that this connection can be exploited to efficiently probe non-stabilizerness even in presence of noise. Our results reveal a direct connection between non-stabilizerness and entanglement response, and define a clear experimental protocol to probe non-stabilizerness in cold atom and solid-state platforms.	翻訳日:2023-04-04 14:15:42 公開日:2023-04-03
# 3次元認識画像生成のための生成多面ニューラルラミアンス Generative Multiplane Neural Radiance for 3D-Aware Image Generation ( http://arxiv.org/abs/2304.01172v1 ) ライセンス: Link先を確認	Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan	(参考訳) 本稿では,複数のターゲットビューに対して連続した3次元高解像度画像を効率よく生成する手法を提案する。 GMNRと呼ばれる提案された多面体ニューラルラジアンスモデルは、ビュー依存情報を学習するための新しいビュー依存表現({\alpha}-VdR)モジュールから構成される。 α}-vdr モジュールは {\alpha} で誘導されたピクセルサンプリング技術により実現され、ビュー方向と位置係数を学習することで、ビュー依存表現を効率的に計算する。さらに、複数のビューにまたがって光度類似性を強制するビュー一貫性損失を提案する。 GMNRモデルは、トレーニング時間と推論時間の両方で計算効率を保ちながら、複数のカメラのポーズに一貫性のある3D対応高解像度画像を生成することができる。 3つのデータセットに関する実験により、提案するモジュールの有効性が示され、既存のアプローチと比較して、生成品質と推論時間の両方において良好な結果が得られた。我々のGMNRモデルは、単一のV100上で17.6FPSの1024×1024ピクセルの3D認識画像を生成する。コード:https://github.com/VIROBO-15/GMNR We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views. The proposed multiplane neural radiance model, named GMNR, consists of a novel {\alpha}-guided view-dependent representation ({\alpha}-VdR) module for learning view-dependent information. The {\alpha}-VdR module, faciliated by an {\alpha}-guided pixel sampling technique, computes the view-dependent representation efficiently by learning viewing direction and position coefficients. Moreover, we propose a view-consistency loss to enforce photometric similarity across multiple views. The GMNR model can generate 3D-aware high-resolution images that are viewconsistent across multiple camera poses, while maintaining the computational efficiency in terms of both training and inference time. Experiments on three datasets demonstrate the effectiveness of the proposed modules, leading to favorable results in terms of both generation quality and inference time, compared to existing approaches. Our GMNR model generates 3D-aware images of 1024 X 1024 pixels with 17.6 FPS on a single V100. Code : https://github.com/VIROBO-15/GMNR	翻訳日:2023-04-04 14:15:30 公開日:2023-04-03
# 自然画像マッティングにおけるコンテキストアグリゲーションの再考 Rethinking Context Aggregation in Natural Image Matting ( http://arxiv.org/abs/2304.01171v1 ) ライセンス: Link先を確認	Qinglin Liu, Shengping Zhang, Quanling Meng, Ru Li, Bineng Zhong, Liqiang Nie	(参考訳) 自然な画像マッチングでは、背景と背景を区別することが困難である場合、文脈情報はアルファマットの推定において重要な役割を果たす。ディープラーニングベースのメソッドの出力は、特に設計されたコンテキストアグリゲーションモジュールを利用してエンコーダ機能を洗練する。しかし、これらのモジュールの有効性は十分に調査されていない。本稿では,コンテキストアグリゲーションモジュールが期待したほど効果的ではないことを示すために,広範な実験を行う。また,大きなイメージパッチで学習すると,より大きな受容領域を持つ基本エンコーダ・デコーダネットワークは,コンテキストを効果的に集約し,より優れた性能を実現することができることを実証する。本報告では,エンコーダに外観強調軸方向学習ブロックを組み込んで,ハイブリッドトランスフォーマデコーダを採用することで,受容領域を拡大する簡易かつ効果的なマットリングネットワークaematterを提案する。 4つのデータセットに対する実験結果から、我々のAEMatterは最先端のマッティング手法(例えばAdobe Composition-1Kデータセットでは、SADとMSEのそれぞれで、それぞれ \textbf{25\%} と \textbf{40\%} の削減)を大幅に上回っていることが示されています。コードとモデルは \url{https://github.com/qlyoo/aematter} で利用可能である。 For natural image matting, context information plays a crucial role in estimating alpha mattes especially when it is challenging to distinguish foreground from its background. Exiting deep learning-based methods exploit specifically designed context aggregation modules to refine encoder features. However, the effectiveness of these modules has not been thoroughly explored. In this paper, we conduct extensive experiments to reveal that the context aggregation modules are actually not as effective as expected. We also demonstrate that when learned on large image patches, basic encoder-decoder networks with a larger receptive field can effectively aggregate context to achieve better performance.Upon the above findings, we propose a simple yet effective matting network, named AEMatter, which enlarges the receptive field by incorporating an appearance-enhanced axis-wise learning block into the encoder and adopting a hybrid-transformer decoder. Experimental results on four datasets demonstrate that our AEMatter significantly outperforms state-of-the-art matting methods (e.g., on the Adobe Composition-1K dataset, \textbf{25\%} and \textbf{40\%} reduction in terms of SAD and MSE, respectively, compared against MatteFormer). The code and model are available at \url{https://github.com/QLYoo/AEMatter}.	翻訳日:2023-04-04 14:15:12 公開日:2023-04-03
# DeepAccident: V2X自動運転の動作と事故予測ベンチマーク DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving ( http://arxiv.org/abs/2304.01168v1 ) ライセンス: Link先を確認	Tianqi Wang, Sukmin Kim, Wenxuan Ji, Enze Xie, Chongjian Ge, Junsong Chen, Zhenguo Li, Ping Luo	(参考訳) 安全は自動運転の優先事項である。それでも、現在公表されているデータセットは、自律運転の直接的かつ説明可能な安全性評価をサポートしていない。本研究では,実世界の運転時に頻繁に発生する多様な事故シナリオを含む現実的なシミュレータを用いて生成された大規模データセットであるdeepaccidentを提案する。提案するdeepaccidentデータセットは57kの注釈付きフレームと285kの注釈付きサンプルを含み、40kの注釈付きサンプルを持つ大規模nuscenesデータセットの約7倍である。さらに,提案したデータセットに基づいて,新たなタスク,エンドツーエンド動作と事故予測を提案し,異なる自律運転アルゴリズムの事故予測能力を直接評価することができる。さらに,各シナリオに対して,データ記録のための4台の車両と1台のインフラを設定し,事故シナリオの多様な視点を提供し,V2X(車間通信)による知覚と予測タスクの実現を可能にした。最後に,V2XFormerと呼ばれるベースラインV2Xモデルを提案する。 Safety is the primary priority of autonomous driving. Nevertheless, no published dataset currently supports the direct and explainable safety evaluation for autonomous driving. In this work, we propose DeepAccident, a large-scale dataset generated via a realistic simulator containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset contains 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset with 40k annotated samples. In addition, we propose a new task, end-to-end motion and accident prediction, based on the proposed dataset, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms. Furthermore, for each scenario, we set four vehicles along with one infrastructure to record data, thus providing diverse viewpoints for accident scenarios and enabling V2X (vehicle-to-everything) research on perception and prediction tasks. Finally, we present a baseline V2X model named V2XFormer that demonstrates superior performance for motion and accident prediction and 3D object detection compared to the single-vehicle model.	翻訳日:2023-04-04 14:14:45 公開日:2023-04-03
# 準メトリック学習による最適ゴールリーチ強化学習 Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning ( http://arxiv.org/abs/2304.01203v1 ) ライセンス: Link先を確認	Tongzhou Wang, Antonio Torralba, Phillip Isola, Amy Zhang	(参考訳) 目標到達強化学習(rl)では、最適値関数は準メトリック構造と呼ばれる特定の幾何学を持つ。本稿では,準メトリックモデルを用いて最適値関数を学習する新しい rl 手法である quasimetric reinforcement learning (qrl) を提案する。従来のアプローチとは違い、QRLの目標は特に準計量のために設計されており、強力な理論的回復保証を提供する。実験的に、離散化されたマウンテンカー環境を徹底的に分析し、QRLの特性と代替品に対する優位性を識別する。オフラインおよびオンラインの目標達成ベンチマークでは、QRLは、状態ベースと画像ベースの両方で、サンプル効率とパフォーマンスが改善されている。 In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.	翻訳日:2023-04-04 14:08:34 公開日:2023-04-03
# 視覚ロコモーション制御のためのニューラルボリュームメモリ Neural Volumetric Memory for Visual Locomotion Control ( http://arxiv.org/abs/2304.01201v1 ) ライセンス: Link先を確認	Ruihan Yang, Ge Yang, Xiaolong Wang	(参考訳) 脚のあるロボットは、舗装道路を超えて自律性の範囲を広げる可能性がある。本研究では,1つの前方深度カメラを用いて,挑戦的地形における移動の難しさについて考察する。問題の部分的な観測性のため、ロボットは現在の地形を推定するために過去の観測に頼らなければならない。この問題を解決するために,シーンの3次元形状を明示的にモデル化するコンピュータビジョンのパラダイムに従い,3次元世界のse(3)等分散を明示的に考慮した幾何学的メモリアーキテクチャであるneural volumetric memory (nvm)を提案する。 NVMは、複数のカメラビューの特徴量を、まずロボットのエゴ中心のフレームに戻すことで集約する。我々は,物理ロボットで学習した視覚運動ポリシーをテストし,学習中に幾何学的事前化を明示的に導入する手法が,na\"ive法よりも優れた性能をもたらすことを示す。また,神経容積記憶に記憶されている表現が,シーンを再構築するための十分な幾何学的情報を取得することを示した。ビデオ付きプロジェクトページはhttps://rchalyang.github.io/NVM です。 Legged robots have the potential to expand the reach of autonomy beyond paved roads. In this work, we consider the difficult problem of locomotion on challenging terrains using a single forward-facing depth camera. Due to the partial observability of the problem, the robot has to rely on past observations to infer the terrain currently beneath it. To solve this problem, we follow the paradigm in computer vision that explicitly models the 3D geometry of the scene and propose Neural Volumetric Memory (NVM), a geometric memory architecture that explicitly accounts for the SE(3) equivariance of the 3D world. NVM aggregates feature volumes from multiple camera views by first bringing them back to the ego-centric frame of the robot. We test the learned visual-locomotion policy on a physical robot and show that our approach, which explicitly introduces geometric priors during training, offers superior performance than more na\"ive methods. We also include ablation studies and show that the representations stored in the neural volumetric memory capture sufficient geometric information to reconstruct the scene. Our project page with videos is https://rchalyang.github.io/NVM .	翻訳日:2023-04-04 14:08:23 公開日:2023-04-03
# オープンワールドにおけるビデオインスタンスセグメンテーション Video Instance Segmentation in an Open-World ( http://arxiv.org/abs/2304.01200v1 ) ライセンス: Link先を確認	Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan	(参考訳) 既存のビデオ・インスタンス・セグメンテーション(VIS)のアプローチは一般的にクローズド・ワールドの仮定に従う。オープンワールドの定式化は、次のような密世界の静的学習の仮定を緩和する。 (a)まず、既知のカテゴリの集合を区別し、未知のオブジェクトを「未知」とラベルし、次に b) 未知のクラスと対応するセマンティックラベルが利用可能になったときのクラスを漸進的に学習する。 OW-VISFormerという名前のオープンワールドVISアプローチを提案し、新しい機能強化機構と時空間オブジェクトネス(STO)モジュールを提案する。軽量補助ネットワークに基づく特徴強調機構は,背景からの正確な画素レベルの(未知の)オブジェクト記述と,カテゴリ固有の既知のセマンティッククラスを識別することを目的としている。 STOモジュールは、対照的な損失によって前景のアクティベーションを強化することで、インスタンスレベルの擬似ラベルを生成する。さらに、OW-VISの特性を測定するための広範な実験プロトコルも導入する。我々のOW-VISFormerはOW-VIS設定において、ソリッドベースラインに対して良好に動作します。さらに,最新のSeqFormerに組み込むことで,標準のフル教師付きVIS設定へのコントリビューションを評価し, Youtube-VIS 2019 val において 1.6 % AP の絶対ゲインを実現した。セット最後に,open-world detection (owod) 設定に対する我々の貢献の汎用性を示す。 OW-VISスプリットと共にコード、モデルは \url{https://github.com/OmkarThawakar/OWVISFormer} で入手できる。 Existing video instance segmentation (VIS) approaches generally follow a closed-world assumption, where only seen category instances are identified and spatio-temporally segmented at inference. Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available. We propose the first open-world VIS approach, named OW-VISFormer, that introduces a novel feature enrichment mechanism and a spatio-temporal objectness (STO) module. The feature enrichment mechanism based on a light-weight auxiliary network aims at accurate pixel-level (unknown) object delineation from the background as well as distinguishing category-specific known semantic classes. The STO module strives to generate instance-level pseudo-labels by enhancing the foreground activations through a contrastive loss. Moreover, we also introduce an extensive experimental protocol to measure the characteristics of OW-VIS. Our OW-VISFormer performs favorably against a solid baseline in OW-VIS setting. Further, we evaluate our contributions in the standard fully-supervised VIS setting by integrating them into the recent SeqFormer, achieving an absolute gain of 1.6\% AP on Youtube-VIS 2019 val. set. Lastly, we show the generalizability of our contributions for the open-world detection (OWOD) setting, outperforming the best existing OWOD method in the literature. Code, models along with OW-VIS splits are available at \url{https://github.com/OmkarThawakar/OWVISFormer}.	翻訳日:2023-04-04 14:08:04 公開日:2023-04-03
# 人間の行動認識における3次元ポーズとトラッキングの利点について On the Benefits of 3D Pose and Tracking for Human Action Recognition ( http://arxiv.org/abs/2304.01199v1 ) ライセンス: Link先を確認	Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph Feichtenhofer, Jitendra Malik	(参考訳) 本研究では,行動認識のためのトラッキングと3Dポーズの利点について検討する。これを達成するために、空間の定点ではなく、人間の運動の軌道上の行動を分析するラグランジュ的視点を採る。この立場を取ることで、人々のトラックレットを使って行動を予測することができます。この精神の中では、まず3Dのポーズを用いて行動を推測し、対人インタラクションを研究することの利点を示す。次に,トラックレット上での3次元ポーズと文脈的外観を用いてラグランジュ的行動認識モデルを提案する。そこで本手法は,AVA v2.2データセットのポーズのみの設定と標準ベンチマーク設定の両方で,最先端のパフォーマンスを実現する。ポーズキューのみを用いてアクションを推論すると、ポーズモデルは対応する最先端モデルに対して+10.0mAP、融合モデルは最高の最先端モデルに対して+2.8mAPとなる。コードと結果は以下の通りである。 https://brjathu.github.io/lart In this work we study the benefits of using tracking and 3D poses for action recognition. To achieve this, we take the Lagrangian view on analysing actions over a trajectory of human motion rather than at a fixed point in space. Taking this stand allows us to use the tracklets of people to predict their actions. In this spirit, first we show the benefits of using 3D pose to infer actions, and study person-person interactions. Subsequently, we propose a Lagrangian Action Recognition model by fusing 3D pose and contextualized appearance over tracklets. To this end, our method achieves state-of-the-art performance on the AVA v2.2 dataset on both pose only settings and on standard benchmark settings. When reasoning about the action using only pose cues, our pose model achieves +10.0 mAP gain over the corresponding state-of-the-art while our fused model has a gain of +2.8 mAP over the best state-of-the-art model. Code and results are available at: https://brjathu.github.io/LART	翻訳日:2023-04-04 14:07:36 公開日:2023-04-03
# デカップリングワンパスネットワークを用いたゼロショットセマンティクスセグメンテーション Zero-Shot Semantic Segmentation with Decoupled One-Pass Network ( http://arxiv.org/abs/2304.01198v1 ) ライセンス: Link先を確認	Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma	(参考訳) 近年,ゼロショット意味セグメンテーション問題に注目が集まっており,提案マスク生成のためのストリームと事前学習されたビジュアル言語モデルを用いたセグメンテーション分類という,2つのストリームネットワークに基づく手法が最適である。しかし、既存の2ストリーム手法では、非常に非効率な視覚言語モデルに大量の(最大100まで)画像作物を渡す必要がある。この問題に対処するために、入力画像ごとに視覚言語モデルに1回だけパスする必要のあるネットワークを提案する。具体的には,まず,事前学習した視覚エンコーダ内のパッチ埋め込み間の有害干渉を制限するために,パッチ切断と呼ぶ新しいネットワーク適応手法を提案する。そこで我々は,ネットワークがより差別的な特徴に着目するように,分類アンカー学習を提案する。実験の結果,提案手法は最先端の手法を4倍から7倍の速さで上回り,優れた性能を発揮することが示された。コードをhttps://github.com/CongHan0808/DeOP.gitでリリースします。 Recently, the zero-shot semantic segmentation problem has attracted increasing attention, and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pre-trained visual-language model. However, existing two-stream methods require passing a great number of (up to a hundred) image crops into the visuallanguage model, which is highly inefficient. To address the problem, we propose a network that only needs a single pass through the visual-language model for each input image. Specifically, we first propose a novel network adaptation approach, termed patch severance, to restrict the harmful interference between the patch embeddings in the pre-trained visual encoder. We then propose classification anchor learning to encourage the network to spatially focus on more discriminative features for classification. Extensive experiments demonstrate that the proposed method achieves outstanding performance, surpassing state-of-theart methods while being 4 to 7 times faster at inference. We release our code at https://github.com/CongHan0808/DeOP.git.	翻訳日:2023-04-04 14:07:23 公開日:2023-04-03
# あらゆるデスクにテレプレゼンスをもたらす Bringing Telepresence to Every Desk ( http://arxiv.org/abs/2304.01197v1 ) ライセンス: Link先を確認	Shengze Wang, Ziheng Wang, Ryan Schmelzle, Liujie Zheng, YoungJoong Kwon, Soumyadip Sengupta, Henry Fuchs	(参考訳) 本稿では,すべてのデスクトップにテレプレゼンスを導入する。商用システムとは異なり、パーソナル3dビデオ会議システムは、平均的な消費者にとって経済的かつ計算可能でありながら、高品質なビデオをレンダリングしなければならない。そこで本研究では,4種類のrgbdカメラを必要とせず,ユーザと環境の高品質な自由視点映像を合成するキャプチャ・レンダリングシステムを提案する。実験の結果,オブジェクトテンプレートや重度前処理を使わずに高品質な自由視点映像をレンダリングできることがわかった。リアルタイムではないものの、システムは高速であり、ビデオ単位の最適化を必要としない。さらに,複雑な手のジェスチャーや衣服に対してロバストなシステムであり,新たなユーザに一般化することができる。この作業は、さらなる最適化のための強力な基盤を提供し、近い将来、すべてのデスクにテレプレゼンスをもたらすのに役立ちます。コードとデータセットは当社のwebサイトhttps://mcmvmc.github.io/personaltelepresence/で利用可能になります。 In this paper, we work to bring telepresence to every desktop. Unlike commercial systems, personal 3D video conferencing systems must render high-quality videos while remaining financially and computationally viable for the average consumer. To this end, we introduce a capturing and rendering system that only requires 4 consumer-grade RGBD cameras and synthesizes high-quality free-viewpoint videos of users as well as their environments. Experimental results show that our system renders high-quality free-viewpoint videos without using object templates or heavy pre-processing. While not real-time, our system is fast and does not require per-video optimizations. Moreover, our system is robust to complex hand gestures and clothing, and it can generalize to new users. This work provides a strong basis for further optimization, and it will help bring telepresence to every desk in the near future. The code and dataset will be made available on our website https://mcmvmc.github.io/PersonalTelepresence/.	翻訳日:2023-04-04 14:07:05 公開日:2023-04-03
# Baize: セルフチャットデータに基づくパラメータ効率チューニングを備えたオープンソースのチャットモデル Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data ( http://arxiv.org/abs/2304.01196v1 ) ライセンス: Link先を確認	Canwen Xu and Daya Guo and Nan Duan and Julian McAuley	(参考訳) ChatGPTのようなチャットモデルは印象的な機能を示しており、多くのドメインで急速に採用されている。しかし、これらのモデルは制限付きAPIを通じてのみアクセス可能であり、この分野における新たな研究と進歩の障壁となる。そこで本研究では,chatgptを利用して対話を行うことで,高品質なマルチターンチャットコーパスを自動生成するパイプラインを提案する。その後,オープンソースの大規模言語モデルであるLLaMAを強化するためにパラメータ効率のチューニングを用いる。得られたモデルBaizeは、潜在的なリスクを最小限に抑えるガードレールとのマルチターン対話において、優れたパフォーマンスを示す。 Chat models, such as ChatGPT, have shown impressive capabilities and have been rapidly adopted across numerous domains. However, these models are only accessible through a restricted API, creating barriers for new research and progress in the field. We propose a pipeline that can automatically generate a high-quality multi-turn chat corpus by leveraging ChatGPT to engage in a conversation with itself. Subsequently, we employ parameter-efficient tuning to enhance LLaMA, an open-source large language model. The resulting model, named Baize, demonstrates good performance in multi-turn dialogues with guardrails that minimize potential risks.	翻訳日:2023-04-04 14:06:49 公開日:2023-04-03
# すべての機能が重要ではない: 適応的な事前リファインメントによるFew-shot CLIPの強化 Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement ( http://arxiv.org/abs/2304.01195v1 ) ライセンス: Link先を確認	Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, Peng Gao	(参考訳) Contrastive Language-Image Pre-Training (CLIP) の人気は、様々な下流視覚タスクへの応用を促している。下流タスクの能力を向上させるために、数発の学習が広く採用されている。しかし、既存の方法は限られた性能を示すか、過剰に学習可能なパラメータに悩まされる。本稿では,CLIP の事前学習知識に対する適応的事前 rEfinement 手法である APE を提案する。先行改良モジュールを用いて下流データにおけるクラス間格差を分析し,そのドメイン固有の知識をクリップ抽出キャッシュモデルから分離する。それに加えて、トレーニング不要のAPEとトレーニング不要のAPE-Tの2つのモデル変種を導入する。テスト画像,事前キャッシュモデル,テキスト表現間の三国間親和性を探索し,軽量なカテゴリ対応モジュールのトレーニングのみを可能にする。 11以上のベンチマークの平均精度では、APEとAPE-Tはいずれも最先端に達し、x30より学習可能なパラメータの少ない16ショットで、それぞれ1.59%、+1.99%で2番目のベットを上回っている。 The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its application to diverse downstream vision tasks. To improve its capacity on downstream tasks, few-shot learning has become a widely-adopted technique. However, existing methods either exhibit limited performance or suffer from excessive learnable parameters. In this paper, we propose APE, an Adaptive Prior rEfinement method for CLIP's pre-trained knowledge, which achieves superior accuracy with high computational efficiency. Via a prior refinement module, we analyze the inter-class disparity in the downstream data and decouple the domain-specific knowledge from the CLIP-extracted cache model. On top of that, we introduce two model variants, a training-free APE and a training-required APE-T. We explore the trilateral affinities between the test image, prior cache model, and textual representations, and only enable a lightweight category-residual module to be trained. For the average accuracy over 11 benchmarks, both APE and APE-T attain state-of-the-art and respectively outperform the second-best by +1.59% and +1.99% under 16 shots with x30 less learnable parameters.	翻訳日:2023-04-04 14:06:41 公開日:2023-04-03
# Burstormer:バーストイメージ復元と強化トランスフォーマー Burstormer: Burst Image Restoration and Enhancement Transformer ( http://arxiv.org/abs/2304.01194v1 ) ライセンス: Link先を確認	Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang	(参考訳) シャッタープレスでは、現代のハンドヘルドカメラが高速に複数の画像をキャプチャし、それらをマージして単一の画像を生成する。しかし、バースト内の個々のフレームは避けられない動きのために不整列であり、複数の劣化を含む。課題は、連続した画像を適切に調整し、その補完的な情報をマージして高品質な出力を達成することである。本稿では,バースト画像の復元と拡張のための新しいトランスフォーマーアーキテクチャであるburstormerを提案する。既存の手法と比較して,マルチスケールの局所的特徴と非局所的特徴を活用し,アライメントと機能融合の改善を図る。私たちのキーとなるアイデアは、バーストワイドコンテキストをモデル化しながら、情報集約とプログレッシブフュージョンのためのバースト地区でのフレーム間通信を可能にすることです。しかし、入力バーストフレームは、情報を融合する前に適切に整列する必要がある。そこで本論文では,バースト特徴を参照フレームにアライメントするための拡張変形可能なアライメントモジュールを提案する。既存の手法と異なり,提案するアライメントモジュールはバースト特徴の整列だけでなく,複雑な動きの処理を容易にする参照ベース機能拡張機構を通じて,特徴情報を交換し,参照フレームとの集中的なコミュニケーションを維持する。マルチレベルアライメントおよびエンリッチメントの後、循環バーストサンプリングモジュールを用いてバースト内のフレーム間通信を再強調する。最後に、提案したバースト機能融合モジュールを用いてフレーム間情報を集約し、さらにプログレッシブアップサンプリングを行う。私たちのBurstormerは、バースト超解像、バーストデノイング、バースト低照度向上の最先端手法よりも優れています。私たちのコードと事前訓練済みモデルはhttps:// github.com/akshaydudhane16/Burstormerで利用可能です。 On a shutter press, modern handheld cameras capture multiple images in rapid succession and merge them to generate a single image. However, individual frames in a burst are misaligned due to inevitable motions and contain multiple degradations. The challenge is to properly align the successive image shots and merge their complimentary information to achieve high-quality outputs. Towards this direction, we propose Burstormer: a novel transformer-based architecture for burst image restoration and enhancement. In comparison to existing works, our approach exploits multi-scale local and non-local features to achieve improved alignment and feature fusion. Our key idea is to enable inter-frame communication in the burst neighborhoods for information aggregation and progressive fusion while modeling the burst-wide context. However, the input burst frames need to be properly aligned before fusing their information. Therefore, we propose an enhanced deformable alignment module for aligning burst features with regards to the reference frame. Unlike existing methods, the proposed alignment module not only aligns burst features but also exchanges feature information and maintains focused communication with the reference frame through the proposed reference-based feature enrichment mechanism, which facilitates handling complex motions. After multi-level alignment and enrichment, we re-emphasize on inter-frame communication within burst using a cyclic burst sampling module. Finally, the inter-frame information is aggregated using the proposed burst feature fusion module followed by progressive upsampling. Our Burstormer outperforms state-of-the-art methods on burst super-resolution, burst denoising and burst low-light enhancement. Our codes and pretrained models are available at https:// github.com/akshaydudhane16/Burstormer	翻訳日:2023-04-04 14:06:21 公開日:2023-04-03
# 画像で特定されたオブジェクトへのナビゲート Navigating to Objects Specified by Images ( http://arxiv.org/abs/2304.01192v1 ) ライセンス: Link先を確認	Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot	(参考訳) イメージは、具体化エージェントがナビゲートすべき特定のオブジェクトインスタンスを指定するための便利な方法である。この課題を解決するには、未知の環境の視覚的推論と探索が必要である。本稿では,この課題をシミュレーションと実世界の両方で行うシステムを提案する。モジュール方式は探索,目標インスタンスの再同定,目標位置特定,局所ナビゲーションといったサブタスクを解決する。特徴マッチングを用いてゴールインスタンスを再同定し、一致した特徴をマップに投影することでゴールインスタンスをローカライズする。各サブタスクは、ゼロの微調整を必要とするオフザシェルフコンポーネントを使用して解決される。 HM3D InstanceImageNavベンチマークでは、このシステムはベースラインのエンドツーエンドのRLポリシー7xと最先端のImageNavモデル2.3x(56%対25%の成功)を上回っている。我々は,このシステムを移動ロボットプラットフォームにデプロイし,実世界の効果的なパフォーマンスを実証し,家庭とオフィス環境全体で88%の成功率を達成した。 Images are a convenient way to specify which particular object instance an embodied agent should navigate to. Solving this task requires semantic visual reasoning and exploration of unknown environments. We present a system that can perform this task in both simulation and the real world. Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation. We re-identify the goal instance in egocentric vision using feature-matching and localize the goal instance by projecting matched features to a map. Each sub-task is solved using off-the-shelf components requiring zero fine-tuning. On the HM3D InstanceImageNav benchmark, this system outperforms a baseline end-to-end RL policy 7x and a state-of-the-art ImageNav model 2.3x (56% vs 25% success). We deploy this system to a mobile robot platform and demonstrate effective real-world performance, achieving an 88% success rate across a home and an office environment.	翻訳日:2023-04-04 14:05:57 公開日:2023-04-03
# SynthVSR:Synthetic Supervisionによる視覚音声認識のスケールアップ SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision ( http://arxiv.org/abs/2303.17200v2 ) ライセンス: Link先を確認	Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, J\'achym Kol\'a\v{r}, Stavros Petridis, Maja Pantic, Christian Fuegen	(参考訳) 最近報告された、視覚音声認識(VSR)における最先端の結果は、しばしば大量のビデオデータに依存するが、公開されている転写されたビデオデータセットのサイズは限られている。本稿では,VSRに合成視覚データを活用する可能性について,初めて考察する。本手法は,合成唇運動を用いたVSRシステムの性能を大幅に向上させる。 SynthVSRの背後にある重要なアイデアは、入力音声に条件付き唇の動きを生成する音声駆動の唇アニメーションモデルを活用することである。音声駆動のリップアニメーションモデルはラベルなしの音声ビジュアルデータセットでトレーニングされ、ラベル付きビデオが利用可能であれば、事前訓練されたvsrモデルにさらに最適化することができる。多くの転写された音響データと顔画像が利用可能であるので、半教師付きVSRトレーニングのためのリップアニメーションモデルを用いて大規模な合成データを生成することができる。提案手法を,最大公用VSRベンチマークであるLip Reading Sentences 3 (LRS3)で評価した。 SynthVSR の WER は 43.3% に達し、実際のラベル付きデータは 30 時間しかなく、何千時間ものビデオを使った既成のアプローチよりも優れている。 WERは、最先端の自己監督型AV-HuBERT法と同等のRS3から438時間のラベル付きデータを使用すると、さらに27.9%に削減される。さらに、大規模な擬似ラベル音声視覚データ合成と組み合わせると、公開されているデータのみを使用して、新しい最先端vsr werが16.9%となり、29倍の非公開機械によるビデオデータ(90,000時間)でトレーニングされた最新の最先端のアプローチを上回っている。最後に,提案手法における各成分の効果を理解するため,広範なアブレーション研究を行った。 Recently reported state-of-the-art results in visual speech recognition (VSR) often rely on increasingly large amounts of video data, while the publicly available transcribed video datasets are limited in size. In this paper, for the first time, we study the potential of leveraging synthetic visual data for VSR. Our method, termed SynthVSR, substantially improves the performance of VSR systems with synthetic lip movements. The key idea behind SynthVSR is to leverage a speech-driven lip animation model that generates lip movements conditioned on the input speech. The speech-driven lip animation model is trained on an unlabeled audio-visual dataset and could be further optimized towards a pre-trained VSR model when labeled videos are available. As plenty of transcribed acoustic data and face images are available, we are able to generate large-scale synthetic data using the proposed lip animation model for semi-supervised VSR training. We evaluate the performance of our approach on the largest public VSR benchmark - Lip Reading Sentences 3 (LRS3). SynthVSR achieves a WER of 43.3% with only 30 hours of real labeled data, outperforming off-the-shelf approaches using thousands of hours of video. The WER is further reduced to 27.9% when using all 438 hours of labeled data from LRS3, which is on par with the state-of-the-art self-supervised AV-HuBERT method. Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16.9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90,000 hours). Finally, we perform extensive ablation studies to understand the effect of each component in our proposed method.	翻訳日:2023-04-04 11:47:45 公開日:2023-04-03
# 光路変調を用いた表面音波の定量的光学画像化法 Quantitative optical imaging method for surface acoustic wave using optical path modulation ( http://arxiv.org/abs/2212.07369v4 ) ライセンス: Link先を確認	Ryusuke Hisatomi, Kotaro Taga, Ryo Sasaki, Yoichi Shiota, Takahiro Moriyama, Teruo Ono	(参考訳) レイリー型表面音響波(SAW)は、その表面局在化、高電気制御性、低伝搬損失により、古典的および量子情報キャリアとして様々な分野で用いられている。 SAWと他の物理系、例えば磁化、電子電荷、電子スピンとの結合とハイブリダイゼーションは、最近のフォノニクスやスピントロニクスの焦点である。表面波振幅の精密測定は結合強度を議論するためにしばしば必要となる。しかし、そのような測定技術はごくわずかであり、概してかなり複雑な分析を必要とする。そこで我々は,SAWを定量的に特徴付ける簡単な測定手法を開発し,実証する。この技術は、光路変調により、コヒーレント駆動SAWによる表面の揺動を光学的に検出する。また、ショットノイズ制限状態で測定システムが動作した場合、光路変調信号から光スポットの表面傾斜及び変位を導出することができる。我々の実証技術は,SAW関連研究にとって重要なツールとなる。 Rayleigh-type surface acoustic wave (SAW) is used in various fields as classical and quantum information carriers because of its surface localization, high electrical controllability, and low propagation loss. Coupling and hybridization between the SAW and other physical systems such as magnetization, electron charge, and electron spin are the recent focuses in phononics and spintronics. Precise measurement of surface wave amplitude is often necessary to discuss the coupling strengths. However, there are only a few such measurement techniques and they generally require a rather complex analysis. Here we develop and demonstrate a straightforward measurement technique that can quantitatively characterize the SAW. The technique optically detects the surface waving due to the coherently driven SAW by the optical path modulation. Furthermore, when the measurement system operates in the shot-noise-limited regime, the surface slope and displacement at the optical spot can be deduced from the optical path modulation signal. Our demonstrated technique will be an important tool for SAW-related research.	翻訳日:2023-04-04 11:46:18 公開日:2023-04-03
# 不確実性のあるユニバーサルドメイン適応 Provably Uncertainty-Guided Universal Domain Adaptation ( http://arxiv.org/abs/2209.09616v7 ) ライセンス: Link先を確認	Yifan Wang, Lin Zhang, Ran Song, Paul L. Rosin, Yibin Li, and Wei Zhang	(参考訳) ユニバーサルドメイン適応(UniDA)は、ラベル付きソースドメインからラベルセットの仮定なしにラベル付きターゲットドメインに知識を転送することを目的としている。 UniDAの主な課題は、識別できないラベルセットが2つのドメイン間のミスアライメントを引き起こすことである。さらに、ソース領域におけるドメインの不一致と教師付き目的は、モデル全体を共通のクラスに偏りやすくし、未知のサンプルに対して過信な予測を生成する。上記の課題に対処するため、我々は新しい不確実性誘導型UniDAフレームワークを提案する。まず、未知のクラスに属する対象サンプルの確率を実証的に推定し、潜在空間における対象サンプルの分布を完全に活用する。次に,この推定に基づいて,$\delta$-filter の線形部分空間における新しい近傍探索スキームを提案し,対象サンプルの不確かさスコアを推定し,未知のサンプルを探索する。ソースドメイン内のターゲットサンプルとその隣人との関係を完全に活用し、ドメインのミスアライメントの影響を避ける。次に,未知のクラスに対するクラス内分散の差を低減できる未知のサンプルの信頼度に基づく不確実性誘導マージン損失により,未知のサンプルと未知のサンプルの両方に対する予測の信頼度をバランスさせる。最後に,3つの公開データセットを用いた実験により,本手法が既存の最先端手法を大幅に上回ることを示した。 Universal domain adaptation (UniDA) aims to transfer the knowledge from a labeled source domain to an unlabeled target domain without any assumptions of the label sets, which requires distinguishing the unknown samples from the known ones in the target domain. A main challenge of UniDA is that the nonidentical label sets cause the misalignment between the two domains. Moreover, the domain discrepancy and the supervised objectives in the source domain easily lead the whole model to be biased towards the common classes and produce overconfident predictions for unknown samples. To address the above challenging problems, we propose a new uncertainty-guided UniDA framework. Firstly, we introduce an empirical estimation of the probability of a target sample belonging to the unknown class which fully exploits the distribution of the target samples in the latent space. Then, based on the estimation, we propose a novel neighbors searching scheme in a linear subspace with a $\delta$-filter to estimate the uncertainty score of a target sample and discover unknown samples. It fully utilizes the relationship between a target sample and its neighbors in the source domain to avoid the influence of domain misalignment. Secondly, this paper well balances the confidences of predictions for both known and unknown samples through an uncertainty-guided margin loss based on the confidences of discovered unknown samples, which can reduce the gap between the intra-class variances of known classes with respect to the unknown class. Finally, experiments on three public datasets demonstrate that our method significantly outperforms existing state-of-the-art methods.	翻訳日:2023-04-04 11:45:32 公開日:2023-04-03
# 不完全情報を用いた総合型ゲームの準最適学習 Near-Optimal Learning of Extensive-Form Games with Imperfect Information ( http://arxiv.org/abs/2202.01752v3 ) ライセンス: Link先を確認	Yu Bai, Chi Jin, Song Mei, Tiancheng Yu	(参考訳) 本稿では,バンディットフィードバックから不完全な情報を広範に学習するための,最適に近いアルゴリズムを設計するという課題を解決する。 x,y$ は情報集合の数であり、$a,b$ は2人のプレイヤーのアクションの数である2人のゼロサムゲームにおいて$\varepsilon$-approximate nash平衡を見つけるためにプレイのエピソードのうち、$\widetilde{\mathcal{o}}((xa+yb)/\varepsilon^2) だけを必要とするアルゴリズムの最初の行を示す。これにより、$\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$の係数が$\widetilde{\mathcal{O}}(\max\{X, Y\})$の最もよく知られたサンプル複雑性が改善され、情報理論の下限を対数因子に合わせる。我々はこのサンプルの複雑さを2つの新しいアルゴリズム: Balanced Online Mirror Descent と Balanced Counterfactual Regret Minimization によって達成する。どちらのアルゴリズムも、古典的手法に『emph{balanced exploration policies}』を統合する新しい手法に依存している。また,マルチプレイヤー汎用ゲームにおける粗相関平衡学習にも適用した。 This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating \emph{balanced exploration policies} into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multi-player general-sum games.	翻訳日:2023-04-04 11:45:03 公開日:2023-04-03
# aiチャットボットは、エンジニアリングの基本(fe)とエンジニアリングの原則と実践(pe)構造試験に合格できるか? Can AI Chatbots Pass the Fundamentals of Engineering (FE) and Principles and Practice of Engineering (PE) Structural Exams? ( http://arxiv.org/abs/2303.18149v2 ) ライセンス: Link先を確認	M.Z. Naser, Brandon Ross, Jennier Ogle, Venkatesh Kodur, Rami Hawileh, Jamal Abdalla, Huu-Tai Thai	(参考訳) エンジニアリングコミュニティは最近、openai chatgpt-4とgoogle bardのリリースでチャットボット技術の出現を目撃した。これらのチャットボットは、医療や法律の試験を含む様々な標準試験に合格することが報告されているが、このフォーラムの論文は、これらのチャットボットがエンジニアリングの基本(fe)とエンジニアリングの原則と実践(pe)試験にも合格できるかどうかを考察している。 FE試験やPE試験で一般的に見られるように、様々な土木工学や環境工学の質問やシナリオがチャットボットのパフォーマンスを評価するために使用される。チャットボットの応答は,その関連性,正確性,明確性に基づいて分析し,NCEES(National Council of Examiners for Engineering and Surveying)の勧告と比較した。調査の結果,ChatGPT-4 と Bard はそれぞれ FE 試験で 70.9% と 39.2%,PE 試験で 46.2% と 41% を獲得した。現在のChatGPT-4はFE試験に合格する可能性があることは明らかである。将来の版は両方の試験に合格する可能性が高いが、この研究はチャットボットをアシスタントや指導エンジニアとして使う可能性を強調している。 The engineering community has recently witnessed the emergence of chatbot technology with the release of OpenAI ChatGPT-4 and Google Bard. While these chatbots have been reported to perform well and even pass various standardized tests, including medical and law exams, this forum paper explores whether these chatbots can also pass the Fundamentals of Engineering (FE) and Principles and Practice of Engineering (PE) exams. A diverse range of civil and environmental engineering questions and scenarios are used to evaluate the chatbots' performance, as commonly present in the FE and PE exams. The chatbots' responses were analyzed based on their relevance, accuracy, and clarity and then compared against the recommendations of the National Council of Examiners for Engineering and Surveying (NCEES). Our report shows that ChatGPT-4 and Bard, respectively scored 70.9% and 39.2% in the FE exam and 46.2% and 41% in the PE exam. It is evident that the current version of ChatGPT-4 could potentially pass the FE exam. While future editions are much more likely to pass both exams, this study also highlights the potential of using chatbots as teaching assistants and guiding engineers.	翻訳日:2023-04-04 11:38:16 公開日:2023-04-03
# 深層ニューラルネットワーク学習のための2レベルkfac法の解析と比較 Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks ( http://arxiv.org/abs/2303.18083v2 ) ライセンス: Link先を確認	Abdoulaye Koroko, Ani Anciaux-Sedrakian, Ibtihel Ben Gharbia, Val\'erie Gar\`es, Mounir Haddou, Quang Huy Tran	(参考訳) 2次の方法として、Natural Gradient Descent (NGD)はニューラルネットワークのトレーニングを高速化する能力を持っている。しかし、計算とFIM(Fiher Information Matrix)の反転の禁止された計算とメモリコストのため、NGDをディープニューラルネットワーク(DNN)にスケーラブルにするには効率的な近似が必要である。多くの近似が試みられている。最も洗練されたKFACは、FIMをブロック対角行列として近似し、各ブロックはニューラルネットワークの層に対応する。これにより、KFACは異なるレイヤ間の相互作用を無視します。本研究では,二段階法を用いて層間の低周波相互作用を復元する関心について検討する。領域分解から着想を得て、異なる粗い空間を用いたKFACの2段階補正を提案し、評価した。その結果, この方法で層間相互作用を組み込むことで, KFACの性能は向上しないことがわかった。このことは、ブロック対角法が計算時間において十分に堅牢で正確かつ経済的であるため、FIMの対角ブロックを破棄することは安全であることを示している。 As a second-order method, the Natural Gradient Descent (NGD) has the ability to accelerate training of neural networks. However, due to the prohibitive computational and memory costs of computing and inverting the Fisher Information Matrix (FIM), efficient approximations are necessary to make NGD scalable to Deep Neural Networks (DNNs). Many such approximations have been attempted. The most sophisticated of these is KFAC, which approximates the FIM as a block-diagonal matrix, where each block corresponds to a layer of the neural network. By doing so, KFAC ignores the interactions between different layers. In this work, we investigate the interest of restoring some low-frequency interactions between the layers by means of two-level methods. Inspired from domain decomposition, several two-level corrections to KFAC using different coarse spaces are proposed and assessed. The obtained results show that incorporating the layer interactions in this fashion does not really improve the performance of KFAC. This suggests that it is safe to discard the off-diagonal blocks of the FIM, since the block-diagonal approach is sufficiently robust, accurate and economical in computation time.	翻訳日:2023-04-04 11:37:53 公開日:2023-04-03
# グローバルローカルコンテキスト特徴を用いたゼロショット参照画像分割 Zero-shot Referring Image Segmentation with Global-Local Context Features ( http://arxiv.org/abs/2303.17811v2 ) ライセンス: Link先を確認	Seonghoon Yu, Paul Hongsuck Seo, Jeany Son	(参考訳) 参照画像セグメンテーション(RIS)は、入力画像の領域に接する参照表現を与えられたセグメンテーションマスクを見つけることを目的とする。しかし、このタスクのためのラベル付きデータセットの収集はコストと労力がかかることで悪名高い。この問題を克服するために,CLIPから事前学習したクロスモーダル知識を利用した,シンプルで効果的なゼロショット参照画像セグメンテーション手法を提案する。入力テキストに接地したセグメンテーションマスクを得るために,入力画像のグローバルおよびローカルな文脈情報をキャプチャするマスク誘導型ビジュアルエンコーダを提案する。本手法は,市販マスクの提案手法から得られた事例マスクを利用して,細粒度Istanceレベルのグラウンドを分割することができる。また、グローバル機能は入力式全体の複雑な文レベルの意味をキャプチャし、ローカル機能は依存構文解析器によって抽出されたターゲット名詞句に焦点を当てるグローバルローカルテキストエンコーダも導入する。実験では,提案手法は,タスクのゼロショットベースラインや,弱教師付き参照表現セグメンテーションにおいても,かなりのマージンで性能を向上する。私たちのコードはhttps://github.com/seonghoon-yu/zero-shot-risで利用可能です。 Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP. In order to obtain segmentation masks grounded to the input text, we propose a mask-guided visual encoder that captures global and local contextual information of an input image. By utilizing instance masks obtained from off-the-shelf mask proposal techniques, our method is able to segment fine-detailed Istance-level groundings. We also introduce a global-local text encoder where the global feature captures complex sentence-level semantics of the entire input expression while the local feature focuses on the target noun phrase extracted by a dependency parser. In our experiments, the proposed method outperforms several zero-shot baselines of the task and even the weakly supervised referring expression segmentation method with substantial margins. Our code is available at https://github.com/Seonghoon-Yu/Zero-shot-RIS.	翻訳日:2023-04-04 11:37:35 公開日:2023-04-03
# 軽量ビジョントランスにおける局所認識の再考 Rethinking Local Perception in Lightweight Vision Transformer ( http://arxiv.org/abs/2303.17803v2 ) ライセンス: Link先を確認	Qihang Fan, Huaibo Huang, Jiyang Guan, Ran He	(参考訳) 視覚変換器(ViT)は様々な視覚タスクに有効であることが示されている。しかし、それらをモバイルフレンドリーなサイズにリサイズすると、パフォーマンスが大幅に低下する。そのため、軽量な視覚トランスフォーマーの開発は重要な研究分野となっている。本稿では,コンテキスト対応の局所拡張を利用した軽量視覚トランスフォーマであるcloformerを紹介する。 cloformerは、バニラ畳み込み演算子でよく使われるグローバルな共有重みと注意を向けるトークン固有のコンテキスト認識重みの関係を探求し、高頻度の局所情報をキャプチャする効果的で簡単なモジュールを提案する。 CloFormerでは、注意スタイルの畳み込み演算子であるAttnConvを紹介します。提案するattnconvは、共有重みを使ってローカル情報を集約し、注意深く設計されたコンテキストアウェア重みを配置し、ローカル機能を強化する。 CloFormerのFLOPを減らすためにプールを使用するAttnConvとバニラアテンションを組み合わせることで、モデルは高周波と低周波の情報を認識することができる。画像分類,物体検出,意味セグメンテーションなどの広範な実験を行い,cloformerの優位性を実証した。 Vision Transformers (ViTs) have been shown to be effective in various vision tasks. However, resizing them to a mobile-friendly size leads to significant performance degradation. Therefore, developing lightweight vision transformers has become a crucial area of research. This paper introduces CloFormer, a lightweight vision transformer that leverages context-aware local enhancement. CloFormer explores the relationship between globally shared weights often used in vanilla convolutional operators and token-specific context-aware weights appearing in attention, then proposes an effective and straightforward module to capture high-frequency local information. In CloFormer, we introduce AttnConv, a convolution operator in attention's style. The proposed AttnConv uses shared weights to aggregate local information and deploys carefully designed context-aware weights to enhance local features. The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information. Extensive experiments were conducted in image classification, object detection, and semantic segmentation, demonstrating the superiority of CloFormer.	翻訳日:2023-04-04 11:37:14 公開日:2023-04-03
# 半弱教師付き物体運動予測 Semi-Weakly Supervised Object Kinematic Motion Prediction ( http://arxiv.org/abs/2303.17774v2 ) ライセンス: Link先を確認	Gengxin Liu, Qian Sun, Haibin Huang, Chongyang Ma, Yulan Guo, Li Yi, Hui Huang, Ruizhen Hu	(参考訳) 3Dオブジェクトが与えられた場合、運動予測は移動部と対応する運動パラメータを識別することを目的としている。 3Dオブジェクトのトポロジ的構造と幾何学的詳細の両方に大きなバリエーションがあるため、これは依然として困難な課題であり、大規模ラベル付きデータの欠如はディープラーニングに基づくアプローチの性能を制限している。本稿では,物体運動予測問題の課題を半弱教師付き方式で解決する。私たちの重要な観察は2つある。まず、完全に注釈付けされたモーションラベルを持つ3Dデータセットは限られているが、大規模にオブジェクト部分のセマンティックセマンティックセグメンテーションのためのデータセットやメソッドが存在する。第2に、セマンティクス部分のセグメンテーションと移動部分のセグメンテーションは必ずしも一貫してはいないが、基盤となる3d構造から移動部分を検出することが可能である。この目的に向けて,階層的部分レベルのセグメンテーションと移動部パラメータのマップを学習するグラフニューラルネットワークを提案する。このネットワークは、まず完全なラベル付きモビリティ情報を持つPartNet-Mobilityデータセットでトレーニングし、さらに粒度の細かい階層的な部分レベルのセグメンテーションでPartNetデータセットに適用することができる。ネットワーク予測は、擬似ラベル付き移動情報を持つ大規模な3次元オブジェクトを生成し、既存のセグメンテーションによる弱い教師付き学習にも利用できる。実験の結果, 従来の3次元部分走査における運動予測のための拡張データでは, 顕著な性能向上が見られた。 Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters. Due to the large variations in both topological structure and geometric details of 3D objects, this remains a challenging task and the lack of large scale labeled data also constrain the performance of deep learning based approaches. In this paper, we tackle the task of object kinematic motion prediction problem in a semi-weakly supervised manner. Our key observations are two-fold. First, although 3D dataset with fully annotated motion labels is limited, there are existing datasets and methods for object part semantic segmentation at large scale. Second, semantic part segmentation and mobile part segmentation is not always consistent but it is possible to detect the mobile parts from the underlying 3D structure. Towards this end, we propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters, which are further refined based on geometric alignment. This network can be first trained on PartNet-Mobility dataset with fully labeled mobility information and then applied on PartNet dataset with fine-grained and hierarchical part-level segmentation. The network predictions yield a large scale of 3D objects with pseudo labeled mobility information and can further be used for weakly-supervised learning with pre-existing segmentation. Our experiments show there are significant performance boosts with the augmented data for previous method designed for kinematic motion prediction on 3D partial scans.	翻訳日:2023-04-04 11:36:55 公開日:2023-04-03
# 強化学習を用いた英語中規模GPTモデルをスペイン語の小さな閉領域にアライメントする Aligning a medium-size GPT model in English to a small closed domain in Spanish using reinforcement learning ( http://arxiv.org/abs/2303.17649v2 ) ライセンス: Link先を確認	Oscar R. Navarrete-Parra, Victor Uc-Cetina, Jorge Reyes-Magana	(参考訳) 本稿では,もともとオープンドメインのために英語で訓練された中規模gptモデルを,スペイン語の小さなクローズドドメインに整合させる手法を提案する。モデルを微調整したアプリケーションは、質問応答タスクである。これを実現するためには、別のニューラルネットワーク(報酬モデルと呼んでいます)をトレーニングし、実装する必要があります。このコンポーネントは、システムのデコードと応答の生成を改善するのに役立った。 BLEUやパープレキシティなどの数値指標をモデル評価に使用し、デコード手法と他の手法との比較にも人的判断を用いた。その結果,提案手法が好適であり,報奨モデルを用いて応答の生成を調整することが可能であることが判明した。 In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.	翻訳日:2023-04-04 11:36:31 公開日:2023-04-03
# アダプティブリファインメントとカントロビッチ計量によるデータ駆動抽象化 [拡張版] Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version] ( http://arxiv.org/abs/2303.17618v2 ) ライセンス: Link先を確認	Adrien Banse, Licio Romao, Alessandro Abate, Rapha\"el M. Jungers	(参考訳) 本稿では,動的システムのスマートでスケーラブルな抽象化のための適応的改良手順を提案する。我々の手法は将来の出力の観測に依存する状態空間の分割に依存している。しかし、この知識は適応的で非対称な方法で動的に構築される。最適構造を学ぶために,マルコフ鎖間のカントロヴィチに触発された計量を定義し,損失関数として用いる。私たちの技術はデータ駆動型フレームワークに傾向がありますが、制限はありません。また、上記のマルコフ連鎖間の計量の性質について研究し、より広い目的のために応用できると考えている。近似アルゴリズムを提案し,従来の線形プログラミング手法よりも計算の複雑さがはるかに高いことを示す。 We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we use it as a loss function. Our technique is prone to data-driven frameworks, but not restricted to. We also study properties of the above mentioned metric between Markov chains, which we believe could be of application for wider purpose. We propose an algorithm to approximate it, and we show that our method yields a much better computational complexity than using classical linear programming techniques.	翻訳日:2023-04-04 11:36:18 公開日:2023-04-03

Title

Authors

Abstract

論文公表日・翻訳日

# 生理・医療系ニューラルネットワークにおけるモデル説明可能性

Model Explainability in Physiological and Healthcare-based Neural Networks ( http://arxiv.org/abs/2304.14495v1 )

ライセンス: Link先を確認

Rohit Sharma, Abhinav Gupta, Arnav Gupta, Bo Li

(参考訳) spo2の推定とモニタリングは肺機能の評価と慢性肺疾患の治療に不可欠である。新型コロナウイルス(covid-19)のパンデミックは、spo2の変化を早期に発見することの重要性を強調した。しかし,従来のSpO2測定法は接触式センシングに頼っており,手足灌流障害患者のクロス汚染や合併症のリスクが指摘されている。加えて、パルスオキシメータは、地域社会や未開発国では利用できない。これらの制限に対処し、より快適で控えめなSpO2モニタリング方法を提供するため、最近の研究では、ビデオを用いたSpO2測定について研究されている。しかし,特にスマートフォンのカメラを用いたSpO2測定は,生理学的信号の弱さと,スマートフォンカメラセンサの光学選択性低下により困難である。システムには3つの主要なステップがある。 1) スマートフォンで撮影したビデオから手のひらと背中を含む関心領域(ROI)を抽出すること。 2)R,G,B時系列を生成するためのROIの空間平均化 3) 時系列を光生理学的に誘発されたCNNに入力し, SpO2推定を行った。提案手法は,消費者のスマートフォンから撮影したビデオを用いて,より効率的かつ正確なSpO2モニタリングを行う方法であり,遠隔医療や健康診断に特に有用である。

The estimation and monitoring of SpO2 are crucial for assessing lung function and treating chronic pulmonary diseases. The COVID-19 pandemic has highlighted the importance of early detection of changes in SpO2, particularly in asymptomatic patients with clinical deterioration. However, conventional SpO2 measurement methods rely on contact-based sensing, presenting the risk of cross-contamination and complications in patients with impaired limb perfusion. Additionally, pulse oximeters may not be available in marginalized communities and undeveloped countries. To address these limitations and provide a more comfortable and unobtrusive way to monitor SpO2, recent studies have investigated SpO2 measurement using videos. However, measuring SpO2 using cameras in a contactless way, particularly from smartphones, is challenging due to weaker physiological signals and lower optical selectivity of smartphone camera sensors. The system includes three main steps: 1) extraction of the region of interest (ROI), which includes the palm and back of the hand, from the smartphone-captured videos; 2) spatial averaging of the ROI to produce R, G, and B time series; and 3) feeding the time series into an optophysiology-inspired CNN for SpO2 estimation. Our proposed method can provide a more efficient and accurate way to monitor SpO2 using videos captured from consumer-grade smartphones, which can be especially useful in telehealth and health screening settings.

翻訳日:2023-05-07 16:21:59 公開日:2023-04-03

# Transformer-based interpretable multi-modal data fusion による皮膚病変分類

Transformer-based interpretable multi-modal data fusion for skin lesion classification ( http://arxiv.org/abs/2304.14505v1 )

ライセンス: Link先を確認

Theodor Cheslerean-Boghiu, Melia-Evelina Fleischmann, Theresa Willem, Tobias Lasser

(参考訳) 近年、多くのディープラーニング(dl)研究が、他の要因に関わらず定量的指標の改善に重点を置いている。皮膚科における皮膚病変分類のようなヒト中心のアプリケーションでは、dl駆動の臨床決定支援システムは、意思決定プロセスの透明性が限られているため、まだ初期段階にある。さらに、訓練されたDLアルゴリズムの動作を説明するための手順の欠如は、臨床医の信頼をほとんど得られない。皮膚病変の診断には、皮膚科医は疾患の視覚的評価と患者の麻酔から収集されたデータの両方に依存している。マルチモーダルデータを扱うデータ駆動アルゴリズムは、畳み込みアーキテクチャに必要な特徴レベルと決定レベルの融合手順の分離によって制限される。この問題に対処するため,トランスフォーマーアーキテクチャのアテンション機構を介し,単一段階のマルチモーダルデータ融合を実現し,皮膚疾患の診断に役立てる。本手法は,画像リッチおよび患者データリッチ環境において,最先端のシングルモーダルかつマルチモーダルなDLアーキテクチャを上回る。さらに、アーキテクチャの選択により、イメージドメインとメタデータドメインの両方で、追加の修正を必要とせずに、分類タスクのネイティブ解釈サポートが可能になる。

A lot of deep learning (DL) research these days is mainly focused on improving on quantitative metrics regardless of other factors. In human centered applications, like skin lesion classification in dermatology, DL-driven clinical decision support systems are still in their infancy due to the limited transparency of their decision-making process. Moreover, the lack of procedures that can explain the behavior of trained DL algorithms leads to almost no trust from the clinical physicians. To diagnose skin lesions, dermatologists rely on both visual assessment of the disease and the data gathered from the anamnesis of the patient. Data-driven algorithms dealing with multi-modal data are limited by the separation of feature-level and decision-level fusion procedures required by convolutional architectures. To address this issue, we enable single-stage multi-modal data fusion via the attention mechanism of transformer-based architectures to aid in the diagnosis of skin diseases. Our method beats other state-of-the-art single- and multi-modal DL architectures in both image rich and patient-data rich environments. Additionally, the choice of the architecture enables native interpretability support for the classification task both in image and metadata domain with no additional modifications necessary.

翻訳日:2023-05-07 16:13:11 公開日:2023-04-03

# コンパクト支持型OEP型バランス型デュアルマルチフレームレットの構造評価

A structural characterization of Compactly Supported OEP-based balanced dual multiframelets ( http://arxiv.org/abs/2305.01641v1 )

ライセンス: Link先を確認

Ran Lu

(参考訳) スカラーフレームレットと比較して、ジェネレータに対する比較的小さなサポート、高消滅モーメントなど、マルチフレームレットには一定の利点がある。マルチフレームのバランス特性は非常に望ましいものであり、それに対応する離散的マルチフレーム変換の下でベクトル値データがどのように効率的に処理できるかを反映している。バランスの取れたマルチフレームを研究対象とする文献の多くは、関数設定の観点からいるが、マルチフレームフィルタバンクの観点からのアプローチはほとんどない。本稿では,斜め拡張原理(OEP)の観点から,バランスの取れたデュアル・マルチフレームの構造的特徴について考察する。 OEPはフレームレットとフィルタバンクを自然に接続するので、フレームレットの特性を分析するのに非常に便利なツールです。 OEPにより、我々は、コンパクトに支持されたバランスの取れたデュアル・マルチフレームを、バランスの取れたモーメント補正フィルタの概念によって特徴付ける。本稿は、バランスの取れたデュアルフレームレットが持つ重要な構造について、最も一般的な設定で示し、バランスのとれたマルチフレームレットとその基盤となる離散マルチフレーム変換を理解するための、より完全な図形を提供する。

Compared to scalar framelets, multiframelets have certain advantages, such as relatively smaller supports on generators, high vanishing moments, etc. The balancing property of multiframelets is very desired, as it reflects how efficient vector-valued data can be processed under the corresponding discrete multiframelet transform. Most of the literature studying balanced multiframelets is from the point of view of the function setting, but very few approaches are from the aspect of multiframelet filter banks. In this paper, we study structural characterizations of balanced dual multiframelets from the point of view of the Oblique Extension Principle (OEP). The OEP naturally connects framelets with filter banks, which makes it a very handy tool for analyzing the properties of framelets. With the OEP, we shall characterize compactly supported balanced dual multiframemets through the concept of balanced moment correction filters, which is the key notion that will be introduced in our investigation. The results of this paper demonstrate what essential structures a balanced dual multiframelet has in the most general setting, and bring us a more complete picture to understand balanced multiframelets and their underlying discrete multiframelet transforms.

翻訳日:2023-05-07 15:53:52 公開日:2023-04-03

# RLサイバー操作エージェントのためのマルチエージェントサイバーバトルシム

A Multiagent CyberBattleSim for RL Cyber Operation Agents ( http://arxiv.org/abs/2304.11052v1 )

ライセンス: Link先を確認

Thomas Kunz, Christian Fisher, James La Novara-Gsell, Christopher Nguyen, Li Li

(参考訳) サイバー物理的資産の強化は重要かつ労働集約的である。近年、機械学習(ml)と強化学習(rl)は、重要な人間の洞察/知性を必要とするタスクを自動化できることを特に示しています。自律的なrlエージェントの開発には、さまざまな選択肢、特に攻撃者や防御者を陥れるようなトレーニングシナリオの配置方法を迅速に評価できる、適切なトレーニング環境が必要です。 CyberBattleSimは、レッドエージェント、すなわち攻撃者のトレーニングをサポートするトレーニング環境である。ブルーエージェント、すなわちディフェンダーを訓練する能力を追加しました。本論文は,ブルーエージェントを単独またはレッドエージェントと共同で訓練した際に得られた結果について,我々の変化と報告について述べる。その結果,ブルーエージェントの訓練は攻撃に対する防御力を高めることが判明した。特に、青色剤と赤色剤を併用する訓練は、洗練された赤色剤を阻害するブルー剤の能力を高める。

Hardening cyber physical assets is both crucial and labor-intensive. Recently, Machine Learning (ML) in general and Reinforcement Learning RL) more specifically has shown great promise to automate tasks that otherwise would require significant human insight/intelligence. The development of autonomous RL agents requires a suitable training environment that allows us to quickly evaluate various alternatives, in particular how to arrange training scenarios that pit attackers and defenders against each other. CyberBattleSim is a training environment that supports the training of red agents, i.e., attackers. We added the capability to train blue agents, i.e., defenders. The paper describes our changes and reports on the results we obtained when training blue agents, either in isolation or jointly with red agents. Our results show that training a blue agent does lead to stronger defenses against attacks. In particular, training a blue agent jointly with a red agent increases the blue agent's capability to thwart sophisticated red agents.

翻訳日:2023-04-30 08:04:57 公開日:2023-04-03

# 制御行動模倣のための生成的adversarial neuroevolution

Generative Adversarial Neuroevolution for Control Behaviour Imitation ( http://arxiv.org/abs/2304.12432v1 )

ライセンス: Link先を確認

Maximilien Le Clei, Pierre Bellec

(参考訳) 最近の模倣学習への関心は高まり、複雑なタスクでエージェントを訓練するために、巨大な人間のビデオゲームとロボット操作データセットが使われている。近年、深層神経進化は様々な強化学習問題における勾配に基づく技術の性能と一致することが示されているが、深層神経進化技術の模倣学習への応用はいまだに未解明である。本研究では,一般的なシミュレーション環境における行動模倣に深部神経進化が有効かどうかを検討する。我々は,OpenAI Gymの8つの状態ベース制御タスク上で,最先端のエージェントを模倣するために,標準的な深部リカレントネットワークを進化させ,その能力を評価する。あらゆる課題において、訓練済みのエージェントが獲得したスコアよりも高いスコアを達成できる最後のエリートアクターが、スコアの軌跡に忠実に追従しているのが分かる。以上の結果から,神経進化は行動エージェントの正確なエミュレーションを実現するための深層学習技術に重要な付加物となる可能性が示唆された。私たちのアプローチの汎用性とシンプルさは、ますます複雑な設定で複雑な振る舞いを模倣する道を開くと信じています。我々はgithub.com/MaximilienLC/ganeでソースコードとモデルチェックポイントと結果を提供しています。

There is a recent surge in interest for imitation learning, with large human video-game and robotic manipulation datasets being used to train agents on very complex tasks. While deep neuroevolution has recently been shown to match the performance of gradient-based techniques on various reinforcement learning problems, the application of deep neuroevolution techniques to imitation learning remains relatively unexplored. In this work, we propose to explore whether deep neuroevolution can be used for behaviour imitation on popular simulation environments. We introduce a simple co-evolutionary adversarial generation framework, and evaluate its capabilities by evolving standard deep recurrent networks to imitate state-of-the-art pre-trained agents on 8 OpenAI Gym state-based control tasks. Across all tasks, we find the final elite actor agents capable of achieving scores as high as those obtained by the pre-trained agents, all the while closely following their score trajectories. Our results suggest that neuroevolution could be a valuable addition to deep learning techniques to produce accurate emulation of behavioural agents. We believe that the generality and simplicity of our approach opens avenues for imitating increasingly complex behaviours in increasingly complex settings, e.g. human behaviour in real-world settings. We provide our source code, model checkpoints and results at github.com/MaximilienLC/gane.

翻訳日:2023-04-30 07:29:03 公開日:2023-04-03

# 制御課題における繰り返しアーキテクチャの神経進化

Neuroevolution of Recurrent Architectures on Control Tasks ( http://arxiv.org/abs/2304.12431v1 )

ライセンス: Link先を確認

Maximilien Le Clei, Pierre Bellec

(参考訳) 現代の人工知能の研究は通常、勾配に基づく最適化技術を用いて固定サイズのディープニューラルネットワークのパラメータを訓練する。単純な進化アルゴリズムは、強化学習の設定など、勾配に基づく技術のパフォーマンスにマッチする時に、ディープニューラルネットワークパラメータを最適化する能力も示されている。ネットワークパラメータの最適化に加えて、多くの進化的計算技術もネットワークアーキテクチャを段階的に構築することができる。しかし、基本的な進化規則からネットワークアーキテクチャを構築することは、現代の強化学習ベンチマークにスケールすることがまだ示されていない。そこで本研究では, 再帰型ニューラルネットワークのアーキテクチャを, 少数の突然変異規則に従って動的に進化させる手法を提案する。我々は並列な進化的アルゴリズムを実装し、19のOpenAI Gym状態に基づく強化学習制御タスクで実験を行う。ほとんどの場合、動的エージェントは、パラメータの桁数を桁違いに減らしながら、勾配に基づくエージェントのパフォーマンスを一致または超過する。我々は、ネットワークのコンパクトさと自律設計が重要である実生活のアプリケーションへの道を開く努力を信じている。私たちはgithub.com/MaximilienLC/nraでソースコードと最終モデルチェックポイントと完全な結果を提供しています。

Modern artificial intelligence works typically train the parameters of fixed-sized deep neural networks using gradient-based optimization techniques. Simple evolutionary algorithms have recently been shown to also be capable of optimizing deep neural network parameters, at times matching the performance of gradient-based techniques, e.g. in reinforcement learning settings. In addition to optimizing network parameters, many evolutionary computation techniques are also capable of progressively constructing network architectures. However, constructing network architectures from elementary evolution rules has not yet been shown to scale to modern reinforcement learning benchmarks. In this paper we therefore propose a new approach in which the architectures of recurrent neural networks dynamically evolve according to a small set of mutation rules. We implement a massively parallel evolutionary algorithm and run experiments on all 19 OpenAI Gym state-based reinforcement learning control tasks. We find that in most cases, dynamic agents match or exceed the performance of gradient-based agents while utilizing orders of magnitude fewer parameters. We believe our work to open avenues for real-life applications where network compactness and autonomous design are of critical importance. We provide our source code, final model checkpoints and full results at github.com/MaximilienLC/nra.

翻訳日:2023-04-30 07:28:40 公開日:2023-04-03

# 忠実性ベンチマーク:視覚言語タスクにおける正確な自然言語説明に向けて

Benchmarking Faithfulness: Towards Accurate Natural Language Explanations in Vision-Language Tasks ( http://arxiv.org/abs/2304.08174v1 )

ライセンス: Link先を確認

Jakob Ambsdorf

(参考訳) ディープニューラルモデルが日々の生活に浸透するにつれ、彼らの意思決定について透明で理解可能な説明が必要になる。しかし,これまで開発されたほとんどの説明手法は,日常ユーザにとって直感的に理解できない。対照的に、自然言語の説明(NLE)は、モデルの意思決定を容易に理解可能な方法でコミュニケーション可能にすることを約束する。現在のモデルは説得力のある説明を生み出すことに成功したが、NLEが実際にモデルの推論過程(忠実性と呼ばれる性質)をいかにうまく表現しているかは、明らかな疑問である。忠実度を測定するためのメトリクスの開発は、より忠実なモデルを設計するために重要であるが、現在のメトリクスはNLEに適用できないか、複数のモダリティで異なるモデルアーキテクチャを比較するように設計されていない。忠実度尺度の先行研究と詳細な理論的根拠に基づいて、帰属相似性、NLE相似性、NLE-包括性という3つの忠実度指標を提案する。本手法の有効性は,評価された説明忠実度の変化を期待する実演e-UGモデルに体系的に修正を加えることで,視覚言語NLE生成のためのe-ViLベンチマークのVQA-Xおよびe-SNLI-VEデータセットを用いて評価する。 e-snli-veデータセットでは,e-ugの説明生成モジュールへの冗長入力の削除が,帰属相似性によって測定された言語的モダリティに対するモデルの忠実性を高めることを示した。さらに,NLE-Sufficiency と -Comprehensiveness は必ずしも属性-相似性と相関しないことを示した。

With deep neural models increasingly permeating our daily lives comes a need for transparent and comprehensible explanations of their decision-making. However, most explanation methods that have been developed so far are not intuitively understandable for lay users. In contrast, natural language explanations (NLEs) promise to enable the communication of a model's decision-making in an easily intelligible way. While current models successfully generate convincing explanations, it is an open question how well the NLEs actually represent the reasoning process of the models - a property called faithfulness. Although the development of metrics to measure faithfulness is crucial to designing more faithful models, current metrics are either not applicable to NLEs or are not designed to compare different model architectures across multiple modalities. Building on prior research on faithfulness measures and based on a detailed rationale, we address this issue by proposing three faithfulness metrics: Attribution-Similarity, NLE-Sufficiency, and NLE-Comprehensiveness. The efficacy of the metrics is evaluated on the VQA-X and e-SNLI-VE datasets of the e-ViL benchmark for vision-language NLE generation by systematically applying modifications to the performant e-UG model for which we expect changes in the measured explanation faithfulness. We show on the e-SNLI-VE dataset that the removal of redundant inputs to the explanation-generation module of e-UG successively increases the model's faithfulness on the linguistic modality as measured by Attribution-Similarity. Further, our analysis demonstrates that NLE-Sufficiency and -Comprehensiveness are not necessarily correlated to Attribution-Similarity, and we discuss how the two metrics can be utilized to gain further insights into the explanation generation process.

翻訳日:2023-04-23 04:25:22 公開日:2023-04-03

# OutCenTR:高次元データセットにおける脆弱性の悪用を予測するための新しい半教師付きフレームワーク

OutCenTR: A novel semi-supervised framework for predicting exploits of vulnerabilities in high-dimensional datasets ( http://arxiv.org/abs/2304.10511v1 )

ライセンス: Link先を確認

Hadi Eskandari, Michael Bewong, Sabih ur Rehman

(参考訳) 毎日、ますます増加する脆弱性が報告されている。しかし、これらの脆弱性はすべて同じではない。悪用される脆弱性の可能性を正しく見積もることは、システム管理者にとって重要なタスクです。これは、システム管理者が適切な脆弱性の優先順位付けとパッチを行うのに役立つ。我々の研究は、National Vulnerability Databaseのような高度に不均衡な高次元データセットで利用される可能性のある脆弱性を予測するために、外れ値検出技術を利用している。本稿では,ベースライン外乱検出モデルを強化する次元削減手法であるOutCenTRを提案する。さらに,4つのベンチマークと12の合成データセットを用いて,OutCenTRの有効性と効率を実証的に示す。実験の結果,PCA や GRP といった最先端の次元減少技術と比較して,F1 スコアの平均は5倍向上した。

An ever-growing number of vulnerabilities are reported every day. Yet these vulnerabilities are not all the same; Some are more targeted than others. Correctly estimating the likelihood of a vulnerability being exploited is a critical task for system administrators. This aids the system administrators in prioritizing and patching the right vulnerabilities. Our work makes use of outlier detection techniques to predict vulnerabilities that are likely to be exploited in highly imbalanced and high-dimensional datasets such as the National Vulnerability Database. We propose a dimensionality reduction technique, OutCenTR, that enhances the baseline outlier detection models. We further demonstrate the effectiveness and efficiency of OutCenTR empirically with 4 benchmark and 12 synthetic datasets. The results of our experiments show on average a 5-fold improvement of F1 score in comparison with state-of-the-art dimensionality reduction techniques such as PCA and GRP.

翻訳日:2023-04-23 03:59:12 公開日:2023-04-03

# グローバル量子クロックに結合した非相互作用系の重力ポテンシャルと時間拡張の発生

Emergence of Gravitational Potential and Time Dilation from Non-interacting Systems Coupled to a Global Quantum Clock ( http://arxiv.org/abs/2304.01263v1 )

ライセンス: Link先を確認

Ashmeet Singh and Oliver Friedrich

(参考訳) 時間座標をグローバルな量子自由度としてモデル化した時間座標と、局所的な量子「時計」として機能する内部自由度によってモデル化された物理系の適切な時間という2つのバージョンを考慮し、量子力学のリレーショナル時間定式化における重力バックリアクションを研究する。我々は,地球規模のホイーラー・デウィット型制約における座標時間と質量エネルギーの相互作用が重力時間拡張につながることを示した。巨大な対象が存在する場合、これはシュワルツシルト計量における時間拡張に一致する。さらに、2つの粒子が時間座標に独立に結合すると、これらの粒子間のニュートンの重力相互作用が低エネルギー限界で現れることを示す。また、高エネルギーダイバージェンスの再正規化の特徴も観察する。

We study gravitational back-reaction within relational time formulations of quantum mechanics by considering two versions of time: a time coordinate, modelled as a global quantum degree of freedom, and the proper time of a given physical system, modelled via an internal degree of freedom serving as a local quantum "clock". We show that interactions between coordinate time and mass-energy in a global Wheeler-DeWitt-like constraint lead to gravitational time dilation. In the presence of a massive object this agrees with time dilation in a Schwarzchild metric at leading order in $G$. Furthermore, if two particles couple independently to the time coordinate we show that Newtonian gravitational interaction between those particles emerges in the low energy limit. We also observe features of renormalization of high energy divergences.

翻訳日:2023-04-16 22:41:17 公開日:2023-04-03

# 人間行動分析のための実産業作業と伝統工芸品のモーションキャプチャベンチマーク

Motion Capture Benchmark of Real Industrial Tasks and Traditional Crafts for Human Movement Analysis ( http://arxiv.org/abs/2304.03771v1 )

ライセンス: Link先を確認

Brenda Elizabeth Olivas-Padilla, Alina Glushkova and Sotiris Manitsaris

(参考訳) 人間の動き分析は、ロボット工学、バイオメカニクス、データサイエンスにおける重要な研究分野である。トラッキング、姿勢推定、運動合成などを含む。長年にわたって多くの方法論が発展してきたが、これらの手法の体系的かつ定量的な評価は、人間の3次元運動の検証可能な真理データを用いても必要である。本稿では,慣性に基づくモーションキャプチャを用いて記録した7つのデータセットについて述べる。データセットには、現場で実環境で行われる工業従事者や熟練職人によるプロフェッショナルなジェスチャーが含まれている。データセットは人間の動作モデリング、分析、生成の研究に使用されることを意図して作成された。データ収集のためのプロトコルを詳細に記述し、収集したデータの予備分析をベンチマークとして提供する。 Gesture Operational Modelは、運動記述子に基づくハイブリッド確率的バイオメカニカルアプローチであり、専門家の動きのダイナミクスをモデル化し、運動軌跡の数学的表現を作成して、身体のデキスタリティを分析し定量化する。このモデルでは、人間のプロのポーズを正確に生成することができ、作業のパフォーマンスを通じて、身体の関節がどのように協力し変化するかを直感的に記述することができた。

Human movement analysis is a key area of research in robotics, biomechanics, and data science. It encompasses tracking, posture estimation, and movement synthesis. While numerous methodologies have evolved over time, a systematic and quantitative evaluation of these approaches using verifiable ground truth data of three-dimensional human movement is still required to define the current state of the art. This paper presents seven datasets recorded using inertial-based motion capture. The datasets contain professional gestures carried out by industrial operators and skilled craftsmen performed in real conditions in-situ. The datasets were created with the intention of being used for research in human motion modeling, analysis, and generation. The protocols for data collection are described in detail, and a preliminary analysis of the collected data is provided as a benchmark. The Gesture Operational Model, a hybrid stochastic-biomechanical approach based on kinematic descriptors, is utilized to model the dynamics of the experts' movements and create mathematical representations of their motion trajectories for analysis and quantifying their body dexterity. The models allowed accurate the generation of human professional poses and an intuitive description of how body joints cooperate and change over time through the performance of the task.

翻訳日:2023-04-16 22:35:51 公開日:2023-04-03

# 自己回帰拡散モデルによる制御可能な運動合成と再構成

Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models ( http://arxiv.org/abs/2304.04681v1 )

ライセンス: Link先を確認

Wenjie Yin, Ruibo Tu, Hang Yin, Danica Kragic, Hedvig Kjellstr\"om, M{\aa}rten Bj\"orkman

(参考訳) データ駆動および制御可能な人間のモーション合成と予測は、インタラクティブメディアとソーシャルロボティクスにおける様々な応用を含む活発な研究分野である。これらの分野には、過去の観察や不完全なポーズを扱う様々な動きを生み出すための課題が残っている。本稿では、他のモードの制御コンテキストに条件付された動き列上の自己回帰的確率拡散モデルであるMoDiffを紹介する。本モデルでは、モーダルトランスフォーマーエンコーダとトランスフォーマーベースのデコーダを統合し、運動と制御の時間的相関を捉えるのに有効である。また,よりリッチなデータ表現とロバストな生成を実現するために,拡散転送プロセスに基づく新しいデータドロップアウト手法を導入する。記録データに近い高忠実度動きの頑健な合成と再構成のための拡散データドロップアウトの利点を示すため, 2つのベースラインに対する移動の制御可能な動作合成におけるMoDiffの優れた性能を示す。

Data-driven and controllable human motion synthesis and prediction are active research areas with various applications in interactive media and social robotics. Challenges remain in these fields for generating diverse motions given past observations and dealing with imperfect poses. This paper introduces MoDiff, an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities. Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities. We also introduce a new data dropout method based on the diffusion forward process to provide richer data representations and robust generation. We demonstrate the superior performance of MoDiff in controllable motion synthesis for locomotion with respect to two baselines and show the benefits of diffusion data dropout for robust synthesis and reconstruction of high-fidelity motion close to recorded data.

翻訳日:2023-04-16 22:24:22 公開日:2023-04-03

# Astroformer: 分類に必要なのはデータだけではない

Astroformer: More Data Might Not be All You Need for Classification ( http://arxiv.org/abs/2304.05350v1 )

ライセンス: Link先を確認

Rishit Dagli

(参考訳) 自然言語処理やコンピュータビジョンなどの分野の最近の進歩は、膨大な量の未ラベルまたは部分的にラベル付けされたデータを用いて訓練された複雑で大規模なモデルに依存しており、これらの最先端の手法をリソース制約環境にデプロイすることは困難である。銀河形態学は銀河の形成と進化の過程を理解するために重要である。銀河の形態を分類する効率的な方法は、現代の天文学調査から物理情報を抽出するために必要である。本稿では,少ない量のデータから学習する方法を提案する。我々はCoAtNetとMaxViTの成功から多くのインスピレーションを得たハイブリッドトランスフォーマー・畳み込みアーキテクチャを提案する。具体的には、トランスフォーマー-畳み込みハイブリッドと、ネットワークのための新しいスタック設計、相対的な自己アテンション層を作成する異なる方法、およびデータ拡張と正規化の慎重な選択と組み合わせる。我々のアプローチでは、ギャラクシー10デカルスデータセット上の画像から銀河の形態を予測する新しい最先端の手法が設定されている。科学の目的であり、17736枚のラベル付き画像からなり、その精度は914.86ドルである。さらに、このアプローチはCIFAR-100とTiny ImageNetの新たな最先端も設定する。また、大きなデータセットに使用するモデルやトレーニング手法は、低データ環境ではうまく動作しないことが多いことが分かりました。私たちのコードとモデルはカンファレンスの前の後でリリースされます。

Recent advancements in areas such as natural language processing and computer vision rely on intricate and massive models that have been trained using vast amounts of unlabelled or partly labeled data and training or deploying these state-of-the-art methods to resource constraint environments has been a challenge. Galaxy morphologies are crucial to understanding the processes by which galaxies form and evolve. Efficient methods to classify galaxy morphologies are required to extract physical information from modern-day astronomy surveys. In this paper, we introduce methods to learn from less amounts of data. We propose using a hybrid transformer-convolutional architecture drawing much inspiration from the success of CoAtNet and MaxViT. Concretely, we use the transformer-convolutional hybrid with a new stack design for the network, a different way of creating a relative self-attention layer, and pair it with a careful selection of data augmentation and regularization techniques. Our approach sets a new state-of-the-art on predicting galaxy morphologies from images on the Galaxy10 DECals dataset, a science objective, which consists of 17736 labeled images achieving $94.86\%$ top-$1$ accuracy, beating the current state-of-the-art for this task by $4.62\%$. Furthermore, this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet. We also find that models and training methods used for larger datasets would often not work very well in the low-data regime. Our code and models will be released at a later date before the conference.

翻訳日:2023-04-16 22:16:06 公開日:2023-04-03

# ナレッジ蒸留グラフニューラルネットワークによるてんかん発作のパーソナライズ

Knowledge-Distilled Graph Neural Networks for Personalized Epileptic Seizure Detection ( http://arxiv.org/abs/2304.06038v1 )

ライセンス: Link先を確認

Qinyue Zheng, Arun Venkitaraman, Simona Petravic, and Pascal Frossard

(参考訳) 発作モニタリングのためのウェアラブルデバイスはてんかん患者の生活の質を著しく向上させる可能性がある。しかし、電脳波(EEG)の完全な電極セットに依存する既存のソリューションは、毎日の使用には不都合である可能性がある。そこで,本研究では,全電極から学習した高精細な検知器(教師と呼ぶ)から知識を伝達し,新しい検出器(学生と呼ぶ)を学習するための新しい知識蒸留手法を提案する。どちらも軽量な実装を提供し、脳波記録に必要な電極数を著しく削減している。本稿では,教師と生徒の発作検知器がグラフニューラルネットワーク(gnn)である場合について考察する。私たちは2つのケースを考えます (a)事前選択されたチャンネルを用いて全患者について一人の学生が学ぶ場合 b) Gumbelsoftmaxアプローチを用いて個別のチャンネル選択を行い,各患者に対して個別の学習を行う場合。テンプル大学病院脳波データコーパス(TUSZ)を用いた実験では,脳波の少ない患者において,知識蒸留とパーソナライゼーションの両方が発作検出の性能向上に重要な役割を果たすことが示された。我々は,2つのチャネルを数えることで,競争性のある発作検出性能が得られることを確認した。これは、記録が少なくても、発作を個別に監視するウェアラブルデバイスの、より現実的なシナリオにおける私たちのアプローチの可能性を示している。

Wearable devices for seizure monitoring detection could significantly improve the quality of life of epileptic patients. However, existing solutions that mostly rely on full electrode set of electroencephalogram (EEG) measurements could be inconvenient for every day use. In this paper, we propose a novel knowledge distillation approach to transfer the knowledge from a sophisticated seizure detector (called the teacher) trained on data from the full set of electrodes to learn new detectors (called the student). They are both providing lightweight implementations and significantly reducing the number of electrodes needed for recording the EEG. We consider the case where the teacher and the student seizure detectors are graph neural networks (GNN), since these architectures actively use the connectivity information. We consider two cases (a) when a single student is learnt for all the patients using preselected channels; and (b) when personalized students are learnt for every individual patient, with personalized channel selection using a Gumbelsoftmax approach. Our experiments on the publicly available Temple University Hospital EEG Seizure Data Corpus (TUSZ) show that both knowledge-distillation and personalization play significant roles in improving performance of seizure detection, particularly for patients with scarce EEG data. We observe that using as few as two channels, we are able to obtain competitive seizure detection performance. This, in turn, shows the potential of our approach in more realistic scenario of wearable devices for personalized monitoring of seizures, even with few recordings.

翻訳日:2023-04-16 22:07:02 公開日:2023-04-03

# 深層q学習による量的取引

Quantitative Trading using Deep Q Learning ( http://arxiv.org/abs/2304.06037v1 )

ライセンス: Link先を確認

Soumyadip Sarkar

(参考訳) 強化学習(Reinforcement Learning、RL)は、ロボット工学、ゲームプレイ、自律システムなど、さまざまな用途で使用されている機械学習の分野である。近年、RLを量的トレーディングに適用することへの関心が高まっており、金融市場で利益のあるトレーディングを行うことが目標となっている。本稿では,量的取引におけるRLの利用について検討し,RLに基づく取引アルゴリズムのケーススタディを示す。その結果,RLは量的トレーディングの強力なツールであり,従来のトレーディングアルゴリズムより優れている可能性が示唆された。量的取引における強化学習の利用は、より高度で効果的な取引システムの開発につながる可能性がある研究の有望な領域である。将来の研究は、代替強化学習アルゴリズムの使用を探求し、追加のデータソースを取り入れ、異なるアセットクラスでシステムをテストする。本研究は, 定量的取引における強化学習の可能性を示し, この分野における継続的な研究・開発の重要性を強調した。より洗練された効果的な取引システムを開発することで、金融市場の効率を改善し、投資家のリターンを高めることができる。

Reinforcement learning (RL) is a branch of machine learning that has been used in a variety of applications such as robotics, game playing, and autonomous systems. In recent years, there has been growing interest in applying RL to quantitative trading, where the goal is to make profitable trades in financial markets. This paper explores the use of RL in quantitative trading and presents a case study of a RL-based trading algorithm. The results show that RL can be a powerful tool for quantitative trading, and that it has the potential to outperform traditional trading algorithms. The use of reinforcement learning in quantitative trading represents a promising area of research that can potentially lead to the development of more sophisticated and effective trading systems. Future work could explore the use of alternative reinforcement learning algorithms, incorporate additional data sources, and test the system on different asset classes. Overall, our research demonstrates the potential of using reinforcement learning in quantitative trading and highlights the importance of continued research and development in this area. By developing more sophisticated and effective trading systems, we can potentially improve the efficiency of financial markets and generate greater returns for investors.

翻訳日:2023-04-16 22:06:35 公開日:2023-04-03

# 大規模画像テキスト(LIT)モデルを用いたCTマルチタスク学習

CT Multi-Task Learning with a Large Image-Text (LIT) Model ( http://arxiv.org/abs/2304.02649v1 )

ライセンス: Link先を確認

Chuang Niu and Ge Wang

(参考訳) 大規模言語モデル(LLM)は、複数の言語タスクをパワーアップするだけでなく、異なる空間にまたがる汎用インターフェースとしても機能する。これまでのところ、コンピュータビジョン分野におけるllmの成功を、高次元およびマルチモーダルな医療画像を含む医療画像分野に効果的に翻訳する方法は、まだ実証されていない。本稿では,LLMとLIMを併用した肺がん診断のためのマルチタスクCT大画像テキスト(LIT)モデルの構築の可能性について報告する。具体的には、LLMとLIMをエンコーダとして、マルチソース情報とタスク固有の患者固有の先行情報を相乗化して、最適な診断性能を実現するタスク固有のテキストプロンプトに基づいてマルチモーダル情報を知覚する。 LITモデルとそれに関連する技術の重要な要素を3次元肺CT解析に重点を置いて評価した。肺の分節, 肺結節の検出, 肺がんの分類など, LIT モデルが複数の医療業務をうまく遂行していることを示す。多様な応用における優れた医用画像と最適な患者結果のための大規模画像言語モデルの開発が進行中である。

Large language models (LLM) not only empower multiple language tasks but also serve as a general interface across different spaces. Up to now, it has not been demonstrated yet how to effectively translate the successes of LLMs in the computer vision field to the medical imaging field which involves high-dimensional and multi-modal medical images. In this paper, we report a feasibility study of building a multi-task CT large image-text (LIT) model for lung cancer diagnosis by combining an LLM and a large image model (LIM). Specifically, the LLM and LIM are used as encoders to perceive multi-modal information under task-specific text prompts, which synergizes multi-source information and task-specific and patient-specific priors for optimized diagnostic performance. The key components of our LIT model and associated techniques are evaluated with an emphasis on 3D lung CT analysis. Our initial results show that the LIT model performs multiple medical tasks well, including lung segmentation, lung nodule detection, and lung cancer classification. Active efforts are in progress to develop large image-language models for superior medical imaging in diverse applications and optimal patient outcomes.

翻訳日:2023-04-07 16:40:16 公開日:2023-04-03

# 2017年から2023年までの大規模言語モデル研究の文献的レビュー

A Bibliometric Review of Large Language Models Research from 2017 to 2023 ( http://arxiv.org/abs/2304.02020v1 )

ライセンス: Link先を確認

Lizhou Fan, Lingyao Li, Zihui Ma, Sanggyu Lee, Huizi Yu, Libby Hemphill

(参考訳) LLM(Large Language Model)は、自然言語処理(NLP)タスクにまたがる卓越した性能を示す言語モデルの一種であり、人間に似た言語を生成する能力と、科学技術に革命をもたらす可能性から、非常に追求された研究領域となっている。本研究では,学術文献の書誌的・談話的分析を LLM 上で実施する。 5000以上の出版物を合成し、研究者、実践者、政策立案者が現在のLLM研究の展望をナビゲートするためのロードマップとして機能する。研究パラダイムとコラボレーションのパターンを特定し,2017年から2023年にかけての研究動向を示す。まず,LLM 研究の基本となるコアアルゴリズム開発と NLP タスクの解析から始める。次に,医学,工学,社会科学,人文科学などの分野におけるllmの応用について検討する。また,llms研究のダイナミックで高速な進化についても概説する。概して、本論文はllms研究とその応用の現状、影響、可能性について貴重な知見を提供する。

Large language models (LLMs) are a class of language models that have demonstrated outstanding performance across a range of natural language processing (NLP) tasks and have become a highly sought-after research area, because of their ability to generate human-like language and their potential to revolutionize science and technology. In this study, we conduct bibliometric and discourse analyses of scholarly literature on LLMs. Synthesizing over 5,000 publications, this paper serves as a roadmap for researchers, practitioners, and policymakers to navigate the current landscape of LLMs research. We present the research trends from 2017 to early 2023, identifying patterns in research paradigms and collaborations. We start with analyzing the core algorithm developments and NLP tasks that are fundamental in LLMs research. We then investigate the applications of LLMs in various fields and domains including medicine, engineering, social science, and humanities. Our review also reveals the dynamic, fast-paced evolution of LLMs research. Overall, this paper offers valuable insights into the current state, impact, and potential of LLMs research and its applications.

翻訳日:2023-04-06 14:33:21 公開日:2023-04-03

# 双方向LSTMによる偽職投稿の検出

Detecting Fake Job Postings Using Bidirectional LSTM ( http://arxiv.org/abs/2304.02019v1 )

ライセンス: Link先を確認

Aravind Sasidharan Pillai

(参考訳) 偽の求人広告がオンライン求人市場に広まり、求職者と雇用者にとって大きな課題となっている。この問題に対処する必要性が高まっているにもかかわらず、不正な求人広告の検出にディープラーニング技術を活用する研究は限られている。本研究では,双方向長短期記憶(Bidirectional Long Short-Term Memory, Bi-LSTM)モデルを用いて,偽の求人広告を識別することによってギャップを埋めることを目的とする。提案手法は数値的特徴とテキスト的特徴の両方を考慮し,データ内のパターンや関係を効果的に把握する。提案モデルはより優れた性能を示し,オンライン求人市場における実用的応用の可能性を示し,0.91LOC AUCスコアと98.71%の精度で達成した。この研究の成果は、偽の求人の拡散に対処し、ジョブ検索プロセスの全体的な完全性を改善する、堅牢で自動化されたツールの開発に寄与する。さらに,本手法に関する課題,今後の研究方向,倫理的考察について論じ,オンラインジョブ詐欺と戦うための実践的ソリューションのさらなる探求と開発をめざす。

Fake job postings have become prevalent in the online job market, posing significant challenges to job seekers and employers. Despite the growing need to address this problem, there is limited research that leverages deep learning techniques for the detection of fraudulent job advertisements. This study aims to fill the gap by employing a Bidirectional Long Short-Term Memory (Bi-LSTM) model to identify fake job advertisements. Our approach considers both numeric and text features, effectively capturing the underlying patterns and relationships within the data. The proposed model demonstrates a superior performance, achieving a 0.91 ROC AUC score and a 98.71% accuracy rate, indicating its potential for practical applications in the online job market. The findings of this research contribute to the development of robust, automated tools that can help combat the proliferation of fake job postings and improve the overall integrity of the job search process. Moreover, we discuss challenges, future research directions, and ethical considerations related to our approach, aiming to inspire further exploration and development of practical solutions to combat online job fraud.

翻訳日:2023-04-06 14:33:03 公開日:2023-04-03

# Beyond Fixed Grid: 変形可能なグリッドによる幾何学的画像表現の学習

Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid ( http://arxiv.org/abs/2008.09269v2 )

ライセンス: Link先を確認

Jun Gao, Zian Wang, Jinchen Xuan, Sanja Fidler

(参考訳) 現代のコンピュータビジョンでは、画像は通常、一定の一様格子として表現され、いくつかのストライドを持ち、深層畳み込みニューラルネットワークによって処理される。我々は、グリッドを変形して、高周波画像コンテンツとよりよく一致させることは、より効果的な戦略であると主張する。学習可能なニューラルネットワークモジュールである \emph{Deformable Grid} DefGrid を導入し、2次元三角格子の頂点の位置オフセットを予測し、変形格子のエッジが画像境界と整合する。 defgridをさまざまな処理レベルでモジュールとして挿入することで、さまざまなユースケース、すなわちさまざまなユースケースで紹介しています。我々はDefGridをエンド・ツー・エンドのemph{learnable geometry downsampling} 層として利用し、画像の深部CNNへの送出時の解像度を下げるための標準的なプール法を置き換える。意味セグメンテーションタスクにおいて,一様グリッド上でcnnを使用する場合と比較して,同じグリッド解像度で有意に改善された結果を示す。また,オブジェクトマスクアノテーションのタスクにおいてDefGridを出力層に利用し,予測した多角形格子上のオブジェクト境界の推論により,既存のピクセルワイドおよび曲線ベースのアプローチよりも正確な結果が得られることを示す。最終的にdefgridを,教師なし画像分割のためのスタンドアロンモジュールとして紹介し,既存のアプローチよりも優れた性能を示す。プロジェクトウェブサイト: http://www.cs.toronto.edu/~jungao/def-grid

In modern computer vision, images are typically represented as a fixed uniform grid with some stride and processed via a deep convolutional neural network. We argue that deforming the grid to better align with the high-frequency image content is a more effective strategy. We introduce \emph{Deformable Grid} DefGrid, a learnable neural network module that predicts location offsets of vertices of a 2-dimensional triangular grid, such that the edges of the deformed grid align with image boundaries. We showcase our DefGrid in a variety of use cases, i.e., by inserting it as a module at various levels of processing. We utilize DefGrid as an end-to-end \emph{learnable geometric downsampling} layer that replaces standard pooling methods for reducing feature resolution when feeding images into a deep CNN. We show significantly improved results at the same grid resolution compared to using CNNs on uniform grids for the task of semantic segmentation. We also utilize DefGrid at the output layers for the task of object mask annotation, and show that reasoning about object boundaries on our predicted polygonal grid leads to more accurate results over existing pixel-wise and curve-based approaches. We finally showcase DefGrid as a standalone module for unsupervised image partitioning, showing superior performance over existing approaches. Project website: http://www.cs.toronto.edu/~jungao/def-grid

翻訳日:2023-04-05 20:06:13 公開日:2023-04-03

# ハドロン衝突体における一次頂点再構成のための量子アニールを用いたトラッククラスタリング

Track clustering with a quantum annealer for primary vertex reconstruction at hadron colliders ( http://arxiv.org/abs/1903.08879v4 )

ライセンス: Link先を確認

Souvik Das, Andrew J. Wildridge, Andreas Jung

(参考訳) ビーム軸に沿った荷電粒子軌道のクラスタリングは、ハドロン衝突型加速器実験におけるハドロン相互作用の位置を再構築する最初のステップである。我々は2036年の物理量子ビットd波量子アニーラーを用いて、大ハドロン衝突型加速器の小型ミューオンソレノイド実験で測定された一次頂点と軌道の位置が類似する人工事象において、限られた容量でトラッククラスタリングを行う。このアルゴリズムは古典量子ハイブリッドではなく、完全に量子アニールに依存しており、様々な事象トポロジーでテストされている。 d-wave chimeraアーキテクチャ上の問題を決定論的グラフに埋め込み,論理キュービット内の結合強度を最適化する方法,アニーリング時間を最適化する方法を示す。さらに,物理焼鈍機と同じ処理時間に制約された商用CPU上での模擬焼鈍との比較を行った。平均665物理量子ビットを含む56の論理量子ビット問題に対するシミュレーションアニーリングに対する量子アドバンテージに注意する。我々の埋め込みと最適化手法とベンチマークパラダイムは、一般に量子アニール上の他のクラスタリング問題に適用できる。このアルゴリズムは、lhcの一次頂点数に到達するためのより洗練されたアルゴリズムのビルディングブロックとして使うことができる。

Clustering of charged particle tracks along the beam axis is the first step in reconstructing the positions of hadronic interactions, also known as primary vertices, at hadron collider experiments. We use a 2036 physical qubit D-Wave quantum annealer to perform track clustering in a limited capacity on artificial events where the positions of primary vertices and tracks resemble those measured by the Compact Muon Solenoid experiment at the Large Hadron Collider. The algorithm, which is not a classical-quantum hybrid but relies entirely on quantum annealing, is tested on a variety of event topologies. We demonstrate a deterministic graph-embedding of the problem on the D-Wave Chimera architecture, a method for optimizing the coupling strengths within logical qubits, and a method for optimizing annealing time. Further, we benchmark it against simulated annealing on a commercial CPU constrained to the same processor time per anneal as the physical annealer. We note a quantum advantage against simulated annealing up to a 56 logical qubit problem that involves 665 physical qubits on average. Our embedding and optimization methods, and the benchmarking paradigm, can be applied generally to other clustering problems on quantum annealers. This algorithm may be used as a building-block for more sophisticated algorithms to reach the number of primary vertices at the LHC.

翻訳日:2023-04-05 20:05:35 公開日:2023-04-03

# 通信効率の良い連帯線形および深い一般化正準相関解析

Communication-Efficient Federated Linear and Deep Generalized Canonical Correlation Analysis ( http://arxiv.org/abs/2109.12400v2 )

ライセンス: Link先を確認

Sagar Shrestha and Xiao Fu

(参考訳) 古典的および深い一般化された標準相関解析(GCCA)アルゴリズムは、線形変換とニューラルネットワークを用いて複数の ``views'' (例:音声と画像)からデータエンティティの低次元共通表現を求める。ビューが異なるコンピュータエージェント(例えば組織やエッジデバイス)に取得され、プライバシや通信コストの考慮からデータ共有が望まれない場合、フェデレートされた学習ベースのGCCAが好まれる。連合学習では、ビューをエージェントにローカルに保持し、中央サーバとの限られた情報交換のみを許可する。しかし,既存のGCCAアルゴリズムをこのような統合学習環境に適用すると,通信オーバーヘッドが著しく大きくなる可能性がある。本研究は, 最大分散(MAX-VAR)定式化の下で, 線形および深部GCCAの通信効率向上のためのフェデレート学習フレームワークを提案する。オーバーヘッド問題は、計算エージェントと中央コントローラ間の情報交換を積極的に(量子化によって)圧縮することで解決される。数値化されていないバージョンと比較して,提案手法は,精度や収束速度をほとんど損なうことなく,通信オーバーヘッドの大幅な削減を享受できることを示した。厳密な収束解析も提示され、これは非自明な試みである。汎用フェデレーション最適化の結果はGCCAの特別な問題構造をカバーしていない。本結果は,重量子化や確率近似の下でも,線形および深部GCCAのアルゴリズムが線形速度で臨界点に収束することを示す。さらに、線形MAX-VARの場合、量子化アルゴリズムは、合理的条件下での幾何速度で大域的最適にアプローチする。提案手法の有効性を示すために合成および実データ実験を用いる。

Classic and deep generalized canonical correlation analysis (GCCA) algorithms seek low-dimensional common representations of data entities from multiple ``views'' (e.g., audio and image) using linear transformations and neural networks, respectively. When the views are acquired and stored at different computing agents (e.g., organizations and edge devices) and data sharing is undesired due to privacy or communication cost considerations, federated learning-based GCCA is well-motivated. In federated learning, the views are kept locally at the agents and only derived, limited information exchange with a central server is allowed. However, applying existing GCCA algorithms onto such federated learning settings may incur prohibitively high communication overhead. This work puts forth a communication-efficient federated learning framework for both linear and deep GCCA under the maximum variance (MAX-VAR) formulation. The overhead issue is addressed by aggressively compressing (via quantization) the exchanging information between the computing agents and a central controller. Compared to the unquantized version, our empirical study shows that the proposed algorithm enjoys a substantial reduction of communication overheads with virtually no loss in accuracy and convergence speed. Rigorous convergence analyses are also presented, which is a nontrivial effort. Generic federated optimization results do not cover the special problem structure of GCCA. Our result shows that the proposed algorithms for both linear and deep GCCA converge to critical points at a sublinear rate, even under heavy quantization and stochastic approximations. In addition, in the linear MAX-VAR case, the quantized algorithm approaches a global optimum in a geometric rate under reasonable conditions. Synthetic and real-data experiments are used to showcase the effectiveness of the proposed approach.

翻訳日:2023-04-05 19:33:03 公開日:2023-04-03

# 少数ショットオープンセット認識のための再構築指導型メタラーニング

Reconstruction guided Meta-learning for Few Shot Open Set Recognition ( http://arxiv.org/abs/2108.00340v3 )

ライセンス: Link先を確認

Sayak Nag, Dripta S. Raychaudhuri, Sujoy Paul, Amit K. Roy-Chowdhury

(参考訳) 多くのアプリケーションでは、非常に限られたデータ(フェーショット分類)から分類器を学習することに制約があります。未知のカテゴリ(オープンセットの分類)からサンプルを識別する必要がある場合、タスクはさらに困難になる。少数のサンプルを持つクラスのよい抽象化を学ぶことは、特にオープンセットの設定では、非常に難しい。結果として、オープンセット認識は、数ショット設定で最小限の注目を集めている。しかし、各クラスのラベル付きサンプル数が限られている環境モニタリングのような多くのアプリケーションでは、これは重要なタスクである。既存のオープンセット認識(fsosr)法はしきい値スキームに依存しており、オープンクラスサンプルの均一な確率を考慮する人もいる。しかし、このアプローチはしばしば不正確であり、特に細粒度の分類では、しきい値の選択に非常に敏感である。これらの問題に対処するため、我々はReconstructing Exemplar-based Few-shot Open-set ClaSsifier (ReFOCS)を提案する。新規のexemplar reconstruction-based meta-learning strategy refocsを用いて、サンプルの開度を自己認識して学習することにより、注意深く調整された閾値の必要性をなくすfsosrを合理化する。例題はクラス代表として行動し、トレーニングデータセットで提供されるか、機能ドメインで見積もることができる。さまざまなデータセットをテストすることで、ReFOCSは複数の最先端手法より優れていることを示す。

In many applications, we are constrained to learn classifiers from very limited data (few-shot classification). The task becomes even more challenging if it is also required to identify samples from unknown categories (open-set classification). Learning a good abstraction for a class with very few samples is extremely difficult, especially under open-set settings. As a result, open-set recognition has received minimal attention in the few-shot setting. However, it is a critical task in many applications like environmental monitoring, where the number of labeled examples for each class is limited. Existing few-shot open-set recognition (FSOSR) methods rely on thresholding schemes, with some considering uniform probability for open-class samples. However, this approach is often inaccurate, especially for fine-grained categorization, and makes them highly sensitive to the choice of a threshold. To address these concerns, we propose Reconstructing Exemplar-based Few-shot Open-set ClaSsifier (ReFOCS). By using a novel exemplar reconstruction-based meta-learning strategy ReFOCS streamlines FSOSR eliminating the need for a carefully tuned threshold by learning to be self-aware of the openness of a sample. The exemplars, act as class representatives and can be either provided in the training dataset or estimated in the feature domain. By testing on a wide variety of datasets, we show ReFOCS to outperform multiple state-of-the-art methods.

翻訳日:2023-04-05 19:32:36 公開日:2023-04-03

# Bayesian Controller Fusion:ロボットの深部強化学習における制御の活用

Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics ( http://arxiv.org/abs/2107.09822v3 )

ライセンス: Link先を確認

Krishan Rana, Vibhavari Dasagi, Jesse Haviland, Ben Talbot, Michael Milford and Niko S\"underhauf

(参考訳) 本稿では,従来の手作りコントローラの強みとモデルフリー深部強化学習(RL)を組み合わせたハイブリッド制御戦略であるBayesian Controller Fusion(BCF)を紹介する。 BCFはロボティクス領域で成長し、多くのタスクに対して信頼性はあるが最適でない制御が優先されるが、スクラッチからのRLは安全でデータ非効率である。各システムからの不確実性を認識した分布出力を融合することにより、BCFはそれらの間の制御を調停し、それぞれの強みを利用する。我々は,広大かつ長期にわたる環境下でのナビゲーションと,マニピュラビリティの最大化を伴う複雑な到達タスクの2つの実世界のロボティクスタスクについてBCFを研究する。これら2つの領域に対して、単純な手作りのコントローラが存在し、リスク・逆の方法でタスクを解決できるが、解析的モデリング、コントローラの誤校正、タスクの変動に制限を課した最適解を必ずしも示さない。訓練の初期段階における事前の指導が自然に行われるため、BCFは学習を加速し、政策がより経験を積むにつれて、事前の制御性能よりも大幅に改善する。さらに重要なことは、コントロールの事前のリスクの多様性を考えると、BCFは安全な探索と展開を保証する。さらに、bcfのゼロショットsim-to-real設定の適用可能性と、実世界の分散状態を扱う能力を示す。 BCFは、深いRLと従来のロボット制御の相補的な強みを組み合わせるための、有望なアプローチである。コードと追加ビデオはhttps://krishanrana.github.io/bcfで公開されている。

We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF's applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at https://krishanrana.github.io/bcf.

翻訳日:2023-04-05 19:32:13 公開日:2023-04-03

# 因果埋め込みによる物理系予測のための観測可能性の普遍集合

Universal set of Observables for Forecasting Physical Systems through Causal Embedding ( http://arxiv.org/abs/2105.10759v3 )

ライセンス: Link先を確認

G Manjunath, A de Clercq and MJ Steynberg

(参考訳) 我々は、基礎となる力学系の左無限軌道全体やそのような左無限軌道からの観測が、いつ、どのようにして異なる空間内の一対の要素によって一意に表現できるかを示す。そのようなペアのコレクションは、駆動力学系から派生したもので、駆動系と一緒に関数を学ぶのに使用される。 (i)。基礎となるシステムに位相的に共役するシステムを決定する (ii) 共役が計算可能で普遍的であるため、基盤となるシステムのダイナミクスを予測すること、すなわち、基盤となるシステムに依存しない (iii) たとえ関数の学習に誤りがあったとしても、因果的に埋め込まれたオブジェクトのイメージを含むアトラクタを保証する。これらを達成することによって、学習可能な関数の存在の保証がないため、しばしば長期的一貫性の低い既存の貯水池コンピューティングスキームを破り、Takensの遅延埋め込みにおける安定性の課題を克服する新たな予測スキームを開拓する。既知技術が失敗した基盤システムの正確なモデリングについて説明する。

We demonstrate when and how an entire left-infinite orbit of an underlying dynamical system or observations from such left-infinite orbits can be uniquely represented by a pair of elements in a different space, a phenomenon which we call \textit{causal embedding}. The collection of such pairs is derived from a driven dynamical system and is used to learn a function which together with the driven system would: (i). determine a system that is topologically conjugate to the underlying system (ii). enable forecasting the underlying system's dynamics since the conjugacy is computable and universal, i.e., it does not depend on the underlying system (iii). guarantee an attractor containing the image of the causally embedded object even if there is an error made in learning the function. By accomplishing these we herald a new forecasting scheme that beats the existing reservoir computing schemes that often lead to poor long-term consistency as there is no guarantee of the existence of a learnable function, and overcomes the challenges of stability in Takens delay embedding. We illustrate accurate modeling of underlying systems where previously known techniques have failed.

翻訳日:2023-04-05 19:31:08 公開日:2023-04-03

# ネガティビティはより早く広まる - 政治的コミュニケーションにおける感情の役割に関する大規模多言語twitter分析

Negativity Spreads Faster: A Large-Scale Multilingual Twitter Analysis on the Role of Sentiment in Political Communication ( http://arxiv.org/abs/2202.00396v3 )

ライセンス: Link先を確認

Dimosthenis Antypas, Alun Preece, Jose Camacho-Collados

(参考訳) ソーシャルメディアは、現代社会、特に西洋社会では、Twitterのようなプラットフォームが政治家をフォローできるため、市民が政治的議論により関与するようになると、非常に影響力を増している。同様に、政治家はTwitterを使って意見を表明し、現在の話題について議論し、有権者の行動に影響を与えるための政治議題を推進している。本稿では、欧州3カ国の政治家のツイートを分析し、そのツイートのバイラル性について検討する。これまでの研究では、ネガティブな感情を伝えるツイートがより頻繁にリツイートされることが示されている。最先端の事前学習された言語モデルを利用することで、ギリシャ、スペイン、イギリスの国会議員が収集した数十万のツイートについて感情分析を行った。私たちは、影響力のあるツイートとあまり人気のないツイートの違いを体系的に探索し分析することでこれを達成しました。我々の分析は、特に近年において、政治家の否定的なツイートが広く広まり、政党と政治家と一般大衆の間で興味深い違いが浮かび上がっていることを示している。

Social media has become extremely influential when it comes to policy making in modern societies, especially in the western world, where platforms such as Twitter allow users to follow politicians, thus making citizens more involved in political discussion. In the same vein, politicians use Twitter to express their opinions, debate among others on current topics and promote their political agendas aiming to influence voter behaviour. In this paper, we attempt to analyse tweets of politicians from three European countries and explore the virality of their tweets. Previous studies have shown that tweets conveying negative sentiment are likely to be retweeted more frequently. By utilising state-of-the-art pre-trained language models, we performed sentiment analysis on hundreds of thousands of tweets collected from members of parliament in Greece, Spain and the United Kingdom, including devolved administrations. We achieved this by systematically exploring and analysing the differences between influential and less popular tweets. Our analysis indicates that politicians' negatively charged tweets spread more widely, especially in more recent times, and highlights interesting differences between political parties as well as between politicians and the general population.

翻訳日:2023-04-05 19:23:46 公開日:2023-04-03

# 3量子ビットを用いた量子フーリエ変換の実装

Implementing quantum Fourier transform using three qubits ( http://arxiv.org/abs/2110.15067v2 )

ライセンス: Link先を確認

Mouhcine Yachi, Radouan Hab-arrih, Ahmed Jellal

(参考訳) 3つの量子ビットを記述するハミルトニアンの循環対称性を用いて、量子フーリエ変換を実現する。この対称性により、ハミルトニアンに関係する物理パラメータの大きさに独立して固有ベクトルの集合を構築することができ、その結果、絡み合いは維持される。実現はトラップされたイオンに頼り、ゲートの実装は各スピン積状態からフーリエモードへの断熱的遷移を必要とする。忠実度を数値計算し,その結果は重要な値を示した。最後に、対向運転場を用いてゲートの加速について議論する。

Using the circulant symmetry of a Hamiltonian describing three qubits, we realize the quantum Fourier transform. This symmetry allows us to construct a set of eigenvectors independently on the magnitude of physical parameters involved in the Hamiltonian and as a result, the entanglement will be maintained. The realization will be leaned on trapped ions and the gate implementation requires an adiabatic transition from each spin product state to Fourier modes. The fidelity was numerically calculated and the results show important values. Finally, we discuss the acceleration of the gate by using the counter-driving field.

翻訳日:2023-04-05 19:21:24 公開日:2023-04-03

# リハーサルフリー連続学習について

A Closer Look at Rehearsal-Free Continual Learning ( http://arxiv.org/abs/2203.17269v2 )

ライセンス: Link先を確認

James Seale Smith, Junjiao Tian, Shaunak Halbe, Yen-Chang Hsu, Zsolt Kira

(参考訳) 連続学習(continual learning)とは、機械学習モデルがトレーニングデータの連続的なシフトから新たな概念を学習すると同時に、トレーニングデータから消失する可能性のある既見のクラスの知識の低下を長期にわたって回避する(破滅的な忘れ方問題として知られる現象)。 1つの拡張タスク(いわゆるクラス増分連続学習)の継続的な学習への現在のアプローチは、この知識の劣化を避けるために、これまで見られたデータを広範囲にリハーサルする必要がある。残念ながら、リハーサルはメモリにコストがかかり、データプライバシーにも違反する可能性がある。代わりに,知識蒸留とパラメータ正規化を組み合わせることにより,リハーサルを伴わずに継続学習性能の向上を図る。具体的には、予測蒸留、特徴蒸留、L2パラメータ正則化、EWCパラメータ正則化など、一般的な連続学習手法について深く研究する。まず、パラメータ正規化手法が1つの拡張タスクのリハーサルなし連続学習に失敗するという一般的な仮定を論じる。次に、リハーサルなし連続学習における事前学習モデルからの知識を活用する方法について検討し、バニラL2パラメータ正則化がEWCパラメータ正則化および特徴蒸留より優れていることを示す。最後に、最近普及したimagenet-rベンチマークを調べ、vitトランスフォーマのセルフアテンションブロックに実装されたl2パラメータの正規化が、最近普及した継続的学習手法のプロンプトよりも優れていることを示す。

Continual learning is a setting where machine learning models learn novel concepts from continuously shifting training data, while simultaneously avoiding degradation of knowledge on previously seen classes which may disappear from the training data for extended periods of time (a phenomenon known as the catastrophic forgetting problem). Current approaches for continual learning of a single expanding task (aka class-incremental continual learning) require extensive rehearsal of previously seen data to avoid this degradation of knowledge. Unfortunately, rehearsal comes at a cost to memory, and it may also violate data-privacy. Instead, we explore combining knowledge distillation and parameter regularization in new ways to achieve strong continual learning performance without rehearsal. Specifically, we take a deep dive into common continual learning techniques: prediction distillation, feature distillation, L2 parameter regularization, and EWC parameter regularization. We first disprove the common assumption that parameter regularization techniques fail for rehearsal-free continual learning of a single, expanding task. Next, we explore how to leverage knowledge from a pre-trained model in rehearsal-free continual learning and find that vanilla L2 parameter regularization outperforms EWC parameter regularization and feature distillation. Finally, we explore the recently popular ImageNet-R benchmark, and show that L2 parameter regularization implemented in self-attention blocks of a ViT transformer outperforms recent popular prompting for continual learning methods.

翻訳日:2023-04-05 19:12:32 公開日:2023-04-03

# ネットワーク偏波, フィルタ気泡, エコーチャンバー : 対策と低減方法についての注釈付きレビュー

Network polarization, filter bubbles, and echo chambers: An annotated review of measures and reduction methods ( http://arxiv.org/abs/2207.13799v4 )

ライセンス: Link先を確認

Ruben Interian, Ruslan G. Marzo, Isela Mendoza, Celso C. Ribeiro

(参考訳) 分極は、コミュニティや社会のメンバーをつなぐ基盤となるネットワークが、グループ間の接続が弱い高度に連結したグループによって特徴づけられるときに生じる。分極化の増大、エコーチェンバーの強化、ソーシャルネットワークにおける情報フィルタによる孤立化は、コンピュータ科学、経済学、社会科学、政治科学など様々な分野の研究者の注目を集めている。本稿では,ネットワークの偏光対策と偏光処理モデルについて注釈付きレビューを行う。グラフやネットワークにおける偏極を測定するためのいくつかのアプローチが同定され、ホモフィリー、モジュラリティ、ランダムウォーク、バランス理論に基づくものが含まれる。分極化を減らすために使われる戦略には、エッジエディションやノードエディション(挿入や削除、エッジウェイトの変更を含む)を提案する方法、ソーシャルネットワーク設計の変更、あるいはこれらのネットワークに埋め込まれたレコメンデーションシステムの変更が含まれる。

Polarization arises when the underlying network connecting the members of a community or society becomes characterized by highly connected groups with weak inter-group connectivity. The increasing polarization, the strengthening of echo chambers, and the isolation caused by information filters in social networks are increasingly attracting the attention of researchers from different areas of knowledge such as computer science, economics, social and political sciences. This work presents an annotated review of network polarization measures and models used to handle the polarization. Several approaches for measuring polarization in graphs and networks were identified, including those based on homophily, modularity, random walks, and balance theory. The strategies used for reducing polarization include methods that propose edge or node editions (including insertions or deletions, as well as edge weight modifications), changes in social network design, or changes in the recommendation systems embedded in these networks.

翻訳日:2023-04-05 19:05:04 公開日:2023-04-03

# yankee swap:matroidランクバリュエーションのための高速で簡単なフェアアロケーションメカニズム

Yankee Swap: a Fast and Simple Fair Allocation Mechanism for Matroid Rank Valuations ( http://arxiv.org/abs/2206.08495v5 )

ライセンス: Link先を確認

Vignesh Viswanathan and Yair Zick

(参考訳) エージェントがマトロイドランクの評価値を持つ場合、不特定商品の公平な割り当てについて検討する。我々の主な貢献は、明快で効率的なロレンツ支配割り当てを計算する、口語的ヤンキースワップ手順に基づく単純なアルゴリズムである。このような割り当てを計算する多項式時間アルゴリズムはあるが、提案手法は2つの方法で改善する。 (a)我々のアプローチは容易に理解でき、複雑なマトロイド最適化アルゴリズムをサブルーチンとして使用しません。 (b)我々のアプローチはスケーラブルであり、ロレンツ支配割当を計算するのに既知のアルゴリズムよりも高速である。これらの2つの特性は、実際の公平な割り当て設定におけるアルゴリズムの採用の鍵となります。

We study fair allocation of indivisible goods when agents have matroid rank valuations. Our main contribution is a simple algorithm based on the colloquial Yankee Swap procedure that computes provably fair and efficient Lorenz dominating allocations. While there exist polynomial time algorithms to compute such allocations, our proposed method improves on them in two ways. (a) Our approach is easy to understand and does not use complex matroid optimization algorithms as subroutines. (b) Our approach is scalable; it is provably faster than all known algorithms to compute Lorenz dominating allocations. These two properties are key to the adoption of algorithms in any real fair allocation setting; our contribution brings us one step closer to this goal.

翻訳日:2023-04-05 19:02:45 公開日:2023-04-03

# 多言語ファインタニングとバックトランスレーションによる多言語双方向教師なし翻訳

Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation ( http://arxiv.org/abs/2209.02821v4 )

ライセンス: Link先を確認

Bryan Li, Mohammad Sadegh Rasooli, Ajay Patel, Chris Callison-Burch

(参考訳) 本研究では,NMTモデルをトレーニングし,未知の言語を英語と英語の両方に翻訳する2段階のアプローチを提案する。最初の段階では、事前訓練されたXLM-RおよびRoBERTa重みにエンコーダデコーダモデルを初期化し、40言語で並列データに対して多言語微調整を行う。このモデルは、未熟な言語のゼロショット翻訳に一般化できる。第2段階では、この一般化能力を利用して、単言語データセットから合成並列データを生成し、その後、双方向にバックトランスレーションのラウンドを訓練する。我々のアプローチは、EcXTra(英語中心のクロスリンガル(X)転送)であり、概念的には単純であり、標準のクロスエントロピー目的のみを使用する。データ駆動型であり、補助並列データと単言語データを活用する。我々は7つの低リソース言語に対する教師なしnmt結果を評価し,各ラウンドのバックトランスレーション訓練により双方向性能がさらに向上することを確認した。我々の最後のシングルEcXTra訓練モデルは、すべての翻訳方向の競合翻訳性能を達成し、特に英語からカザフ語への新たな最先端(22.9 > 10.4 BLEU)を確立した。私たちのコードはhttps://github.com/manestay/EcXTraで利用可能です。

We propose a two-stage approach for training a single NMT model to translate unseen languages both to and from English. For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 40 languages to English. We find this model can generalize to zero-shot translations on unseen languages. For the second stage, we leverage this generalization ability to generate synthetic parallel data from monolingual datasets, then bidirectionally train with successive rounds of back-translation. Our approach, which we EcXTra (English-centric Crosslingual (X) Transfer), is conceptually simple, only using a standard cross-entropy objective throughout. It is also data-driven, sequentially leveraging auxiliary parallel data and monolingual data. We evaluate unsupervised NMT results for 7 low-resource languages, and find that each round of back-translation training further refines bidirectional performance. Our final single EcXTra-trained model achieves competitive translation performance in all translation directions, notably establishing a new state-of-the-art for English-to-Kazakh (22.9 > 10.4 BLEU). Our code is available at https://github.com/manestay/EcXTra .

翻訳日:2023-04-05 18:55:34 公開日:2023-04-03

# サーバ学習によるフェデレーションラーニング - 非IIDデータのパフォーマンス向上

Federated Learning with Server Learning: Enhancing Performance for Non-IID Data ( http://arxiv.org/abs/2210.02614v3 )

ライセンス: Link先を確認

Van Sy Mai, Richard J. La, Tao Zhang

(参考訳) フェデレートラーニング(FL)は、クライアントに格納されたローカルデータを協調サーバで分散学習する手段として登場した。最近の研究では、クライアントでデータをトレーニングする場合、flはパフォーマンスの低下と収束の遅さに苦しむことが示されている。ここでは、サーバが小さなデータセットから補助学習を行うことにより、この性能劣化を軽減するための新たな補完的アプローチを検討する。解析と実験により,サーバのデータセットが小さく,すべてのクライアントから収集したデータと分布が異なる場合でも,モデル精度と収束時間の両方において,新たなアプローチが大幅に向上することが示された。

Federated Learning (FL) has emerged as a means of distributed learning using local data stored at clients with a coordinating server. Recent studies showed that FL can suffer from poor performance and slower convergence when training data at clients are not independent and identically distributed. Here we consider a new complementary approach to mitigating this performance degradation by allowing the server to perform auxiliary learning from a small dataset. Our analysis and experiments show that this new approach can achieve significant improvements in both model accuracy and convergence time even when the server dataset is small and its distribution differs from that of the aggregated data from all clients.

翻訳日:2023-04-05 18:45:02 公開日:2023-04-03

# PU GNN:不均衡PUラベル付きグラフ注意ネットワークによるP2E MMORPGのチャージバックフラッド検出

PU GNN: Chargeback Fraud Detection in P2E MMORPGs via Graph Attention Networks with Imbalanced PU Labels ( http://arxiv.org/abs/2211.08604v5 )

ライセンス: Link先を確認

Jiho Choi, Junghoon Park, Woocheol Kim, Jin-Hyeok Park, Yumin Suh, Minchang Sung

(参考訳) 近年のマルチプレイヤーオンラインロールプレイングゲーム(MMORPG)におけるプレイツーアーンシステム(P2E)の出現により、ゲーム内商品は、これまで以上に現実世界の価値と交換可能になった。 p2e mmorpgsの商品は、ブロックチェーンネットワークを介してbitcoin、ethereum、klaytnなどの暗号通貨と直接交換することができる。従来のゲーム内商品とは異なり、一旦ブロックチェーンに書き込むと、P2E商品は支払い詐欺、キャンセル、返金などのチャージバック詐欺であってもゲーム運用チームによって復元できない。そこで本研究では,p2eトークンのトランザクションパターンを用いて,pu損失を伴うグラフアテンションネットワークを活用した,新たなチャージバック詐欺予測手法pu gnnを提案する。修正GraphSMOTEの導入により、提案モデルはチャージバック詐欺データセットにおけるラベルの不均衡分布を処理する。実世界の3つのP2E MMORPGデータセットを用いた実験により,PU GNNは従来提案されていた手法よりも優れた性能を示した。

The recent advent of play-to-earn (P2E) systems in massively multiplayer online role-playing games (MMORPGs) has made in-game goods interchangeable with real-world values more than ever before. The goods in the P2E MMORPGs can be directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn via blockchain networks. Unlike traditional in-game goods, once they had been written to the blockchains, P2E goods cannot be restored by the game operation teams even with chargeback fraud such as payment fraud, cancellation, or refund. To tackle the problem, we propose a novel chargeback fraud prediction method, PU GNN, which leverages graph attention networks with PU loss to capture both the players' in-game behavior with P2E token transaction patterns. With the adoption of modified GraphSMOTE, the proposed model handles the imbalanced distribution of labels in chargeback fraud datasets. The conducted experiments on three real-world P2E MMORPG datasets demonstrate that PU GNN achieves superior performances over previously suggested methods.

翻訳日:2023-04-05 18:36:21 公開日:2023-04-03

# 進化アルゴリズム(movea)によるヒト脳の高分解能経頭蓋電気刺激の多目的最適化

Multi-objective optimization via evolutionary algorithm (MOVEA) for high-definition transcranial electrical stimulation of the human brain ( http://arxiv.org/abs/2211.05658v2 )

ライセンス: Link先を確認

Mo Wang, Kexin Lou, Zeming Liu, Pengfei Wei, Quanying Liu

(参考訳) 経頭蓋電気刺激(tes)戦略の設計には、目標領域の強度、焦点距離、刺激深度、回避ゾーンなど、しばしば互いに排他的である複数の目的を考慮する必要がある。異なる戦略を最適化し、これらの目標間のトレードオフを比較するための計算フレームワークは現在不足している。本稿では,TES戦略の設計における非凸最適化問題に対して,事前定義された方向のないMOVEA(Multi-Objective Optimization)を提案する。 MOVEAはパレート最適化を通じて複数の目標の同時最適化を可能にし、手動の重量調整なしでパレートフロントを生成し、より多くの目標に容易に拡張できる。このパレート前線は、強度や焦点性といった相反する目標間のトレードオフ関係を尊重しながら、様々な要求を満たす最適な解からなる。 moveaは多用途で、high definition (hd) と two-pair システムに基づく経頭蓋交互電流刺激 (tacs) と経頭蓋側時間刺激 (ttis) の両方に適している。我々は,tacsとttiの包括的比較を行った。moveaは脳領域と認知機能との因果関係の理解や疾患の治療において,特定の目的と制約に基づく tes の最適化,tti と tacs ベースのニューロモジュレーションを促進する。 MOVEAのコードはhttps://github.com/ncclabsustech/MOVEAで公開されている。

Designing a transcranial electrical stimulation (TES) strategy requires considering multiple objectives, such as intensity in the target area, focality, stimulation depth, and avoidance zone, which are often mutually exclusive. A computational framework for optimizing different strategies and comparing trade-offs between these objectives is currently lacking. In this paper, we propose a general framework called multi-objective optimization via evolutionary algorithms (MOVEA) to address the non-convex optimization problem in designing TES strategies without predefined direction. MOVEA enables simultaneous optimization of multiple targets through Pareto optimization, generating a Pareto front after a single run without manual weight adjustment and allowing easy expansion to more targets. This Pareto front consists of optimal solutions that meet various requirements while respecting trade-off relationships between conflicting objectives such as intensity and focality. MOVEA is versatile and suitable for both transcranial alternating current stimulation (tACS) and transcranial temporal interference stimulation (tTIS) based on high definition (HD) and two-pair systems. We performed a comprehensive comparison between tACS and tTIS in terms of intensity, focality, and steerability for targets at different depths.MOVEA facilitates the optimization of TES based on specific objectives and constraints, advancing tTIS and tACS-based neuromodulation in understanding the causal relationship between brain regions and cognitive functions and in treating diseases. The code for MOVEA is available at https://github.com/ncclabsustech/MOVEA.

翻訳日:2023-04-05 18:35:54 公開日:2023-04-03

# 不完全情報に基づく知識グラフの品質評価

Knowledge Graph Quality Evaluation under Incomplete Information ( http://arxiv.org/abs/2212.00994v2 )

ライセンス: Link先を確認

Xiaodong Li, Chenxin Zou, Yi Cai, Yuelong Zhu

(参考訳) 知識グラフ(KG)は多くのタスクにおける基本的な役割のため、ますます注目を集めている。したがって、KGsの品質評価は重要で不可欠である。この分野での既存の手法では、異なる次元からの新しい品質指標を提案するか、kg建設段階での性能を測定するかによってkgを評価する。しかし、これらの方法には2つの大きな問題がある。まず、KGsの内部情報を品質評価中に露出させるKGsの生データに強く依存する。第二に、ダウンストリームアプリケーションにとって後者がより重要となる能力レベルではなく、データレベルの品質についてより深く検討する。そこで本研究では,不完全情報に基づく知識グラフ品質評価フレームワーク(qeii)を提案する。品質評価タスクは、2つのKG間の逆Q&Aゲームに変換される。したがって、ゲームの勝者はより良い品質を持つと考えられる。評価プロセス中は、情報保護を保証する生データを露出しない。 4組のKGの実験結果から,QEIIはベースラインと比較して,不完全情報下での能力レベルにおいて合理的な品質評価を行うことを示した。

Knowledge graphs (KGs) have attracted more and more attentions because of their fundamental roles in many tasks. Quality evaluation for KGs is thus crucial and indispensable. Existing methods in this field evaluate KGs by either proposing new quality metrics from different dimensions or measuring performances at KG construction stages. However, there are two major issues with those methods. First, they highly rely on raw data in KGs, which makes KGs' internal information exposed during quality evaluation. Second, they consider more about the quality at data level instead of ability level, where the latter one is more important for downstream applications. To address these issues, we propose a knowledge graph quality evaluation framework under incomplete information (QEII). The quality evaluation task is transformed into an adversarial Q&A game between two KGs. Winner of the game is thus considered to have better qualities. During the evaluation process, no raw data is exposed, which ensures information protection. Experimental results on four pairs of KGs demonstrate that, compared with baselines, the QEII implements a reasonable quality evaluation at ability level under incomplete information.

翻訳日:2023-04-05 18:27:53 公開日:2023-04-03

# 一般化された少数ショットセマンティクスセグメンテーションのための強固なベースライン

A Strong Baseline for Generalized Few-Shot Semantic Segmentation ( http://arxiv.org/abs/2211.14126v2 )

ライセンス: Link先を確認

Sina Hajimiri, Malik Boudiaf, Ismail Ben Ayed, Jose Dolz

(参考訳) 本稿では,簡単なトレーニングプロセスと最適化の容易な推論フェーズを備えた,一般化されたマイナショットセグメンテーションフレームワークを提案する。特に、よく知られたInfoMaxの原理に基づいて、学習した特徴表現とそれに対応する予測との相互情報(MI)を最大化する単純なモデルを提案する。また,MIに基づく定式化から派生した用語は,知識蒸留用語と結合し,基礎クラスにおける知識を保持する。簡単なトレーニングプロセスでは、ベースクラスでトレーニングされたセグメンテーションネットワークの上に推論モデルを適用することができる。提案手法は,人気のマイナショットセグメンテーションベンチマークであるpascal-$5^i$とcoco-$20^i$に対して大幅に改善する。特に新規の授業では、改善率は7%から26%(PASCAL-$5^i$)と3%から12%(COCO-$20^i$)である。さらに,パフォーマンスギャップがさらに悪化する,より困難な設定を提案する。私たちのコードはhttps://github.com/sinahmr/DIaM.comで公開されています。

This paper introduces a generalized few-shot segmentation framework with a straightforward training process and an easy-to-optimize inference phase. In particular, we propose a simple yet effective model based on the well-known InfoMax principle, where the Mutual Information (MI) between the learned feature representations and their corresponding predictions is maximized. In addition, the terms derived from our MI-based formulation are coupled with a knowledge distillation term to retain the knowledge on base classes. With a simple training process, our inference model can be applied on top of any segmentation network trained on base classes. The proposed inference yields substantial improvements on the popular few-shot segmentation benchmarks, PASCAL-$5^i$ and COCO-$20^i$. Particularly, for novel classes, the improvement gains range from 7% to 26% (PASCAL-$5^i$) and from 3% to 12% (COCO-$20^i$) in the 1-shot and 5-shot scenarios, respectively. Furthermore, we propose a more challenging setting, where performance gaps are further exacerbated. Our code is publicly available at https://github.com/sinahmr/DIaM.

翻訳日:2023-04-05 18:26:51 公開日:2023-04-03

# ハイゼンベルク対共変文字列

Heisenberg versus the Covariant String ( http://arxiv.org/abs/2212.07256v3 )

ライセンス: Link先を確認

Norbert Dragon and Florian Oppermann

(参考訳) p^2 - m^2\bigr)\psi = 0$ は、自由ベクトル位置作用素 $x=(x_0,\dots x_{d-1})$: ハイゼンベルク代数 $[p^m, x_n] = i \delta^m{}_n$ は、任意の質量のポアンカル多重が消えるという単純な議論から導かれる。同じ結論はストーン=ヴォン・ノイマンの定理から導かれる。量子論において、絶対連続スペクトルの低次元部分多様体への制約は、ディラックの対応する古典的制約に対する処理が一貫した対応する量子モデルを持つシンプレクティック部分多様体を定義するとしてもゼロとなる。そのヒルベルト空間は、制約のない理論の部分空間ではない。したがって、制約のないモデルの演算子関係は制約付きモデルに引き継がれる必要はない。この議論は相対論的粒子の量子化されたワールドラインモデルと共変量子弦の物理的状態を除いている。粒子に作用するローレンツ変換の生成元に関する誤解を補正する。

A Poincar\'e multiplet of mass eigenstates $\bigl(P^2 - m^2\bigr)\Psi = 0$ cannot be a subspace of a space with a $D$-vector position operator $X=(X_0,\dots X_{D-1})$: the Heisenberg algebra $[P^m, X_n] = i \delta^m{}_n$ implies by a simple argument that each Poincar\'e multiplet of definite mass vanishes. The same conclusion follows from the Stone-von Neumann theorem. In a quantum theory the constraint of an absolutely continuous spectrum to a lower dimensional submanifold yields zero even if Dirac's treatment of the corresponding classical constraint defines a symplectic submanifold with a consistent corresponding quantum model. Its Hilbert space is not a subspace of the unconstrained theory. Hence the operator relations of the unconstrained model need not carry over to the constrained model. Our argument excludes quantized worldline models of relativistic particles and the physical states of the covariant quantum string. We correct misconceptions about the generators of Lorentz transformations acting on particles.

翻訳日:2023-04-05 18:17:41 公開日:2023-04-03

# D適応による学習時間自由学習

Learning-Rate-Free Learning by D-Adaptation ( http://arxiv.org/abs/2301.07733v3 )

ライセンス: Link先を確認

Aaron Defazio and Konstantin Mishchenko

(参考訳) d-適応(d-adaptation)は、バックトラッキングやラインサーチなしに凸リプシッツ関数を最小化するための収束率を漸近的に達成し、ステップごとに追加の関数値や勾配評価を行わない学習率を自動的に設定する手法である。本手法は,収束率に乗算的ログ係数を付加することなく,このクラスで最初のハイパーパラメータフリーメソッドである。本手法のSGDおよびAdam変種に対する広範な実験を行い,大規模ビジョンや言語問題を含む12以上の機械学習問題に対して手作業による学習率を自動的にマッチングする手法を提案する。オープンソース実装は \url{https://github.com/facebookresearch/dadaptation} で利用可能である。

D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step. Our approach is the first hyper-parameter free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. An open-source implementation is available at \url{https://github.com/facebookresearch/dadaptation}.

翻訳日:2023-04-05 18:10:20 公開日:2023-04-03

# 映像動作認識のための階層的説明

Hierarchical Explanations for Video Action Recognition ( http://arxiv.org/abs/2301.00436v3 )

ライセンス: Link先を確認

Sadaf Gulshad, Teng Long, Nanne van Noord

(参考訳) ディープニューラルネットワークを解釈するには、視覚入力を解剖し、分類の原型的な部分を見つけることが主なアプローチである。しかし、既存の手法はこれらのプロトタイプ間の階層的関係を無視することが多く、したがってより高いレベル(ウォータースポーツなど)と低いレベル(水泳など)のセマンティック概念を説明できない。本研究では,人間認知システムに着想を得て,不確実性に対処するために階層的情報を活用する。水と人間の活動を観察しても,決定的な行動は認められない。水泳を観察した後だけ、私たちはそれを水泳行動に精練することができる。この目的のために,プロトタイプとクラス間の階層関係を構築するための階層型プロトタイプ記述器 (HIPE) を提案する。 HIPEは、入力されたビデオフレームをクラス階層の複数のレベルに分割することで、ビデオアクション分類の推論プロセスを可能にし、この手法は他のビデオタスクにも適用できる。本手法の信頼性は,ActivityNet と UCF-101 の精度・説明可能性トレードオフを減らし,マルチレベルな説明を提供することによって検証する。

To interpret deep neural networks, one main approach is to dissect the visual input and find the prototypical parts responsible for the classification. However, existing methods often ignore the hierarchical relationship between these prototypes, and thus can not explain semantic concepts at both higher level (e.g., water sports) and lower level (e.g., swimming). In this paper inspired by human cognition system, we leverage hierarchal information to deal with uncertainty: When we observe water and human activity, but no definitive action it can be recognized as the water sports parent class. Only after observing a person swimming can we definitively refine it to the swimming action. To this end, we propose HIerarchical Prototype Explainer (HIPE) to build hierarchical relations between prototypes and classes. HIPE enables a reasoning process for video action classification by dissecting the input video frames on multiple levels of the class hierarchy, our method is also applicable to other video tasks. The faithfulness of our method is verified by reducing accuracy-explainability trade off on ActivityNet and UCF-101 while providing multi-level explanations.

翻訳日:2023-04-05 18:08:29 公開日:2023-04-03

# 光円錐弦の平滑化について

The Rough with the Smooth of the Light Cone String ( http://arxiv.org/abs/2212.14822v2 )

ライセンス: Link先を確認

Norbert Dragon and Florian Oppermann

(参考訳) ポアンカーイ群のユニタリ表現の生成元は滑らかな波動関数を滑らかな波動関数に写像する代数を生成する。この数学的結果は、以前は非有界作用素の代数的処理が正当化されると仮定した物理学者にとって非常に歓迎されている。しかし、滑らかさは、滑らかな波動関数を滑らかでない函数に写像する粗い作用素がポアンカルの対称性と矛盾する副作用を持つ:それらの生成元との積は定義できない。粗かつ滑らかな作用素は共通代数のメンバーではない。 transverse heisenberg pairs $x^i$ and $p^j$, $i,j\in \{1,\dots d-2\}$, $p_z = p^{d-1}$, $p^+=(p^0 + p_z)/\sqrt{2}$, 光円錐弦で起こるように、大まかに質量のない多重集合に作用する。それらの代数の領域は回転によって自身に写像されず、ローレンツ変換だけに留まる。これは全ての次元において真であり、ボソニック弦の臨界次元 $d=26$ の代数的計算を無意味にする: no dimension $d > 2$ では、光円錐弦はローレンツ群のユニタリ表現を許容する。無質量多重は空間的位置演算子 $\vec x$ と矛盾し、空間的モーメントの変換を生成する。

The generators of unitary representations of the Poincar\'e group generate an algebra which maps smooth wavefunctions to smooth wavefunctions. This mathematical result is highly welcome to physicists, who previously just assumed their algebraic treatment of unbounded operators be justified. The smoothness, however, has the side effect that rough operators, which map smooth wavefunctions to functions which are not smooth, are inconsistent with Poincar\'e symmetry: their product with the generators cannot be defined. Rough and smooth operators are not members of a common algebra. Transverse Heisenberg pairs $X^i$ and $P^j$, $i,j\in \{1,\dots D-2\}$, $P_z = P^{D-1}$, which commute with $P^+=(P^0 + P_z)/\sqrt{2}$, as they occur in the light cone string, act roughly on massless multiplets. The domain of their algebra is not mapped to itself by rotations, leave alone Lorentz transformations. This is true in all dimensions and makes the algebraic calculation of the critical dimension, $D=26$, of the bosonic string meaningless: in no dimension $D > 2$ does the light cone string admit a unitary representation of the Lorentz group. Massless multiplets are inconsistent with a spatial position operator $\vec X$, which generates translations of the spatial momentum.

翻訳日:2023-04-05 18:08:08 公開日:2023-04-03

# 因果カミソリ

Causal Razors ( http://arxiv.org/abs/2302.10331v2 )

ライセンス: Link先を確認

Wai-yin Lam

(参考訳) 因果発見を行う場合、真の因果メカニズムが基礎となる確率分布とどのように対応しているかを仮定する必要がある。これらの仮定は、この作品において因果的なカミソリとしてラベル付けされる。文献に登場する多数のカミソリについて検討し,それらを包括的に比較した。特に,多項因果モデルにおける不人気因果関係,すなわちパラメータ最小性,および他のよく研究された因果関係を精査する。我々の論理的結果は、スコアベースのカジュアル検索アルゴリズムの適切なスコアリング基準を選択する際のジレンマとなる。

When performing causal discovery, assumptions have to be made on how the true causal mechanism corresponds to the underlying joint probability distribution. These assumptions are labeled as causal razors in this work. We review numerous causal razors that appeared in the literature, and offer a comprehensive logical comparison of them. In particular, we scrutinize an unpopular causal razor, namely parameter minimality, in multinomial causal models and its logical relations with other well-studied causal razors. Our logical result poses a dilemma in selecting a reasonable scoring criterion for score-based casual search algorithms.

翻訳日:2023-04-05 18:00:40 公開日:2023-04-03

# 潜在拡散前処理によるテキスト駆動視覚合成

Text-driven Visual Synthesis with Latent Diffusion Prior ( http://arxiv.org/abs/2302.08510v2 )

ライセンス: Link先を確認

Ting-Hsuan Liao, Songwei Ge, Yiran Xu, Yao-Chih Lee, Badour AlBahar and Jia-Bin Huang

(参考訳) テキストからの3Dオブジェクト合成や画像編集,カスタマイズ生成といった,汎用的な下流アプリケーションを可能にする拡散モデルによって駆動される大規模テキスト・画像合成は,大きな進歩を遂げている。本稿では,様々な視覚合成タスクにおいて,遅延拡散モデルを用いた画像先行処理を提案する。このようなプリエントを利用する既存のメソッドは、これらのモデルの完全な機能を使用しない。これを改善するための中核となるアイデアは 1) デコーダの異なるレイヤからの機能の損失をマッチングして詳細なガイダンスを提供する機能 2) 予測潜伏特性を規則化し, 訓練を安定させるKL分散損失。提案手法の有効性を,テキストから3D,スタイルGAN適応,階層画像編集の3つの異なるアプリケーションに示す。その結果,本手法はベースラインと良好に比較できることがわかった。

There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. Existing methods that utilize such priors fail to use these models' full capabilities. To improve this, our core ideas are 1) a feature matching loss between features from different layers of the decoder to provide detailed guidance and 2) a KL divergence loss to regularize the predicted latent features and stabilize the training. We demonstrate the efficacy of our approach on three different applications, text-to-3D, StyleGAN adaptation, and layered image editing. Extensive results show our method compares favorably against baselines.

翻訳日:2023-04-05 18:00:28 公開日:2023-04-03

# ChatGPT障害の分類的アーカイブ

A Categorical Archive of ChatGPT Failures ( http://arxiv.org/abs/2302.03494v8 )

ライセンス: Link先を確認

Ali Borji

(参考訳) 大規模言語モデルは様々な分野で有用であることが示されている。 OpenAIが開発したChatGPTは、大量のデータを使って訓練され、コンテキストを理解し、適切な応答を生成することで人間の会話をシミュレートしている。幅広い質問に効果的に答える能力が、セキュリティと有用性の両方において、従来の公開チャットボットを上回っているため、大きな注目を集めている。しかし、ChatGPTの失敗の包括的分析は欠落しており、この研究の焦点となっている。推論、事実的エラー、数学、コーディング、バイアスを含む11の障害カテゴリが提示され、議論されている。 ChatGPTのリスク、制限、社会的意味も強調されている。本研究の目的は,将来の言語モデルやチャットボットの強化を支援することにある。

Large language models have been demonstrated to be valuable in different fields. ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation by comprehending context and generating appropriate responses. It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries, with fluent and comprehensive answers surpassing prior public chatbots in both security and usefulness. However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study. Eleven categories of failures, including reasoning, factual errors, math, coding, and bias, are presented and discussed. The risks, limitations, and societal implications of ChatGPT are also highlighted. The goal of this study is to assist researchers and developers in enhancing future language models and chatbots.

翻訳日:2023-04-05 17:59:27 公開日:2023-04-03

# 対比自己スーパービジョンのための補正不変学習

Amortised Invariance Learning for Contrastive Self-Supervision ( http://arxiv.org/abs/2302.12712v2 )

ライセンス: Link先を確認

Ruchika Chavhan, Henry Gouk, Jan Stuehmer, Calum Heggan, Mehrdad Yaghoobi, Timothy Hospedales

(参考訳) 対照的な自己教師付き学習法は、異なるデータ拡張に対する不変性を学習することで、高品質な転送可能表現を作り出すことで有名である。事前学習中に確立された不変性は強い帰納バイアスと解釈できる。しかし、下流タスクの不変性要件に適合するかどうかによっては、これらは役に立たないかもしれない。このことは、事前訓練中にタスク固有の不変性を学習するいくつかの試みにつながっているが、これらの手法は高度に計算集約され、訓練に手間がかかる。対照的自己管理のための無形不分散学習の概念を導入する。事前学習の段階では,特徴抽出器のパラメータ化を,表現によって符号化された不変量を制御する可変不変超パラメータで行う。そして、ダウンストリームタスクに対して、線形読み出しとタスク固有の不変条件の両方を、勾配差により効率よく、効果的に学習することができる。 ResNets や Vision Transformers などの一般的なアーキテクチャを用いた SimCLR と MoCo-v2 と,ResNet-18 を用いた SimCLR という2つの異なる手法を用いて, 視覚と音声の2つの異なる相違点を比較検討した。我々は、一つの機能を使用し、タスク固有の事前学習を避けながら、異なる不変条件で多様な下流タスクを学習する信頼性の高い方法を提供することを示す。これは、汎用表現学習の分野での新しい地平を開くエキサイティングな視点を提供する。

Contrastive self-supervised learning methods famously produce high quality transferable representations by learning invariances to different data augmentations. Invariances established during pre-training can be interpreted as strong inductive biases. However these may or may not be helpful, depending on if they match the invariance requirements of downstream tasks or not. This has led to several attempts to learn task-specific invariances during pre-training, however, these methods are highly compute intensive and tedious to train. We introduce the notion of amortised invariance learning for contrastive self supervision. In the pre-training stage, we parameterize the feature extractor by differentiable invariance hyper-parameters that control the invariances encoded by the representation. Then, for any downstream task, both linear readout and task-specific invariance requirements can be efficiently and effectively learned by gradient-descent. We evaluate the notion of amortised invariances for contrastive learning over two different modalities: vision and audio, on two widely-used contrastive learning methods in vision: SimCLR and MoCo-v2 with popular architectures like ResNets and Vision Transformers, and SimCLR with ResNet-18 for audio. We show that our amortised features provide a reliable way to learn diverse downstream tasks with different invariance requirements, while using a single feature and avoiding task-specific pre-training. This provides an exciting perspective that opens up new horizons in the field of general purpose representation learning.

翻訳日:2023-04-05 17:49:39 公開日:2023-04-03

# AHPにおける安全な判断集約に向けて

Towards secure judgments aggregation in AHP ( http://arxiv.org/abs/2303.15099v2 )

ライセンス: Link先を確認

Konrad Ku{\l}akowski and Jacek Szybowski and Jiri Mazurek and Sebastian Ernst

(参考訳) 意思決定においては、専門家が正直で専門家であると仮定するのが一般的である。しかし、グループ分析階層プロセス(GAHP)のようなグループ決定フレームワークの1つ以上の専門家が、結果の操作を好意的に行おうとする場合は、そうではない。本研究の目的は,GAHPに2つのヒューリスティックスを導入し,加重を小さくすることでマニピュレータの検出とグループコンセンサスへの影響を最小化することである。最初のヒューリスティックは、マニピュレータがグループの他の専門家の判断に対して外れ値と見なすことができる判断を提供するという仮定に基づいている。第二のヒューリスティックは、不正直な判断はグループの平均的な一貫性よりも一貫性が低いと仮定する。どちらのアプローチも数値的な例とシミュレーションで示される。

In decision-making methods, it is common to assume that the experts are honest and professional. However, this is not the case when one or more experts in the group decision making framework, such as the group analytic hierarchy process (GAHP), try to manipulate results in their favor. The aim of this paper is to introduce two heuristics in the GAHP, setting allowing to detect the manipulators and minimize their effect on the group consensus by diminishing their weights. The first heuristic is based on the assumption that manipulators will provide judgments which can be considered outliers with respect to those of the rest of the experts in the group. The second heuristic assumes that dishonest judgments are less consistent than the average consistency of the group. Both approaches are illustrated with numerical examples and simulations.

翻訳日:2023-04-05 17:33:04 公開日:2023-04-03

# 質問に答えるにはどのような質問が必要か? AskReddit 質問のケーススタディ

What Types of Questions Require Conversation to Answer? A Case Study of AskReddit Questions ( http://arxiv.org/abs/2303.17710v2 )

ライセンス: Link先を確認

Shih-Hong Huang, Chieh-Yang Huang, Ya-Fang Lin, Ting-Hao 'Kenneth' Huang

(参考訳) チャットボット、音声対話システム、スマートスピーカーなどの自動会話システムの普及は、現代のデジタル生活に大きな影響を与えている。しかし、これらのシステムは、ユーザが複雑な不明確な質問を探索するのを支援するのではなく、よく定義された質問に対する回答を提供するように設計されている。本稿では,会話を通じて最も答えられる不明瞭でオープンな質問のタイプを調べることにより,会話システムの境界を押し上げることを目的とする。最初にAskRedditに投稿された100万件のオープンエンドリクエストから500件の質問をサンプリングし、オンラインの群衆労働者を雇い、これらの質問について8つの質問に答えた。また、オープンコーディングを行い、質問を27の異なる領域に分類しました。人々が十分解決するために会話を必要とすると考える問題は、高度に社会的で個人的なものであることが分かりました。私たちの研究は、将来の研究がどのようにユーザのニーズに合わせることができるかについての洞察を提供する。

The proliferation of automated conversational systems such as chatbots, spoken-dialogue systems, and smart speakers, has significantly impacted modern digital life. However, these systems are primarily designed to provide answers to well-defined questions rather than to support users in exploring complex, ill-defined questions. In this paper, we aim to push the boundaries of conversational systems by examining the types of nebulous, open-ended questions that can best be answered through conversation. We first sampled 500 questions from one million open-ended requests posted on AskReddit, and then recruited online crowd workers to answer eight inquiries about these questions. We also performed open coding to categorize the questions into 27 different domains. We found that the issues people believe require conversation to resolve satisfactorily are highly social and personal. Our work provides insights into how future research could be geared to align with users' needs.

翻訳日:2023-04-05 17:23:15 公開日:2023-04-03

# CQMに基づく組合せ問題の解法と薬物設計への応用

A CQM-based approach to solving a combinatorial problem with applications in drug design ( http://arxiv.org/abs/2303.15419v2 )

ライセンス: Link先を確認

B. Maurice Benson, Victoria M. Ingman, Abhay Agarwal, Shahar Keinan

(参考訳) D-WaveのLeap Hybrid solverの使用は、Knapsack最適化問題の解決において、ダイナーの制約に合う固定メニューから食事の組み合わせを見つけることで実証されている。これは、最適化問題をCQM(Constrained Quadratic Model)として初めて定式化し、量子アニーラーに送信することで実現される。ここでは、必要なステップと実装されたコードを強調し、ChickenとWaffleのレストランメニューからのソリューションを提供します。さらに、このモデルがどのように一般化され、多くの複雑でしばしば矛盾する構造と性質の制約のある大きな探索空間内で最適な薬物分子を見つけるかについて議論する。

The use of D-Wave's Leap Hybrid solver is demonstrated here in solving a Knapsack optimization problem: finding meal combinations from a fixed menu that fit a diner's constraints. This is done by first formulating the optimization problem as a Constrained Quadratic Model (CQM) and then submitting it to a quantum annealer. We highlight here the steps needed, as well as the implemented code, and provide solutions from a Chicken and Waffle restaurant menu. Additionally, we discuss how this model may be generalized to find optimal drug molecules within a large search space with many complex, and often contradictory, structures and property constraints.

翻訳日:2023-04-05 17:20:38 公開日:2023-04-03

# メンタルヘルス記録テキストにおける痛みの症状の特定 : 自然言語処理アプローチ

Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach ( http://arxiv.org/abs/2304.01240v1 )

ライセンス: Link先を確認

Jaya Chaturvedi, Sumithra Velupillai, Robert Stewart, Angus Roberts

(参考訳) 痛みは医療資源にアクセスする一般的な理由であり、特に精神的な健康と重なる研究領域が増加している。メンタルヘルスの電子健康記録は、この重複を研究する良いデータ源である。しかし、痛みに関する多くの情報はこれらの記録の自由なテキストに保持されており、痛みに関する言及はあいまいな性質のため、独特の自然言語処理の問題をもたらす。このプロジェクトは匿名のメンタルヘルス電子健康記録データベースからのデータを利用する。データは、機械学習に基づく分類アルゴリズムを訓練し、患者の痛みについて議論するか否かを分類する。これにより、大きなデータベースから関連する痛み情報を抽出し、痛みとメンタルヘルスのさらなる研究にそのようなアウトプットを使用することが容易になる。 1,985の文書は、3つの一般的な分類アルゴリズムを訓練するために使用されるゴールドスタンダードトレーニングデータを作成するために手動で3重注釈付けされた。最高のパフォーマンスモデルはF1スコアが0.98(95% CI 0.98-0.99)に達した。

Pain is a common reason for accessing healthcare resources and is a growing area of research, especially in its overlap with mental health. Mental health electronic health records are a good data source to study this overlap. However, much information on pain is held in the free text of these records, where mentions of pain present a unique natural language processing problem due to its ambiguous nature. This project uses data from an anonymised mental health electronic health records database. The data are used to train a machine learning based classification algorithm to classify sentences as discussing patient pain or not. This will facilitate the extraction of relevant pain information from large databases, and the use of such outputs for further studies on pain and mental health. 1,985 documents were manually triple-annotated for creation of gold standard training data, which was used to train three commonly used classification algorithms. The best performing model achieved an F1-score of 0.98 (95% CI 0.98-0.99).

翻訳日:2023-04-05 17:05:17 公開日:2023-04-03

# 循環型ドメインシフトの連続学習によるオンライン蒸留

Online Distillation with Continual Learning for Cyclic Domain Shifts ( http://arxiv.org/abs/2304.01239v1 )

ライセンス: Link先を確認

Joachim Houyon, Anthony Cioppa, Yasir Ghunaim, Motasem Alfarra, Ana\"is Halin, Maxim Henry, Bernard Ghanem, Marc Van Droogenbroeck

(参考訳) 近年、オンライン蒸留は、遅いが正確な教師モデルを用いてリアルタイムでディープニューラルネットワークを適用するための強力な技術として出現している。しかし、オンライン蒸留における大きな課題は、学生モデルが新しいドメインのデータで更新され、それまでの知識を忘れたときに生じるドメインシフトが破滅的なことを忘れることである。本稿では,ドメインシフトの影響を低減するために連続学習手法のパワーを活用することで,この問題に対する解決策を提案する。具体的には, オンライン蒸留の文脈において, 最先端の連続学習手法をいくつか統合し, 破滅的放棄の低減効果を実証する。さらに, 環状領域シフトの場合には, 提案する解の詳細な解析を行う。実験により, オンライン蒸留の堅牢性と精度向上に対する我々のアプローチの有効性を実証し, ビデオ監視や自律運転といった分野への応用の可能性を示した。全体として、われわれの研究はオンライン蒸留と継続的学習の分野における重要な一歩であり、現実世界のアプリケーションに大きな影響を与える可能性がある。

In recent years, online distillation has emerged as a powerful technique for adapting real-time deep neural networks on the fly using a slow, but accurate teacher model. However, a major challenge in online distillation is catastrophic forgetting when the domain shifts, which occurs when the student model is updated with data from the new domain and forgets previously learned knowledge. In this paper, we propose a solution to this issue by leveraging the power of continual learning methods to reduce the impact of domain shifts. Specifically, we integrate several state-of-the-art continual learning methods in the context of online distillation and demonstrate their effectiveness in reducing catastrophic forgetting. Furthermore, we provide a detailed analysis of our proposed solution in the case of cyclic domain shifts. Our experimental results demonstrate the efficacy of our approach in improving the robustness and accuracy of online distillation, with potential applications in domains such as video surveillance or autonomous driving. Overall, our work represents an important step forward in the field of online distillation and continual learning, with the potential to significantly impact real-world applications.

翻訳日:2023-04-05 17:05:00 公開日:2023-04-03

# Spam-T5: メールスパム検出のための大規模言語モデルのベンチマーク

Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection ( http://arxiv.org/abs/2304.01238v1 )

ライセンス: Link先を確認

Maxime Labonne and Sean Moran

(参考訳) 本稿では,メールスパム検出における大規模言語モデル (LLM) の有効性について,BERT-like, Sentence Transformers, Seq2Seq の3家系の著名なモデルを比較検討した。さらに,Na\"ive Bayes や LightGBM などのスパム検出のための機械学習手法をベースライン手法として検討した。 4つの公開データセットにまたがってこれらのモデルの性能を評価し、異なる数のトレーニングサンプル(フルトレーニングセットと数ショット設定)を利用する。その結果,ほとんどのケースでllmが一般的なベースライン技術,特に少数のシナリオのパフォーマンスを上回っていることが明らかとなった。この適応性は、ラベル付きサンプルの数に制限があり、モデルは頻繁な更新を必要とするスパム検出タスクに特有のLLMをレンダリングする。さらに,eメールのスパム検出に特化・微調整されたflan-t5モデルについても紹介する。以上の結果から,Spam-T5 がベースラインモデルや他の LLM をはるかに上回っていることが明らかとなった。私たちのコードはhttps://github.com/jpmorganchase/emailspamdetectionで公開されています。

This paper investigates the effectiveness of large language models (LLMs) in email spam detection by comparing prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. Additionally, we examine well-established machine learning techniques for spam detection, such as Na\"ive Bayes and LightGBM, as baseline methods. We assess the performance of these models across four public datasets, utilizing different numbers of training samples (full training set and few-shot settings). Our findings reveal that, in the majority of cases, LLMs surpass the performance of the popular baseline techniques, particularly in few-shot scenarios. This adaptability renders LLMs uniquely suited to spam detection tasks, where labeled samples are limited in number and models require frequent updates. Additionally, we introduce Spam-T5, a Flan-T5 model that has been specifically adapted and fine-tuned for the purpose of detecting email spam. Our results demonstrate that Spam-T5 surpasses baseline models and other LLMs in the majority of scenarios, particularly when there are a limited number of training samples available. Our code is publicly available at https://github.com/jpmorganchase/emailspamdetection.

翻訳日:2023-04-05 17:04:43 公開日:2023-04-03

# ADMG Causal Data Augmentation の実用化のためのガイドライン

A Guide for Practical Use of ADMG Causal Data Augmentation ( http://arxiv.org/abs/2304.01237v1 )

ライセンス: Link先を確認

Poinsot Audrey, Leite Alessandro

(参考訳) 小規模データレジームに機械学習を適用する場合、データ拡張は不可欠である。観測されたデータ分布に従って新しいサンプルを生成し、その多様性と多様性を高め、研究者や実践者がモデルの堅牢性を改善し、現実世界にデプロイするのに役立つ。それでも、基盤となるデータメカニズムに関する事前の知識がほとんど考慮されず、生成されたデータの忠実さと多様性が制限されるため、表形式のデータでの使用は改善される必要がある。因果グラフにエンコードされた条件付き独立性に依存することにより、これらの課題に対処するための解決策として因果的データ拡張戦略が指摘されている。本稿では,ADMGの因果拡大手法を実験的に分析し,事前知識が新たなデータポイントの生成に役立っているかを理解する上で,研究者や実践者を支援するために異なる設定を考慮に入れた。その結果,研究手法が注目された。 (a) 基礎となるモデル機構とは独立である。 (b) MLモデルの精度を向上させるために、小さなデータ構造において困難となる最小限の観測値を必要とする。 (c)モデルの性能を低下させる拡張集合に異常値を伝達し、 (d)はハイパーパラメータの値に敏感である。

Data augmentation is essential when applying Machine Learning in small-data regimes. It generates new samples following the observed data distribution while increasing their diversity and variability to help researchers and practitioners improve their models' robustness and, thus, deploy them in the real world. Nevertheless, its usage in tabular data still needs to be improved, as prior knowledge about the underlying data mechanism is seldom considered, limiting the fidelity and diversity of the generated data. Causal data augmentation strategies have been pointed out as a solution to handle these challenges by relying on conditional independence encoded in a causal graph. In this context, this paper experimentally analyzed the ADMG causal augmentation method considering different settings to support researchers and practitioners in understanding under which conditions prior knowledge helps generate new data points and, consequently, enhances the robustness of their models. The results highlighted that the studied method (a) is independent of the underlying model mechanism, (b) requires a minimal number of observations that may be challenging in a small-data regime to improve an ML model's accuracy, (c) propagates outliers to the augmented set degrading the performance of the model, and (d) is sensitive to its hyperparameter's value.

翻訳日:2023-04-05 17:04:21 公開日:2023-04-03

# CONVolutional AttENTION(ConvEntion)を用いた天文画像時系列分類

Astronomical image time series classification using CONVolutional attENTION (ConvEntion) ( http://arxiv.org/abs/2304.01236v1 )

ライセンス: Link先を確認

Anass Bairouk, Marc Chaumont, Dominique Fouchez, Jerome Paquet, Fr\'ed\'eric Comby, Julian Bautista

(参考訳) 狙いだ近年,天文画像の時系列処理が注目されている。実際、過渡的な天体に関する多くの調査が進行中か建設中であり、例えばヴェラ・ルービン天文台の宇宙と時間に関するレガシーサーベイ (LSST) は、これらの時系列を大量に生成することを目指している。関連する科学的トピックは、我々の銀河内の天体の研究から、宇宙の膨張を測定するための最も遠い超新星の観測まで幅広い。膨大な量のデータが得られるため、天体を検知し分類する堅牢な自動ツールの必要性は着実に高まっている。メソッド。この研究は、天体画像が光度曲線よりも多くの情報を含んでいるという仮定に基づいている。本稿では,画像を用いて異なる種類の空間オブジェクトを分類するための深層学習に基づく新しい手法を提案する。われわれはConvEntionと命名し,ConVolutional attENTIONの略とした。これはコンボリューションとトランスフォーマーに基づいており、天文学的な画像時系列を扱うための新しいアプローチである。我々のソリューションは時空間的特徴を統合し、様々な種類の画像データセットに適用できる。結果だ本研究では,データセットが苦しむ様々な問題を解き,天文学的画像時系列を用いた分類において,画像時系列を用いた最新手法と比較し,光曲線を用いたアプローチと比較して13%の精度で分類する新しい結果を示す。

Aims. The treatment of astronomical image time series has won increasing attention in recent years. Indeed, numerous surveys following up on transient objects are in progress or under construction, such as the Vera Rubin Observatory Legacy Survey for Space and Time (LSST), which is poised to produce huge amounts of these time series. The associated scientific topics are extensive, ranging from the study of objects in our galaxy to the observation of the most distant supernovae for measuring the expansion of the universe. With such a large amount of data available, the need for robust automatic tools to detect and classify celestial objects is growing steadily. Methods. This study is based on the assumption that astronomical images contain more information than light curves. In this paper, we propose a novel approach based on deep learning for classifying different types of space objects directly using images. We named our approach ConvEntion, which stands for CONVolutional attENTION. It is based on convolutions and transformers, which are new approaches for the treatment of astronomical image time series. Our solution integrates spatio-temporal features and can be applied to various types of image datasets with any number of bands. Results. In this work, we solved various problems the datasets tend to suffer from and we present new results for classifications using astronomical image time series with an increase in accuracy of 13%, compared to state-of-the-art approaches that use image time series, and a 12% increase, compared to approaches that use light curves.

翻訳日:2023-04-05 17:04:01 公開日:2023-04-03

# グラフマルコフニューラルネットワークの公正評価

Fair Evaluation of Graph Markov Neural Networks ( http://arxiv.org/abs/2304.01235v1 )

ライセンス: Link先を確認

Pirmin Lemberger and Antoine Saillenfest

(参考訳) グラフマルコフニューラルネットワーク(GMNN)は、半教師付きノード分類タスクにラベル依存を含めることで、正規グラフニューラルネットワーク(GNN)を改善するために最近提案されている。 GMNNは理論的に原理的にこれを行い、3種類の情報を使ってラベルを予測する。通常のgnnと同じように、ノードの特徴とグラフ構造を使用するが、さらに隣のノードのラベルの情報を活用して、予測の精度を向上させる。本稿では,wikipediaの記事を32のカテゴリに分類し,2.3mのエッジで接続した48kの相互参照のグラフを含む,wikivitalsという新しいデータセットを提案する。本研究の目的は, GMNNの予測精度, 記事の内容, 相互関係, ラベル間の相関の3つの情報ソースの寄与度を厳格に評価することである。そこで本研究では,分割に対する適切なランダム化とモデル選択とモデル評価の明確な分離を用いて,GNN性能の公正比較を行う手法を提案する。

Graph Markov Neural Networks (GMNN) have recently been proposed to improve regular graph neural networks (GNN) by including label dependencies into the semi-supervised node classification task. GMNNs do this in a theoretically principled way and use three kinds of information to predict labels. Just like ordinary GNNs, they use the node features and the graph structure but they moreover leverage information from the labels of neighboring nodes to improve the accuracy of their predictions. In this paper, we introduce a new dataset named WikiVitals which contains a graph of 48k mutually referred Wikipedia articles classified into 32 categories and connected by 2.3M edges. Our aim is to rigorously evaluate the contributions of three distinct sources of information to the prediction accuracy of GMNN for this dataset: the content of the articles, their connections with each other and the correlations among their labels. For this purpose we adapt a method which was recently proposed for performing fair comparisons of GNN performance using an appropriate randomization over partitions and a clear separation of model selection and model assessment.

翻訳日:2023-04-05 17:03:39 公開日:2023-04-03

# ポテンシャル場源面(PFSS)磁気グラムへの畳み込みニューラルネットワークの適用による太陽風速の予測

Prediction of solar wind speed by applying convolutional neural network to potential field source surface (PFSS) magnetograms ( http://arxiv.org/abs/2304.01234v1 )

ライセンス: Link先を確認

Rong Lin, Zhekai Luo, Jiansen He, Lun Xie, Chuanpeng Hou, Shuwei Chen

(参考訳) 正確な太陽風速モデルは、宇宙天気予報、破滅的なイベント警告、太陽風に関するその他の問題、磁気圏相互作用に重要である。本研究では,太陽-地球系のラグランジュ1(L1)点における太陽風速の予測を目的とした,畳み込み型ニューラルネットワーク (CNN) と電位場源面 (PFSS) に基づくモデルを構築し,太陽風源面の$R_{\rm SS}=2.5R_\odot$を考慮した。このモデルの入力は4つのポテンシャル磁場源表面(pfss)磁図からなり、r_{\rm ss}$ はターゲットエポックの4日前の7, 6, 5, 4日である。還元磁図はモデルの効率を高めるために使われる。我々は、GONG(Global Oscillation Network Group)光球磁気グラムと電位場外挿モデルを用いて、ソース表面でPFSS磁気グラムを生成する。このモデルは、データの時間分解能を1時間に抑えた8倍の検証訓練スキームにおいて、平均相関係数0.52と根平均二乗誤差80.8km/sの連続テストデータセットの予測を提供する。モデルはまた、太陽風の高速流れを予測する可能性があり、一般的な脅威スコア 0.39 で定量化することができる。

An accurate solar wind speed model is important for space weather predictions, catastrophic event warnings, and other issues concerning solar wind - magnetosphere interaction. In this work, we construct a model based on convolutional neural network (CNN) and Potential Field Source Surface (PFSS) magnetograms, considering a solar wind source surface of $R_{\rm SS}=2.5R_\odot$, aiming to predict the solar wind speed at the Lagrange 1 (L1) point of the Sun-Earth system. The input of our model consists of four Potential Field Source Surface (PFSS) magnetograms at $R_{\rm SS}$, which are 7, 6, 5, and 4 days before the target epoch. Reduced magnetograms are used to promote the model's efficiency. We use the Global Oscillation Network Group (GONG) photospheric magnetograms and the potential field extrapolation model to generate PFSS magnetograms at the source surface. The model provides predictions of the continuous test dataset with an averaged correlation coefficient (CC) of 0.52 and a root mean square error (RMSE) of 80.8 km/s in an eight-fold validation training scheme with the time resolution of the data as small as one hour. The model also has the potential to forecast high speed streams of the solar wind, which can be quantified with a general threat score of 0.39.

翻訳日:2023-04-05 17:03:22 公開日:2023-04-03

# 救急部門におけるアウトカム予測のためのマルチモーダル知覚言語モデル

Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department ( http://arxiv.org/abs/2304.01233v1 )

ライセンス: Link先を確認

Sabri Boughorbel, Fethi Jarray, Abdulaziz Al Homaid, Rashid Niaz, Khalid Alyafei

(参考訳) 言語モデリングは、高い精度と高いセマンティックコヒーレンスで魅力的なテキストを生成するという驚くべき進歩を示している。興味深い研究の方向性は、コンテキスト情報を用いた特定のアプリケーションのためのこれらの強力なモデルを強化することである。本稿では,医療アプリケーションのためのマルチモーダル言語モデリングについて検討する。主訴のテキスト情報とトリアージで記録されたバイタルサインに基づいて, 病院救急部門における結果予測と患者トリアージに関心がある。我々は、いくつかのアプリケーションで有望な結果を示すモダリティに依存しないトランスフォーマーベースのモデルであるPerceiverを適応する。バイタル符号のモダリティは表形式で表されるため,置換不変性を保証するために知覚器位置符号化を改良した。 120Kの訪問でMIMIC-IV EDデータセットを用いた診断コード予測のためのマルチモーダル言語モデルの評価を行った。実験分析では,テキストやバイタルサインのみに基づいて学習したモデルと比較して,ミュータリモダリティが予測性能を向上させることを示した。マルチモダリティがパフォーマンス向上に繋がる疾患カテゴリーを特定し,これらのカテゴリにおいて,重要な兆候が予測力を増したことを示す。クロスアテンション層を解析することにより、マルチモーダリティがモデル予測にどのように貢献するかを示す。この研究は、医療アプリケーションのためのマルチモーダル言語モデルの開発に関する興味深い洞察を与える。

Language modeling have shown impressive progress in generating compelling text with good accuracy and high semantic coherence. An interesting research direction is to augment these powerful models for specific applications using contextual information. In this work, we explore multi-modal language modeling for healthcare applications. We are interested in outcome prediction and patient triage in hospital emergency department based on text information in chief complaints and vital signs recorded at triage. We adapt Perceiver - a modality-agnostic transformer-based model that has shown promising results in several applications. Since vital-sign modality is represented in tabular format, we modified Perceiver position encoding to ensure permutation invariance. We evaluated the multi-modal language model for the task of diagnosis code prediction using MIMIC-IV ED dataset on 120K visits. In the experimental analysis, we show that mutli-modality improves the prediction performance compared with models trained solely on text or vital signs. We identified disease categories for which multi-modality leads to performance improvement and show that for these categories, vital signs have added predictive power. By analyzing the cross-attention layer, we show how multi-modality contributes to model predictions. This work gives interesting insights on the development of multi-modal language models for healthcare applications.

翻訳日:2023-04-05 17:02:53 公開日:2023-04-03

# フロッケ符号における量子セルオートマトンと異常の測定

Measurement Quantum Cellular Automata and Anomalies in Floquet Codes ( http://arxiv.org/abs/2304.01277v1 )

ライセンス: Link先を確認

David Aasen, Jeongwan Haah, Zhi Li, Roger S. K. Mong

(参考訳) パウリ測定回路における量子情報の進化について検討する。本稿では,最近導入されたFloquetトポロジカルコードに関連する1次元および2次元システムについて述べる。測定回路の文脈で局所可逆性を定義し, 同様の足場上の有限深度計測回路を有限深度ユニタリ回路に扱えるようにした。ユニタリの場合とは対照的に、有限深さ局所可逆測定列は1次元の変換を実装できる。 2次元の局所可逆測定列は、境界に沿って論理情報の流れを誘導することもある。本稿では,これらの概念を統一し,論理演算子のフローを特徴づける指標を1次元で定義する「測定量子セルオートマトン」を提案する。 Floquet位相符号の$\mathbb{Z}_2$バルク不変量は、自明な境界を持つことの障害を示す。我々は、Hastings-Haah ハニカム符号がそのような障害のあるクラスに属することを証明し、任意の境界は非局所力学、周期倍、あるいは量子情報の境界フローを持つ必要があることを意味する。

We investigate the evolution of quantum information under Pauli measurement circuits. We focus on the case of one- and two-dimensional systems, which are relevant to the recently introduced Floquet topological codes. We define local reversibility in context of measurement circuits, which allows us to treat finite depth measurement circuits on a similar footing to finite depth unitary circuits. In contrast to the unitary case, a finite depth locally reversible measurement sequence can implement a translation in one dimension. A locally reversible measurement sequence in two dimensions may also induce a flow of logical information along the boundary. We introduce "measurement quantum cellular automata" which unifies these ideas and define an index in one dimension to characterize the flow of logical operators. We find a $\mathbb{Z}_2$ bulk invariant for Floquet topological codes which indicates an obstruction to having a trivial boundary. We prove that the Hastings-Haah honeycomb code belong to a class with such obstruction, which means that any boundary must have either non-local dynamics, period doubled, or admits boundary flow of quantum information.

翻訳日:2023-04-05 16:56:34 公開日:2023-04-03

# ノイズ量子電池からの作業抽出過程--非局所的資源の役割

Work extraction processes from noisy quantum batteries: the role of non local resources ( http://arxiv.org/abs/2304.01270v1 )

ライセンス: Link先を確認

Salvatore Tirone, Raffaele Salvia, Stefano Chessa and Vittorio Giovannetti

(参考訳) 量子バッテリモデルからの作業抽出における環境騒音の悪影響を緩和するために,非局所操作で得られる有益効果と非局所状態との非対称性を示す。具体的には、ノイズ動作後の非局所回復操作を用いることで、一般に、分離可能な(非絡み合った)入力状態であっても、バッテリから回復できる作業量を増やすことができることを示す。逆に、局所回復操作で絡み合った入力状態を採用すると、一般的にバッテリー性能は向上しない。

We demonstrate an asymmetry between the beneficial effects one can obtain using non-local operations and non-local states to mitigate the detrimental effects of environmental noise in the work extraction from quantum battery models. Specifically, we show that using non-local recovery operations after the noise action can in general increase the amount of work one can recover from the battery even with separable (i.e. non entangled) input states. On the contrary, employing entangled input states with local recovery operations will not generally improve the battery performances.

翻訳日:2023-04-05 16:56:17 公開日:2023-04-03

# ランダム量子回路における情報流の動的相転移

Dynamical phase transitions of information flow in random quantum circuits ( http://arxiv.org/abs/2304.01256v1 )

ライセンス: Link先を確認

J.-Z. Zhuang, Y.-K. Wu, L.-M. Duan

(参考訳) 本研究では,ランダムクリフォード量子回路に支配される多体力学における情報の流れを解明し,この情報の流れにおけるリッチな相転移を探索する。情報フローダイナミクスの位相遷移点と臨界指数は、有限サイズスケーリングによってよく確立される。位相遷移が、情報の位置や最終プローブ領域とどのように異なるかを調べ、これらの遷移の中でユビキタスな振る舞いを見つけ、この量子多体モデルにおける情報伝播と揺らぎに関する興味深い特性を明らかにする。古典情報と量子情報のフローは、それぞれホレボとコヒーレント情報によって測定され、同様の動的相転移挙動を示す。我々の研究は、情報フローが大規模システムの様々な相転移を伴う豊富な挙動を持つことを示し、その研究は量子多体ダイナミクスの理解に新たな光を当てている。

We study how the information flows in the many-body dynamics governed by random Clifford quantum circuits and discover a rich set of dynamical phase transitions in this information flow. The phase transition points and the critical exponents for the information flow dynamics are well established through the finite-size scaling. We investigate how the phase transitions vary with the initial position where the information is located and the final probe region, and find ubiquitous behaviors in these transitions, revealing interesting properties about the information propagation and scrambling in this quantum many-body model. The flow of both classical and quantum information, measured respectively by Holevo and coherent information, show similar dynamical phase transition behaviors. Our work shows that the information flow has rich behaviors with various phase transitions for large systems and its study sheds new light on understanding of quantum many-body dynamics.

翻訳日:2023-04-05 16:56:08 公開日:2023-04-03

# 準周期駆動量子ビットによる巨大エネルギー振動

Giant energy oscillations mediated by a quasiperiodically driven qubit ( http://arxiv.org/abs/2304.01254v1 )

ライセンス: Link先を確認

Dominik Vuina, David M. Long, Phillip J.D. Crowley, Anushya Chandran

(参考訳) 2つの非共振周波数で駆動される量子ビットは、断熱限界における量子化された平均エネルギー電流を媒介することができる。非断熱過程は、駆動間で伝達されるネットエネルギーのエネルギー電流と対応する振動の反転をもたらすことを示す。振動は有界だが巨大で、量子ビットのエネルギー分割よりもはるかに大きい。ランダウ・ツェナー解析は、振動の時間スケールがドライブの周期で指数関数的に大きいと予測する。しかし, 数値解析により, この時間スケールは周期の単調関数ではなく, 断熱限界が近づくにつれてサブ構造が増加することが明らかとなった。この非単調性は、その後のランダウ・ツェナー遷移の干渉効果から生じる。巨大エネルギー振動は、窒素空洞中心での短期実験で観測可能である。

A qubit driven by two incommensurate frequencies can mediate a quantised average energy current in the adiabatic limit. We show that non-adiabatic processes result in reversals of the energy current and corresponding oscillations in the net energy transferred between the drives. The oscillations are bounded but giant -- much larger than the qubit energy splitting. A Landau-Zener analysis predicts that the timescale of the oscillations is exponentially large in the period of the drives. However, numerical analysis reveals that this timescale is not a monotonic function of the period, and has increasing sub-structure as the adiabatic limit is approached. We show that this non-monotonicity arises from interference effects between subsequent Landau-Zener transitions. Giant energy oscillations should be observable in near-term experiments with nitrogen-vacancy centers.

翻訳日:2023-04-05 16:55:53 公開日:2023-04-03

# 一元化画像の再生と拡張に先立つ生成拡散

Generative Diffusion Prior for Unified Image Restoration and Enhancement ( http://arxiv.org/abs/2304.01247v1 )

ライセンス: Link先を確認

Ben Fei, Zhaoyang Lyu, Liang Pan, Junzhe Zhang, Weidong Yang, Tianyue Luo, Bo Zhang, Bo Dai

(参考訳) 既存の画像復元法は主に自然画像の後方分布を利用する。しかし、彼らはしばしば既知の劣化を仮定し、複雑な実アプリケーションへの適応を制限するために教師付きトレーニングも要求する。本研究では,非教師付きサンプリング方式で後部分布を効果的にモデル化する生成拡散事前(GDP)を提案する。 GDPは、線形逆問題、非線形問題、盲目の問題を解決するために、プレトレインデノナイジング拡散生成モデル(DDPM)を利用する。具体的には、GDPは、一般的なガイダンス方法よりも実用的な条件付きガイダンスのプロトコルを体系的に探求する。さらに、GDPは、デノナイジング過程における劣化モデルのパラメータを最適化し、ブラインド画像復元を達成するのに長けている。さらに、階層的なガイダンスとパッチベースの手法を考案し、GDPが任意の解像度の画像を生成することを可能にする。実験では,低照度化やHDR画像回復などの非線形・盲点問題だけでなく,高分解能,色調,彩色などの線形問題に対する複数の画像データセットに対するGDPの汎用性を実証した。 GDPは、再建品質と知覚品質の様々なベンチマークにおいて、現在指導されていない主要な手法よりも優れています。さらに、GDPは、ImageNetトレーニングセットの分布から、様々なタスクから任意のサイズで、自然画像や合成画像に対してよく一般化する。

Existing image restoration methods mostly leverage the posterior distribution of natural images. However, they often assume known degradation and also require supervised training, which restricts their adaptation to complex real applications. In this work, we propose the Generative Diffusion Prior (GDP) to effectively model the posterior distributions in an unsupervised sampling manner. GDP utilizes a pre-train denoising diffusion generative model (DDPM) for solving linear inverse, non-linear, or blind problems. Specifically, GDP systematically explores a protocol of conditional guidance, which is verified more practical than the commonly used guidance way. Furthermore, GDP is strength at optimizing the parameters of degradation model during the denoising process, achieving blind image restoration. Besides, we devise hierarchical guidance and patch-based methods, enabling the GDP to generate images of arbitrary resolutions. Experimentally, we demonstrate GDP's versatility on several image datasets for linear problems, such as super-resolution, deblurring, inpainting, and colorization, as well as non-linear and blind issues, such as low-light enhancement and HDR image recovery. GDP outperforms the current leading unsupervised methods on the diverse benchmarks in reconstruction quality and perceptual quality. Moreover, GDP also generalizes well for natural images or synthesized images with arbitrary sizes from various tasks out of the distribution of the ImageNet training set.

翻訳日:2023-04-05 16:55:42 公開日:2023-04-03

# 大規模言語モデルにおける安全性分析:ChatGPTを用いたSTPAの事例

Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT ( http://arxiv.org/abs/2304.01246v1 )

ライセンス: Link先を確認

Yi Qi, Xingyu Zhao, Xiaowei Huang

(参考訳) ChatGPTやBERTといった大規模言語モデル(LLM)は、多くの知識領域にわたる詳細な回答を備えた人間のような会話によって、新たなAI熱波を導いている。 LLMは多くのAIアプリケーションドメインに迅速に適用されていますが、私たちは次のような質問に興味を持っています。本稿では,ChatGPTを用いた自動緊急ブレーキ(AEB)システムにおけるシステム理論プロセス解析(STPA)の事例研究を行う。リスク分析において最も普及している技術の一つであるSTPAは,高い複雑性や主観性といった限界があることが知られており,本論文はChatGPTを用いて対処することを目的としている。具体的には、ChatGPTをSTPAに組み込む3つの方法について、ヒトの専門家との相互作用を考慮し検討した。比較の結果は一人間の専門家の介入なしにChatGPTを使用することは、LCMの信頼性及び精度の問題により不十分である。 (ii)ChatGPTと人間の専門家との相互作用がより良くなり、そして 3)STPAにおけるChatGPTの使用は,既存の比較手法をベースラインに再利用することにより,ヒトの安全専門家を単独で上回りうる。安全分析にLSMを適用しようとする試みに加えて,今後の研究に向けた重要な課題(LCMの信頼性に関する懸念,標準化の必要性など)も挙げる。

Large Language Models (LLMs), such as ChatGPT and BERT, are leading a new AI heatwave due to its human-like conversations with detailed and articulate answers across many domains of knowledge. While LLMs are being quickly applied to many AI application domains, we are interested in the following question: Can safety analysis for safety-critical systems make use of LLMs? To answer, we conduct a case study of Systems Theoretic Process Analysis (STPA) on Automatic Emergency Brake (AEB) systems using ChatGPT. STPA, one of the most prevalent techniques for hazard analysis, is known to have limitations such as high complexity and subjectivity, which this paper aims to explore the use of ChatGPT to address. Specifically, three ways of incorporating ChatGPT into STPA are investigated by considering its interaction with human experts: one-off simplex interaction, recurring simplex interaction, and recurring duplex interaction. Comparative results reveal that: (i) using ChatGPT without human experts' intervention can be inadequate due to reliability and accuracy issues of LLMs; (ii) more interactions between ChatGPT and human experts may yield better results; and (iii) using ChatGPT in STPA with extra care can outperform human safety experts alone, as demonstrated by reusing an existing comparison method with baselines. In addition to making the first attempt to apply LLMs in safety analysis, this paper also identifies key challenges (e.g., trustworthiness concern of LLMs, the need of standardisation) for future research in this direction.

翻訳日:2023-04-05 16:55:03 公開日:2023-04-03

# 自律型サイバーエージェントのための統一エミュレーションシミュレーション訓練環境

Unified Emulation-Simulation Training Environment for Autonomous Cyber Agents ( http://arxiv.org/abs/2304.01244v1 )

ライセンス: Link先を確認

Li Li, Jean-Pierre S. El Rami, Adrian Taylor, James Hailing Rao, and Thomas Kunz

(参考訳) 自律型サイバーエージェントは、エージェントが代表的環境で訓練される強化および深層強化学習(RL/DRL)を適用して開発することができる。トレーニング環境は、エージェントが探索しようとするネットワークサイバーオペレーション(CyOp)の高忠実度をシミュレートする必要がある。ネットワークのcyopsの複雑さを考えると、良いシミュレータは達成が難しい。本研究は,Cyber Gym for Intelligent Learning (CyGIL)において,高忠実度シミュレータを自動生成する手法を提案する。表現学習と連続学習を通じて、CyGIL-EがシミュレートされたCyGIL-Sを自動的に生成する統一されたCyOpトレーニング環境を提供する。シミュレータ生成はエージェントトレーニングプロセスと統合され、必要なエージェントトレーニング時間を更に短縮する。 CyGIL-Sで訓練されたエージェントは、エミュレートされた「リアル」ネットワークへの完全な転送性を示すCyGIL-Eに直接転送可能である。サイギルトレーニング性能を実証するために実験を行った。オフラインでRLを実行するCyGILソリューションは、現実のサイバーネットワークでRLエージェントを活用するためのsim-to-realに向けた有望な方向を示す。

Autonomous cyber agents may be developed by applying reinforcement and deep reinforcement learning (RL/DRL), where agents are trained in a representative environment. The training environment must simulate with high-fidelity the network Cyber Operations (CyOp) that the agent aims to explore. Given the complexity of net-work CyOps, a good simulator is difficult to achieve. This work presents a systematic solution to automatically generate a high-fidelity simulator in the Cyber Gym for Intelligent Learning (CyGIL). Through representation learning and continuous learning, CyGIL provides a unified CyOp training environment where an emulated CyGIL-E automatically generates a simulated CyGIL-S. The simulator generation is integrated with the agent training process to further reduce the required agent training time. The agent trained in CyGIL-S is transferrable directly to CyGIL-E showing full transferability to the emulated "real" network. Experimental results are presented to demonstrate the CyGIL training performance. Enabling offline RL, the CyGIL solution presents a promising direction towards sim-to-real for leveraging RL agents in real-world cyber networks.

翻訳日:2023-04-05 16:54:29 公開日:2023-04-03

# CoReFusion: 誘導熱超解法のための対照的な正則核融合

CoReFusion: Contrastive Regularized Fusion for Guided Thermal Super-Resolution ( http://arxiv.org/abs/2304.01243v1 )

ライセンス: Link先を確認

Aditya Kasliwal, Pratinav Seth, Sriya Rallabandi and Sanchit Singhal

(参考訳) 低照度環境ではよく機能するため、通常の可視域撮像に比べて多くの利点がある。超解像アプローチは、低コスト・低解像熱センサによる測定を用いて正確な高解像熱画像の再現により、その有用性を広げることができる。画像間のスペクトル範囲ミスマッチのため、可視範囲画像を用いた熱画像の誘導超解像は困難である。しかし、可視範囲画像のキャプチャに失敗した場合、重要な領域でのアプリケーションの動作を防止できる。熱画像のガイド超解像のための新しいデータ融合フレームワークと正規化手法を提案する。提案するアーキテクチャは,高分解能rgb画像や低分解能熱画像の1つが欠落しているにも関わらず,性能を維持できるとともに,計算能力に優れ,軽量であり,欠落データの存在下では堅牢に設計されている。提案手法は,実世界のシナリオにおいてしばしば発生する欠落モダリティ問題に対する有望な解決法である。コードはhttps://github.com/Kasliwal17/CoReFusion.comで入手できる。

Thermal imaging has numerous advantages over regular visible-range imaging since it performs well in low-light circumstances. Super-Resolution approaches can broaden their usefulness by replicating accurate high-resolution thermal pictures using measurements from low-cost, low-resolution thermal sensors. Because of the spectral range mismatch between the images, Guided Super-Resolution of thermal images utilizing visible range images is difficult. However, In case of failure to capture Visible Range Images can prevent the operations of applications in critical areas. We present a novel data fusion framework and regularization technique for Guided Super Resolution of Thermal images. The proposed architecture is computationally in-expensive and lightweight with the ability to maintain performance despite missing one of the modalities, i.e., high-resolution RGB image or the lower-resolution thermal image, and is designed to be robust in the presence of missing data. The proposed method presents a promising solution to the frequently occurring problem of missing modalities in a real-world scenario. Code is available at https://github.com/Kasliwal17/CoReFusion.

翻訳日:2023-04-05 16:54:08 公開日:2023-04-03

# マルチチャネル不均一学習によるエビデンスグラフを用いた臨床エビデンス勧告の強化

Enhancing Clinical Evidence Recommendation with Multi-Channel Heterogeneous Learning on Evidence Graphs ( http://arxiv.org/abs/2304.01242v1 )

ライセンス: Link先を確認

Maolin Luo, and Xiang Zhang

(参考訳) 臨床証拠には、患者の関連や影響、介入(薬物や理学療法など)、問題、結果が含まれる。臨床エビデンスを推奨する目標は、医療従事者に意思決定プロセスを支援するための関連する情報を提供し、新たな証拠を生み出すことである。具体的なタスクは、臨床問題に基づくエビデンスを推奨することに集中します。しかし、特定の臨床的問題と関連する証拠との直接的なつながりは、しばしば疎結合の課題となる。また、適切な証拠を推薦するには、証拠間のトポロジカルな関係とそれらを記述するテキスト情報の両方を共同利用することが不可欠である。これらの課題に対処するために,エビデンス共参照グラフとエビデンステキストグラフという2つの知識グラフを定義し,各要素間のトポロジ的および言語的関係を表現する。また,エビデンス・レコメンデーションにおける共参照テキストの不均質性を扱うために,多チャンネル不均質学習モデルと融合注意機構を提案する。実験により,オープンデータ上での最先端手法に勝ることを示す。

Clinical evidence encompasses the associations and impacts between patients, interventions (such as drugs or physiotherapy), problems, and outcomes. The goal of recommending clinical evidence is to provide medical practitioners with relevant information to support their decision-making processes and to generate new evidence. Our specific task focuses on recommending evidence based on clinical problems. However, the direct connections between certain clinical problems and related evidence are often sparse, creating a challenge of link sparsity. Additionally, to recommend appropriate evidence, it is essential to jointly exploit both topological relationships among evidence and textual information describing them. To address these challenges, we define two knowledge graphs: an Evidence Co-reference Graph and an Evidence Text Graph, to represent the topological and linguistic relations among evidential elements, respectively. We also introduce a multi-channel heterogeneous learning model and a fusional attention mechanism to handle the co-reference-text heterogeneity in evidence recommendation. Our experiments demonstrate that our model outperforms state-of-the-art methods on open data.

翻訳日:2023-04-05 16:53:52 公開日:2023-04-03

# ドラヴィダ語におけるホモフォビア・トランスフォビアの検出:深層学習手法の探索

Detection of Homophobia & Transphobia in Dravidian Languages: Exploring Deep Learning Methods ( http://arxiv.org/abs/2304.01241v1 )

ライセンス: Link先を確認

Deepawali Sharma, Vedika Gupta, Vivek Kumar Singh

(参考訳) オンラインソーシャルメディアプラットフォームにおける乱用コンテンツの増加は、オンラインユーザーの社会生活に影響を与えている。攻撃的・憎悪的な言葉の使用は、いわゆるメディアを有害なものにしている。ホモフォビアとトランスフォビアはLGBT+コミュニティに対する攻撃的なコメントを構成している。これらのコメントを検知し、処理し、タイムリーにフラグを立てたり、ユーザーに対して警告を発したりすることが必須になる。しかし,このようなコンテンツの自動検出は,低リソース言語として認識されるドラヴィダ語では難しい課題である。そこで本論文は,マラヤラムとタミル・ランゲージのソーシャルメディアコメントをホモフォビック,トランスフォビック,非LGBT+コンテンツとして分類するために,異なるディープラーニングモジュールの適用性を検討する。一般的なディープラーニングモデルである畳み込みニューラルネットワーク(CNN)、GloVe埋め込みを用いたLong Short Term Memory(LSTM)、トランスフォーマーベース学習モデル(Multilingual BERTおよびIndicBERT)を分類問題に適用する。その結果、IndicBERTはマラヤラムとタミルでそれぞれ0.86と0.77の重み付き平均F1スコアで他の不完全なモデルよりも優れていた。そこで本研究では,選択したドラヴィダ語言語における与えられたタスクに対するIndicBERTのより高い性能を確認した。

The increase in abusive content on online social media platforms is impacting the social life of online users. Use of offensive and hate speech has been making so-cial media toxic. Homophobia and transphobia constitute offensive comments against LGBT+ community. It becomes imperative to detect and handle these comments, to timely flag or issue a warning to users indulging in such behaviour. However, automated detection of such content is a challenging task, more so in Dravidian languages which are identified as low resource languages. Motivated by this, the paper attempts to explore applicability of different deep learning mod-els for classification of the social media comments in Malayalam and Tamil lan-guages as homophobic, transphobic and non-anti-LGBT+content. The popularly used deep learning models- Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) using GloVe embedding and transformer-based learning models (Multilingual BERT and IndicBERT) are applied to the classification problem. Results obtained show that IndicBERT outperforms the other imple-mented models, with obtained weighted average F1-score of 0.86 and 0.77 for Malayalam and Tamil, respectively. Therefore, the present work confirms higher performance of IndicBERT on the given task in selected Dravidian languages.

翻訳日:2023-04-05 16:53:34 公開日:2023-04-03

# 非生成エネルギーモデル

Non-Generative Energy Based Models ( http://arxiv.org/abs/2304.01297v1 )

ライセンス: Link先を確認

Jacob Piland and Christopher Sweet and Priscila Saboia and Charles Vardeman II and Adam Czajka

(参考訳) エネルギーベースモデル(EBM)はコンピュータビジョンの中でますます人気が高まっている。 ebmsはディープニューラルネットワーク(dnn)のトレーニングに確率的アプローチをもたらし、キャリブレーション、分散検出、敵対的抵抗といった分野でのパフォーマンスを向上させることが示されている。しかし、これらの利点は入力データ確率を推定するコストを伴い、通常、Stochastic Gradient Langevin Dynamics (SGLD) のようなランゲヴィンベースの手法を用いて、計算コストを増大させ、パラメータ化を必要とし、効率のキャッシング方法を必要とし、安定性とスケーリングの問題に陥る。 EBMは動的手法を用いて、ネットワークの現在の状態によって定義された確率密度関数(PDF)からサンプルを抽出し、それらを最大ログ確率アプローチを用いてトレーニングデータと比較し、正しいPDFを学習する。本稿では,Grathwohlらによって同定された「Non-Generative EBM(NG-EBM)」を,トレーニングを指示する損失項として用いた非生成的EBM(Non-Generative EBM)を提案する。我々のNG-EBMトレーニング戦略は、従来の手法の計算複雑性やオーバーヘッドを伴わずに、校正、分布外検出、対向抵抗におけるEMMの利点の多くを維持していることを示す。特に、NG-EBMアプローチは、従来の訓練されたモデルと比較して、CIFAR10の2.5倍、CIFAR100の7.5倍の予測校正誤差を改善する。

Energy-based models (EBM) have become increasingly popular within computer vision. EBMs bring a probabilistic approach to training deep neural networks (DNN) and have been shown to enhance performance in areas such as calibration, out-of-distribution detection, and adversarial resistance. However, these advantages come at the cost of estimating input data probabilities, usually using a Langevin based method such as Stochastic Gradient Langevin Dynamics (SGLD), which bring additional computational costs, require parameterization, caching methods for efficiency, and can run into stability and scaling issues. EBMs use dynamical methods to draw samples from the probability density function (PDF) defined by the current state of the network and compare them to the training data using a maximum log likelihood approach to learn the correct PDF. We propose a non-generative training approach, Non-Generative EBM (NG-EBM), that utilizes the {\it{Approximate Mass}}, identified by Grathwohl et al., as a loss term to direct the training. We show that our NG-EBM training strategy retains many of the benefits of EBM in calibration, out-of-distribution detection, and adversarial resistance, but without the computational complexity and overhead of the traditional approaches. In particular, the NG-EBM approach improves the Expected Calibration Error by a factor of 2.5 for CIFAR10 and 7.5 times for CIFAR100, when compared to traditionally trained models.

翻訳日:2023-04-05 16:47:23 公開日:2023-04-03

# purkinje画像とmlアルゴリズムを用いた動的調節計測

Dynamic Accommodation Measurement using Purkinje Images and ML Algorithms ( http://arxiv.org/abs/2304.01296v1 )

ライセンス: Link先を確認

Faik Ozan Ozhan, Arda Gulersoy, Ugur Aygun, Afsun Sahin, Hakan Urey

(参考訳) 本研究では,ARおよび眼科応用に適した4つのPurkinjeリフレクション(PR)に基づく動的視線および調節測定装置の試作を行った。 PR1&2とPR3&4は、それぞれ正確な視線測定と調節測定に使用される。眼模型はZEMAXで開発され,実験結果とよく一致した。モデルは、0.25d以上の精度で4つのディプターから1つのディプターへの調節を予測している。再現性テストを行い,被験者から正確な視線と調節推定値を得た。我々は物理的に正確なモデルと機械学習を用いて大規模な合成データセットを作成している。

We developed a prototype device for dynamic gaze and accommodation measurements based on 4 Purkinje reflections (PR) suitable for use in AR and ophthalmology applications. PR1&2 and PR3&4 are used for accurate gaze and accommodation measurements, respectively. Our eye model was developed in ZEMAX and matches the experiments well. Our model predicts the accommodation from 4 diopters to 1 diopter with better than 0.25D accuracy. We performed repeatability tests and obtained accurate gaze and accommodation estimations from subjects. We are generating a large synthetic data set using physically accurate models and machine learning.

翻訳日:2023-04-05 16:46:55 公開日:2023-04-03

# Prompt-Tuning を用いた会話課題の言語間移動学習の効率化

Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning ( http://arxiv.org/abs/2304.01295v1 )

ライセンス: Link先を確認

Lifu Tu, Jin Qu, Semih Yavuz, Shafiq Joty, Wenhao Liu, Caiming Xiong, Yingbo Zhou

(参考訳) 英語のような高リソース言語で訓練された言語モデルの言語間移動は、多くのNLPタスクで広く研究されているが、会話タスクに焦点が当てられているのは比較的限られている。これは、非英語の会話データを取得するコストが高いためであり、カバー範囲は限られている。本稿では、英語のみのスキーマガイド対話(SGD)データセット(Rastogi et al., 2020)を105言語に翻訳することで、並列かつ大規模多言語会話データセットであるXSGDを紹介する。 xsgdは言語毎に約330k発話を含む。そこで我々は,アライメントプロンプトを学習する効率的なプロンプトチューニング手法を開発した。また、NLIベースとバニラ分類器の2つの異なる分類器と、アライメントされたプロンプトによって可能となる言語間のテスト機能についても検討する。我々は,2つの会話タスク(スロットフィルングとインテント分類)における言語横断的一般化能力を評価する。提案手法は,NLIに基づく分類器のモデリング能力の強化と,アライメントプロンプトによる言語間移動の大幅な改善,特に数ショット設定において実現された。

Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks, but focus on conversational tasks has been rather limited. This is partly due to the high cost of obtaining non-English conversational data, which results in limited coverage. In this work, we introduce XSGD, a parallel and large-scale multilingual conversation dataset that we created by translating the English-only Schema-Guided Dialogue (SGD) dataset (Rastogi et al., 2020) into 105 other languages. XSGD contains approximately 330k utterances per language. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts. We also investigate two different classifiers: NLI-based and vanilla classifiers, and test cross-lingual capability enabled by the aligned prompts. We evaluate our model's cross-lingual generalization capabilities on two conversation tasks: slot-filling and intent classification. Our results demonstrate the strong and efficient modeling ability of NLI-based classifiers and the large cross-lingual transfer improvements achieved by our aligned prompts, particularly in few-shot settings.

翻訳日:2023-04-05 16:46:48 公開日:2023-04-03

# ガウス過程による非線形PDEのスパースコレスキー分解

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes ( http://arxiv.org/abs/2304.01294v1 )

ライセンス: Link先を確認

Yifan Chen, Houman Owhadi, Florian Sch\"afer

(参考訳) 一般非線形偏微分方程式(PDE)を解くためのガウス過程(GP)フレームワークの計算スケーラビリティについて検討する。この枠組みはPDEを非線形制約で2次最適化問題に変換する。その複雑性のボトルネックは、GPの共分散核とその偏微分のコロケーション点での点での評価から得られる高密度なカーネル行列による計算にある。ディラックスと微分測定の新しい順序付けの下で、コレスキー因子のほぼ疎度に基づいて、そのようなカーネル行列に対するスパースチョレスキー分解アルゴリズムを提案する。我々は,スパルシリティパターンを厳密に同定し,kullback-leiblerダイバージェンスにおいて最適であるgpの対応するvecchia近似の指数収束精度を定量化する。これにより、空間上の複雑性 $o(n\log^d(n/\epsilon))$ と時間内に $o(n\log^{2d}(n/\epsilon))$ を持つカーネル行列の逆コレスキー係数を計算できる。スパース因子により、勾配に基づく最適化手法はスケーラブルになる。さらに、しばしばより効率的なガウス・ニュートン法を用いることで、線形系を解くために、縮小されたカーネル行列のスパース係数と共役勾配アルゴリズムを適用することができる。非線形楕円型, バーガー型, モンジュアンプ型といった幅広い非線形pdesに対して, アルゴリズムの近似空間/時間複雑性を数値的に示す。要約すると、GPで一般的なPDEを解くための高速でスケーラブルで正確な方法を提供する。

We study the computational scalability of a Gaussian process (GP) framework for solving general nonlinear partial differential equations (PDEs). This framework transforms solving PDEs to solving quadratic optimization problem with nonlinear constraints. Its complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel of the GP and its partial derivatives at collocation points. We present a sparse Cholesky factorization algorithm for such kernel matrices based on the near-sparsity of the Cholesky factor under a new ordering of Diracs and derivative measurements. We rigorously identify the sparsity pattern and quantify the exponentially convergent accuracy of the corresponding Vecchia approximation of the GP, which is optimal in the Kullback-Leibler divergence. This enables us to compute $\epsilon$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N\log^d(N/\epsilon))$ in space and $O(N\log^{2d}(N/\epsilon))$ in time. With the sparse factors, gradient-based optimization methods become scalable. Furthermore, we can use the oftentimes more efficient Gauss-Newton method, for which we apply the conjugate gradient algorithm with the sparse factor of a reduced kernel matrix as a preconditioner to solve the linear system. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Amp\`ere equations. In summary, we provide a fast, scalable, and accurate method for solving general PDEs with GPs.

翻訳日:2023-04-05 16:46:28 公開日:2023-04-03

# ウェアラブルセンサを用いた社会不安者の社会的文脈におけるマルチモーダル生理的反応

Wearable Sensor-based Multimodal Physiological Responses of Socially Anxious Individuals across Social Contexts ( http://arxiv.org/abs/2304.01293v1 )

ライセンス: Link先を確認

Emma R. Toner, Mark Rucker, Zhiyuan Wang, Maria A. Larrazabal, Lihua Cai, Debajyoti Datta, Elizabeth Thompson, Haroon Lone, Mehdi Boukhechba, Bethany A. Teachman, and Laura E. Barnes

(参考訳) 受動的に装着されたセンサーから個人の社会的コンテキストを正しく識別することは、社会的不安障害の治療にジャスト・イン・タイム適応的介入(JITAI)を提供することを約束する。本研究では,異なる社会的文脈における生理的反応(他者との比較),社会的相(前・後・相互作用対相互作用),社会的相互作用のサイズ(ダイアディック対グループインタラクション),社会的脅威(暗黙対社会的評価)のレベルについて,受動的に収集したデータを用いて評価した。この研究の参加者(46ドル)は、社会的相互作用不安尺度(80ドル中34ドル)で評価された中程度から重度の社会不安症状を報告した。多変量ランダムフォレストモデルとフォローアップ・クラスター分析を用いて,社会・非社会の異なる状況における生理的反応パターンを検討した。以上の結果から,社会的文脈は,社会的フェーズ,グループサイズ,社会的脅威のレベルよりも確実に区別できるが,これらの区別可能な文脈の中にも,生理的反応パターンにかなりの変動があることが示唆された。実世界のコンテキスト検出とJITAIの展開について論じる。

Correctly identifying an individual's social context from passively worn sensors holds promise for delivering just-in-time adaptive interventions (JITAIs) to treat social anxiety disorder. In this study, we present results using passively collected data from a within-subject experiment that assessed physiological response across different social contexts (i.e, alone vs. with others), social phases (i.e., pre- and post-interaction vs. during an interaction), social interaction sizes (i.e., dyadic vs. group interactions), and levels of social threat (i.e., implicit vs. explicit social evaluation). Participants in the study ($N=46$) reported moderate to severe social anxiety symptoms as assessed by the Social Interaction Anxiety Scale ($\geq$34 out of 80). Univariate paired difference tests, multivariate random forest models, and follow-up cluster analyses were used to explore physiological response patterns across different social and non-social contexts. Our results suggest that social context is more reliably distinguishable than social phase, group size, or level of social threat, but that there is considerable variability in physiological response patterns even among these distinguishable contexts. Implications for real-world context detection and deployment of JITAIs are discussed.

翻訳日:2023-04-05 16:46:00 公開日:2023-04-03

# 知覚者による3次元における境界ボックスによる単眼3次元物体検出

Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver ( http://arxiv.org/abs/2304.01289v1 )

ライセンス: Link先を確認

Xianpeng Liu, Ce Zheng, Kelvin Cheng, Nan Xue, Guo-Jun Qi, Tianfu Wu

(参考訳) 単眼3次元物体検出の主な課題は3次元中心の正確な位置決めである。理想的な場合において,この課題を3次元空間の局所グリッド探索方式で修復できるという新たな強みに感化されて,2次元から3次元までの情報フローと3次元から2次元のコンテキストを重畳した3次元から2次元までの情報フローを組み合わせたステージワイズ手法を提案する。具体的には、まず、市販のバックボーン単分子3D検出器から最初の提案を得る。次に,初期提案から局所グリッドサンプリングにより3次元アンカー空間を生成する。最後に、3D-to-2D提案の検証段階で3Dバウンディングボックスをデノナイズする。本稿では,重なり合う提案を識別する識別的特徴を効果的に学習するために,Perceiver I/Oモデルを用いて3次元から2次元の幾何学的情報と2次元の外観情報を融合させる手法を提案する。提案のエンコードされた潜在表現により、検証ヘッドは自己アテンションモジュールで実装される。提案手法はMonoXiverと命名され, 背骨単分子3D検出器に容易に適用可能である。確立されたKITTIデータセットと挑戦的な大規模Waymoデータセットの実験結果から、MonoXiverは計算オーバーヘッドを制限して一貫して改善を達成している。

The main challenge of monocular 3D object detection is the accurate localization of 3D center. Motivated by a new and strong observation that this challenge can be remedied by a 3D-space local-grid search scheme in an ideal case, we propose a stage-wise approach, which combines the information flow from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and 3D-to-2D (proposal verification by denoising with 3D-to-2D contexts) in a top-down manner. Specifically, we first obtain initial proposals from off-the-shelf backbone monocular 3D detectors. Then, we generate a 3D anchor space by local-grid sampling from the initial proposals. Finally, we perform 3D bounding box denoising at the 3D-to-2D proposal verification stage. To effectively learn discriminative features for denoising highly overlapped proposals, this paper presents a method of using the Perceiver I/O model to fuse the 3D-to-2D geometric information and the 2D appearance information. With the encoded latent representation of a proposal, the verification head is implemented with a self-attention module. Our method, named as MonoXiver, is generic and can be easily adapted to any backbone monocular 3D detectors. Experimental results on the well-established KITTI dataset and the challenging large-scale Waymo dataset show that MonoXiver consistently achieves improvement with limited computation overhead.

翻訳日:2023-04-05 16:45:38 公開日:2023-04-03

# x-time:camsによる表データ機械学習を高速化するインメモリエンジン

X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs ( http://arxiv.org/abs/2304.01285v1 )

ライセンス: Link先を確認

Giacomo Pedretti, John Moon, Pedro Bruel, Sergey Serebryakov, Ron M. Roth, Luca Buonanno, Tobias Ziegler, Cong Xu, Martin Foltin, Jim Ignowski, Catherine E. Graves

(参考訳) データ構造は、データ科学において最も一般的な形式である。ディープラーニングモデルは、画像や音声などの非構造化データから学習することが証明されているが、表データから学習する場合の単純なアプローチよりも正確ではない。対照的に、現代的なツリーベース機械学習(ML)モデルでは、構造化データから関連する情報を抽出する。データサイエンスにおける必須要件は、例えば、科学的な発見を加速するためにシミュレーションを伴うクローズドループでモデルが使用される場合のモデル推論レイテンシを低減することである。しかしながら、ハードウェアアクセラレーションコミュニティは、主にディープニューラルネットワークに焦点を当てており、他の機械学習形式を無視している。これまでの研究では、ランダムフォレストを効率的にマッピングするためにアナログコンテンツアドレスメモリ(CAM)コンポーネントが用いられてきた。本研究では,XGBoostやCatBoostといった最先端のツリーベースMLモデルの推論を可能にする,新たな精度向上型アナログCAMと,チップ上のプログラマブルネットワークを実装した,アナログデジタルアーキテクチャ全般に焦点をあてる。 16nm技術で1チップで評価した結果、最先端のGPUと比較して119倍のレイテンシが9740倍、ピーク電力は19Wであった。

Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption.

翻訳日:2023-04-05 16:45:15 公開日:2023-04-03

# 信仰・知識・証拠

Belief, knowledge and evidence ( http://arxiv.org/abs/2304.01283v1 )

ライセンス: Link先を確認

Steffen Lewitzka and Vin\'icius Pinto

(参考訳) 本稿では,信念と知識というよく知られた古典的認識論的概念と,直感的原理 \textit{`evidences belief and knowledge'} が満たされる証拠の概念を組み合わせた論理体系を提案する。我々のアプローチは、最初の著者である『lecite{lewjlc2, lewigpl, lewapal}』の以前の著作に依拠しており、直観主義的真理(すなわち『textit{proof}』)の推論のためのS5$スタイルの原理と、その体系を『textit{intuitionistic}』の信念と知識の概念と組み合わせたものである。我々は、この組み合わせシステムを考慮し、構築的概念である \textit{proof} を、古典的な概念である \textit{evidence} に置き換える。この結果、モダルシステム$S5$と古典的なエピステミック原理を組み合わせた論理となり、$\square\varphi$はエピステミックな意味で '$\varphi$ is obvious' と読む。文献に見られる通常の可能な世界意味論とは対照的に、我々は、信念と知識がアクセシビリティの関係によってモデル化されるのではなく、直接命題の集合(世界の集合)としてモデル化される関係性に基づく意味論を提案する。

We present a logical system that combines the well-known classical epistemic concepts of belief and knowledge with a concept of evidence such that the intuitive principle \textit{`evidence yields belief and knowledge'} is satisfied. Our approach relies on previous works of the first author \cite{lewjlc2, lewigpl, lewapal} who introduced a modal system containing $S5$-style principles for the reasoning about intutionistic truth (i.e. \textit{proof}) and, inspired by \cite{artpro}, combined that system with concepts of \textit{intuitionistic} belief and knowledge. We consider that combined system and replace the constructive concept of \textit{proof} with a classical notion of \textit{evidence}. This results in a logic that combines modal system $S5$ with classical epistemic principles where $\square\varphi$ reads as `$\varphi$ is evident' in an epistemic sense. Inspired by \cite{lewapal}, and in contrast to the usual possible worlds semantics found in the literature, we propose here a relational, frame-based semantics where belief and knowledge are not modeled via accessibility relations but directly as sets of propositions (sets of sets of worlds).

翻訳日:2023-04-05 16:44:54 公開日:2023-04-03

# PEACH:半教師付き擬似パラレル文書生成による翻訳のための事前学習シーケンスとシーケンスの多言語モデル

PEACH: Pre-Training Sequence-to-Sequence Multilingual Models for Translation with Semi-Supervised Pseudo-Parallel Document Generation ( http://arxiv.org/abs/2304.01282v1 )

ライセンス: Link先を確認

Alireza Salemi, Amirhossein Abaskohi, Sara Tavakoli, Yadollah Yaghoobzadeh, Azadeh Shakery

(参考訳) 多言語プレトレーニングは、機械翻訳を含む多言語nlpタスクを著しく改善する。既存の手法の多くは、モノリンガルデータに基づくマスク付き言語モデリングとテキストデノベーションの目的に基づくものである。モノリンガルデータに対する多言語事前学習は、多くの言語ペアにおける並列データの可用性を無視する。また、利用可能な人間の生成した並列翻訳データを事前学習に組み込む研究もある。この種の並列データは間違いなく役に立つが、高リソースの言語ペアであっても制限されている。本稿では,多言語事前学習のための高品質な擬似並列データを生成する,新しい半教師付きSPDGを提案する。まず、単語の順序付け、追加、削除、置換のために単言語データに対して述語モデルを事前訓練し、予め学習した文書の品質を高める。そして、単語間翻訳のための辞書を用いて事前学習文書ごとに異なる擬似翻訳を生成し、事前学習された復調モデルを適用する。次に、擬似並列データを用いて、多言語列列列モデルのPEACHを事前学習する。 PEACHは, 教師付き, ゼロショット, 少数ショットのシナリオを含む様々な翻訳タスクにおいて, mT5 と mBART のトレーニングに使用されている既存手法よりも優れていることを示す。さらに、PEACHが類似言語間で知識を伝達する能力は、低リソース言語に特に有用である。 PEACHは,精度の高い擬似並列を生成するための高品質な辞書を用いて,低リソース言語に有用であることを示す。

Multilingual pre-training significantly improves many multilingual NLP tasks, including machine translation. Most existing methods are based on some variants of masked language modeling and text-denoising objectives on monolingual data. Multilingual pre-training on monolingual data ignores the availability of parallel data in many language pairs. Also, some other works integrate the available human-generated parallel translation data in their pre-training. This kind of parallel data is definitely helpful, but it is limited even in high-resource language pairs. This paper introduces a novel semi-supervised method, SPDG, that generates high-quality pseudo-parallel data for multilingual pre-training. First, a denoising model is pre-trained on monolingual data to reorder, add, remove, and substitute words, enhancing the pre-training documents' quality. Then, we generate different pseudo-translations for each pre-training document using dictionaries for word-by-word translation and applying the pre-trained denoising model. The resulting pseudo-parallel data is then used to pre-train our multilingual sequence-to-sequence model, PEACH. Our experiments show that PEACH outperforms existing approaches used in training mT5 and mBART on various translation tasks, including supervised, zero- and few-shot scenarios. Moreover, PEACH's ability to transfer knowledge between similar languages makes it particularly useful for low-resource languages. Our results demonstrate that with high-quality dictionaries for generating accurate pseudo-parallel, PEACH can be valuable for low-resource languages.

翻訳日:2023-04-05 16:44:24 公開日:2023-04-03

# 知識抽出による自己異種統合による長期視覚認識

Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation ( http://arxiv.org/abs/2304.01279v1 )

ライセンス: Link先を確認

Yan Jin, Mengke LI, Yang Lu, Yiu-ming Cheung, Hanzi Wang

(参考訳) 深層ニューラルネットワークは、ここ数十年で大きな進歩を遂げている。しかしながら、現実世界のデータはしばしば長い尾の分布を示すため、バニラディープモデルは多数派に大きく偏っている傾向にある。この問題に対処するため、最先端の手法は通常、ロングテール分布の異なる部分に焦点を当てるために専門家(moe)の混合を採用する。これらの手法のエキスパートはモデル深度が同じであり、異なるクラスが異なる深さのモデルに適合するように異なる好みを持つという事実を無視する。そこで本研究では,知識抽出を用いた自己異種統合法(SHIKE)を提案する。まず,異なる浅い部分と1つのネットワークの深い部分の間で特徴を融合するために,dkf(deep-wise knowledge fusion)を提案する。 dkfに基づき、我々はさらに、moeフレームワークのテールクラスに無視できない影響を持つ最も難しい負のクラスの影響を減らすために、動的知識伝達(dkt)を提案します。その結果、特に尾のクラスにおいて、長い尾のデータの分類精度を著しく向上させることができる。 SHIKEはCIFAR100-LT (IF100), ImageNet-LT, iNaturalist 2018, Places-LTで56.3%, 60.3%, 75.4%, 41.9%の最先端性能を達成した。

Deep neural networks have made huge progress in the last few decades. However, as the real-world data often exhibits a long-tailed distribution, vanilla deep models tend to be heavily biased toward the majority classes. To address this problem, state-of-the-art methods usually adopt a mixture of experts (MoE) to focus on different parts of the long-tailed distribution. Experts in these methods are with the same model depth, which neglects the fact that different classes may have different preferences to be fit by models with different depths. To this end, we propose a novel MoE-based method called Self-Heterogeneous Integration with Knowledge Excavation (SHIKE). We first propose Depth-wise Knowledge Fusion (DKF) to fuse features between different shallow parts and the deep part in one network for each expert, which makes experts more diverse in terms of representation. Based on DKF, we further propose Dynamic Knowledge Transfer (DKT) to reduce the influence of the hardest negative class that has a non-negligible impact on the tail classes in our MoE framework. As a result, the classification accuracy of long-tailed data can be significantly improved, especially for the tail classes. SHIKE achieves the state-of-the-art performance of 56.3%, 60.3%, 75.4%, and 41.9% on CIFAR100-LT (IF100), ImageNet-LT, iNaturalist 2018, and Places-LT, respectively.

翻訳日:2023-04-05 16:44:00 公開日:2023-04-03

# $\delta$相互作用によるSchr\"{o}dinger演算子について

On Schr\"{o}dinger Operators Modified by $\delta$ Interactions ( http://arxiv.org/abs/2304.01326v1 )

ライセンス: Link先を確認

Kaya G\"uven Akba\c{s}, Fatih Erman, O. Teoman Turgut

(参考訳) デルタ相互作用によって修正されたシュル「{o}dinger 作用素 $H_0$ のスペクトル特性を研究し、新しいグリーン関数の極が元のグリーン関数の極に対してどのように再配置されるかを明確に示す。我々は、新しい境界状態エネルギーが古い状態と接し、デルタ相互作用が魅力的であれば基底状態エネルギーは常に低下することを証明する。また,若干のヒューリスティックな方法で小さな結合定数の仮定の下で境界状態エネルギーと波動関数を求める別の摂動的な方法も導出する。さらに,この結果が再正規化処理が必要な場合に拡張可能であることを示す。最後に, 粒子がコンパクトな2次元多様体内をデルタ相互作用の影響下で移動している場合について, 多中心の場合, 曲線上で支持されるデルタ相互作用, およびその場合について検討する。

We study the spectral properties of a Schr\"{o}dinger operator $H_0$ modified by delta interactions and show explicitly how the poles of the new Green's function are rearranged relative to the poles of original Green's function of $H_0$. We prove that the new bound state energies are interlaced between the old ones and the ground state energy is always lowered if the delta interaction is attractive. We also derive an alternative perturbative way of finding the bound state energies and wave functions under the assumption of small coupling constant in a somewhat heuristic way. We further show that these results can be extended to the case where a renormalization process is required. Finally, we consider the possible extensions of our results to the multi center case, to delta interaction supported on curves, and to the case, where the particle is moving in a compact two dimensional manifold under the influence of a delta interaction.

翻訳日:2023-04-05 16:37:51 公開日:2023-04-03

# 任意の2次元閉じ込めを伴うシュリンガー問題にアプローチするディープラーニングニューラルネットワーク

Deep learning neural network for approaching Schr\"odinger problems with arbitrary two-dimensional confinement ( http://arxiv.org/abs/2304.01325v1 )

ライセンス: Link先を確認

Adrian Radu, Carlos A. Duque

(参考訳) 本稿では,ニューラルネットワークを用いた自動学習法に基づく2次元シュリンガー方程式へのアプローチを提案する。これは、解の知識から多くの任意のサンプル問題まで、任意の二次元ポテンシャルに閉じ込められた粒子の基底状態を決定することを目的としている。基底状態の波動関数とエネルギーを予測するために,二つの隠れ層を持つネットワークアーキテクチャを提案する。ニューラルネットワークが提供する推定値を検証するために、いくつかの精度インジケータが提案されている。トレーニングされたネットワークは、学習プロセスで使用されるものと異なる大量の閉じ込めポテンシャルに適用することでテストされた。対称ポテンシャルを持つ特定の事例を具体例として解決し,良好なネットワーク予測精度が得られた。

This article presents an approach to the two-dimensional Schr\"odinger equation based on automatic learning methods with neural networks. It is intended to determine the ground state of a particle confined in any two-dimensional potential, starting from the knowledge of the solutions to a large number of arbitrary sample problems. A network architecture with two hidden layers is proposed to predict the wave function and energy of the ground state. Several accuracy indicators have been proposed to validate the estimates provided by the neural network. The trained network was tested by applying it to a large set of confinement potentials different from those used in the learning process. Some particular cases with symmetrical potentials were solved as concrete examples, and a good network prediction accuracy was found.

翻訳日:2023-04-05 16:37:31 公開日:2023-04-03

# PALI:ペルソ・アラビア文字の言語識別ベンチマーク

PALI: A Language Identification Benchmark for Perso-Arabic Scripts ( http://arxiv.org/abs/2304.01322v1 )

ライセンス: Link先を確認

Sina Ahmadi and Milind Agarwal and Antonios Anastasopoulos

(参考訳) ペルソ・アラビア文字(Perso-Arabic script)は、世界中の様々な言語コミュニティで広く採用され使用されている文字群である。このようなスクリプトを使って様々な言語を識別することは、言語技術にとって重要であり、低リソースのセットアップでは困難である。そこで本稿では,ペルソ・アラビア文字を用いた言語検出の課題について,特に「非従来的」な文章を実践するバイリンガル・コミュニティで取り上げる。これを解決するために,教師付き手法を用いて文を言語に分類する。また,これらに基づいて,分類器によって混同されることが多い言語群を対象とする階層モデルを提案する。私たちの実験結果は、ソリューションの有効性を示しています。

The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various linguistic communities around the globe. Identifying various languages using such scripts is crucial to language technologies and challenging in low-resource setups. As such, this paper sheds light on the challenges of detecting languages using Perso-Arabic scripts, especially in bilingual communities where ``unconventional'' writing is practiced. To address this, we use a set of supervised techniques to classify sentences into their languages. Building on these, we also propose a hierarchical model that targets clusters of languages that are more often confused by the classifiers. Our experiment results indicate the effectiveness of our solutions.

翻訳日:2023-04-05 16:37:20 公開日:2023-04-03

# 低リソース言語技術のためのコーパス作成へのアプローチ--南クルド語とラキ語を事例として

Approaches to Corpus Creation for Low-Resource Language Technology: the Case of Southern Kurdish and Laki ( http://arxiv.org/abs/2304.01319v1 )

ライセンス: Link先を確認

Sina Ahmadi and Zahra Azin and Sara Belelli and Antonios Anastasopoulos

(参考訳) 言語技術において過度に表現され、絶滅危惧されている言語コミュニティが直面する大きな課題の1つは、言語データの欠如または欠如である。これはまた、クルド語とラキ語の南の品種で、道具の実質的な進歩とともに非常に限られた資源が利用可能である場合でもある。そこで本稿では、クルド南部でコンテンツを放送するローカルラジオ局であるローカルニュースサイトと、ラキのフィールドワークに依存するいくつかのアプローチを提案する。本稿では,このような未表現言語の課題,特に文書作成と標準化,そして南クルド語とラキ語のためのコーパスを作成するために,データソースの検索や手書きコンテンツの遡及といった課題について述べる。さらに,クルド語およびザザ・ゴラーニ語以外の変種に照らして,言語識別の課題について検討した。

One of the major challenges that under-represented and endangered language communities face in language technology is the lack or paucity of language data. This is also the case of the Southern varieties of the Kurdish and Laki languages for which very limited resources are available with insubstantial progress in tools. To tackle this, we provide a few approaches that rely on the content of local news websites, a local radio station that broadcasts content in Southern Kurdish and fieldwork for Laki. In this paper, we describe some of the challenges of such under-represented languages, particularly in writing and standardization, and also, in retrieving sources of data and retro-digitizing handwritten content to create a corpus for Southern Kurdish and Laki. In addition, we study the task of language identification in light of the other variants of Kurdish and Zaza-Gorani languages.

翻訳日:2023-04-05 16:37:07 公開日:2023-04-03

# matched machine learning: 学習メトリクスを用いた治療効果推論の一般化フレームワーク

Matched Machine Learning: A Generalized Framework for Treatment Effect Inference With Learned Metrics ( http://arxiv.org/abs/2304.01316v1 )

ライセンス: Link先を確認

Marco Morucci, Cynthia Rudin, Alexander Volfovsky

(参考訳) 本稿では,機械学習ブラックボックスの柔軟性と,観測因果推論における長年のツールであるマッチングの解釈可能性を組み合わせたフレームワークであるmatched machine learningを紹介する。因果推論の多くの高リスク応用において、解釈可能性が最も重要である。平均的および個別化された治療効果の非パラメトリック推定のための現在のツールは、人間の見積もりの監査を許さないブラックボックスである。我々のフレームワークは、機械学習を使用して、ユニットのマッチングと結果の推定に最適なメトリクスを学習し、解釈可能でありながら機械学習ブラックボックスのパフォーマンスを達成する。我々の一般的なフレームワークは、いくつかの出版作品を特別なケースとして包含している。提案するフレームワークの漸近的推論理論により,個々の治療効果と平均治療効果の両面から近似した信頼区間を構築できる。一致機械学習のインスタンスはブラックボックスの機械学習手法と同等に動作し、類似した問題に対する既存のマッチング手法よりも優れていることを示す。最後に、我々のアプリケーションでは、共変量データが非常に複雑である場合でも、どのようにマッチング機械学習を用いて因果推論を行うかを示す。

We introduce Matched Machine Learning, a framework that combines the flexibility of machine learning black boxes with the interpretability of matching, a longstanding tool in observational causal inference. Interpretability is paramount in many high-stakes application of causal inference. Current tools for nonparametric estimation of both average and individualized treatment effects are black-boxes that do not allow for human auditing of estimates. Our framework uses machine learning to learn an optimal metric for matching units and estimating outcomes, thus achieving the performance of machine learning black-boxes, while being interpretable. Our general framework encompasses several published works as special cases. We provide asymptotic inference theory for our proposed framework, enabling users to construct approximate confidence intervals around estimates of both individualized and average treatment effects. We show empirically that instances of Matched Machine Learning perform on par with black-box machine learning methods and better than existing matching methods for similar problems. Finally, in our application we show how Matched Machine Learning can be used to perform causal inference even when covariate data are highly complex: we study an image dataset, and produce high quality matches and estimates of treatment effects.

翻訳日:2023-04-05 16:36:51 公開日:2023-04-03

# 強化学習における実証設計

Empirical Design in Reinforcement Learning ( http://arxiv.org/abs/2304.01315v1 )

ライセンス: Link先を確認

Andrew Patterson, Samuel Neumann, Martha White, Adam White

(参考訳) 強化学習における実証設計は簡単な作業ではない。優れた実験を行うには、詳細や時には重要な計算資源に注意する必要がある。ドル当たりの計算資源は急速に増え続けているが、強化学習における典型的な実験の規模も大きい。今や、数百万のパラメータを持つエージェントが数十のタスクに対して、それぞれ30日間の経験と同等のパラメータでベンチマークするのも一般的だ。これらの実験の規模は、特にアルゴリズムを比較する際に、適切な統計証拠の必要性と相反することが多い。最近の研究は、一般的なアルゴリズムがハイパーパラメータの設定や実装の詳細にどのように敏感であるかを強調しており、一般的な経験的実践は弱い統計的証拠をもたらす(Machado et al., 2018; Henderson et al., 2018)。ここでは、これを一歩進める。この原稿は、行動を呼びかけることと、強化学習で良い実験を行うための包括的なリソースの両方を表しています。特に、共通性能測定の基礎となる統計的仮定、性能変動と安定性を適切に評価する方法、仮説テスト、複数のエージェントの比較のための特別な考察、ベースラインとイラストラティブな例構築、ハイパーパラメータと実験者バイアスの扱いについて述べる。全体を通して、文献に見られる一般的な誤りと、事例実験による統計的結果を強調します。この文書の目的は、我々の前例のない計算を使って強化学習に優れた科学を学べるか、また、経験的設計における潜在的な落とし穴への警告を与えることである。

Empirical design in reinforcement learning is no small task. Running good experiments requires attention to detail and at times significant computational resources. While compute resources available per dollar have continued to grow rapidly, so have the scale of typical experiments in reinforcement learning. It is now common to benchmark agents with millions of parameters against dozens of tasks, each using the equivalent of 30 days of experience. The scale of these experiments often conflict with the need for proper statistical evidence, especially when comparing algorithms. Recent studies have highlighted how popular algorithms are sensitive to hyper-parameter settings and implementation details, and that common empirical practice leads to weak statistical evidence (Machado et al., 2018; Henderson et al., 2018). Here we take this one step further. This manuscript represents both a call to action, and a comprehensive resource for how to do good experiments in reinforcement learning. In particular, we cover: the statistical assumptions underlying common performance measures, how to properly characterize performance variation and stability, hypothesis testing, special considerations for comparing multiple agents, baseline and illustrative example construction, and how to deal with hyper-parameters and experimenter bias. Throughout we highlight common mistakes found in the literature and the statistical consequences of those in example experiments. The objective of this document is to provide answers on how we can use our unprecedented compute to do good science in reinforcement learning, as well as stay alert to potential pitfalls in our empirical design.

翻訳日:2023-04-05 16:36:35 公開日:2023-04-03

# 実践における知識グラフのユーザ,課題,可視化の必要性

Characterizing the Users, Challenges, and Visualization Needs of Knowledge Graphs in Practice ( http://arxiv.org/abs/2304.01311v1 )

ライセンス: Link先を確認

Harry Li, Gabriel Appleby, Camelia Daniela Brumar, Remco Chang, Ashley Suh

(参考訳) 本研究は、企業と学術の両方で幅広いユースケースで働いている19人の知識グラフ実践者へのインタビューから得られた知見を提示する。本研究では,視覚的デザインによって緩和できるKGの作成,探索,分析において,KG実践者が経験した重要な課題を明らかにする。以上の結果から,kg実践者のうち,kg製作者,アナリスト,消費者の3人がそれぞれ独自の専門知識とニーズを持っていることが明らかとなった。我々は、KGビルダーがスキーマインクルーダーの恩恵を受けることを発見した。一方、KGアナリストは、中間クエリ結果を提供するカスタマイズ可能なクエリビルダーが必要である。 kg ユーザに対しては,ノードリンク図の有効性の欠如,および kg の採用と理解を促進するためのドメイン固有可視化の必要性が指摘されている。最後に、KGを効果的に実践するには、現在のツールや技術、コラボレーションワークフローに対処しない、技術的および社会的ソリューションの両方が必要です。インタビューの分析から,消化可能性と発見可能性のバランスをとる知識カード,時間的変化を追跡するタイムラインビュー,有機的発見をサポートするインターフェース,AIと機械学習予測のセマンティック説明など,KGのユーザビリティ向上のための可視化研究の方向性を抽出した。

This study presents insights from interviews with nineteen Knowledge Graph (KG) practitioners who work in both enterprise and academic settings on a wide variety of use cases. Through this study, we identify critical challenges experienced by KG practitioners when creating, exploring, and analyzing KGs that could be alleviated through visualization design. Our findings reveal three major personas among KG practitioners - KG Builders, Analysts, and Consumers - each of whom have their own distinct expertise and needs. We discover that KG Builders would benefit from schema enforcers, while KG Analysts need customizable query builders that provide interim query results. For KG Consumers, we identify a lack of efficacy for node-link diagrams, and the need for tailored domain-specific visualizations to promote KG adoption and comprehension. Lastly, we find that implementing KGs effectively in practice requires both technical and social solutions that are not addressed with current tools, technologies, and collaborative workflows. From the analysis of our interviews, we distill several visualization research directions to improve KG usability, including knowledge cards that balance digestibility and discoverability, timeline views to track temporal changes, interfaces that support organic discovery, and semantic explanations for AI and machine learning predictions.

翻訳日:2023-04-05 16:36:09 公開日:2023-04-03

# 2軸非直線イメージングにおける過渡現象の役割

Role of Transients in Two-Bounce Non-Line-of-Sight Imaging ( http://arxiv.org/abs/2304.01308v1 )

ライセンス: Link先を確認

Siddharth Somasundaram, Akshat Dave, Connor Henley, Ashok Veeraraghavan, Ramesh Raskar

(参考訳) 非視線イメージング(NLOS)の目的は、多重散乱光を用いてカメラの視野から隠された物体を撮像することである。最近の研究は、レーザーを走査し、2つのリレー面を持つシーンにおける閉塞物体の鋳造影を測定することにより、2バウンス(2B)のNLOSイメージングの実現可能性を示している。本研究では,2B-NLOSにおける飛行時間(ToF)測定の役割を多重照明下で検討した。具体的には,tof情報による形状復元に必要な計測数と空間分解能の低減について検討する。本稿では,(1)時間分解能,(2)空間分解能,(3)SNRによる画像キャプチャ数,およびシステムパラメータの関数としての回復可能性に関するトレードオフについて述べる。これにより、2Bライダーの数学的制約の形式的定義が導かれる。我々の研究は将来のNLOSイメージングシステム、特にToFセンサーがますます普及するにつれて、分析的基盤を築き上げていると信じている。

The goal of non-line-of-sight (NLOS) imaging is to image objects occluded from the camera's field of view using multiply scattered light. Recent works have demonstrated the feasibility of two-bounce (2B) NLOS imaging by scanning a laser and measuring cast shadows of occluded objects in scenes with two relay surfaces. In this work, we study the role of time-of-flight (ToF) measurements, \ie transients, in 2B-NLOS under multiplexed illumination. Specifically, we study how ToF information can reduce the number of measurements and spatial resolution needed for shape reconstruction. We present our findings with respect to tradeoffs in (1) temporal resolution, (2) spatial resolution, and (3) number of image captures by studying SNR and recoverability as functions of system parameters. This leads to a formal definition of the mathematical constraints for 2B lidar. We believe that our work lays an analytical groundwork for design of future NLOS imaging systems, especially as ToF sensors become increasingly ubiquitous.

翻訳日:2023-04-05 16:35:45 公開日:2023-04-03

# 並列テンパリングにおける混合時間境界の改善

Improved Bound for Mixing Time of Parallel Tempering ( http://arxiv.org/abs/2304.01303v1 )

ライセンス: Link先を確認

Holden Lee, Zeyu Shen

(参考訳) サンプリングアルゴリズムの分野では、直接サンプリングが不可能な場合にはMCMC(Markov Chain Monte Carlo)法が広く用いられている。しかし、ターゲット分布の多様性はしばしば収束と混合を遅くする。一般的な解決策は並列テンパリングである。実際には極めて効果的であるが、その性能に関する理論的保証は限られている。本稿では,各パラメータに多項式依存を持つスペクトルギャップ上での並列テンパリングのための新しい下限を,$(L + 1)$がレベル数であるような$\log L$を除いて提示する。これにより、モード数に指数関数的に依存する最善のバウンダリが改善される。さらに、スペクトルギャップ上の仮定上界は$\log L$に指数関数依存しており、ある意味では我々の境界は密であることを示す。

In the field of sampling algorithms, MCMC (Markov Chain Monte Carlo) methods are widely used when direct sampling is not possible. However, multimodality of target distributions often leads to slow convergence and mixing. One common solution is parallel tempering. Though highly effective in practice, theoretical guarantees on its performance are limited. In this paper, we present a new lower bound for parallel tempering on the spectral gap that has a polynomial dependence on all parameters except $\log L$, where $(L + 1)$ is the number of levels. This improves the best existing bound which depends exponentially on the number of modes. Moreover, we complement our result with a hypothetical upper bound on spectral gap that has an exponential dependence on $\log L$, which shows that, in some sense, our bound is tight.

翻訳日:2023-04-05 16:35:26 公開日:2023-04-03

# 微分プライベート学習のためのカーネルアフィンハルマシン

Kernel Affine Hull Machines for Differentially Private Learning ( http://arxiv.org/abs/2304.01300v1 )

ライセンス: Link先を確認

Mohit Kumar, Bernhard A. Moser, Lukas Fischer

(参考訳) 本稿では,データ空間を,個々のデータ点に関するプライバシーに敏感な情報を隠蔽する幾何学体に分割し,元の学習課題の構造を保ちながら,学習を通してデータを表現する手段として,アフィンの点の殻を用いる方法について検討する。この目的のために,カーネルアフィンハルマシン (kernel affine hull machine, kahm) を導入する。 KAHMは、広範囲かつ深いオートエンコーダにおいて重要なビルディングブロックであり、分類アプリケーションのためのデータ表現学習を可能にする。プライバシ保存学習を確実にするために,変換プロセスを通じて差分プライベートなデータサンプルを平滑化することを含む,新しいデータ生成法を提案する。その結果生成されたデータにより、差分プライバシーが保証されるだけでなく、KAHMモデリングエラーが元のトレーニングデータサンプルよりも大きくないことも保証される。また, 試作データを用いて, 微分プライベート分類器で発生する精度損失問題にも対処する。このアプローチにより、メンバーシップ推論攻撃のリスクは大幅に低減されるが、精度の限界損失しか生じない。応用として,大域的分類器の評価が局所的に計算された距離測定のみを必要とすることを特徴とする,KAHMに基づく微分プライベートフェデレーション学習方式を導入する。本研究は,プライバシー保護学習と分類のための効果的なツールとして,KAHMの可能性を示すものである。

This paper explores the use of affine hulls of points as a means of representing data via learning in Reproducing Kernel Hilbert Spaces (RKHS), with the goal of partitioning the data space into geometric bodies that conceal privacy-sensitive information about individual data points, while preserving the structure of the original learning problem. To this end, we introduce the Kernel Affine Hull Machine (KAHM), which provides an effective way of computing a distance measure from the resulting bounded geometric body. KAHM is a critical building block in wide and deep autoencoders, which enable data representation learning for classification applications. To ensure privacy-preserving learning, we propose a novel method for generating fabricated data, which involves smoothing differentially private data samples through a transformation process. The resulting fabricated data guarantees not only differential privacy but also ensures that the KAHM modeling error is not larger than that of the original training data samples. We also address the accuracy-loss issue that arises with differentially private classifiers by using fabricated data. This approach results in a significant reduction in the risk of membership inference attacks while incurring only a marginal loss of accuracy. As an application, a KAHM based differentially private federated learning scheme is introduced featuring that the evaluation of global classifier requires only locally computed distance measures. Overall, our findings demonstrate the potential of KAHM as effective tool for privacy-preserving learning and classification.

翻訳日:2023-04-05 16:35:14 公開日:2023-04-03

# メモリ効率とロバストな単調演算子学習(MOL)を用いた並列MRIの高速化

Accelerated parallel MRI using memory efficient and robust monotone operator learning (MOL) ( http://arxiv.org/abs/2304.01351v1 )

ライセンス: Link先を確認

Aniket Pramanik, Mathews Jacob

(参考訳) 画像物理と学習正則化を併用したモデルベースディープラーニング手法が,並列MRI加速のための強力なツールとして登場してきた。本稿では, 並列MRIにおける単調演算子学習(MOL)フレームワークの有用性について検討する。 MOLアルゴリズムは、モノトン畳み込みニューラルネットワーク(CNN)と共役勾配アルゴリズムを用いて勾配降下ステップを交互に行い、データの一貫性を促進する。このアプローチの利点は、一意性、収束性、安定性を含む圧縮センシングアルゴリズムと同様の保証を含んでいる。提案手法を,静的および動的設定のための高速化並列mriの文脈で,異なる未ロールアルゴリズムと比較することにより検証する。

Model-based deep learning methods that combine imaging physics with learned regularization priors have been emerging as powerful tools for parallel MRI acceleration. The main focus of this paper is to determine the utility of the monotone operator learning (MOL) framework in the parallel MRI setting. The MOL algorithm alternates between a gradient descent step using a monotone convolutional neural network (CNN) and a conjugate gradient algorithm to encourage data consistency. The benefits of this approach include similar guarantees as compressive sensing algorithms including uniqueness, convergence, and stability, while being significantly more memory efficient than unrolled methods. We validate the proposed scheme by comparing it with different unrolled algorithms in the context of accelerated parallel MRI for static and dynamic settings.

翻訳日:2023-04-05 16:28:53 公開日:2023-04-03

# 化学タンパク質相互作用抽出のためのエンド・ツー・エンドモデル:トークン化とスパンベースのパイプライン戦略の改善

End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies ( http://arxiv.org/abs/2304.01344v1 )

ライセンス: Link先を確認

Xuguang Ai and Ramakanth Kavuluru

(参考訳) エンド・ツー・エンド関係抽出(E2ERE)は情報抽出において重要な課題である。 e2ereは通常、エンティティ(または名前付きエンティティ認識(ner))と関連する関係を識別するが、ほとんどのreタスクは単にエンティティが前もって提供され、最終的に関係分類を行うと仮定する。 E2EREは、NERの雪玉効果がREにより多くの誤差をもたらす可能性を考えると、RE単独よりも本質的に困難である。バイオメディカルE2EREの複雑なデータセットはChemProtデータセット(BioCreative VI, 2017)であり、科学文献における化学物質と遺伝子/タンパク質の関係を識別する。 ChemProtはBLUE、BLURB、BigBioを含む最近のバイオメディカル自然言語処理ベンチマークに含まれている。しかしながら、これらのベンチマークや他の別々の取り組みでは、通常はエンドツーエンドではなく、例外が少ない。この取り組みでは、ChemProtデータセット上で新しい最先端のE2EREパフォーマンスを生成するために、スパンベースのパイプラインアプローチを採用しています。以上の結果から,e2ereでは,特に複雑な名前付きエンティティの扱いに関して,スパンベースのアプローチが優れていることを示す。私たちのエラー解析では、ChemProt用のE2EREのいくつかの重要な障害モードも特定しています。

End-to-end relation extraction (E2ERE) is an important task in information extraction, more so for biomedicine as scientific literature continues to grow exponentially. E2ERE typically involves identifying entities (or named entity recognition (NER)) and associated relations, while most RE tasks simply assume that the entities are provided upfront and end up performing relation classification. E2ERE is inherently more difficult than RE alone given the potential snowball effect of errors from NER leading to more errors in RE. A complex dataset in biomedical E2ERE is the ChemProt dataset (BioCreative VI, 2017) that identifies relations between chemical compounds and genes/proteins in scientific literature. ChemProt is included in all recent biomedical natural language processing benchmarks including BLUE, BLURB, and BigBio. However, its treatment in these benchmarks and in other separate efforts is typically not end-to-end, with few exceptions. In this effort, we employ a span-based pipeline approach to produce a new state-of-the-art E2ERE performance on the ChemProt dataset, resulting in $> 4\%$ improvement in F1-score over the prior best effort. Our results indicate that a straightforward fine-grained tokenization scheme helps span-based approaches excel in E2ERE, especially with regards to handling complex named entities. Our error analysis also identifies a few key failure modes in E2ERE for ChemProt.

翻訳日:2023-04-05 16:28:07 公開日:2023-04-03

# ビデオ中の効率的なデータ収集のためのスケール不変トラジェクトリ簡易化法

A Scale-Invariant Trajectory Simplification Method for Efficient Data Collection in Videos ( http://arxiv.org/abs/2304.01340v1 )

ライセンス: Link先を確認

Yang Liu, Luiz Gustavo Hafemann

(参考訳) トレーニングデータは機械学習タスクにとって重要な要件であり、ラベル付きトレーニングデータは取得に高価であり、手動または半自動のデータ収集パイプラインを必要とすることが多い。アプリケーションを追跡するために、データ収集は各フレームの関心クラスの周りにバウンディングボックスを描画し、同じ"インスタンス"の検出をフレーム上で関連付ける。半自動データ収集パイプラインでは、ベースライン検出とトラッキングアルゴリズムを実行し、各フレームにバウンディングボックスを追加/削除/変更するための手作業による修正と、フレーム(トラックスイッチ)上のアソシエーションエラーの解決によって、これを実現することができる。本稿では,この半自動化シナリオにおいて,より効率的に地中データを生成するためのデータ補正パイプラインを提案する。提案手法は追跡システムからのトラジェクタを単純化し,アノテータがサンプルしたキーフレーム内のオブジェクトの検証と修正を行う。キーフレーム内のオブジェクトが修正されると、他のフレームのバウンディングボックスは補間によって取得される。本手法は,手動補正を必要とするフレーム数を大幅に削減する。 MOTデータセットでは、HOTAスコア89.61%を維持しながらフレーム数を30倍に削減する。さらに、サッカーネットデータセットでは79.24%、ダンストラックデータセットでは85.79%のホタスコアを達成しつつ、フレーム数を10倍に削減する。プロジェクトコードとデータはhttps://github.com/foreverYoungGitHub/trajectory-simplify-benchmarkで公開されている。

Training data is a critical requirement for machine learning tasks, and labeled training data can be expensive to acquire, often requiring manual or semi-automated data collection pipelines. For tracking applications, the data collection involves drawing bounding boxes around the classes of interest on each frame, and associate detections of the same "instance" over frames. In a semi-automated data collection pipeline, this can be achieved by running a baseline detection and tracking algorithm, and relying on manual correction to add/remove/change bounding boxes on each frame, as well as resolving errors in the associations over frames (track switches). In this paper, we propose a data correction pipeline to generate ground-truth data more efficiently in this semi-automated scenario. Our method simplifies the trajectories from the tracking systems and let the annotator verify and correct the objects in the sampled keyframes. Once the objects in the keyframes are corrected, the bounding boxes in the other frames are obtained by interpolation. Our method achieves substantial reduction in the number of frames requiring manual correction. In the MOT dataset, it reduces the number of frames by 30x while maintaining a HOTA score of 89.61% . Moreover, it reduces the number of frames by a factor of 10x while achieving a HOTA score of 79.24% in the SoccerNet dataset, and 85.79% in the DanceTrack dataset. The project code and data are publicly released at https://github.com/foreverYoungGitHub/trajectory-simplify-benchmark.

翻訳日:2023-04-05 16:27:40 公開日:2023-04-03

# 熱的騒音によるニューラルネットワーク景観の地形図の作成

Charting the Topography of the Neural Network Landscape with Thermal-Like Noise ( http://arxiv.org/abs/2304.01335v1 )

ライセンス: Link先を確認

Theo Jules, Gal Brener, Tal Kachman, Noam Levi, Yohai Bar-Sinai

(参考訳) ニューラルネットワークのトレーニングは複雑で高次元で非凸でノイズの多い最適化問題であり、理論的理解は応用的視点と基本的な理由の両方から興味深い。主な課題は、最適化を導く景観の幾何学と地形を理解することである。本研究では,Langevin dynamics を用いた位相空間探索という標準的な統計力学手法を用いて,ランダムデータに基づく分類タスクを実行する過度パラメータ付き完全連結ネットワークについて,この景観を考察する。一定温度における熱力学に類似したゆらぎの統計を解析し、低損失領域の明確な幾何学的記述を推定する。揺らぎから容易に次元が得られるような低次元多様体であることが分かる。さらに、この次元は、分類決定境界付近に存在するデータポイントの数によって制御される。重要なことは、決定境界の指数的性質と低損失領域の平坦性により、最小付近での損失の2次近似が根本的に不適切であることである。これにより、より高温で曲率の高い領域にダイナミクスを生じさせ、任意の温度で二次的な統計を発生させる。解析的に解析可能で観測されたゆらぎ統計を再現した簡易損失モデルを用いて,この挙動を説明する。

The training of neural networks is a complex, high-dimensional, non-convex and noisy optimization problem whose theoretical understanding is interesting both from an applicative perspective and for fundamental reasons. A core challenge is to understand the geometry and topography of the landscape that guides the optimization. In this work, we employ standard Statistical Mechanics methods, namely, phase-space exploration using Langevin dynamics, to study this landscape for an over-parameterized fully connected network performing a classification task on random data. Analyzing the fluctuation statistics, in analogy to thermal dynamics at a constant temperature, we infer a clear geometric description of the low-loss region. We find that it is a low-dimensional manifold whose dimension can be readily obtained from the fluctuations. Furthermore, this dimension is controlled by the number of data points that reside near the classification decision boundary. Importantly, we find that a quadratic approximation of the loss near the minimum is fundamentally inadequate due to the exponential nature of the decision boundary and the flatness of the low-loss region. This causes the dynamics to sample regions with higher curvature at higher temperatures, while producing quadratic-like statistics at any given temperature. We explain this behavior by a simplified loss model which is analytically tractable and reproduces the observed fluctuation statistics.

翻訳日:2023-04-05 16:27:15 公開日:2023-04-03

# 深層学習による素数分割可能性について

On the Prime Number Divisibility by Deep Learning ( http://arxiv.org/abs/2304.01333v1 )

ライセンス: Link先を確認

Da Wu, Jingye Yang, Mian Umair Ahsan, Kai Wang

(参考訳) 与えられた整数を2、3または他の素数で割り切れるかどうかを決定するようなタスクは、人間にとって自明であるが、事前に特定されたアルゴリズムがなければ、コンピュータにとって簡単ではない。本稿では,複数のディープラーニングアーキテクチャと特徴工学的アプローチを検証し,小素数による大きな有限整数の可除性(最大2^{32}$)を決定するシナリオを評価した。その結果、ネットワークフレームワークやネットワーク構造(CNN、RNN、Transformerなど)の複雑さに関わらず、素数の可視性を予測する能力は、ディープラーニングモデルに供給される機能空間に依存することがわかった。また、Amazon、Google、Microsoftから入手可能なAutomated Machine Learning (AutoML)パイプラインを評価し、適切にエンジニアリングされた機能を提供しない限り、この問題に対処できないことを示した。さらに、フーリエ級数基底ベクトル上の通常の線形回帰を用いて、問題の閉形式解を提案し、その成功を示した。最後に,chatgptを用いたプロンプトベースの学習を評価し,小素数での成功と,大素数で明らかな失敗を実証した。機能工学は、AutoMLや大規模言語モデル(LLM)の時代においても、パフォーマンスの向上、解釈可能性の向上、マシンラーニング/深層学習モデルの複雑さの低減に引き続き重要な課題である、と結論付けている。

Certain tasks such as determining whether a given integer can be divided by 2, 3, or other prime numbers may be trivial for human beings, but can be less straightforward for computers in the absence of pre-specified algorithms. In this paper, we tested multiple deep learning architectures and feature engineering approaches, and evaluated the scenario of determining divisibility of large finite integers (up to $2^{32}$) by small prime numbers. It turns out that, regardless of the network frameworks or the complexity of the network structures (CNN, RNN, Transformer, etc.), the ability to predict the prime number divisibility critically depends on the feature space fed into the deep learning models. We also evaluated commercially available Automated Machine Learning (AutoML) pipelines from Amazon, Google and Microsoft, and demonstrated that they failed to address this issue unless appropriately engineered features were provided. We further proposed a closed form solution to the problem using the ordinary linear regression on Fourier series basis vectors, and showed its success. Finally, we evaluated prompt-based learning using ChatGPT and demonstrated its success on small primes and apparent failures on larger primes. We conclude that feature engineering remains an important task to improve the performance, increase the interpretability, and reduce the complexity of machine learning/deep learning models, even in the era of AutoML and large-language models (LLMs).

翻訳日:2023-04-05 16:26:55 公開日:2023-04-03

# 辞書なしでカスタムイベントデータを作成する:bag-of-tricks

Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks ( http://arxiv.org/abs/2304.01331v1 )

ライセンス: Link先を確認

Andrew Halterman, Philip A. Schrodt, Andreas Beger, Benjamin E. Bagozzi, Grace I. Scarborough

(参考訳) テキストから自動的に抽出される「who did what to who」の構造化された記録は、国際政治学者にとって重要な資料である。新しいイベントデータセットを開発するコスト、特に手作り辞書に依存する自動システムを使用する場合、ほとんどの研究者は、特定の研究課題に最適化されたカスタマイズされたイベントデータセットを開発するのではなく、ICEWSのような大規模で既存のデータセットに頼っている。本稿では,自然言語処理(nlp)の最近の進歩を活かし,イベントデータセットを迅速に作成可能な,効率的なカスタムイベントデータ生成のための ‘bag of tricks' について述べる。そこで本稿では,大規模言語モデルと標準機械学習分類器を用いて,能動的学習によるイベントカテゴリ分類器を訓練し,アクターとアクターを識別し,NLPから<question-Awering'モデルを事前訓練し,アクターの言及をWikipediaの記事に分類する手法を提案する。これらのテクニックがICEWSに代わる,新たなPOLECATグローバルイベントデータセットを生成する方法と,より小型でカスタムなイベントデータセットを学者が迅速に生成する方法の例について説明する。新しいテクニックを実装するためのサンプルコードとモデルを公開する。

Event data, or structured records of ``who did what to whom'' that are automatically extracted from text, is an important source of data for scholars of international politics. The high cost of developing new event datasets, especially using automated systems that rely on hand-built dictionaries, means that most researchers draw on large, pre-existing datasets such as ICEWS rather than developing tailor-made event datasets optimized for their specific research question. This paper describes a ``bag of tricks'' for efficient, custom event data production, drawing on recent advances in natural language processing (NLP) that allow researchers to rapidly produce customized event datasets. The paper introduces techniques for training an event category classifier with active learning, identifying actors and the recipients of actions in text using large language models and standard machine learning classifiers and pretrained ``question-answering'' models from NLP, and resolving mentions of actors to their Wikipedia article to categorize them. We describe how these techniques produced the new POLECAT global event dataset that is intended to replace ICEWS, along with examples of how scholars can quickly produce smaller, custom event datasets. We publish example code and models to implement our new techniques.

翻訳日:2023-04-05 16:26:31 公開日:2023-04-03

# 文書類似性アルゴリズムの比較

A Comparison of Document Similarity Algorithms ( http://arxiv.org/abs/2304.01330v1 )

ライセンス: Link先を確認

Nicholas Gahman and Vinayak Elangovan

(参考訳) 文書類似性は自然言語処理の重要な部分であり、最も一般的には盗作検出やテキスト要約に使われる。したがって、最も効果的な文書類似性アルゴリズムを見つけることは、自然言語処理の分野に大きな影響を与える可能性がある。本報告では,多数の文書類似性アルゴリズムについて検討し,どれが最も有用かを決定する。統計アルゴリズム、ニューラルネットワーク、コーパス/知識ベースのアルゴリズムの3つのタイプの文書類似性アルゴリズムに分類することで、最も効果的な文書類似性アルゴリズムに対処する。各カテゴリでもっとも効果的なアルゴリズムは、各アルゴリズムが利用できるあらゆる可能な領域をテストする一連のベンチマークデータセットと評価を使用して、我々の研究で比較されます。

Document similarity is an important part of Natural Language Processing and is most commonly used for plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity algorithm could have a major positive impact on the field of Natural Language Processing. This report sets out to examine the numerous document similarity algorithms, and determine which ones are the most useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based algorithms. The most effective algorithms in each category are also compared in our work using a series of benchmark datasets and evaluations that test every possible area that each algorithm could be used in.

翻訳日:2023-04-05 16:26:06 公開日:2023-04-03

# ニューラル遅延微分方程式を用いた遅延学習

Learning the Delay Using Neural Delay Differential Equations ( http://arxiv.org/abs/2304.01329v1 )

ライセンス: Link先を確認

Maria Oprea and Mark Walth and Robert Stephany and Gabriella Torres Nothaft and Arnaldo Rodriguez-Gonzalez and William Clark

(参考訳) 機械学習と動的システムの交点は最近かなりの関心を集めている。ニューラルネットワークの常微分方程式(ノード)は、これらのフィールド間の重なりが豊富である。本稿では,遅延微分方程式(ddes)に基づく連続時間ニューラルネットワーク手法を提案する。本モデルでは,データからモデルパラメータと遅延を直接学習するために随伴感度法を用いる。我々のアプローチはNODEにインスパイアされ、遅延の値が先行値であることが仮定された初期のニューラルDDEモデルを拡張します。我々は,提案手法の感度解析を行い,ベンチマークシステムからddeパラメータを学習する能力を示す。我々は今後の方向性と応用の可能性で議論を終える。

The intersection of machine learning and dynamical systems has generated considerable interest recently. Neural Ordinary Differential Equations (NODEs) represent a rich overlap between these fields. In this paper, we develop a continuous time neural network approach based on Delay Differential Equations (DDEs). Our model uses the adjoint sensitivity method to learn the model parameters and delay directly from data. Our approach is inspired by that of NODEs and extends earlier neural DDE models, which have assumed that the value of the delay is known a priori. We perform a sensitivity analysis on our proposed approach and demonstrate its ability to learn DDE parameters from benchmark systems. We conclude our discussion with potential future directions and applications.

翻訳日:2023-04-05 16:25:55 公開日:2023-04-03

# チープフェイク検出のグランドチャレンジ

Grand Challenge On Detecting Cheapfakes ( http://arxiv.org/abs/2304.01328v1 )

ライセンス: Link先を確認

Duc-Tien Dang-Nguyen and Sohail Ahmed Khan and Cise Midoglu and Michael Riegler and P{\aa}l Halvorsen and Minh-Son Dao

(参考訳) Cheapfake(チープフェイク)は、マルチメディアコンテンツの非AI(チープ)操作を含む最近作られた用語である。チープフェイクはディープフェイクよりも一般的であることが知られている。画像/ビデオ操作のための編集ソフトウェアを使って、あるいはソフトウェアを使わずに、単にメディアを誤解を招くクレームと共有することで、画像/ビデオのコンテキストを変更することで、安価なフェイクメディアを作成できる。このコンテキストの変更は、メディアのout-of-context(ooc)誤用と呼ばれる。 OOCメディアは、画像やビデオが改ざんされないため、偽メディアよりもずっと検出が難しい。本稿では,OOC画像の検出に焦点をあてるとともに,ニュース記事中の画像キャプションと矛盾する実画像の誤用に着目した。この課題の目的は、最近コンパイルされたCOSMOSデータセットに基づいて、与えられたサンプル(新しい画像と関連するキャプション)がOOCであるかどうかを検出できるモデルを開発し、ベンチマークすることである。

Cheapfake is a recently coined term that encompasses non-AI ("cheap") manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration of context is referred to as out-of-context (OOC) misuse of media. OOC media is much harder to detect than fake media, since the images and videos are not tampered. In this challenge, we focus on detecting OOC images, and more specifically the misuse of real photographs with conflicting image captions in news items. The aim of this challenge is to develop and benchmark models that can be used to detect whether given samples (news image and associated captions) are OOC, based on the recently compiled COSMOS dataset.

翻訳日:2023-04-05 16:25:46 公開日:2023-04-03

# lidarによる動的物体の3次元追跡と状態推定

Lidar based 3D Tracking and State Estimation of Dynamic Objects ( http://arxiv.org/abs/2304.01396v1 )

ライセンス: Link先を確認

Patil Shubham Suresh, Gautham Narayan Narasimhan

(参考訳) 対向車の状態推定: 初期の研究は、自走車の位置、速度、方向、角速度などの状態の決定に基づいている。提案手法は,運動計画や意思決定に不可欠な非自走車の状態推定に重点を置いている。ダイナミックシーンベースのローカライゼーション: 私たちのプロジェクトは、移動エゴ(自己)や非エゴ車両のような動的シーンで作業します。以前の手法は静的環境に重点を置いていた。

State estimation of oncoming vehicles: Earlier research has been based on determining states like position, velocity, orientation , angular velocity, etc of ego-vehicle. Our approach focuses on estimating the states of non-ego vehicles which is crucial for Motion planning and decision-making. Dynamic Scene Based Localization: Our project will work on dynamic scenes like moving ego (self) and non-ego vehicles. Previous methods were focused on static environments.

翻訳日:2023-04-05 16:19:31 公開日:2023-04-03

# クラスタ化システム同定によるパーソナライズモデル学習

Learning Personalized Models with Clustered System Identification ( http://arxiv.org/abs/2304.01395v1 )

ライセンス: Link先を確認

Leonardo F. Toso, Han Wang, James Anderson

(参考訳) 線形系モデルを異なる系力学から複数の軌道を観測することから学習する問題に対処する。このフレームワークは、システムの類似性に応じて、複数のシステムが彼らのダイナミクスをクラスタに分割する、協調的なシナリオを含んでいる。したがって、同じクラスタ内のシステムは、他のクラスタによる観測の恩恵を受けることができる。この枠組みを考慮して,各システムがクラスタのアイデンティティを交互に推定し,そのダイナミクスを推定するアルゴリズムを提案する。そして、これを集約して各クラスタのモデルを更新する。軽度の仮定では,クラスタのアイデンティティを正確に推定し,クラスタ内のシステム数と逆スケールする近似的なサンプル複雑性を実現し,より効率的かつパーソナライズされたシステム識別プロセスを実現する。

We address the problem of learning linear system models from observing multiple trajectories from different system dynamics. This framework encompasses a collaborative scenario where several systems seeking to estimate their dynamics are partitioned into clusters according to their system similarity. Thus, the systems within the same cluster can benefit from the observations made by the others. Considering this framework, we present an algorithm where each system alternately estimates its cluster identity and performs an estimation of its dynamics. This is then aggregated to update the model of each cluster. We show that under mild assumptions, our algorithm correctly estimates the cluster identities and achieves an approximate sample complexity that scales inversely with the number of systems in the cluster, thus facilitating a more efficient and personalized system identification process.

翻訳日:2023-04-05 16:19:21 公開日:2023-04-03

# グラフ上の反事実学習:調査

Counterfactual Learning on Graphs: A Survey ( http://arxiv.org/abs/2304.01391v1 )

ライセンス: Link先を確認

Zhimeng Guo, Teng Xiao, Charu Aggarwal, Hui Liu, Suhang Wang

(参考訳) グラフ構造化データは、ソーシャルネットワーク、分子グラフ、トランザクションネットワークなどの現実世界で広く利用されている。グラフニューラルネットワーク(gnns)は、グラフでの表現学習において大きな成功を収め、さまざまな下流タスクを効率化した。しかし、GNNには、解釈可能性の欠如や、トレーニングデータのバイアスを容易に受け継ぎ、カジュアルな関係をモデル化できないといった欠点がいくつかある。近年,グラフ上の反実的学習は,これらの欠点を緩和する有望な結果を示している。グラフ上の反実的公正性、説明可能性、リンク予測などに対する様々なグラフ反実的学習手法が提案されている。この有望な方向性の展開を促進するため,本調査では,グラフ反事実学習に関する論文を分類し,総合的にレビューする。既存の手法を研究課題に基づいて4つのカテゴリに分けた。それぞれのカテゴリについて、バックグラウンドとモチベーションの例、既存の作品を要約する一般的なフレームワーク、そしてこれらの作品の詳細なレビューを提供する。我々は,グラフ構造化データ,反事実学習,実世界のアプリケーションとの交点における将来研究の方向性を指摘する。今後の研究のためのリソースの総合的なビューを提供するため、オープンソース実装、パブリックデータセット、そして一般的に使用される評価指標のコレクションをコンパイルする。この調査は、グラフの反事実学習カテゴリと現在のリソースの統一的な理解を構築するための 'one-stop-shop' として機能することを目的としている。また、文書やリソースのリポジトリも維持しており、リポジトリ https://github.com/TimeLovercc/Awesome-Graph-Causal-Learning.orgの更新を続けます。

Graph-structured data are pervasive in the real-world such as social networks, molecular graphs and transaction networks. Graph neural networks (GNNs) have achieved great success in representation learning on graphs, facilitating various downstream tasks. However, GNNs have several drawbacks such as lacking interpretability, can easily inherit the bias of the training data and cannot model the casual relations. Recently, counterfactual learning on graphs has shown promising results in alleviating these drawbacks. Various graph counterfactual learning approaches have been proposed for counterfactual fairness, explainability, link prediction and other applications on graphs. To facilitate the development of this promising direction, in this survey, we categorize and comprehensively review papers on graph counterfactual learning. We divide existing methods into four categories based on research problems studied. For each category, we provide background and motivating examples, a general framework summarizing existing works and a detailed review of these works. We point out promising future research directions at the intersection of graph-structured data, counterfactual learning, and real-world applications. To offer a comprehensive view of resources for future studies, we compile a collection of open-source implementations, public datasets, and commonly-used evaluation metrics. This survey aims to serve as a ``one-stop-shop'' for building a unified understanding of graph counterfactual learning categories and current resources. We also maintain a repository for papers and resources and will keep updating the repository https://github.com/TimeLovercc/Awesome-Graph-Causal-Learning.

翻訳日:2023-04-05 16:19:08 公開日:2023-04-03

# カイラル有効場論演算子を持つ$A = 3$核の磁気モーメント

Magnetic moments of $A = 3$ nuclei with chiral effective field theory operators ( http://arxiv.org/abs/2304.01389v1 )

ライセンス: Link先を確認

Soham Pal (1), Shiplu Sarker (1), Patrick J. Fasano (2), Pieter Maris (1), James P. Vary (1), Mark A. Caprio (2) ((1) Iowa State University, (2) University of Notre-Dame)

(参考訳) カイラル有効場理論(英語版)(\chi$EFT)は、第一原理から体系的に即興的な方法で核間相互作用を得るための枠組みを提供し、一貫した電気弱電流作用素の導出を提供する。本研究では,TritonとHelium-3の磁気双極子モーメントの計算に一貫した相互作用と電流を適用した。半局所座標空間(SCS)正則化を用いて得られるLENPIC相互作用に着目した。 LENPIC $\chi$EFTベクトル電流の運動量空間表現から、N2LOを通したSCS正規化磁気双極子作用素を導出する。次に,n2loにおけるscsレンピック相互作用を$\chi$eftで利用し,トリトンおよびヘリウム3系の非核殻モデル計算を行い,一核子及び二核子電磁電流を用いた磁気双極子モーメントの評価を行った。以前の$\chi$EFTの電流で予測されたように、N2LOによる電流補正はトリトンとヘリウム3の磁気双極子モーメントの実験と一致している。

Chiral effective field theory ($\chi$EFT) provides a framework for obtaining internucleon interactions in a systematically improvable fashion from first principles, while also providing for the derivation of consistent electroweak current operators. In this work, we apply consistently derived interactions and currents towards calculating the magnetic dipole moments of the $A=3$ systems Triton and Helium-3. We focus here on LENPIC interactions obtained using semilocal coordinate-space (SCS) regularization. Starting from the momentum-space representation of the LENPIC $\chi$EFT vector current, we derive the SCS-regularized magnetic dipole operator up through N2LO. We then carry out no-core shell model calculations for Triton and Helium-3 systems, using the SCS LENPIC interaction at N2LO in $\chi$EFT, and evaluate the magnetic dipole moments obtained using the consistently derived one-nucleon and two-nucleon electromagnetic currents. As anticipated by prior results with $\chi$EFT currents, the current corrections through N2LO provide improved, but not yet complete, agreement with experiment for the Triton and Helium-3 magnetic dipole moments.

翻訳日:2023-04-05 16:18:44 公開日:2023-04-03

# PoseMatcher: 深部特徴マッチングによる1ショット6Dオブジェクトポス推定

PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching ( http://arxiv.org/abs/2304.01382v1 )

ライセンス: Link先を確認

Pedro Castro, Tae-Kyun Kim

(参考訳) 見えないオブジェクトのポーズを推定することは、挑戦的なワンショットポーズ推定タスクの目標である。これまでの手法は機能マッチングと大きな成功に大きく依存していた。しかし、これらの手法は、特にポーズ推定のために設計されていない事前訓練されたモデルに依存しているため、しばしば非効率で制限される。本稿では,これらの制約を克服したモデルフリーワンショットオブジェクトポーズ推定器PoseMatcherを提案する。 3つのビューシステムに基づいて、オブジェクトとイメージのマッチングのための新しいトレーニングパイプラインを作成しました。このシンプルで効果的なアプローチは、トレーニング中の完全なオブジェクトポイントクラウドの近似を安価に構築することで、テスト時間のシナリオをエミュレートする。本稿では,PoseMatcherが入力モード,イメージ,ポイントクラウドの異なる部分への参加を可能にするために,入力間の自己と相互の注意を効率的に収容する新しい注意層であるIO-Layerを導入する。さらに,対象オブジェクトの冗長領域を反復的に除去し,精度を維持しつつ,ネットワークの複雑さやノイズをさらに低減するプルーニング戦略を提案する。最後に、ポーズリファインメント戦略、ズームと2Dオフセットリファインメントを再設計し、それらをワンショットパラダイムに適応させました。 linemod と ycb-v のデータセット上で,事前のワンショットポーズ推定手法を上回り,最近のインスタンスレベル手法に匹敵する結果を得る。ソースコードとモデルはhttps://github.com/pedrocastro/posematcherで入手できる。

Estimating the pose of an unseen object is the goal of the challenging one-shot pose estimation task. Previous methods have heavily relied on feature matching with great success. However, these methods are often inefficient and limited by their reliance on pre-trained models that have not be designed specifically for pose estimation. In this paper we propose PoseMatcher, an accurate model free one-shot object pose estimator that overcomes these limitations. We create a new training pipeline for object to image matching based on a three-view system: a query with a positive and negative templates. This simple yet effective approach emulates test time scenarios by cheaply constructing an approximation of the full object point cloud during training. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer, a new attention layer that efficiently accommodates self and cross attention between the inputs. Moreover, we propose a pruning strategy where we iteratively remove redundant regions of the target object to further reduce the complexity and noise of the network while maintaining accuracy. Finally we redesign commonly used pose refinement strategies, zoom and 2D offset refinements, and adapt them to the one-shot paradigm. We outperform all prior one-shot pose estimation methods on the Linemod and YCB-V datasets as well achieve results rivaling recent instance-level methods. The source code and models are available at https://github.com/PedroCastro/PoseMatcher.

翻訳日:2023-04-05 16:18:24 公開日:2023-04-03

# 機械学習を用いたパッシブ光ネットワークにおける故障分岐同定

Faulty Branch Identification in Passive Optical Networks using Machine Learning ( http://arxiv.org/abs/2304.01376v1 )

ライセンス: Link先を確認

Khouloud Abdelli, Carsten Tropschug, Helmut Griesser, and Stephan Pachnicke

(参考訳) パッシブ光ネットワーク(PON)は有望なブロードバンドアクセスネットワークソリューションとなっている。信頼できる送信を確実にし、サービスレベルの合意を満たすためには、ネットワーク障害を迅速に識別しローカライズするために、ponシステムを常に監視する必要がある。通常、PONシステムにおけるサービス中断は、主にファイバカットと光ネットワークユニット(ONU)の送信機/受信機故障に起因する。 ONUが光線端末(OLT)と異なる距離にある場合、記録された光時間領域反射率(OTDR)トレースを分析して故障したONUまたは分岐を特定することができる。しかし、同じ長さの2つ以上の枝に由来する反射が重なり合うと、故障枝の分離が非常に困難になるため、大域的な後方散乱信号による故障枝の判別が困難になる。近年、機械学習(ML)に基づくアプローチは、PONシステムにおける光学的欠陥を管理する大きな可能性を示している。このようなテクニックは、同じPONシステムから派生したデータでトレーニングやテストを行うときによく機能する。しかし、ponシステム(トレーニングデータの生成に採用)が変化した場合、例えば、より多くのブランチを追加したり、隣り合う2つのブランチの長さ差を変更したりすることで、パフォーマンスが著しく低下する可能性がある。などネットワーク変更毎にmlモデルを再トレーニングする必要があるため、時間を要する可能性がある。本稿では,ネットワークアーキテクチャとは独立に学習した汎用MLアプローチを提案し,近接長の分岐に対してOTDR信号が与えられたPONシステムの障害分岐を特定する。このようなアプローチは、ネットワークの変更毎に再トレーニングされることなく、任意のPONシステムに適用することができる。提案手法はPONシステムから得られた実験データを用いて検証する。

Passive optical networks (PONs) have become a promising broadband access network solution. To ensure a reliable transmission, and to meet service level agreements, PON systems have to be monitored constantly in order to quickly identify and localize networks faults. Typically, a service disruption in a PON system is mainly due to fiber cuts and optical network unit (ONU) transmitter/receiver failures. When the ONUs are located at different distances from the optical line terminal (OLT), the faulty ONU or branch can be identified by analyzing the recorded optical time domain reflectometry (OTDR) traces. However, faulty branch isolation becomes very challenging when the reflections originating from two or more branches with similar length overlap, which makes it very hard to discriminate the faulty branches given the global backscattered signal. Recently, machine learning (ML) based approaches have shown great potential for managing optical faults in PON systems. Such techniques perform well when trained and tested with data derived from the same PON system. But their performance may severely degrade, if the PON system (adopted for the generation of the training data) has changed, e.g. by adding more branches or varying the length difference between two neighboring branches. etc. A re-training of the ML models has to be conducted for each network change, which can be time consuming. In this paper, to overcome the aforementioned issues, we propose a generic ML approach trained independently of the network architecture for identifying the faulty branch in PON systems given OTDR signals for the cases of branches with close lengths. Such an approach can be applied to an arbitrary PON system without requiring to be re-trained for each change of the network. The proposed approach is validated using experimental data derived from PON system.

翻訳日:2023-04-05 16:18:00 公開日:2023-04-03

# Pythia: トレーニングとスケーリングを対象とする大規模言語モデル分析スイート

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling ( http://arxiv.org/abs/2304.01373v1 )

ライセンス: Link先を確認

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal

(参考訳) 大規模言語モデル(llm)は、トレーニングの過程でどのように発展し進化するのか? モデルがスケールするにつれて、これらのパターンはどのように変化するのか? これらの疑問に答えるために、我々は、同じ順序で、70Mから12Bのパラメータで見られる公開データに基づいてトレーニングされた16のLLMからなるスイートである、textit{Pythia}を紹介した。 16モデルごとに154のチェックポイントをパブリックアクセスし、トレーニングデータローダをダウンロードして再構築し、さらなる研究を行うためのツールを提供します。我々は,様々な分野の研究を容易にするために,<textit{pythia> を意図しており,記憶の新規な結果,短期の頻度効果,性別バイアスの低減など,いくつかの事例研究を行っている。この高度に制御されたセットアップは、llmとそのトレーニングダイナミクスに対する新たな洞察を得られることを実証する。トレーニングされたモデル、分析コード、トレーニングコード、トレーニングデータはhttps://github.com/EleutherAI/pythia.comにある。

How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at https://github.com/EleutherAI/pythia.

翻訳日:2023-04-05 16:17:32 公開日:2023-04-03

# 閉曲線に対するガウスモデル

Gaussian model for closed curves ( http://arxiv.org/abs/2304.01367v1 )

ライセンス: Link先を確認

Krzysztof Byrski, Przemys{\l}aw Spurek, Jacek Tabor

(参考訳) ガウス混合モデル(GMM)は、曲線データや強い非線形データにうまく適応しない。しかし、この問題を解くために、曲線座標系においてガウス的を用いることができる。さらに、そのような解は、関数の族によって定義される複雑な形状へのクラスタの適応を可能にする。しかしそれでも、クラスタを閉じた曲線(円、楕円など)としてモデル化することは困難である。本研究では,データ中の複雑なテンプレートを検出するために使用できる閉曲線の密度表現を提案する。この目的のために、閉曲線をモデル化するための新しい確率分布を定義する。そして、そのような分布の混合を構築し、一次元閉曲線の場合、効果的に訓練できることを示す。

Gaussian Mixture Models (GMM) do not adapt well to curved and strongly nonlinear data. However, we can use Gaussians in the curvilinear coordinate systems to solve this problem. Moreover, such a solution allows for the adaptation of clusters to the complicated shapes defined by the family of functions. But still, it is challenging to model clusters as closed curves (e.g., circles, ellipses, etc.). In this work, we propose a density representation of the closed curve, which can be used to detect the complicated templates in the data. For this purpose, we define a new probability distribution to model closed curves. Then we construct a mixture of such distributions and show that it can be effectively trained in the case of the one-dimensional closed curves.

翻訳日:2023-04-05 16:17:16 公開日:2023-04-03

# 自律型サイバーエージェントのためのネットワークAIジャムの実現

Enabling A Network AI Gym for Autonomous Cyber Agents ( http://arxiv.org/abs/2304.01366v1 )

ライセンス: Link先を確認

Li Li, Jean-Pierre S. El Rami, Adrian Taylor, James Hailing Rao, Thomas Kunz

(参考訳) 本研究の目的は、強化・深層強化学習(RL/DRL)を適用し、ネットワークサイバーオペレーション(CyOps)のための自律エージェントの実現である。要求されるRLトレーニング環境は、実際のネットワークエミュレーションによって達成される高忠実さの必要性と、シミュレーションを使用して最も達成される多数のトレーニングエピソードを実行する必要性のバランスをとる必要があるため、特に困難である。シミュレーションされたCyGIL-EがシミュレートされたCyGIL-Sを自動生成する統合学習環境であるCyGIL(CyGIL)を開発する。予備実験の結果から、CyGIL-SはCyGIL-Eに必要な日数と比較して数分でエージェントを訓練することができる。 CyGIL-Sで訓練されたエージェントは、エミュレートされた「リアル」ネットワークで完全な意思決定能力を示すCyGIL-Eに直接転送可能である。オフラインでRLを実行するCyGILソリューションは、現実のサイバーネットワークでRLエージェントを活用するためのsim-to-realに向けた有望な方向を示す。

This work aims to enable autonomous agents for network cyber operations (CyOps) by applying reinforcement and deep reinforcement learning (RL/DRL). The required RL training environment is particularly challenging, as it must balance the need for high-fidelity, best achieved through real network emulation, with the need for running large numbers of training episodes, best achieved using simulation. A unified training environment, namely the Cyber Gym for Intelligent Learning (CyGIL) is developed where an emulated CyGIL-E automatically generates a simulated CyGIL-S. From preliminary experimental results, CyGIL-S is capable to train agents in minutes compared with the days required in CyGIL-E. The agents trained in CyGIL-S are transferrable directly to CyGIL-E showing full decision proficiency in the emulated "real" network. Enabling offline RL, the CyGIL solution presents a promising direction towards sim-to-real for leveraging RL agents in real-world cyber networks.

翻訳日:2023-04-05 16:17:07 公開日:2023-04-03

# 言語横断プラジャリズム検出の簡便かつ効果的な方法

A Simple and Effective Method of Cross-Lingual Plagiarism Detection ( http://arxiv.org/abs/2304.01352v1 )

ライセンス: Link先を確認

Karen Avetisyan, Arthur Malajyan, Tsolak Ghukasyan

(参考訳) 本稿では,多数の言語に適用可能な単純な言語間プラジャリズム検出手法を提案する。提案手法は,候補検索タスクにオープンな多言語セサリと,詳細な解析に事前訓練された多言語BERT言語モデルを利用する。この方法は、使用時に機械翻訳や単語認識の曖昧さに依存しないため、非ソース言語を含む多数の言語に適している。提案手法の有効性は、いくつかの既存および新しいベンチマークで実証され、フランス語、ロシア語、アルメニア語の最先端の結果が得られた。

We present a simple cross-lingual plagiarism detection method applicable to a large number of languages. The presented approach leverages open multilingual thesauri for candidate retrieval task and pre-trained multilingual BERT-based language models for detailed analysis. The method does not rely on machine translation and word sense disambiguation when in use, and therefore is suitable for a large number of languages, including under-resourced languages. The effectiveness of the proposed approach is demonstrated for several existing and new benchmarks, achieving state-of-the-art results for French, Russian, and Armenian languages.

翻訳日:2023-04-05 16:16:49 公開日:2023-04-03

# Maxwell's Demon for Emergent Page Curve and Split Property

Maxwell's Demon for Emergent Page Curve and Split Property ( http://arxiv.org/abs/2304.01414v1 )

ライセンス: Link先を確認

Yang An

(参考訳) 緊急重力の適切な状況を求めて,最近我々は,島の発達の非重力結合構造に類似した極端表面が変化する場合にのみエントロピー機構が発生することを明らかにした。本稿では,これらのユークリッド状態の進化を見出すためには,外力が$F_{ex}\propto T_{H}\delta A(\mu_a)$である必要がある。我々は、降下傾向を第2法則傾向に類似させ、ホーキング放射の潜在的傾向を探索する。この類似性はマクスウェルの悪魔の役割を想起させ、近接宇宙においてページ曲線が現れるための浴槽のメカニズムを解釈する。これは、ラジュによって提起された量子重力に関するスプリット問題を解決できる。

Seeking for the proper situation of Emergent Gravity, we recent reveal that the entropic mechanism only happens when extremal surfaces are varied, which is similar to the non-gravitational-bath-coupled setup of the island development. In this paper, we consider perturbing thin shell state outside horizon during equilibrating, to find the evolution of these Euclidean states requires an external force to be $ F_{ex}\propto T_{H}\delta A(\mu_a)$, proportional to area variation of the apparent horizon which could transform into the actual event horizon as extremal surface. We analogize the falling tendency to the 2nd law tendency, and then explore the potential tendency violation of Hawking radiation. This analogy recalls the role of Maxwell's Demon to interpret the mechanism of the bath for the Page curve to emerge in a close universe. It could reconcile the Split Problem concerning quantum gravity raised by Raju.

翻訳日:2023-04-05 16:10:13 公開日:2023-04-03

# 量子等化に対するコヒーレントLQGアプローチ

A Coherent LQG approach to Quantum Equalization ( http://arxiv.org/abs/2304.01413v1 )

ライセンス: Link先を確認

Rebbecca TY Thien, Shanon L. Vuglar and Ian R. Petersen

(参考訳) 量子等化問題を解くために,サブオプティカルでコヒーレントな量子lqgコントローラを設計する手法を提案する。本手法では,制御問題として問題を再構成し,古典的なLQGコントローラを設計し,量子システムとして実装する。例としては、アクティブシステムとパッシブシステムの両方のアルゴリズム、すなわち、運動量演算子と運動量演算子の両方で力学が記述されるシステムと、消滅演算子のみの力学を持つシステムを示す。

We propose a method to design a suboptimal, coherent quantum LQG controller to solve a quantum equalization problem. Our method involves reformulating the problem as a control problem and then designing a classical LQG controller and implementing it as a quantum system. Illustrative examples are included which demonstrate the algorithm for both active and passive systems, i.e., systems where the dynamics are described in terms of both position and momentum operators and systems with dynamics in terms of annihilation operators only.

翻訳日:2023-04-05 16:09:54 公開日:2023-04-03

# statcan dialogue dataset: 真の意図による会話によるデータテーブルの検索

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents ( http://arxiv.org/abs/2304.01412v1 )

ライセンス: Link先を確認

Xing Han Lu, Siva Reddy, Harm de Vries

(参考訳) 我々は、StatCan Dialogue Datasetを導入し、カナダ統計局で働いているエージェントと、公開データテーブルを探しているオンラインユーザとの間で19,379の会話を交わした。会話は本質的な意図に起因し、英語やフランス語で行われ、5000以上の複雑なデータテーブルの1つを取得するエージェントに繋がる。このデータセットに基づいて,(1)現在進行中の会話に基づく関連表の自動検索,(2)各ターンにおける適切なエージェント応答の自動生成の2つのタスクを提案する。我々は,強いベースラインを確立することで各タスクの難しさを調査する。時間的データ分割の実験では、検証からテストセットに移行するとき、両方のタスク間でパフォーマンスが大幅に低下するのを観察するため、すべてのモデルが将来の会話に一般化するのに苦労していることが明らかになりました。さらに、応答生成モデルは、いつテーブルを返すかを決定するのに苦労している。タスクが既存のモデルに重大な課題をもたらすことを考慮し、私たちはコミュニティにタスクのためのモデル開発を奨励します。

We introduce the StatCan Dialogue Dataset consisting of 19,379 conversation turns between agents working at Statistics Canada and online users looking for published data tables. The conversations stem from genuine intents, are held in English or French, and lead to agents retrieving one of over 5000 complex data tables. Based on this dataset, we propose two tasks: (1) automatic retrieval of relevant tables based on a on-going conversation, and (2) automatic generation of appropriate agent responses at each turn. We investigate the difficulty of each task by establishing strong baselines. Our experiments on a temporal data split reveal that all models struggle to generalize to future conversations, as we observe a significant drop in performance across both tasks when we move from the validation to the test set. In addition, we find that response generation models struggle to decide when to return a table. Considering that the tasks pose significant challenges to existing models, we encourage the community to develop models for our task, which can be directly used to help knowledge workers find relevant tables for live chat users.

翻訳日:2023-04-05 16:09:43 公開日:2023-04-03

# キャビティ媒介型集合モーメント交換相互作用

Cavity-Mediated Collective Momentum-Exchange Interactions ( http://arxiv.org/abs/2304.01411v1 )

ライセンス: Link先を確認

Chengyi Luo, Haoqing Zhang, Vanessa P. W. Koh, John D. Wilson, Anjun Chu, Murray J. Holland, Ana Maria Rey, and James K. Thompson

(参考訳) 量子シミュレーションとセンシングは、複雑な相互作用系の理解から未発見の物理学の探索まで、自然に新たな洞察を提供するという大きな約束を持っている。無限距離光子を媒介する相互作用によって相互作用するレーザー冷却原子の大規模なアンサンブルは、両方の試みの強力な基盤である。ここでは、原子が共通のキャビティモードから光子の集団放出と吸収を通じて運動量状態と交換する最初の運動量交換相互作用を実現する。運動量-交換相互作用は、物質波干渉計におけるオール・ツー・オール・アイシングのような相互作用をもたらす。多体エネルギーギャップも出現し、干渉計の物質波パケットを効果的に結合してドップラー劣化を抑制する。調整可能な運動量-交換相互作用は、量子相互作用による物質-波干渉法と超伝導体や力学ゲージ場のシミュレーションを含むエキゾチックな挙動を実現するための新しい能力を提供する。

Quantum simulation and sensing hold great promise for providing new insights into nature, from understanding complex interacting systems to searching for undiscovered physics. Large ensembles of laser-cooled atoms interacting via infinite-range photon mediated interactions are a powerful platform for both endeavours. Here, we realize for the first time momentum-exchange interactions in which atoms exchange their momentum states via collective emission and absorption of photons from a common cavity mode. The momentum-exchange interaction leads to an observed all-to-all Ising-like interaction in a matter-wave interferometer, which is useful for entanglement generation. A many-body energy gap also emerges, effectively binding interferometer matter-wave packets together to suppress Doppler dephasing, akin to M\"ossbauer spectroscopy. The tunable momentum-exchange interaction provides a new capability for quantum interaction-enhanced matter-wave interferometry and for realizing exotic behaviors including simulations of superconductors and dynamical gauge fields.

翻訳日:2023-04-05 16:09:23 公開日:2023-04-03

# 実現可能性保証付き2段直流最適潮流の効率的な学習型解法

An Efficient Learning-Based Solver for Two-Stage DC Optimal Power Flow with Feasibility Guarantees ( http://arxiv.org/abs/2304.01409v1 )

ライセンス: Link先を確認

Ling Zhang, Daniel Tabas and Baosen Zhang

(参考訳) 本稿では,負荷が不確実性に直面している場合の最適かつ信頼性の高いディスパッチのためのシナリオベース2段階直流最適電力流(OPF)問題を考察する。この問題は線形プログラムであるが、不確実性を正確に表わすのに必要な多数のシナリオのため、計算的に解決が難しいままである。計算問題を軽減するため、第2段階の決定をより効率的に処理できるように、多くの手法が提案されている。第二段階の決定を近似する適切なポリシーを見つける上での課題は、これらのソリューションが実現可能である必要があることである。そこで本稿では,この2段階問題をより効率的かつ最適な方法で解くための学習法を提案する。ゲージマップと呼ばれる手法が学習アーキテクチャ設計に組み込まれ、学習したソリューションがネットワーク制約に対して実現可能であることを保証する。すなわち、実現可能なソリューションのみを出力するフォワード関数をフィードするポリシーを設計できる。標準IEEEシステムにおけるシミュレーション結果から, 反復解法や広く用いられているアフィンポリシと比較して, 提案手法は良質な解を学習するだけでなく, 桁違いの計算を高速化することを示した。

In this paper, we consider the scenario-based two-stage stochastic DC optimal power flow (OPF) problem for optimal and reliable dispatch when the load is facing uncertainty. Although this problem is a linear program, it remains computationally challenging to solve due to the large number of scenarios needed to accurately represent the uncertainties. To mitigate the computational issues, many techniques have been proposed to approximate the second-stage decisions so they can dealt more efficiently. The challenge of finding good policies to approximate the second-stage decisions is that these solutions need to be feasible, which has been difficult to achieve with existing policies. To address these challenges, this paper proposes a learning method to solve the two-stage problem in a more efficient and optimal way. A technique called the gauge map is incorporated into the learning architecture design to guarantee the learned solutions' feasibility to the network constraints. Namely, we can design policies that are feed forward functions that only output feasible solutions. Simulation results on standard IEEE systems show that, compared to iterative solvers and the widely used affine policy, our proposed method not only learns solutions of good quality but also accelerates the computation by orders of magnitude.

翻訳日:2023-04-05 16:09:07 公開日:2023-04-03

# 目標情報の拡張による学習:フィードバックアライメントの代替理論

Learning with augmented target information: An alternative theory of Feedback Alignment ( http://arxiv.org/abs/2304.01406v1 )

ライセンス: Link先を確認

Huzi Cheng, Joshua W. Brown

(参考訳) エラーバックプロパゲーション(bp)は、ほぼ全ての現代のニューラルネットワークのトレーニングを長い間支配してきたが、対称ウェイト要件や同期更新など、いくつかの生物学的な可能性の問題に悩まされている。フィードバックアライメント(FA)はBPの代替として提案され、様々なタスクやネットワークアーキテクチャに有効であることが示されている。その単純さと有効性にもかかわらず、さまざまなアーキテクチャでFAがどのように機能するかという満足のいく説明はまだ欠けている。本稿では、FAが情報理論のレンズを通してどのように機能するかという新しいアーキテクチャに依存しない理論を提案する: BPが計算した勾配を同じパラメータで近似する代わりに、FAはトレーニング対象情報をニューラルネットワークに埋め込むことで効果的な表現を学習する。理想的な設定におけるFAダイナミクスの分析と、一連の実験を通してこれを示す。この理論の意義に基づき、我々は3種類のFAを設計し、複数のタスクで同等の性能を示す。これらの変種は、予測符号化や表現の漂流のような神経科学のいくつかの現象や理論も説明できる。

While error backpropagation (BP) has dominated the training of nearly all modern neural networks for a long time, it suffers from several biological plausibility issues such as the symmetric weight requirement and synchronous updates. Feedback Alignment (FA) was proposed as an alternative to BP to address those dilemmas and has been demonstrated to be effective on various tasks and network architectures. Despite its simplicity and effectiveness, a satisfying explanation of how FA works across different architectures is still lacking. Here we propose a novel, architecture-agnostic theory of how FA works through the lens of information theory: Instead of approximating gradients calculated by BP with the same parameter, FA learns effective representations by embedding target information into neural networks to be trained. We show this through the analysis of FA dynamics in idealized settings and then via a series of experiments. Based on the implications of this theory, we designed three variants of FA and show their comparable performance on several tasks. These variants also account for some phenomena and theories in neuroscience such as predictive coding and representational drift.

翻訳日:2023-04-05 16:08:47 公開日:2023-04-03

# ワークミーティングにおけるアバター--フォトリアリズムとアピールの関係

Avatars in Work Meetings: Correlation Between Photorealism and Appeal ( http://arxiv.org/abs/2304.01405v1 )

ライセンス: Link先を確認

Vrushank Phadnis, Kristin Moore and Mar Gonzalez Franco

(参考訳) 職場会議におけるアバターの受容性に及ぼすリアリズムの影響を検討した。 2509人の知識労働者を対象に、アニメーションGIFを用いて5レベルのフォトリアリズムを検証した。アバターのスタイルは、マネージャ、既知の同僚、未知の同僚によって使用された。すべてのシナリオにおいて、より高いリアリズムが好まれることがわかったが、完全に現実的なアバターは時々参加者に不利であると認識された。調査結果をセグメンテーションして,調査回答の年齢層と組織パターンを調査した。最後に,オープンエンド反応を評価し,アバターの選択に影響を及ぼす要因の質的評価を行う。その結果,光リアリズムは作業アバターの選択における重要な属性であることがわかった。しかし、アバターを使った仕事仲間との地域選好や関係も役割を担っている可能性がある。アバター選択に影響を与える他の要因の探索は、職場でのアバター使用の影響をさらに理解するために必要である。

We investigated the effects of realism on acceptability of avatars for work meetings. Our survey of 2509 knowledge workers tested five levels of photorealism using animated GIFs. Avatar styles were rated for usage by: a manager, known colleague and unknown colleague. In all scenarios, we found that higher realism was favored; however fully realistic avatars were sometimes perceived as uncanny by participants. We segmented our results to uncover demographic and firmographic patterns in the survey responses. Lastly, we caveat our findings by evaluating open end responses to provide a qualitative evaluation of factors influencing avatar choices for work meetings. In conclusion, our findings suggest that photorealism is a key attribute in selecting work avatars. However, regional preferences and relationship with work colleagues using the avatar may also play a role. Exploration of other factors influencing work avatar selection is needed to further understand the implications of avatar use in the workplace.

翻訳日:2023-04-05 16:08:29 公開日:2023-04-03

# アクティブトランスファー学習に基づくレベルセット推定による材料表面の適応的欠陥領域同定

Adaptive Defective Area Identification in Material Surface Using Active Transfer Learning-based Level Set Estimation ( http://arxiv.org/abs/2304.01404v1 )

ライセンス: Link先を確認

Shota Hozumi, Kentaro Kutsukake, Kota Matsui, Syunya Kusakawa, Toru Ujihara, Ichiro Takeuchi

(参考訳) 材料キャラクタリゼーションでは、材料表面上の欠陥領域の同定が基本である。従来のアプローチでは、表面上の所定のメッシュグリッドポイントにおける関連物理特性をポイント単位で測定し、その特性が所望のレベルに達しない領域を決定する。より効率的に欠陥領域を同定するために,測定資源を優先的に使用して欠陥領域の境界を検出する適応マッピング手法を提案する。我々はこの問題をレベルセット推定(LSE)問題のアクティブラーニング(AL)として解釈する。 AL-based LSEの目標は、表面で定義される物理特性関数のレベルセットをできるだけ少数の測定値で決定することである。さらに, 同様の仕様の材料が繰り返し生産される状況に対処するため, 以前に作成された材料の情報を効果的に活用できるように, 転写学習手法を導入する。概念実証として,提案手法をシリコンウェハの赤帯推定問題に適用し,従来の手法よりもかなり低い測定コストで欠陥領域を同定できることを実証した。

In material characterization, identifying defective areas on a material surface is fundamental. The conventional approach involves measuring the relevant physical properties point-by-point at the predetermined mesh grid points on the surface and determining the area at which the property does not reach the desired level. To identify defective areas more efficiently, we propose adaptive mapping methods in which measurement resources are used preferentially to detect the boundaries of defective areas. We interpret this problem as an active-learning (AL) of the level set estimation (LSE) problem. The goal of AL-based LSE is to determine the level set of the physical property function defined on the surface with as small number of measurements as possible. Furthermore, to handle the situations in which materials with similar specifications are repeatedly produced, we introduce a transfer learning approach so that the information of previously produced materials can be effectively utilized. As a proof-of-concept, we applied the proposed methods to the red-zone estimation problem of silicon wafers and demonstrated that we could identify the defective areas with significantly lower measurement costs than those of conventional methods.

翻訳日:2023-04-05 16:08:17 公開日:2023-04-03

# U-Netmerが医療用画像セグメンテーション用トランスフォーマーを発表

U-Netmer: U-Net meets Transformer for medical image segmentation ( http://arxiv.org/abs/2304.01401v1 )

ライセンス: Link先を確認

Sheng He, Rina Bao, P. Ellen Grant, Yangming Ou

(参考訳) U-NetベースのディープラーニングモデルとTransformerの組み合わせは、医療画像セグメンテーションの新しいトレンドである。 U-Netは詳細な局所意味情報やテクスチャ情報を抽出でき、Transformerは入力画像中の画素間の長距離依存関係を学習することができる。しかし、セグメンテーションのためにTransformerを直接適用するには、‘token-flatten’問題(ローカルパッチを局所パッチ内のピクセル間の相互作用を損なう1Dトークンにフラット化する)と‘‘scale-sensitivity’問題(入力イメージをローカルパッチに分割するために固定スケールを使用する)がある。そこで本研究では,u-netとtransformerの直接結合と比較して,u-netとtransformerのグローバル・ローカルな組み合わせを提案する。提案するu-netmerは入力画像をローカルパッチに分割する。局所パッチ間のグローバルコンテキスト情報は、トランスフォーマおよびu-netセグメントにおける自己アテンション機構によって学習され、トークンに平ら化せずに各局所パッチが“トケンフラット”問題を解決する。u-netmerは、入力画像を同じ構造とパラメータで異なるパッチサイズでセグメント化することができる。したがって、u-netmerは、‘scale-sensitivity’問題を解決するために、異なるパッチサイズで訓練することができる。 7つの臓器 (脳, 心臓, 乳房, 肺, ポリープ, 膵, 前立腺) と4つの画像モダリティ (MRI, CT, 超音波, 内視鏡) を用いて, 提案したU-Netmerが医用画像セグメンテーションの精度を向上させるために一般的に適用可能であることを示す。これらの実験結果から,U-Netmerはベースラインや他のモデルと比較して最先端の性能を提供することがわかった。また、異なるスケールのU-Netmerの出力間の差は、グラウンドトルースのない難易度でテスト画像のランク付けに信頼性スコアとみなすことのできるセグメンテーション精度と線形に相関する。

The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has ``token-flatten" problem (flattens the local patches into 1D tokens which losses the interaction among pixels within local patches) and ``scale-sensitivity" problem (uses a fixed scale to split the input image into local patches). Compared to directly combining U-Net and Transformer, we propose a new global-local fashion combination of U-Net and Transformer, named U-Netmer, to solve the two problems. The proposed U-Netmer splits an input image into local patches. The global-context information among local patches is learnt by the self-attention mechanism in Transformer and U-Net segments each local patch instead of flattening into tokens to solve the `token-flatten" problem. The U-Netmer can segment the input image with different patch sizes with the identical structure and the same parameter. Thus, the U-Netmer can be trained with different patch sizes to solve the ``scale-sensitivity" problem. We conduct extensive experiments in 7 public datasets on 7 organs (brain, heart, breast, lung, polyp, pancreas and prostate) and 4 imaging modalities (MRI, CT, ultrasound, and endoscopy) to show that the proposed U-Netmer can be generally applied to improve accuracy of medical image segmentation. These experimental results show that U-Netmer provides state-of-the-art performance compared to baselines and other models. In addition, the discrepancy among the outputs of U-Netmer with different scales is linearly correlated to the segmentation accuracy which can be considered as a confidence score to rank test images by difficulty without ground-truth.

翻訳日:2023-04-05 16:08:01 公開日:2023-04-03

# 皮膚科医の信頼向上へのフィードバックに基づく皮膚病変分類のための説明可能なCNNの微調整

Fine-tuning of explainable CNNs for skin lesion classification based on dermatologists' feedback towards increasing trust ( http://arxiv.org/abs/2304.01399v1 )

ライセンス: Link先を確認

Md Abdul Kadir, Fabrizio Nunnari, Daniel Sonntag

(参考訳) 本稿では,分類そのものと分類の視覚的説明という2つの出力を同時にフィードバックできるCNNファインチューニング手法を提案する。皮膚病変分類タスクにおけるこのフィードバック戦略の効果を示し、CNNが2種類のユーザフィードバックにどう反応するかを測定する。このアプローチを実現するために,学習ループにおけるモデル決定を説明するため,Grad-CAM技術を統合した新しいCNNアーキテクチャを提案する。シミュレーションされたユーザフィードバックを用いて,分類と説明の両方を微調整することで,分類精度を保ちながら視覚的説明が向上し,CNNベースの皮膚病変分類器の信頼性が向上する可能性が示唆された。

In this paper, we propose a CNN fine-tuning method which enables users to give simultaneous feedback on two outputs: the classification itself and the visual explanation for the classification. We present the effect of this feedback strategy in a skin lesion classification task and measure how CNNs react to the two types of user feedback. To implement this approach, we propose a novel CNN architecture that integrates the Grad-CAM technique for explaining the model's decision in the training loop. Using simulated user feedback, we found that fine-tuning our model on both classification and explanation improves visual explanation while preserving classification accuracy, thus potentially increasing the trust of users in using CNN-based skin lesion classifiers.

翻訳日:2023-04-05 16:07:16 公開日:2023-04-03

# 特殊Q-ロスによる非線形MPCからの模倣学習とそのガウスニュートン近似

Imitation Learning from Nonlinear MPC via the Exact Q-Loss and its Gauss-Newton Approximation ( http://arxiv.org/abs/2304.01782v1 )

ライセンス: Link先を確認

Andrea Ghezzi, Jasper Hoffman, Jonathan Frey, Joschka Boedecker, Moritz Diehl

(参考訳) 本稿では, 模倣学習による非線形モデル予測制御方針学習のための新しい損失関数を提案する。模倣学習の標準的なアプローチは、専門家に関する情報を無視し、専門家と学習したコントロールの間の距離に基づいた損失関数を採用する。そこで本研究では,提案する最適制御問題(ocp)の性能目標と制約満足度を直接埋め込んだq関数に基づく損失を提案する。しかし、ニューラルネットワークをQ-lossでトレーニングするには、新しいサンプルごとに関連するOCPを解決する必要がある。計算負荷を軽減するため,OCPのガウス・ニュートン近似に基づいて第2のQ損失を導出し,学習時間を短縮する。我々は,制約のある非線形システムの制御において,模倣学習の標準的アプローチである行動クローンに対する損失を検証する。最終結果は、Q関数に基づく損失は、同等あるいはより良い閉ループコストを達成する一方で、制約違反の量を大幅に減少させることを示した。

This work presents a novel loss function for learning nonlinear Model Predictive Control policies via Imitation Learning. Standard approaches to Imitation Learning neglect information about the expert and generally adopt a loss function based on the distance between expert and learned controls. In this work, we present a loss based on the Q-function directly embedding the performance objectives and constraint satisfaction of the associated Optimal Control Problem (OCP). However, training a Neural Network with the Q-loss requires solving the associated OCP for each new sample. To alleviate the computational burden, we derive a second Q-loss based on the Gauss-Newton approximation of the OCP resulting in a faster training time. We validate our losses against Behavioral Cloning, the standard approach to Imitation Learning, on the control of a nonlinear system with constraints. The final results show that the Q-function-based losses significantly reduce the amount of constraint violations while achieving comparable or better closed-loop costs.

翻訳日:2023-04-05 13:49:13 公開日:2023-04-03

# 音声認識におけるウェークワードスポッティングのためのデュアルアテンションニューラルトランスデューサ

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition ( http://arxiv.org/abs/2304.01905v1 )

ライセンス: Link先を確認

Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree M. Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann

(参考訳) 本稿では,wake words (ww) 認識を促進させ,音声認識タスクにおける推論時間遅延を改善するアーキテクチャであるdual-attention neural biasingを提案する。このアーキテクチャは、wwスポッティングを利用して、入力オーディオフレームに対してどのブランチを実行するかを選択することで、実行時の計算パスの動的スイッチを可能にする。提案手法では,浮動小数点演算(FLOP)によって定義されたランタイム計算コストを削減しつつ,WWスポッティング精度を効果的に向上する。そこで本研究では,本提案方式のデュアルアテンションネットワークを用いて,wwオーディオフレームの計算コストを90-%$で削減し,パラメータ数を1-%$で増やすことを実証する。このアーキテクチャは、ww f1スコアを16\%$相対的に改善し、一般的なレアワードエラーレートをベースラインと比較して3\%$改善する。

We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by $90\%$ for WW audio frames, with only $1\%$ increase in the number of parameters. This architecture improves WW F1 score by $16\%$ relative and improves generic rare word error rate by $3\%$ relative compared to the baselines.

翻訳日:2023-04-05 13:12:03 公開日:2023-04-03

# TransPimLib: メモリ内処理システムにおける効率的な超越関数ライブラリ

TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems ( http://arxiv.org/abs/2304.01951v1 )

ライセンス: Link先を確認

Maurus Item, Juan G\'omez-Luna, Yuxin Guo, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu

(参考訳) プロセッシング・イン・メモリ(PIM)は、現代のコンピューティングシステムにおけるデータ移動のボトルネックを軽減することを約束する。しかし、現在の実世界のpimシステムは、メモリの近くで処理要素を構築するのが困難でコストがかかるため、ハードウェアが従来のプロセッサ(cpu、gpu)よりも制約が強いという固有の欠点がある。その結果、汎用PIMアーキテクチャは、かなり限られた命令セットをサポートし、超越関数などの複雑な操作(例えば平方根)を実行するのに苦労する。これらの操作は、機械学習アプリケーションにおけるアクティベーション機能など、現代のワークロードにおいて特に重要である。汎用PIMシステムにおける超越関数(およびその他のハード・トゥ・カルキュレート関数)のサポートを提供するため,CORDICに基づく三角関数,双曲関数,指数関数,対数,平方根などのためのライブラリである \emph{TransPimLib} を提案する。 UPMEM PIMアーキテクチャのためのTransPimLibの実装を開発し、マイクロベンチマークと3つのフルワークロード(Blackscholes, Sigmoid, Softmax)を用いて、TransPimLibの手法を性能と精度で徹底的に評価する。私たちは、すべてのコードとデータセットを、~\url{https://github.com/CMU-SAFARI/transpimlib}でオープンソースにしています。

Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at~\url{https://github.com/CMU-SAFARI/transpimlib}.

翻訳日:2023-04-05 13:02:54 公開日:2023-04-03

# 適応測定フィルタ:量子マルコフ連鎖の最適推定のための効率的な戦略

Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains ( http://arxiv.org/abs/2204.08964v5 )

ライセンス: Link先を確認

Alfred Godley and Madalin Guta

(参考訳) 連続時間計測は、量子工学と量子制御における多くのタスクに役立ち、環境を通じて監視される開量子システムの動的パラメータの推定を含む。しかし、そのような測定は出力状態で利用できる情報の最大量を抽出しないので、代替の最適測定戦略を見つけることが大きな課題である。本稿では、離散時間入力出力量子マルコフ連鎖の設定においてこの問題を解決する。本稿では,「計測フィルタ」演算子を更新し,出力単位の連続的な測定基準を決定する反復的な手順からなる一次元動的パラメータの最適推定アルゴリズムを提案する。このスキームの重要な要素は、システムとの相互作用後に出力を後処理する方法としてコヒーレント量子吸収器を使用することである。これは、結合系と吸収体定常状態が基準パラメータ値で純粋であるように適応的に設計される。このスキームは、最適連続時間適応測定のエキサイティングな展望を提供するが、現実的な実用的な実装を見つけるにはより多くの作業が必要である。

Continuous-time measurements are instrumental for a multitude of tasks in quantum engineering and quantum control, including the estimation of dynamical parameters of open quantum systems monitored through the environment. However, such measurements do not extract the maximum amount of information available in the output state, so finding alternative optimal measurement strategies is a major open problem. In this paper we solve this problem in the setting of discrete-time input-output quantum Markov chains. We present an efficient algorithm for optimal estimation of one-dimensional dynamical parameters which consists of an iterative procedure for updating a `measurement filter' operator and determining successive measurement bases for the output units. A key ingredient of the scheme is the use of a coherent quantum absorber as a way to post-process the output after the interaction with the system. This is designed adaptively such that the joint system and absorber stationary state is pure at a reference parameter value. The scheme offers an exciting prospect for optimal continuous-time adaptive measurements, but more work is needed to find realistic practical implementations.

翻訳日:2023-04-05 10:48:36 公開日:2023-04-03

# 最大公約数とプライベート集合交叉に対するセキュアな多要素量子計算

Secure multiparty quantum computations for greatest common divisor and private set intersection ( http://arxiv.org/abs/2303.17196v3 )

ライセンス: Link先を確認

Muhammad Imran

(参考訳) 本稿では,Liu,Yang,LiによるPSU(quantum multiparty private set union)に基づいて,最大共通因子(GCD)を計算するためのセキュアなマルチパーティ量子計算(MPQC)を提案する。最初のステップとして、Liu と Li による最小共通倍数 (LCM) 計算のための MPQC プロトコルのセキュリティを改善し、標準 (確率) Shor の量子周期フィニングアルゴリズム (QPA) の代わりに、効率的な正確な量子周期フィニングアルゴリズム (EQPA) をサブルーチンとして構築する。標準QPAの代わりにEQPAを使用することは、繰り返しなしでプロトコルの正確性を保証する。 LCMプロトコルの改良により、計算用LCMに基づくプライベート・セット・ユニオンプロトコルも改善される。最後に、PSUプロトコルの同じ考え方を用いて、PSI問題をGCD計算問題に変換することにより、量子多元的プライベートセット交差点(PSI)を構築する。性能解析により,半正直モデルにおける正当性と無条件のセキュリティは,サブルーチンプロトコル(LCMおよびPSUプロトコル)の正当性とセキュリティから直接保証されることが示された。さらに,提案プロトコルの複雑さは,秘密入力の大きさとパーティ数における多項式であることを示す。

We present a secure multiparty quantum computation (MPQC) for computing greatest common divisor (GCD) based on quantum multiparty private set union (PSU) by Liu, Yang, and Li. As the first step, we improve the security of the MPQC protocol for computing least common multiple (LCM) by Liu and Li by constructing an efficient exact quantum period-finding algorithm (EQPA) as a subroutine instead of the standard (probabilistic) Shor's quantum period-finding algorithm (QPA). The use of EQPA instead of the standard QPA guarantees the correctness of the protocol without repetitions. The improvement of LCM protocol also improves the private set union protocol which is based on computing LCM. Finally, using the same idea of the PSU protocol, we construct a quantum multiparty private set intersection (PSI) by transforming the PSI problem into the problem of computing GCD. Performance analysis shows that the correctness and the unconditional security in the semihonest model are guaranteed directly from the correctness and the security of the subroutine protocols (LCM and PSU protocols). Moreover, we show that the complexity of the proposed protocols is polynomial in the size of the secret inputs and the number of parties.

翻訳日:2023-04-05 10:38:04 公開日:2023-04-03

# MSC: StarCraft IIのマクロ管理のためのデータセット

MSC: A Dataset for Macro-Management in StarCraft II ( http://arxiv.org/abs/1710.03131v3 )

ライセンス: Link先を確認

Huikai Wu, Yanqi Zong, Junge Zhang, Kaiqi Huang

(参考訳) マクロ管理はstarcraftの重要な問題であり、長い間研究されてきた。さまざまなデータセットとさまざまなメソッドがここ数年で提案されている。しかしこれらのデータセットには、学術研究や産業研究の促進にいくつかの欠陥がある。 1) 標準的な事前処理、解析、機能抽出の手順も、いくつかのデータセットで事前に定義されたトレーニング、検証、テストセットもない。 2)いくつかのデータセットはマクロ管理の特定のタスクに対してのみ指定される。 3) 一部のデータセットは小さすぎるか、ディープニューラルネットワークのような現代の機械学習アルゴリズムに十分なラベル付きデータを持っていない。したがって、以前のほとんどのメソッドはさまざまな機能でトレーニングされ、同じまたは異なるデータセットの異なるテストセットで評価されるため、直接比較することが困難になる。 StarCraftにおけるマクロ管理の研究を促進するため、SC2LEプラットフォームに基づく新しいデータセットMSCをリリースする。 mscはよく設計された特徴ベクトル、事前定義されたハイレベルアクション、各マッチの最終結果で構成される。また,MSCをトレーニング,検証,テストセットに分割し,評価と比較の便宜を図る。データセットの他に,グローバル状態評価とビルド順序予測のためのベースラインモデルと最初のベースライン結果を提案し,マクロ管理における2つの重要なタスクである。また、StarCraft IIにおけるマクロ管理の研究のために、さまざまな下流タスクやデータセットの分析も記述されている。ホームページ:https://github.com/wuhuikai/MSC

Macro-management is an important problem in StarCraft, which has been studied for a long time. Various datasets together with assorted methods have been proposed in the last few years. But these datasets have some defects for boosting the academic and industrial research: 1) There're neither standard preprocessing, parsing and feature extraction procedures nor predefined training, validation and test set in some datasets. 2) Some datasets are only specified for certain tasks in macro-management. 3) Some datasets are either too small or don't have enough labeled data for modern machine learning algorithms such as deep neural networks. So most previous methods are trained with various features, evaluated on different test sets from the same or different datasets, making it difficult to be compared directly. To boost the research of macro-management in StarCraft, we release a new dataset MSC based on the platform SC2LE. MSC consists of well-designed feature vectors, pre-defined high-level actions and final result of each match. We also split MSC into training, validation and test set for the convenience of evaluation and comparison. Besides the dataset, we propose a baseline model and present initial baseline results for global state evaluation and build order prediction, which are two of the key tasks in macro-management. Various downstream tasks and analyses of the dataset are also described for the sake of research on macro-management in StarCraft II. Homepage: https://github.com/wuhuikai/MSC.

翻訳日:2023-04-05 02:41:44 公開日:2023-04-03

# 軌跡推論の数学的理論に向けて

Towards a mathematical theory of trajectory inference ( http://arxiv.org/abs/2102.09204v2 )

ライセンス: Link先を確認

Hugo Lavenant, Stephen Zhang, Young-Heon Kim, Geoffrey Schiebinger

(参考訳) 確率過程の軌跡を時間的辺縁のサンプルから推測するための理論的枠組みと数値的手法を考案する。この問題は、細胞状態の高次元計測を提供するが、経時的に細胞の軌道を追跡できない単細胞rna配列データの解析において生じる。確率過程のクラスにおいて,各時点における時間的辺縁の限られたサンプルから基底真理軌道を復元することが可能であることが証明され,実際に行うための効率的なアルゴリズムが提供される。開発したGlobal Waddington-OT (gWOT) は, エントロピー規則化された最適輸送を含む全時間点において, 円滑な凸最適化問題である。そこで本研究では,本課題を効率的に解決できることを示すとともに,いくつかの合成データと実データを用いて,良好な再構成を実現する。

We devise a theoretical framework and a numerical method to infer trajectories of a stochastic process from samples of its temporal marginals. This problem arises in the analysis of single cell RNA-sequencing data, which provide high dimensional measurements of cell states but cannot track the trajectories of the cells over time. We prove that for a class of stochastic processes it is possible to recover the ground truth trajectories from limited samples of the temporal marginals at each time-point, and provide an efficient algorithm to do so in practice. The method we develop, Global Waddington-OT (gWOT), boils down to a smooth convex optimization problem posed globally over all time-points involving entropy-regularized optimal transport. We demonstrate that this problem can be solved efficiently in practice and yields good reconstructions, as we show on several synthetic and real datasets.

翻訳日:2023-04-05 02:38:25 公開日:2023-04-03

# 中心対称性を持つ形状不変ポテンシャルの統一スキーム

A Unified Scheme of Shape Invariant Potentials with Central Symmetry ( http://arxiv.org/abs/2001.02068v3 )

ライセンス: Link先を確認

Taha Koohrokhi and Abdolmajid Izadpanah and Mitra Gerayloo

(参考訳) 古典的あるいは量子力学的にせよ、ほとんどの物理系は球対称である。保存量として、粒子が中心力の場を移動するとき、遠心ポテンシャルの数値に角運動量が現れる。本研究は、解ける中心ポテンシャルを1つの超ポテンシャルに統一する統一因子の役割を角運動量が果たすフォーマリズムを導入する。特定の$\ell$と$r$の依存関係に基づいて、超ポテンシャルは3次元高調波発振器(3-DHO)、クーロン(Culomb)、逆向きの3DHO電位などの形状不変ポテンシャルの集合を生成する。任意の$D$次元への P\"{o}schl-Teller ポテンシャルの一般化も導出され、これを "central P\"{o}schl-Teller" と呼び、その階層について論じる。さらに、超対称性が破られたり破られたりした条件を決定するための超ポテンシャルの性質についても論じる。驚くべきことに、統一されたスキームは、2つの荷電粒子(クーロン)と2つの核子(核)結合系を同じ枠組みで解明することができる。最終的に、この形式主義は重陽子に対する新しい効果的なポテンシャルを特定するために適用される。

Most physical systems, whether classical or quantum mechanical, are subjected to spherical symmetry. As a conserved quantity, angular momentum appears in the numerator of centrifugal potential when a particle moves in the field of a central force. The present work introduces a formalism in which angular momentum plays a unifying factor role that unifies solvable central potentials into one superpotential. Based on particular $\ell$ and $r$ dependencies, the superpotential generates a set of shape invariant potentials, such as the 3-dimensional harmonic oscillator (3-DHO), Coulomb, and upside-down 3-DHO potentials. A generalization of the P\"{o}schl-Teller potential to an arbitrary $D$ dimension is also derived, which we called "central P\"{o}schl-Teller", and its hierarchy is discussed. Furthermore, we discuss properties of the superpotential to determine conditions supersymmetry is broken or unbroken. Surprisingly, the unified scheme is also able to elucidate the two charged particles (Coulomb) as well as the two-nucleon (nuclear) bound systems in the same framework. Ultimately, this formalism is applied to specify a new effective potential for deuteron.

翻訳日:2023-04-05 02:37:06 公開日:2023-04-03

# バッチ非同期確率近似の収束と強化学習への応用

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning ( http://arxiv.org/abs/2109.03445v4 )

ライセンス: Link先を確認

Rajeeva L. Karandikar and M. Vidyasagar

(参考訳) 確率近似(英: stochastic approximation、SA)アルゴリズムは、関数のノイズ測定のみが利用可能であるとき、ベクトル値の試行の零点または定点を見つけるために広く用いられる確率的手法である。これまでの文献では、‘synchronous’ の更新と、現在の推測のすべてのコンポーネントが毎回更新される `synchronous'' の更新とを区別し、1つのコンポーネントだけが更新される。本稿では,現在推定されている解のコンポーネントを瞬時に更新する,‘batch asynchronous stochastic approximation’(basa)と呼ばれる中間状態について検討する。 BASAにより、ユーザーはメモリ要件を時間的複雑さと引き換えることができる。このようなアルゴリズムが研究中の写像の不動点に収束することを示す一般的な方法を開発した。これらの収束証明は、既存の結果よりも弱い仮説を用いる。具体的には、既存の収束証明は、測定ノイズがゼロ平均i.i.d\列またはマルティンゲール差分列である必要がある。本稿では,非ゼロ条件平均の計測ノイズについて,偏りの測定を許可する。また、すべての収束結果は、確率的ステップサイズがよく知られたロビンズ・モンロ条件の確率的類似性を満たすと仮定している。この仮定を,マルコフ過程の既約性に関する純粋決定論的条件に置き換える。 Reinforcement Learning への具体的な応用として、時間差分アルゴリズム $TD(\lambda)$ for value iteration と、最適なアクション値関数を見つけるための $Q$-learning アルゴリズムを解析する。どちらの場合も、既存の文献よりも穏やかな条件下でこれらのアルゴリズムの収束を確立する。

The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a zero or a fixed point of a vector-valued funtion, when only noisy measurements of the function are available. In the literature to date, one makes a distinction between ``synchronous'' updating, whereby every component of the current guess is updated at each time, and ``asynchronous'' updating, whereby only one component is updated. In this paper, we study an intermediate situation that we call ``batch asynchronous stochastic approximation'' (BASA), in which, at each time instant, \textit{some but not all} components of the current estimated solution are updated. BASA allows the user to trade off memory requirements against time complexity. We develop a general methodology for proving that such algorithms converge to the fixed point of the map under study. These convergence proofs make use of weaker hypotheses than existing results. Specifically, existing convergence proofs require that the measurement noise is a zero-mean i.i.d\ sequence or a martingale difference sequence. In the present paper, we permit biased measurements, that is, measurement noises that have nonzero conditional mean. Also, all convergence results to date assume that the stochastic step sizes satisfy a probabilistic analog of the well-known Robbins-Monro conditions. We replace this assumption by a purely deterministic condition on the irreducibility of the underlying Markov processes. As specific applications to Reinforcement Learning, we analyze the temporal difference algorithm $TD(\lambda)$ for value iteration, and the $Q$-learning algorithm for finding the optimal action-value function. In both cases, we establish the convergence of these algorithms, under milder conditions than in the existing literature.

翻訳日:2023-04-05 02:02:32 公開日:2023-04-03

# 心臓血管疾患に対するAIを用いた大動脈血管木切開術

AI-based Aortic Vessel Tree Segmentation for Cardiovascular Diseases Treatment: Status Quo ( http://arxiv.org/abs/2108.02998v2 )

ライセンス: Link先を確認

Yuan Jin, Antonio Pepe, Jianning Li, Christina Gsaxner, Fen-hua Zhao, Kelsey L. Pomykala, Jens Kleesiek, Alejandro F. Frangi, Jan Egger

(参考訳) 大動脈管木は大動脈とその分岐動脈から構成され、全身に血液を供給する上で重要な役割を果たす。動脈瘤や解離などの大動脈疾患は大動脈破裂を引き起こすことがあるが、開腹手術による治療は非常に危険である。したがって、患者は、画像による血管の定期的な検査を必要とする定常的な監視の下で、一般的に薬物治療を受ける。診断とモニタリングのための標準的な画像モダリティはCT(CT)であり、CT血管造影(CT angiography)と呼ばれる造影剤で完成すれば、大動脈とその分岐血管の詳細な画像を提供することができる。最適に、連続ctasからの大動脈血管ツリー全体形状をオーバーレイ比較する。これにより、大動脈の変化を検出するだけでなく、一次病理学や新規に開発された枝も検出できる。この再建には、手作業で行う場合、スライスをスライスする作業が必要であり、1本の大動脈管木で一日を要し、臨床での使用は不可能である。しかし、自動または半自動の容器木分割アルゴリズムは、このタスクを手動実行時間のごく一部で完了し、臨床医の臨床ルーチンと並行して実行することができる。本稿では,大動脈管ツリーの自動的および半自動的なセグメンテーションのための計算手法を体系的に検討する。このレビューは、これらの最先端のアプローチが臨床実践への応用にどの程度近いか、そしてこの研究分野がどれほど活発であるかについて、出版物、データセット、課題の数を考慮して詳細に議論することで締めくくくっている。

The aortic vessel tree is composed of the aorta and its branching arteries, and plays a key role in supplying the whole body with blood. Aortic diseases, like aneurysms or dissections, can lead to an aortic rupture, whose treatment with open surgery is highly risky. Therefore, patients commonly undergo drug treatment under constant monitoring, which requires regular inspections of the vessels through imaging. The standard imaging modality for diagnosis and monitoring is computed tomography (CT), which can provide a detailed picture of the aorta and its branching vessels if completed with a contrast agent, called CT angiography (CTA). Optimally, the whole aortic vessel tree geometry from consecutive CTAs is overlaid and compared. This allows not only detection of changes in the aorta, but also of its branches, caused by the primary pathology or newly developed. When performed manually, this reconstruction requires slice by slice contouring, which could easily take a whole day for a single aortic vessel tree, and is therefore not feasible in clinical practice. Automatic or semi-automatic vessel tree segmentation algorithms, however, can complete this task in a fraction of the manual execution time and run in parallel to the clinical routine of the clinicians. In this paper, we systematically review computing techniques for the automatic and semi-automatic segmentation of the aortic vessel tree. The review concludes with an in-depth discussion on how close these state-of-the-art approaches are to an application in clinical practice and how active this research field is, taking into account the number of publications, datasets and challenges.

翻訳日:2023-04-05 02:02:03 公開日:2023-04-03

# fl-market: 連合学習におけるプライベートモデル取引

FL-Market: Trading Private Models in Federated Learning ( http://arxiv.org/abs/2106.04384v4 )

ライセンス: Link先を確認

Shuyuan Zheng, Yang Cao, Masatoshi Yoshikawa, Huizhong Li, Qiang Yan

(参考訳) 十分な量のトレーニングデータを取得することの難しさは、機械学習(ML)ベースのデータ分析の大きなボトルネックである。近年、ML指向データ取得の経済的かつ適度なソリューションとして、MLモデルのコモディティ化が提案されている。しかし、既存のモデルマーケットプレイスでは、ブローカーがデータ所有者のプライベートトレーニングデータにアクセスできると仮定している。本稿では,MLタスクに対する信頼性の高いデータ取得を促進するために,モデル購入者だけでなく,信頼できないブローカーに対してプライバシを保護するローカルプライベートモデルマーケットプレースであるFL-Marketを提案する。 fl-marketは、データオーナがローカル勾配をアップロードしてmlモデルを協調的にトレーニングする、新たなプライバシ保存型mlパラダイムであるfederated learningを使用して、ブローカ側でトレーニングデータを集中的に収集する必要性からmlを分離する(モデル更新のためにグローバル勾配に集約される)。そして、fl-marketは、データ所有者がローカルなディファレンシャルプライバシによって勾配を局所的に摂動させることを可能にし、プライバシーリスクをさらに防ぐ。 FL-Marketを駆動するために,局所勾配の摂動レベルをインテリジェントに決定する深層学習型オークション機構と,摂動勾配を集約する最適集約機構を提案する。当社のオークションとアグリゲーション機構は,モデル購入者の実用性を最適化するグローバルグラデーションの精度を共同で最大化することができる。提案手法の有効性を検証する実験を行った。

The difficulty in acquiring a sufficient amount of training data is a major bottleneck for machine learning (ML) based data analytics. Recently, commoditizing ML models has been proposed as an economical and moderate solution to ML-oriented data acquisition. However, existing model marketplaces assume that the broker can access data owners' private training data, which may not be realistic in practice. In this paper, to promote trustworthy data acquisition for ML tasks, we propose FL-Market, a locally private model marketplace that protects privacy not only against model buyers but also against the untrusted broker. FL-Market decouples ML from the need to centrally gather training data on the broker's side using federated learning, an emerging privacy-preserving ML paradigm in which data owners collaboratively train an ML model by uploading local gradients (to be aggregated into a global gradient for model updating). Then, FL-Market enables data owners to locally perturb their gradients by local differential privacy and thus further prevents privacy risks. To drive FL-Market, we propose a deep learning-empowered auction mechanism for intelligently deciding the local gradients' perturbation levels and an optimal aggregation mechanism for aggregating the perturbed gradients. Our auction and aggregation mechanisms can jointly maximize the global gradient's accuracy, which optimizes model buyers' utility. Our experiments verify the effectiveness of the proposed mechanisms.

翻訳日:2023-04-05 02:00:40 公開日:2023-04-03

# DNNにおけるスパース概念の創発的定義と定量化

Defining and Quantifying the Emergence of Sparse Concepts in DNNs ( http://arxiv.org/abs/2111.06206v6 )

ライセンス: Link先を確認

Jie Ren, Mingjie Li, Qirui Chen, Huiqi Deng, Quanshi Zhang

(参考訳) 本稿では,DNNの学習における概念創出現象を説明することを目的とする。具体的には、DNNの推論スコアを、いくつかのインタラクティブな概念の影響に結びつけることができる。これらの概念は、DNNの説明である疎いシンボリック因果グラフの因果パターンとして理解することができる。このような因果グラフを用いてdnnを説明する忠実性は理論的に保証される。なぜなら、因果グラフは指数関数的な数の異なるマスク標本上のdnnの出力をうまく模倣できるからである。さらに、そのような因果グラフは、多くの説明精度を失うことなく、さらに単純化され、And-Orグラフ(AOG)として書き直される。

This paper aims to illustrate the concept-emerging phenomenon in a trained DNN. Specifically, we find that the inference score of a DNN can be disentangled into the effects of a few interactive concepts. These concepts can be understood as causal patterns in a sparse, symbolic causal graph, which explains the DNN. The faithfulness of using such a causal graph to explain the DNN is theoretically guaranteed, because we prove that the causal graph can well mimic the DNN's outputs on an exponential number of different masked samples. Besides, such a causal graph can be further simplified and re-written as an And-Or graph (AOG), without losing much explanation accuracy.

翻訳日:2023-04-05 01:53:22 公開日:2023-04-03

# 公平性に配慮した連合学習に向けて

Towards Fairness-Aware Federated Learning ( http://arxiv.org/abs/2111.01872v3 )

ライセンス: Link先を確認

Yuxin Shi, Han Yu, Cyril Leung

(参考訳) フェデレーション学習(fl)の最近の進歩は、パフォーマンスとデータのプライバシの保証を備えた大規模分散クライアントに、大規模な機械学習の機会をもたらした。しかし、現在のほとんどの作品は、flにおけるセントラルコントローラの関心に焦点をあて、flクライアントの利益を見落としている。これは、学習プロセスに積極的に参加することを妨げ、flエコシステムの持続性を損なうクライアントの不公平な扱いにつながる可能性がある。したがって、flにおける公平性を確保するという話題は、多くの研究の関心を集めている。近年、異なる視点からflの公平性を達成するために、多様な公正性認識fl(fafl)アプローチが提案されている。しかし、この学際分野に対する読者の洞察を得るための総合的な調査は行われていない。本稿ではそのような調査を行うことを目的とする。本研究は,本分野において既存文献で採用されている公正性の概念と基本的かつ単純化された仮定を考察し,クライアント選択,最適化,貢献評価,インセンティブ分布など,FLの主要なステップをカバーするFAFLアプローチの分類法を提案する。さらに,FAFLアプローチの性能を実験的に評価するための主要な指標について考察し,今後のFAFL研究の方向性を示唆する。

Recent advances in Federated Learning (FL) have brought large-scale collaborative machine learning opportunities for massively distributed clients with performance and data privacy guarantees. However, most current works focus on the interest of the central controller in FL,and overlook the interests of the FL clients. This may result in unfair treatment of clients that discourages them from actively participating in the learning process and damages the sustainability of the FL ecosystem. Therefore, the topic of ensuring fairness in FL is attracting a great deal of research interest. In recent years, diverse Fairness-Aware FL (FAFL) approaches have been proposed in an effort to achieve fairness in FL from different perspectives. However, there is no comprehensive survey that helps readers gain insight into this interdisciplinary field. This paper aims to provide such a survey. By examining the fundamental and simplifying assumptions, as well as the notions of fairness adopted by existing literature in this field, we propose a taxonomy of FAFL approaches covering major steps in FL, including client selection, optimization, contribution evaluation and incentive distribution. In addition, we discuss the main metrics for experimentally evaluating the performance of FAFL approaches, and suggest promising future research directions towards FAFL.

翻訳日:2023-04-05 01:52:24 公開日:2023-04-03

# 前向きSDE理論を用いたSchr\"odinger Bridgeの模擬訓練

Likelihood Training of Schr\"odinger Bridge using Forward-Backward SDEs Theory ( http://arxiv.org/abs/2110.11291v5 )

ライセンス: Link先を確認

Tianrong Chen, Guan-Horng Liu, Evangelos A. Theodorou

(参考訳) Schr\"odinger Bridge (SB) はエントロピー規則化された最適輸送問題であり、Scored-based Generative Model (SGM) と比較して、その数学的柔軟性のために深部生成モデルに注目が集まっている。しかし、SBの最適化原理が、ログライクな目的の構築にしばしば依存する深層生成モデルの近代的な訓練と関係しているかどうかは不明であり、このことは、生成的応用の原則的な代替としてSBモデルの適合性に関する疑問を提起する。本稿では,SBの最適条件を一組のSDEに変換する確率的最適制御に現れる数学的方法論として,前向き確率微分方程式理論に基づくSBモデルの確率的トレーニングのための新しい計算フレームワークを提案する。重要なことに、これらのSDEはSBの潜在的目的を構築するために使用することができ、驚くべきことに、SGMの目的を特別なケースとして一般化することができる。これにより、現代の生成訓練技術の応用を損なうことなく、sbの最適性を継承する新しい最適化原理が導かれるとともに、mnist、celeba、cifar10上の現実的な画像を生成するのに匹敵する結果が得られることを示した。私たちのコードはhttps://github.com/ghliu/SB-FBSDE.comで利用可能です。

Schr\"odinger Bridge (SB) is an entropy-regularized optimal transport problem that has received increasing attention in deep generative modeling for its mathematical flexibility compared to the Scored-based Generative Model (SGM). However, it remains unclear whether the optimization principle of SB relates to the modern training of deep generative models, which often rely on constructing log-likelihood objectives.This raises questions on the suitability of SB models as a principled alternative for generative applications. In this work, we present a novel computational framework for likelihood training of SB models grounded on Forward-Backward Stochastic Differential Equations Theory - a mathematical methodology appeared in stochastic optimal control that transforms the optimality condition of SB into a set of SDEs. Crucially, these SDEs can be used to construct the likelihood objectives for SB that, surprisingly, generalizes the ones for SGM as special cases. This leads to a new optimization principle that inherits the same SB optimality yet without losing applications of modern generative training techniques, and we show that the resulting training algorithm achieves comparable results on generating realistic images on MNIST, CelebA, and CIFAR10. Our code is available at https://github.com/ghliu/SB-FBSDE.

翻訳日:2023-04-05 01:52:03 公開日:2023-04-03

# 大規模並列ベイズ最適化へのポートフォリオアプローチ

A portfolio approach to massively parallel Bayesian optimization ( http://arxiv.org/abs/2110.09334v2 )

ライセンス: Link先を確認

Mickael Binois (ACUMES, JAD), Nicholson Collier (ANL), Jonathan Ozik (ANL)

(参考訳) 最適化研究の実施時間を短縮する一つの方法は、一度に1回ではなく、並列に設計を評価することである。高価な評価ブラックボックスでは、ベイズ最適化のバッチバージョンが提案されている。彼らはブラックボックスのサロゲートモデルを構築し、インフィル基準によって複数のデザインを同時に選択する。それでも、大規模並列性を実現するコンピューティングリソースの可用性は高まっているが、数桁の並列設計を選択して評価を行う戦略は、より多くの設計を選択する複雑さのために制限される。ブラックボックスがうるさい場合にはさらに重要であり、より多くの評価と繰り返しの実験が必要である。ここでは,大規模なバッチ処理をネイティブに処理し,探索/探索のトレードオフとポートフォリオ割り当てに着目したスケーラブルな戦略を提案する。単目的および多目的最適化タスクにおいて,ノイズ関数に関する関連手法との比較を行った。これらの実験は、類似またはより良い性能を持つ既存手法よりも桁違いの速度向上を示す。

One way to reduce the time of conducting optimization studies is to evaluate designs in parallel rather than just one-at-a-time. For expensive-to-evaluate black-boxes, batch versions of Bayesian optimization have been proposed. They work by building a surrogate model of the black-box to simultaneously select multiple designs via an infill criterion. Still, despite the increased availability of computing resources that enable large-scale parallelism, the strategies that work for selecting a few tens of parallel designs for evaluations become limiting due to the complexity of selecting more designs. It is even more crucial when the black-box is noisy, necessitating more evaluations as well as repeating experiments. Here we propose a scalable strategy that can keep up with massive batching natively, focused on the exploration/exploitation trade-off and a portfolio allocation. We compare the approach with related methods on noisy functions, for mono and multi-objective optimization tasks. These experiments show orders of magnitude speed improvements over existing methods with similar or better performance.

翻訳日:2023-04-05 01:51:42 公開日:2023-04-03

# OpenFed: 包括的でVersatileなオープンソースフェデレーションラーニングフレームワーク

OpenFed: A Comprehensive and Versatile Open-Source Federated Learning Framework ( http://arxiv.org/abs/2109.07852v3 )

ライセンス: Link先を確認

Dengsheng Chen, Vince Tan, Zhilin Lu and Jie Hu

(参考訳) 近年の人工知能技術の発展により、商業的・工業的な場面で応用が成功している。しかし、これらの技術は大量のデータを集中的に集約し、データの機密性やデータ転送コストが禁じられるシナリオに適用性を高める必要がある。フェデレーション学習は、モデルトレーニングの分散化によってこれらの問題を緩和し、データ転送と集約の必要性をなくす。連合学習の採用を進めるためには、いくつかの重要なオープン問題に対処するために、さらなる研究と開発が必要である。本研究では,エンドツーエンドのフェデレート学習のためのオープンソースソフトウェアフレームワークであるOpenFedを提案する。 OpenFedは、既存の痛点を標的に除去することで、フェデレートラーニングの研究者と下流ユーザーの両方の参入障壁を減らす。研究者にとって、OpenFedは、広範なベンチマークスイートに対して、新しいメソッドを簡単に実装し、かなり評価できるフレームワークを提供する。 openfedは,ダウンストリームユーザに対して,さまざまなサブジェクトマッターコンテキスト内でフェデレーション学習をプラグインしてプレイ可能にすることで,フェデレーション学習における深い専門知識の必要性をなくす。

Recent developments in Artificial Intelligence techniques have enabled their successful application across a spectrum of commercial and industrial settings. However, these techniques require large volumes of data to be aggregated in a centralized manner, forestalling their applicability to scenarios wherein the data is sensitive or the cost of data transmission is prohibitive. Federated Learning alleviates these problems by decentralizing model training, thereby removing the need for data transfer and aggregation. To advance the adoption of Federated Learning, more research and development needs to be conducted to address some important open questions. In this work, we propose OpenFed, an open-source software framework for end-to-end Federated Learning. OpenFed reduces the barrier to entry for both researchers and downstream users of Federated Learning by the targeted removal of existing pain points. For researchers, OpenFed provides a framework wherein new methods can be easily implemented and fairly evaluated against an extensive suite of benchmarks. For downstream users, OpenFed allows Federated Learning to be plugged and play within different subject-matter contexts, removing the need for deep expertise in Federated Learning.

翻訳日:2023-04-05 01:51:07 公開日:2023-04-03

# 最適化における非同期イテレーション:新しいシーケンス結果とシャーパアルゴリズム保証

Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees ( http://arxiv.org/abs/2109.04522v2 )

ライセンス: Link先を確認

Hamid Reza Feyzmahdavian and Mikael Johansson

(参考訳) 本稿では並列および分散最適化アルゴリズムの解析に現れる非同期反復に対する新しい収束結果を提案する。結果は簡単に適用でき、非同期度が反復の収束率にどのように影響するかを明確に見積もることができる。その結果,既存の非同期最適化手法の収束証明の短縮,合理化,強化が可能となり,これまで完全に理論的理解を欠いていた一般的なアルゴリズムに対する収束保証を確立することができた。 Specifically, we use our results to derive better iteration complexity bounds for proximal incremental aggregated gradient methods, to obtain tighter guarantees depending on the average rather than maximum delay for the asynchronous stochastic gradient descent method, to provide less conservative analyses of the speedup conditions for asynchronous block-coordinate implementations of Krasnoselskii-Mann iterations, and to quantify the convergence rates for totally asynchronous iterations under various assumptions on communication delays and update rates.

We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates. Our results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization methods and allow us to establish convergence guarantees for popular algorithms that were thus far lacking a complete theoretical understanding. Specifically, we use our results to derive better iteration complexity bounds for proximal incremental aggregated gradient methods, to obtain tighter guarantees depending on the average rather than maximum delay for the asynchronous stochastic gradient descent method, to provide less conservative analyses of the speedup conditions for asynchronous block-coordinate implementations of Krasnoselskii-Mann iterations, and to quantify the convergence rates for totally asynchronous iterations under various assumptions on communication delays and update rates.

翻訳日:2023-04-05 01:50:29 公開日:2023-04-03

# 単語の「エゴネットワーク」における構造的不変性と意味的指紋

Structural invariants and semantic fingerprints in the "ego network" of words ( http://arxiv.org/abs/2203.00588v2 )

ライセンス: Link先を確認

Kilian Ollivier and Chiara Boldrini and Andrea Passarella and Marco Conti

(参考訳) 人類学的に確立された認知モデルは、社会的相互作用の「バンド幅」を制限する認知的制約のため、人間は通常の構造に従って社会的関係を組織することを示した。本研究では,言語生産など他の認知過程に類似した規則性が存在することを仮定する。この主張を調査するために、Twitterユーザ(正規ユーザとプロのライター)の不均一なグループのつぶやきを含むデータセットを分析した。確立された社会的認知の制約を明らかにするために用いられる方法論に類似した手法を利用することで、構造的および意味的両方のレベルで規則性を見出す。前者では、同心的な階層構造(言葉のエゴネットワーク、社会関係のエゴネットワークと類似)が、個人が使用する単語をどう整理するかをうまく捉えている。この構造内の層の大きさは、外向きに移動すると定期的に増加し(前回に比べて約2〜3倍)、2つの垂直な外部層は、ユーザの総層数に関係なく、使用語の約60%と30%を一貫して占める。意味分析のために、各egoネットワークの各リングは、そのリング内の単語に関連するトピックをキャプチャするセマンティックプロファイルによって記述される。環 #1 がモデルに特別な役割を果たすことが分かる。意味的に最も異なっており、環の中でも最も多様である。また、最内側のリングにおいて重要なトピックは、他のリングとエゴネットワーク全体において、それぞれに支配的な特徴を持つことも示している。この点において、環 #1 は単語の ego ネットワークの意味的指紋と見なすことができる。

Well-established cognitive models coming from anthropology have shown that, due to the cognitive constraints that limit our "bandwidth" for social interactions, humans organize their social relations according to a regular structure. In this work, we postulate that similar regularities can be found in other cognitive processes, such as those involving language production. In order to investigate this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). Leveraging a methodology similar to the one used to uncover the well-established social cognitive constraints, we find regularities at both the structural and semantic level. At the former, we find that a concentric layered structure (which we call ego network of words, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use. The size of the layers in this structure regularly grows (approximately 2-3 times with respect to the previous one) when moving outwards, and the two penultimate external layers consistently account for approximately 60% and 30% of the used words, irrespective of the number of the total number of layers of the user. For the semantic analysis, each ring of each ego network is described by a semantic profile, which captures the topics associated with the words in the ring. We find that ring #1 has a special role in the model. It is semantically the most dissimilar and the most diverse among the rings. We also show that the topics that are important in the innermost ring also have the characteristic of being predominant in each of the other rings, as well as in the entire ego network. In this respect, ring #1 can be seen as the semantic fingerprint of the ego network of words.

翻訳日:2023-04-05 01:44:12 公開日:2023-04-03

# ディープリニアネットワークの厳密解

Exact Solutions of a Deep Linear Network ( http://arxiv.org/abs/2202.04777v6 )

ライセンス: Link先を確認

Liu Ziyin, Botao Li, Xiangming Meng

(参考訳) この研究は、ニューラルネットワークの風景を理解するための基礎モデルである、重崩壊と確率ニューロンを持つディープ線形ネットワークの大域的ミニマの解析的表現を発見する。その結果、ゼロはディープニューラルネットワークアーキテクチャの特別なポイントであることがわかった。重みの減衰はモデルアーキテクチャと強く相互作用し、わずか1ドルの隠れ層しか持たないネットワークと質的に異なる1ドル以上の隠れ層を持つネットワークにおいて、ゼロで悪いミニマを生成できることを示します。その結果,一般的なディープラーニング初期化手法では,ニューラルネットワークの最適化が容易でないことがわかった。

This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that zero is a special point in deep neural network architecture. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer, qualitatively different from a network with only $1$ hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.

翻訳日:2023-04-05 01:43:46 公開日:2023-04-03

# コンパクト性スコア:教師なし特徴選択のための高速フィルタ法

Compactness Score: A Fast Filter Method for Unsupervised Feature Selection ( http://arxiv.org/abs/2201.13194v3 )

ライセンス: Link先を確認

Peican Zhu, Xin Hou, Keke Tang, Zhen Wang, Feiping Nie

(参考訳) 情報時代の繁栄とともに、大量のデータが日々生成される。これらのデータの大規模かつ高次元的な特性のため、実用的なアプリケーションにおいてより良い意思決定をすることがしばしば困難である。そのため,効率的なビッグデータ分析手法が必要である。特徴工学においては、特徴選択は、候補から優れた特徴を選択することが期待される重要な研究内容であると考えられる。次元の縮小、モデル効果の改善、モデル性能の向上など、機能選択によって異なる機能を実現することができる。多くの分類タスクにおいて、研究者は、同じクラスに属している場合、データが互いに近接しているように見えるので、局所的コンパクト性は特徴を評価する上で非常に重要であることを発見した。本稿では,CSUFS (Compactness Score) と呼ばれる高速な教師なし特徴選択手法を提案する。効率と精度を示すために、広範囲な実験を行い、いくつかのデータセットが選択される。その後,クラスタリングタスクに対処し,提案手法の有効性と優位性を明らかにする。ここで、パフォーマンスはいくつかのよく知られた評価指標で示され、効率は対応する実行時間によって反映される。シミュレーション結果から明らかになったように,提案アルゴリズムは既存のアルゴリズムよりも正確かつ効率的であると考えられる。

Along with the flourish of the information age, massive amounts of data are generated day by day. Due to the large-scale and high-dimensional characteristics of these data, it is often difficult to achieve better decision-making in practical applications. Therefore, an efficient big data analytics method is urgently needed. For feature engineering, feature selection seems to be an important research content in which is anticipated to select "excellent" features from candidate ones. Different functions can be realized through feature selection, such as dimensionality reduction, model effect improvement, and model performance improvement. In many classification tasks, researchers found that data seem to be usually close to each other if they are from the same class; thus, local compactness is of great importance for the evaluation of a feature. In this manuscript, we propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS), to select desired features. To demonstrate the efficiency and accuracy, several data sets are chosen with extensive experiments being performed. Later, the effectiveness and superiority of our method are revealed through addressing clustering tasks. Here, the performance is indicated by several well-known evaluation metrics, while the efficiency is reflected by the corresponding running time. As revealed by the simulation results, our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.

翻訳日:2023-04-05 01:43:23 公開日:2023-04-03

# 集約関数を用いた輸入則を満たすmiso階層型推論エンジン

MISO hierarchical inference engine satisfying the law of importation with aggregation functions ( http://arxiv.org/abs/2112.12808v4 )

ライセンス: Link先を確認

Dechao Li and Qiannan Guo

(参考訳) ファジィ推論エンジンはファジィ系の最も重要な構成要素の一つであり、ファジィ論理推論法を用いて入力空間上のファジィ集合とファジィ規則基底から有意義な出力を得ることができる。本稿では,多入出力ファジィシステムにおけるファジィ推論エンジンの計算効率を高めるために,集約関数(LIA)による輸入法則を満たすファジィ含意に基づく3つのMISOファジィ階層推論エンジンを主に検討することを目的とする。まず、よく知られたファジィ含意の集合関数を、それらが満足する(LIA)ように見つけ出す。そして、所定の集約関数に対して、この集約関数に満足するファジィ含意(LIA)を特徴付ける。最後に,上記の理論的展開を応用したmisoファジィシステムにおいて,ファジィ階層推論エンジンを3つ構成する。

Fuzzy inference engine, as one of the most important components of fuzzy systems, can obtain some meaningful outputs from fuzzy sets on input space and fuzzy rule base using fuzzy logic inference methods. In order to enhance the computational efficiency of fuzzy inference engine in multi-input-single-output(MISO) fuzzy systems,this paper aims mainly to investigate three MISO fuzzy hierarchial inference engines based on fuzzy implications satisfying the law of importation with aggregation functions (LIA). We firstly find some aggregation functions for well-known fuzzy implications such that they satisfy (LIA). For a given aggregation function, the fuzzy implication which satisfies (LIA) with this aggregation function is then characterized. Finally, we construct three fuzzy hierarchical inference engines in MISO fuzzy systems applying aforementioned theoretical developments.

翻訳日:2023-04-05 01:42:40 公開日:2023-04-03

# banmo: カジュアルなビデオから3dニューラルモデルを作る

BANMo: Building Animatable 3D Neural Models from Many Casual Videos ( http://arxiv.org/abs/2112.12761v3 )

ライセンス: Link先を確認

Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo

(参考訳) 関節型3d形状再構成の作業は、しばしば特殊なセンサー(例えば、同期マルチカメラシステム)や、事前構築された3d変形可能なモデル(例えば、smalやsmpl)に依存する。このようなメソッドは、野生のさまざまなオブジェクトセットにスケールできない。本稿では,特殊なセンサや事前定義されたテンプレート形状を必要としないBANMoを提案する。 BANMoは、多くのモノクロカジュアルビデオから高忠実な3Dモデル(形状とアニマタブルなスキンウェイトを含む)を、異なるレンダリングフレームワークで構築する。多くのビデオを使用することで、カメラのビューやオブジェクトの調音をより広範にカバーできる一方で、背景や照明条件の異なるシーン間での対応を確立する上での重要な課題がもたらされる。我々は,(1)関節骨とブレンドスキンを用いた古典的変形可能な形状モデル,(2)勾配に基づく最適化に寄与する体積神経放射場(NeRF),(3)ピクセルと関節モデルとの対応を生成する正準埋め込みの3つの学派を融合させることを考察した。ニューラルブレンドスキンモデルを導入し, 可微分変形と可逆変形を可能にした。標準埋め込みと組み合わせることで、サイクル整合性で自己教師できるビデオ間の密接な対応を確立することができる。リアルと合成のデータセットでは、BANMoは人間や動物の以前の作品よりも忠実な3D再構成を示しており、新しい視点やポーズからリアルな画像をレンダリングすることができる。プロジェクトWebページ: banmo-www.github.io

Prior work for articulated 3D shape reconstruction often relies on specialized sensors (e.g., synchronized multi-camera systems), or pre-built 3D deformable models (e.g., SMAL or SMPL). Such methods are not able to scale to diverse sets of objects in the wild. We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape. BANMo builds high-fidelity, articulated 3D models (including shape and animatable skinning weights) from many monocular casual videos in a differentiable rendering framework. While the use of many videos provides more coverage of camera views and object articulations, they introduce significant challenges in establishing correspondence across scenes with different backgrounds, illumination conditions, etc. Our key insight is to merge three schools of thought; (1) classic deformable shape models that make use of articulated bones and blend skinning, (2) volumetric neural radiance fields (NeRFs) that are amenable to gradient-based optimization, and (3) canonical embeddings that generate correspondences between pixels and an articulated model. We introduce neural blend skinning models that allow for differentiable and invertible articulated deformations. When combined with canonical embeddings, such models allow us to establish dense correspondences across videos that can be self-supervised with cycle consistency. On real and synthetic datasets, BANMo shows higher-fidelity 3D reconstructions than prior works for humans and animals, with the ability to render realistic images from novel viewpoints and poses. Project webpage: banmo-www.github.io .

翻訳日:2023-04-05 01:42:24 公開日:2023-04-03

# 因果モデルを用いた機械学習同定バイオマーカーの一般化:免疫受容体診断法の検討

Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics ( http://arxiv.org/abs/2204.09291v2 )

ライセンス: Link先を確認

Milena Pavlovi\'c, Ghadi S. Al Hajj, Chakravarthi Kanduri, Johan Pensar, Mollie Wood, Ludvig M. Sollid, Victor Greiff, Geir Kjetil Sandve

(参考訳) 機械学習は、高次元の分子データから診断と予後のバイオマーカーを発見するためにますます使われている。しかしながら、実験設計に関連するさまざまな要因が、一般化可能な臨床応用診断の学習能力に影響を与える可能性がある。ここでは,因果的視点がこれらの課題の同定を改善し,機械学習に基づく診断の堅牢性と一般化との関係を定式化する。具体的には,最近確立された高次元バイオマーカーであるadaptive immune receptor repertoires (airrs) に注目した。シミュレーションにより、airrドメインの生物学的および実験的要因が学習バイオマーカーにどのように影響するかを示す。結論として, 因果モデリングは, 変数間の安定な関係を同定し, 個体群間で変化する関係と変数の調整を導くことにより, 機械学習に基づくバイオマーカーのロバスト性を向上させる。

Machine learning is increasingly used to discover diagnostic and prognostic biomarkers from high-dimensional molecular data. However, a variety of factors related to experimental design may affect the ability to learn generalizable and clinically applicable diagnostics. Here, we argue that a causal perspective improves the identification of these challenges and formalizes their relation to the robustness and generalization of machine learning-based diagnostics. To make for a concrete discussion, we focus on a specific, recently established high-dimensional biomarker - adaptive immune receptor repertoires (AIRRs). Through simulations, we illustrate how major biological and experimental factors of the AIRR domain may influence the learned biomarkers. In conclusion, we argue that causal modeling improves machine learning-based biomarker robustness by identifying stable relations between variables and by guiding the adjustment of the relations and variables that vary between populations.

翻訳日:2023-04-05 01:35:16 公開日:2023-04-03

# ベイジアンイメージングのための条件付きインジェクティブフロー

Conditional Injective Flows for Bayesian Imaging ( http://arxiv.org/abs/2204.07664v3 )

ライセンス: Link先を確認

AmirEhsan Khorashadizadeh, Konik Kothari, Leonardo Salsi, Ali Aghababaei Harandi, Maarten de Hoop, Ivan Dokmani\'c

(参考訳) 計算画像のためのほとんどのディープラーニングモデルは、単一の再構成されたイメージを回帰する。しかし、実際には、不合理性、非線形性、モデルミスマッチ、ノイズはしばしばそのような推定を誤解させるか、あるいは不十分にする。ベイズアプローチは、画像と(ノイズ)計測を共同分散ランダムベクトルとしてモデル化し、未知の後方分布を近似することを目的としている。条件付き正規化フローに基づく最近の変分推論手法は従来のMCMC法に代わる有望な代替手段であるが, 過大なメモリと高解像度画像に対する計算要求, ハード非線形問題に対する性能低下といった欠点が生じる。本研究では,画像問題に特化して設計された条件付きインジェクティブフローであるC-Trumpetsを提案する。インジェクティビティは、固定体積変化層やスキップ接続revnet層といったアーキテクチャ革新とともに、低次元潜在空間におけるメモリフットプリントとトレーニング時間を削減し、C-Trumpetsは、コンピュータとメモリの予算を低く抑えながら、様々な画像および画像復元タスクにおいて、通常の条件フローモデルより優れている。 c-trumpetsは、mmseやmapのような点推定の高速近似と、物理的に測定可能な不確実性定量化を可能にする。

Most deep learning models for computational imaging regress a single reconstructed image. In practice, however, ill-posedness, nonlinearity, model mismatch, and noise often conspire to make such point estimates misleading or insufficient. The Bayesian approach models images and (noisy) measurements as jointly distributed random vectors and aims to approximate the posterior distribution of unknowns. Recent variational inference methods based on conditional normalizing flows are a promising alternative to traditional MCMC methods, but they come with drawbacks: excessive memory and compute demands for moderate to high resolution images and underwhelming performance on hard nonlinear problems. In this work, we propose C-Trumpets -- conditional injective flows specifically designed for imaging problems, which greatly diminish these challenges. Injectivity reduces memory footprint and training time while low-dimensional latent space together with architectural innovations like fixed-volume-change layers and skip-connection revnet layers, C-Trumpets outperform regular conditional flow models on a variety of imaging and image restoration tasks, including limited-view CT and nonlinear inverse scattering, with a lower compute and memory budget. C-Trumpets enable fast approximation of point estimates like MMSE or MAP as well as physically-meaningful uncertainty quantification.

翻訳日:2023-04-05 01:34:59 公開日:2023-04-03

# ゼロショット質問生成による経路検索の改善

Improving Passage Retrieval with Zero-Shot Question Generation ( http://arxiv.org/abs/2204.07496v4 )

ライセンス: Link先を確認

Devendra Singh Sachan and Mike Lewis and Mandar Joshi and Armen Aghajanyan and Wen-tau Yih and Joelle Pineau and Luke Zettlemoyer

(参考訳) オープンな質問応答における経路検索を改善するための,単純かつ効果的な手法を提案する。再ランカは、学習済み言語モデルを用いて、検索されたパスに条件付けられた入力質問の確率を算出するゼロショット質問生成モデルを用いて、検索されたパスを再スコアする。このアプローチは、任意の検索方法(例えば、ニューラルネットワークやキーワードベース)の上に適用でき、ドメイン固有のトレーニングやタスク固有のトレーニングを必要としない(従って、データ分散シフトをより一般化することが期待されている)。複数のオープンドメイン検索データセットで評価すると,上位20項目の検索精度では,6%-18%の絶対および強い教師付きモデルによって,強い教師なし検索モデルが最大12%向上する。さらに,既存のモデルに新たな再ランク付けを追加するだけで,完全なオープンドメイン質問応答に関する新たな最新結果を得ることができた。

We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or keyword-based), does not require any domain- or task-specific training (and therefore is expected to generalize better to data distribution shifts), and provides rich cross-attention between query and passage (i.e. it must explain every token in the question). When evaluated on a number of open-domain retrieval datasets, our re-ranker improves strong unsupervised retrieval models by 6%-18% absolute and strong supervised models by up to 12% in terms of top-20 passage retrieval accuracy. We also obtain new state-of-the-art results on full open-domain question answering by simply adding the new re-ranker to existing models with no further changes.

翻訳日:2023-04-05 01:34:33 公開日:2023-04-03

# 不織布の曇り評価

Assessing cloudiness in nonwovens ( http://arxiv.org/abs/2204.06275v2 )

ライセンス: Link先を確認

Michael Godehardt and Ali Moghiseh and Christine Oetjen and Joachim Ohser and Simon Ringger and Katja Schladitz and Ingo Windschiegel

(参考訳) フィルター媒体の均質性は, 特定の重量(固有グラム)と局所重量分布とともに, 材料選択と品質管理に重要である。曇り (cloudiness) または形成 ( formation) は、フィルタ媒体における均質性からの逸脱を記述するために用いられる概念である。我々は,選択した周波数範囲に結合した相対的局所的アレルウェイトのパワースペクトルから曇り指数を求める。パワースペクトルは広いスペクトル範囲のエネルギー密度を捕捉する。さらに、ある条件下では、非織布の構造は、アレンジ重量、局所アレンジ重量のばらつき、パワースペクトルによって完全に特徴づけられる。したがって、パワースペクトルは、曇りを排他的に反映するパラメータである。ここでは,実用的応用から生じる課題について述べる。最も顕著なのはスペクトルバンドの選択である。それは確かに特徴的な「雲の大きさ」に依存するが、画像のサイズと横分解能によって制限される。本研究は, 相対的局所軸重みのパワースペクトルに基づく曇り指数が理論的に良好に確立され, 画像データから頑健に測定できることを示す。スペクトル帯を選択することで、視覚的に知覚されたり、製品特性に決定的であったりする曇りを捉えることができる。そのため、技術標準を構築するのに適している。

The homogeneity of filter media is important for material selection and quality control, along with the specific weight (nominal grammage) and the distribution of the local weight. Cloudiness or formation is a concept used to describe deviations from homogeneity in filter media. We suggest to derive the cloudiness index from the power spectrum of the relative local areal weight, integrated over a selected frequency range. The power spectrum captures the energy density in a broad spectral range. Moreover, under certain conditions, the structure of a nonwoven is fully characterized by the areal weight, the variance of the local areal weight, and the power spectrum. Consequently, the power spectrum is the parameter that exclusively reflects the cloudiness. Here, we address questions arising from practical application. The most prominent is the choice of the spectral band. It certainly depends on the characteristic "size of the clouds", but is limited by the size and lateral resolution of the images. We show that the cloudiness index based on the power spectrum of the relative local areal weight is theoretically well founded and can be robustly measured from image data. Choosing the spectral band allows to capture the cloudiness either visually perceived or found to be decisive for product properties. It is thus well suited to build a technical standard on it.

翻訳日:2023-04-05 01:33:54 公開日:2023-04-03

# 野生における効率的な舗装距離検出・認識のためのパッチラベル推論ネットワーク

Weakly Supervised Patch Label Inference Networks for Efficient Pavement Distress Detection and Recognition in the Wild ( http://arxiv.org/abs/2203.16782v2 )

ライセンス: Link先を確認

Sheng Huang and Wenhao Tang and Guixin Huang and Luwen Huangfu and Dan Yang

(参考訳) 自動的な画像ベース舗装災害検出と認識は、舗装維持と管理に不可欠である。しかし,既存のディープ・ラーニング・ベースの手法は,高精細度や低救難面積比などの舗装画像の特徴をほとんど省略しており,エンドツーエンドの訓練ができない。本稿では,Wakly Supervised Patch Label Inference Networks (WSPLIN) という,これらのタスクを様々なアプリケーション環境下で効率的に処理するための,シンプルで効果的なエンドツーエンドディープラーニング手法を提案する。 WSPLINは、完全に教師付き舗装画像分類問題を弱教師付き舗装画像分類問題に変換する。具体的には、WSPLINはまず異なるスケールの舗装画像を異なるコレクション戦略のパッチに分割し、次にパッチのラベルを推測するためにパッチラベル推論ネットワーク(PLIN)を使用し、解像度とスケール情報をフルに活用する。特に,難易度分布の事前知識に基づいてパッチラベルの空間性制約を設計し,包括的決定ネットワーク(CDN)を利用してPLINのトレーニングを弱教師付きで指導する。したがって、PLINが生成するパッチラベルは、粗い位置や苦痛の種類などの解釈可能な中間情報を提供する。 CQU-BPDDとCrack500-PDD(Crack500-PDD)データセットを新たに構築したCrack500-PDD(Crack500-PDD)データセットを用いて評価を行った。その結果,本手法は性能と効率の両方において,ベースラインよりも優れていることが示された。 WSPLINのソースコードはhttps://github.com/DearCaat/wsplin.comで公開されている。

Automatic image-based pavement distress detection and recognition are vital for pavement maintenance and management. However, existing deep learning-based methods largely omit the specific characteristics of pavement images, such as high image resolution and low distress area ratio, and are not end-to-end trainable. In this paper, we present a series of simple yet effective end-to-end deep learning approaches named Weakly Supervised Patch Label Inference Networks (WSPLIN) for efficiently addressing these tasks under various application settings. WSPLIN transforms the fully supervised pavement image classification problem into a weakly supervised pavement patch classification problem for solutions. Specifically, WSPLIN first divides the pavement image under different scales into patches with different collection strategies and then employs a Patch Label Inference Network (PLIN) to infer the labels of these patches to fully exploit the resolution and scale information. Notably, we design a patch label sparsity constraint based on the prior knowledge of distress distribution and leverage the Comprehensive Decision Network (CDN) to guide the training of PLIN in a weakly supervised way. Therefore, the patch labels produced by PLIN provide interpretable intermediate information, such as the rough location and the type of distress. We evaluate our method on a large-scale bituminous pavement distress dataset named CQU-BPDD and the augmented Crack500 (Crack500-PDD) dataset, which is a newly constructed pavement distress detection dataset augmented from the Crack500. Extensive results demonstrate the superiority of our method over baselines in both performance and efficiency. The source codes of WSPLIN are released on https://github.com/DearCaat/wsplin.

翻訳日:2023-04-05 01:33:33 公開日:2023-04-03

# 映画物語の合成:ストーリー理解のためのビデオ言語データセット

Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding ( http://arxiv.org/abs/2203.05711v2 )

ライセンス: Link先を確認

Yidan Sun, Qin Chao, Yangfeng Ji and Boyang Li

(参考訳) 最近のaiの進歩にもかかわらず、ストーリー理解はオープンで未調査の問題だ。我々は、人気映画やテレビシリーズの5,193本のビデオ要約を含むビデオ言語ストーリーデータセットSYMON(Synopses of Movie Narratives)を収集、前処理、公開する。 SYMONは、人間のクリエイターが作った人間のオーディエンスのための自然主義的なストーリーテリングビデオを撮影する。原型的で自然主義的なストーリーデータセットとして、SYMONは多モーダルなストーリーイベント、豊富な精神状態の記述、視覚とテキストのモダリティの間に大きな意味的ギャップを特徴としている。我々は,映像要約ビデオにおけるビデオテキスト検索とゼロショットアライメントのベンチマークを構築し,ストーリー理解におけるドメイン内データの重要性を示す。 SYMONでは、マルチモーダルなストーリー理解の進展の基礎を築きたいと考えています。

Despite recent advances of AI, story understanding remains an open and under-investigated problem. We collect, preprocess, and publicly release a video-language story dataset, Synopses of Movie Narratives (SYMON), containing 5,193 video summaries of popular movies and TV series. SYMON captures naturalistic story-telling videos for human audience made by human creators. As a prototypical and naturalistic story dataset, SYMON features high coverage of multimodal story events, abundant mental-state descriptions, and large semantic gaps between the visual and the textual modalities. We establish benchmarks on video-text retrieval and zero-shot alignment on movie summary videos, which showcase the importance of in-domain data in story understanding. With SYMON, we hope to lay the groundwork for progress in multimodal story understanding.

翻訳日:2023-04-05 01:32:47 公開日:2023-04-03

# 分類における良性過剰--大きなモデルでラベルノイズに対抗できる

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models ( http://arxiv.org/abs/2206.00501v2 )

ライセンス: Link先を確認

Kaiyue Wen, Jiaye Teng, Jingzhao Zhang

(参考訳) 良性過剰フィッティングの研究は、過剰パラメータのディープラーニングモデルの成功のための洞察を提供する。本研究では,実世界の分類タスクにおいて過剰適合が真に有益であるかどうかを検討する。まず、ResNetモデルがCifar10に優越するが、ImageNetに優越しないという観察から始める。 ImageNet実験でベニグオーバーフィッティングが失敗する理由を理解するために,パラメータ数がデータポイント数より大きくないような制限的な設定でベニグオーバーフィッティングを理論的に解析する。この軽度な過パラメータ化設定の下で、我々の分析は相変化を識別する:以前の重度過パラメータ化設定とは異なり、良性過適合はラベルノイズの存在下で失敗する。我々の分析は経験的観察を説明し、ResNetsによる一連の制御実験によって検証される。我々の研究は、将来の方向性として不適合な体制における暗黙のバイアスを理解することの重要性を強調します。

Studies on benign overfitting provide insights for the success of overparameterized deep learning models. In this work, we examine whether overfitting is truly benign in real-world classification tasks. We start with the observation that a ResNet model overfits benignly on Cifar10 but not benignly on ImageNet. To understand why benign overfitting fails in the ImageNet experiment, we theoretically analyze benign overfitting under a more restrictive setup where the number of parameters is not significantly larger than the number of data points. Under this mild overparameterization setup, our analysis identifies a phase change: unlike in the previous heavy overparameterization settings, benign overfitting can now fail in the presence of label noise. Our analysis explains our empirical observations, and is validated by a set of control experiments with ResNets. Our work highlights the importance of understanding implicit bias in underfitting regimes as a future direction.

翻訳日:2023-04-05 01:25:14 公開日:2023-04-03

# 窒素空孔中心を持つ大領域における高分解能NMR分光

High-Resolution NMR Spectroscopy at Large Fields with Nitrogen Vacancy Centers ( http://arxiv.org/abs/2205.04150v2 )

ライセンス: Link先を確認

C. Munuera-Javaloy, A. Tobalina, and J. Casanova

(参考訳) 窒素空洞(NV)中心のアンサンブルは、室温でミクロンサイズの試料からNMR信号を検出するセンサーとして使用される。このシナリオでは、大きな磁場の体制が特に興味深いのは、化学シフトとJカップリングがよりアクセスしやすくなる一方で、核熱偏極が大きく、低濃度のサンプルでも強いセンサー応答をもたらすためである。しかしながら、この体制は、高周波核信号とNVベースのセンサーを混在させることが困難であるため、ほとんど未解明のままである。そこで本研究では, センサに伝達される誘導核スピン信号の振幅における関連するエネルギーシフトをマッピングする手法を用いて, この問題を回避する。この段階は、センサーが関与しないサンプル核スピンの自由沈降期間と交差する。したがって、この方法は、核スピン信号のコヒーレンスによって最終的に制限される高いスペクトル分解能をもたらす。

Ensembles of nitrogen-vacancy (NV) centers are used as sensors to detect NMR signals from micron-sized samples at room temperature. In this scenario, the regime of large magnetic fields is especially interesting as it leads to a large nuclear thermal polarisation -- thus, to a strong sensor response even in low concentration samples -- while chemical shifts and J-couplings become more accessible. Nevertheless, this regime remains largely unexplored owing to the difficulties to couple NV-based sensors with high-frequency nuclear signals. In this work, we circumvent this problem with a method that maps the relevant energy shifts in the amplitude of an induced nuclear spin signal that is subsequently transferred to the sensor. This stage is interspersed with free-precession periods of the sample nuclear spins where the sensor does not participate. Thus, our method leads to high spectral resolutions ultimately limited by the coherence of the nuclear spin signal.

翻訳日:2023-04-05 01:23:57 公開日:2023-04-03

# 暗号通貨ポンプ・ダンプのシーケンスベースターゲットコイン予測

Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump ( http://arxiv.org/abs/2204.12929v2 )

ライセンス: Link先を確認

Sihao Hu, Zhen Zhang, Shengliang Lu, Bingsheng He, Zhao Li

(参考訳) 暗号通貨市場におけるポンプ・アンド・ダンプ・スキーム(P&D)の普及に伴い、そのような不正行為を事前に検出し、潜在的な投資家に警告することが義務づけられる。本稿では,スケジュールされたポンプ時間の前に,対象交換所に記載された全てのコインのポンプ確率を予測することに焦点を当て,これを目標コイン予測タスクと呼ぶ。まず、2019年1月から2022年1月までテレグラムで組織された最新の709のp&dイベントを総合的に調査する。実験結果から,p&dにはチャネル内均一性とチャネル間不均質性を示すような興味深いパターンがみられた。ここでのチャンネルは、しばしばp&dイベントのコーディネートに使用されるテレグラムのグループの形式を指す。この観察により、チャネルのP&Dイベント履歴を位置注意機構を介してシーケンス表現にエンコードし、予測精度を高める、SNNと呼ばれる新しいシーケンスベースのニューラルネットワークを開発することができる。位置注意は有用な情報を抽出し、特にシーケンスの長さが長い場合にはノイズを軽減するのに役立つ。大規模実験により提案手法の有効性と一般化性を検証する。 https://github.com/Bayi-Hu/Pump-and-Dump-detection-on-Cryptocurrency.comでコードとP&Dデータセットをリリースし、定期的にデータセットを更新します。

With the proliferation of pump-and-dump schemes (P&Ds) in the cryptocurrency market, it becomes imperative to detect such fraudulent activities in advance to alert potentially susceptible investors. In this paper, we focus on predicting the pump probability of all coins listed in the target exchange before a scheduled pump time, which we refer to as the target coin prediction task. Firstly, we conduct a comprehensive study of the latest 709 P&D events organized in Telegram from Jan. 2019 to Jan. 2022. Our empirical analysis reveals some interesting patterns of P&Ds, such as that pumped coins exhibit intra-channel homogeneity and inter-channel heterogeneity. Here channel refers a form of group in Telegram that is frequently used to coordinate P&D events. This observation inspires us to develop a novel sequence-based neural network, dubbed SNN, which encodes a channel's P&D event history into a sequence representation via the positional attention mechanism to enhance the prediction accuracy. Positional attention helps to extract useful information and alleviates noise, especially when the sequence length is long. Extensive experiments verify the effectiveness and generalizability of proposed methods. Additionally, we release the code and P&D dataset on GitHub: https://github.com/Bayi-Hu/Pump-and-Dump-Detection-on-Cryptocurrency, and regularly update the dataset.

翻訳日:2023-04-05 01:23:09 公開日:2023-04-03

# ノイズのあるスキルラベルから職種を学習する

Learning Job Titles Similarity from Noisy Skill Labels ( http://arxiv.org/abs/2207.00494v3 )

ライセンス: Link先を確認

Rabih Zbib, Lucas Alvarez Lacasa, Federico Retyk, Rus Poves, Juan Aizpuru, Hermenegildo Fabregat, Vaidotas Simkus, and Emilia Garc\'ia-Casademont

(参考訳) 職名間のセマンティックな類似度を測定することは、仕事の自動推薦に不可欠な機能である。このタスクは通常、同等の肩書きペアの形式でトレーニングデータを必要とする教師付き学習技術を使ってアプローチされる。そこで本稿では,ノイズのあるスキルラベルを用いた職名類似性モデルの学習のための教師なし表現学習手法を提案する。テキストのランク付けや仕事の正規化といったタスクに非常に効果的であることを示す。

Measuring semantic similarity between job titles is an essential functionality for automatic job recommendations. This task is usually approached using supervised learning techniques, which requires training data in the form of equivalent job title pairs. In this paper, we instead propose an unsupervised representation learning method for training a job title similarity model using noisy skill labels. We show that it is highly effective for tasks such as text ranking and job normalization.

翻訳日:2023-04-05 01:16:13 公開日:2023-04-03

# wnet: 訓練可能な再構成層を有するスパースビューctのためのデータ駆動型デュアルドメインデノイジングモデル

WNet: A data-driven dual-domain denoising model for sparse-view computed tomography with a trainable reconstruction layer ( http://arxiv.org/abs/2207.00400v2 )

ライセンス: Link先を確認

Theodor Cheslerean-Boghiu, Felix C. Hofmann, Manuel Schulthei{\ss}, Franz Pfeiffer, Daniela Pfeiffer, Tobias Lasser

(参考訳) ディープラーニングベースのソリューションは、さまざまなアプリケーションでうまく実装されています。中でも注目すべきは、臨床ユースケースの関心が高まり、過去数年間に提案された最先端のデータ駆動アルゴリズムの主要な推進役となったことだ。 sparse-view tomographic reconstructionsのようなアプリケーションでは、取得時間を短く、放射線線量が少ない状態に保つために測定データの量が小さい場合、ストレッチアーティファクトの削減は、フルスキャンデータのサブセットのみを使用して診断可能な画像を取得することを主な目標としたデータ駆動デノイジングアルゴリズムの開発を促している。本稿では,sparse-viewアーティファクトをデノージングするためのトレーニング可能な再構築層を含むデータ駆動型デュアルドメインデノージングモデルであるwnetを提案する。 2つのエンコーダデコーダネットワークは、シングラムと再構成ドメインを同時にデノナイズする一方、フィルタバックプロジェクションアルゴリズムを実装する第3の層は、第1の2つの間に挟み込み、再構成操作を行う。胸部CTスキャンにおけるネットワークの性能について検討し,従来の固定層よりもトレーニング可能な再構成層を持つことのメリットを強調した。我々は2つの臨床的に関連のあるデータセットを用いてネットワークをトレーニングし、その結果を3種類のスパースビューCTと再構成アルゴリズムと比較した。

Deep learning based solutions are being succesfully implemented for a wide variety of applications. Most notably, clinical use-cases have gained an increased interest and have been the main driver behind some of the cutting-edge data-driven algorithms proposed in the last years. For applications like sparse-view tomographic reconstructions, where the amount of measurement data is small in order to keep acquisition time short and radiation dose low, reduction of the streaking artifacts has prompted the development of data-driven denoising algorithms with the main goal of obtaining diagnostically viable images with only a subset of a full-scan data. We propose WNet, a data-driven dual-domain denoising model which contains a trainable reconstruction layer for sparse-view artifact denoising. Two encoder-decoder networks perform denoising in both sinogram- and reconstruction-domain simultaneously, while a third layer implementing the Filtered Backprojection algorithm is sandwiched between the first two and takes care of the reconstruction operation. We investigate the performance of the network on sparse-view chest CT scans, and we highlight the added benefit of having a trainable reconstruction layer over the more conventional fixed ones. We train and test our network on two clinically relevant datasets and we compare the obtained results with three different types of sparse-view CT denoising and reconstruction algorithms.

翻訳日:2023-04-05 01:16:06 公開日:2023-04-03

# AnoShift: 教師なし異常検出のための分散シフトベンチマーク

AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection ( http://arxiv.org/abs/2206.15476v4 )

ライセンス: Link先を確認

Marius Dragoi, Elena Burceanu, Emanuela Haller, Andrei Manolache and Florin Brad

(参考訳) データの分散シフトを分析することは、機械学習(ML)における研究の方向性の高まりであり、MLモデルの一般化特性を研究するのに適したシナリオを提供することに焦点を当てた、新たなベンチマークへとつながる。既存のベンチマークは教師なし学習にフォーカスしており、最善の知識は教師なし学習には何もありません。そこで本稿では,ネットワーク侵入検知のためのトラフィックデータセットである Kyoto-2006+ 上に構築されたデータを用いた教師なし異常検出ベンチマークを導入する。このタイプのデータは、入力の配布をシフトする前提を満たしている: 大量の時間(10ドル)をカバーし、時間とともに自然に変化する変化(ユーザの振る舞いパターンの変更やソフトウェアのアップデートなど)をカバーしている。まず、基本機能毎の分析、t-sne、および年数間の分布距離を測定するための最適な輸送手法を用いて、データの非定常的性質を強調する。次に、IID、NEAR、FARテスト分割でデータを分割するプロトコルであるAnoShiftを提案する。従来のアプローチからディープラーニングまで,さまざまなモデルで時間の経過とともにパフォーマンスの低下を検証する。最後に,分散シフト問題を認識し,適切な対応を行うことで,独立かつ同一の分散データを前提とした古典的トレーニングと比較して,パフォーマンスの向上が期待できることを示した。データセットとコードはhttps://github.com/bit-ml/anoshift/で入手できる。

Analyzing the distribution shift of data is a growing research direction in nowadays Machine Learning (ML), leading to emerging new benchmarks that focus on providing a suitable scenario for studying the generalization properties of ML models. The existing benchmarks are focused on supervised learning, and to the best of our knowledge, there is none for unsupervised learning. Therefore, we introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. This type of data meets the premise of shifting the input distribution: it covers a large time span ($10$ years), with naturally occurring changes over time (eg users modifying their behavior patterns, and software updates). We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. Next, we propose AnoShift, a protocol splitting the data in IID, NEAR, and FAR testing splits. We validate the performance degradation over time with diverse models, ranging from classical approaches to deep learning. Finally, we show that by acknowledging the distribution shift problem and properly addressing it, the performance can be improved compared to the classical training which assumes independent and identically distributed data (on average, by up to $3\%$ for our approach). Dataset and code are available at https://github.com/bit-ml/AnoShift/.

翻訳日:2023-04-05 01:15:38 公開日:2023-04-03

# 質問は、密集した通路のレトリバーを訓練するしかないか?

Questions Are All You Need to Train a Dense Passage Retriever ( http://arxiv.org/abs/2206.10658v4 )

ライセンス: Link先を確認

Devendra Singh Sachan and Mike Lewis and Dani Yogatama and Luke Zettlemoyer and Joelle Pineau and Manzil Zaheer

(参考訳) ラベル付きトレーニングデータを必要としない高密度検索モデルをトレーニングするための,新しいコーパスレベルの自動エンコーディング手法であるartを紹介する。高度な検索は、open qaのようなオープンドメインタスクの中心的な課題であり、最先端の手法では、カスタムのハード負のマイニングとポジティブな例の否定を伴う大規模な教師ありデータセットを必要とする。対照的にARTは、未解決の入力や出力(質問や潜在的な回答文書など)へのアクセスのみを必要とする。新たな文書リトライバル自動エンコーディング方式を用いて,(1)証拠文書の集合を検索するために入力質問を使用し,(2)文書を用いて元の質問を再構築する確率を計算する。質問再構成に基づく検索の訓練は、文書と質問エンコーダの効果的な教師なし学習を可能にし、後から完全なオープンQAシステムに組み込むことができる。広範囲な実験により、ARTは事前訓練された言語モデルからのみ汎用的な初期化を行い、ラベル付きデータやタスク固有の損失を除去し、複数のQA検索ベンチマークで最先端の結果を得ることができた。

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents). It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.

翻訳日:2023-04-05 01:15:13 公開日:2023-04-03

# スパイクニューラルネットワークのためのシナプス閾値シナジスティック学習手法

A Synapse-Threshold Synergistic Learning Approach for Spiking Neural Networks ( http://arxiv.org/abs/2206.06129v3 )

ライセンス: Link先を確認

Hongze Sun, Wuque Cai, Baoxin Yang, Yan Cui, Yang Xia, Dezhong Yao, Daqing Guo

(参考訳) スパイキングニューラルネットワーク(SNN)は、さまざまなインテリジェントなシナリオにおいて優れた機能を示している。既存のsnsの訓練方法はシナプス可塑性の概念に基づいているが、現実的脳での学習はニューロンの非シナプス機構も活用している。生体ニューロンのスパイク閾値は、ミリ秒の時間スケールでリッチなダイナミクスを示す重要な内在神経の特徴であり、神経情報処理の基盤となるメカニズムとして提案されている。本研究では,SNNにおけるシナプス重みとスパイク閾値を同時に学習する新しいシナジー学習手法を開発する。シナプス・スレッショルド・シナジスティック・ラーニング(STL-SNN)で訓練されたSNNは、2つの非生成シングルラーニングモデルで訓練されたSNNよりも、様々な静的およびニューロモルフィックなデータセットで大幅に優れた性能を発揮する。トレーニング中、シナジスティック学習アプローチは神経閾値を最適化し、適切な発射率で安定した信号伝達を提供する。さらに分析した結果、STL-SNNはノイズの多いデータに対して堅牢であり、深層ネットワーク構造に対する低エネルギー消費を示すことが示された。さらに、一般化された共同決定フレームワークを導入することにより、STL-SNNの性能をさらに向上することができる。以上の結果から, シナプスと内因性非シナプス機構の相乗効果は, SNN学習法の開発に有効である可能性が示唆された。

Spiking neural networks (SNNs) have demonstrated excellent capabilities in various intelligent scenarios. Most existing methods for training SNNs are based on the concept of synaptic plasticity; however, learning in the realistic brain also utilizes intrinsic non-synaptic mechanisms of neurons. The spike threshold of biological neurons is a critical intrinsic neuronal feature that exhibits rich dynamics on a millisecond timescale and has been proposed as an underlying mechanism that facilitates neural information processing. In this study, we develop a novel synergistic learning approach that involves simultaneously training synaptic weights and spike thresholds in SNNs. SNNs trained with synapse-threshold synergistic learning~(STL-SNNs) achieve significantly superior performance on various static and neuromorphic datasets than SNNs trained with two degenerated single-learning models. During training, the synergistic learning approach optimizes neural thresholds, providing the network with stable signal transmission via appropriate firing rates. Further analysis indicates that STL-SNNs are robust to noisy data and exhibit low energy consumption for deep network structures. Additionally, the performance of STL-SNN can be further improved by introducing a generalized joint decision framework. Overall, our findings indicate that biologically plausible synergies between synaptic and intrinsic non-synaptic mechanisms may provide a promising approach for developing highly efficient SNN learning methods.

翻訳日:2023-04-05 01:14:13 公開日:2023-04-03

# sagnacに基づく量子エンタングルメントのコヒーレンス解釈

Coherence interpretation of the Sagnac-based quantum entanglement ( http://arxiv.org/abs/2206.05358v4 )

ライセンス: Link先を確認

Byoung S. Ham

(参考訳) ベルの不等式違反は量子エンタングルメントの定量的測定ツールである。量子エンタングルメントは量子情報科学の中心であり、リモートで分離された局所検出器間の非局所相関は量子力学のユニークな性質を示す。過去数十年間、量子相関の基礎物理学と量子技術への潜在的な応用に関する集中的な研究が続けられてきた。そこで,コヒーレンス法による遅延型量子消去器の干渉計において,非局所相関に対する一致検出の役割について検討した。非局所量子特性を理解するために、干渉計の2つの出力光子間の一致測定から局所検出器間の結合パラメータ関係をコヒーレントに誘導する。このコヒーレンスアプローチに基づいて, 解の妥当性について, 量子エンタングルメント生成の逆直観的コヒーレンスバージョンを提案する。

Bell inequality violation is a quantitative measurement tool for quantum entanglement. Quantum entanglement is the heart of quantum information science, in which the resulting nonlocal correlation between remotely separated local detectors shows a unique property of quantum mechanics. Over the last few decades, intensive research has been conducted on the basic physics of quantum correlation as well as its potential applications to quantum technologies. Here, the role of coincidence detection is investigated for the nonlocal correlation in a simple interferometer of the delayed-choice quantum eraser using a coherence approach. To understand the nonlocal quantum feature, a joint-parameter relation between local detectors is coherently induced from coincidence measurements between two output photons of an interferometer. Based on this coherence approach, a counterintuitive coherence version of the quantum entanglement generation is proposed for the validity of the solution.

翻訳日:2023-04-05 01:13:47 公開日:2023-04-03

# menli: 自然言語推論によるロバストな評価指標

MENLI: Robust Evaluation Metrics from Natural Language Inference ( http://arxiv.org/abs/2208.07316v2 )

ライセンス: Link先を確認

Yanran Chen and Steffen Eger

(参考訳) 最近提案されたBERTベースのテキスト生成評価指標は、標準的なベンチマークでよく機能するが、情報正当性などの敵攻撃に弱い。これは、それらが意味的類似性のモデルであるという事実に由来する(一部)。対照的に、我々は自然言語推論(NLI)に基づく評価指標を開発し、より適切なモデリングを行う。我々は、嗜好ベースの敵攻撃フレームワークを設計し、我々のNLIベースのメトリクスが最近のBERTベースのメトリクスよりも攻撃に対してより堅牢であることを示す。標準ベンチマークでは、NLIベースのメトリクスは既存の要約の指標よりも優れていますが、SOTA MTの指標よりは劣ります。しかし、既存のメトリクスとNLIのメトリクスを組み合わせると、標準ベンチマーク(+5%から30%)で測定された高い逆の堅牢性(15%から30%)と高品質のメトリクスの両方が得られます。

Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e.g., relating to information correctness. We argue that this stems (in part) from the fact that they are models of semantic similarity. In contrast, we develop evaluation metrics based on Natural Language Inference (NLI), which we deem a more appropriate modeling. We design a preference-based adversarial attack framework and show that our NLI based metrics are much more robust to the attacks than the recent BERT-based metrics. On standard benchmarks, our NLI based metrics outperform existing summarization metrics, but perform below SOTA MT metrics. However, when combining existing metrics with our NLI metrics, we obtain both higher adversarial robustness (15%-30%) and higher quality metrics as measured on standard benchmarks (+5% to 30%).

翻訳日:2023-04-05 01:06:49 公開日:2023-04-03

# 可積分スピン鎖の量子シミュレーションにおける保存電荷

Conserved charges in the quantum simulation of integrable spin chains ( http://arxiv.org/abs/2208.00576v2 )

ライセンス: Link先を確認

Kazunobu Maruyoshi, Takuya Okuda, Juan William Pedersen, Ryo Suzuki, Masahito Yamazaki, Yutaka Yoshida

(参考訳) デジタル量子コンピュータ上での量子多体系の時間進化をシミュレーションすると、時間離散化による量子ノイズとトロッター誤差の課題に直面している。可積分スピンチェーンのトロッター誤差は、離散時間発展が可積分性を維持するならば制御できる。この研究において、実量子コンピュータと古典シミュレータ上で、スピン-1/2ハイゼンベルクXXXスピン鎖の可積分トロッター化を実装した。量子ノイズがいくつかの保存電荷の時間発展にどのように影響するかを研究し、期待値の減衰を観測する。さらに、将来量子デバイスやアルゴリズムのベンチマークに使用可能な、時間発展の初期の挙動についても研究しています。また,保存電荷を高い順序で効率的に生成する方法を提供する。

When simulating the time evolution of quantum many-body systems on a digital quantum computer, one faces the challenges of quantum noise and of the Trotter error due to time discretization. The Trotter error in integrable spin chains can be under control if the discrete time evolution preserves integrability. In this work we implement, on a real quantum computer and on classical simulators, the integrable Trotterization of the spin-1/2 Heisenberg XXX spin chain. We study how quantum noise affects the time evolution of several conserved charges, and observe the decay of the expectation values. We in addition study the early time behaviors of the time evolution, which can potentially be used to benchmark quantum devices and algorithms in the future. We also provide an efficient method to generate the conserved charges at higher orders.

翻訳日:2023-04-05 01:06:03 公開日:2023-04-03

# コヒーレンス、重ね合わせ、L\"{o}wdin対称直交化

Coherence, superposition, and L\"{o}wdin symmetric orthogonalization ( http://arxiv.org/abs/2209.03746v3 )

ライセンス: Link先を確認

G\"okhan Torun

(参考訳) コヒーレンスと重ね合わせの概念は概念的には同じであるが、資源理論の定式化には重要な違いがある。すなわち、基底状態はコヒーレンスの資源理論では直交するが、重ね合わせの資源理論では必ずしも直交するとは限らない。非直交性のため、重ね合わせ状態の操作と特徴づけにはかなりの努力が必要である。ここでは、L\"{o}wdin symmetric orthogonalization (LSO) 法が純粋重ね合わせ状態の特徴づけに有用な手段であることを示す。 LSOの主な性質は、元の非直交基底状態の構造と対称性が極端に保存されていることである。特に、極大にコヒーレントな状態は、lsoの助けを借りて極大重ね合わせで状態になる:言い換えれば、それらは対称直交化の作用の下で同値である。この結果から,LSOが主なツールであるコヒーレンスと重ね合わせの接続が促進される。

The notions of coherence and superposition are conceptually the same; however, an important distinction exists between their resource-theoretic formulations. Namely, while basis states are orthogonal in the resource theory of coherence, they are not necessarily orthogonal in the resource theory of superposition. Owing to the nonorthogonality, the manipulation and characterization of superposition states require significant efforts. Here, we demonstrate that the L\"{o}wdin symmetric orthogonalization (LSO) method offers a useful means for characterizing pure superposition states. The principal property of LSO is that the structure and symmetry of the original nonorthogonal basis states are preserved to the greatest extent possible, which prompts us to study the role of LSO in identifying the hierarchical relations of resource states. Notably, we reveal that the maximally coherent states turn into the states with maximal superposition with the help of LSO: in other words, they are equivalent under the action of symmetric orthogonalization. Our results facilitate further connections between coherence and superposition, where LSO is the main tool.

翻訳日:2023-04-05 00:58:35 公開日:2023-04-03

# 確率的プロトコルによる漸近状態変換のエントロピー制限の克服

Overcoming entropic limitations on asymptotic state transformations through probabilistic protocols ( http://arxiv.org/abs/2209.03362v3 )

ライセンス: Link先を確認

Bartosz Regula, Ludovico Lami, Mark M. Wilde

(参考訳) 量子相対エントロピー(quantum relative entropy)は、一般の資源理論的設定における量子状態の漸近的変換可能性を決定する上で重要な役割を果たすことが知られている。量子相対エントロピーは漸近状態変換の速度を特徴づけるには不十分であり、ヒルベルト射影計量の正則化に基づく新しいエントロピー量(英語版)が成立する。このようなシナリオは、一般に与えられた状態のコピー数として取られる量子状態の変換に関連するコストが、プロトコルを実現するのに必要な量子メモリのサイズと同一視されるような設定によって動機付けられる。提案手法では,相対エントロピーによって課されるよりも厳密に高いレートを実現する変換プロトコルを構築することができる。資源蒸留の課題に焦点をあて,確率的蒸留プロトコルの漸近速度の強い逆境界を広く適用し,非エンタングリング操作による絡み込み蒸留など,関連する状況において厳密であることを示す。これは決定論的プロトコルにのみ適用される既知の制限を一般化し拡張する。提案手法は, 確率的ワンショット変換の最近の結果と, 射影的相対エントロピーに対する新しい漸近的平衡特性に基づく。

The quantum relative entropy is known to play a key role in determining the asymptotic convertibility of quantum states in general resource-theoretic settings, often constituting the unique monotone that is relevant in the asymptotic regime. We show that this is no longer the case when one allows stochastic protocols that may only succeed with some probability, in which case the quantum relative entropy is insufficient to characterize the rates of asymptotic state transformations, and a new entropic quantity based on a regularization of the Hilbert projective metric comes into play. Such a scenario is motivated by a setting where the cost associated with transformations of quantum states, typically taken to be the number of copies of a given state, is instead identified with the size of the quantum memory needed to realize the protocol. Our approach allows for constructing transformation protocols that achieve strictly higher rates than those imposed by the relative entropy. Focusing on the task of resource distillation, we give broadly applicable strong converse bounds on the asymptotic rates of probabilistic distillation protocols, and show them to be tight in relevant settings such as entanglement distillation with non-entangling operations. This generalizes and extends previously known limitations that only applied to deterministic protocols. Our methods are based on recent results for probabilistic one-shot transformations as well as a new asymptotic equipartition property for the projective relative entropy.

翻訳日:2023-04-05 00:58:16 公開日:2023-04-03

# 潜在力学から有意義な表現へ

From latent dynamics to meaningful representations ( http://arxiv.org/abs/2209.00905v2 )

ライセンス: Link先を確認

Dedi Wang, Yihang Wang, Luke Evans and Pratyush Tiwary

(参考訳) 表現学習は機械学習と人工知能の台頭の中心であるが、学習した表現を意味のあるものにすることが重要な問題である。このため、典型的なアプローチは、事前確率分布を通じて学習表現を正則化することである。しかし、そのような事前処理は通常使用できないかアドホックである。これに対応するために,動的制約付き表現学習フレームワークを提案する。事前定義された確率を用いる代わりに、動的システムにおける表現学習のより自然な制約である特定のダイナミクスに従うために潜在表現を制限します。我々の信念は、異なる系は異なる限界化された確率分布を持つことができるが、ニュートン方程式やシュロディンガー方程式のような同じ力学に従うという物理学の基本的な観察に由来する。我々は,現実の蛍光DNA映画データセットを含む様々なシステムに対する枠組みを検証する。本アルゴリズムは,非相関,等尺,有意な潜在表現を一意に識別できることを示す。

While representation learning has been central to the rise of machine learning and artificial intelligence, a key problem remains in making the learnt representations meaningful. For this the typical approach is to regularize the learned representation through prior probability distributions. However such priors are usually unavailable or ad hoc. To deal with this, we propose a dynamics-constrained representation learning framework. Instead of using predefined probabilities, we restrict the latent representation to follow specific dynamics, which is a more natural constraint for representation learning in dynamical systems. Our belief stems from a fundamental observation in physics that though different systems can have different marginalized probability distributions, they typically obey the same dynamics, such as Newton's and Schrodinger's equations. We validate our framework for different systems including a real-world fluorescent DNA movie dataset. We show that our algorithm can uniquely identify an uncorrelated, isometric and meaningful latent representation.

翻訳日:2023-04-05 00:57:52 公開日:2023-04-03

# フォトニック量子プロセッサのための量子体積

Quantum Volume for Photonic Quantum Processors ( http://arxiv.org/abs/2208.11724v2 )

ライセンス: Link先を確認

Yuxuan Zhang, Daoheng Niu, Alireza Shabani, Hassan Shapourian

(参考訳) 短期量子コンピューティングプロセッサのメトリクスを定義することは、量子ハードウェアの研究と開発に不可欠である。このような定量的特徴は、進捗の報告や異なる量子プラットフォームの比較に有用であるだけでなく、ボトルネックの特定や技術ロードマップの設計にも不可欠である。ランダム化ベンチマークや量子ボリュームのようなほとんどのメトリクスは、もともと回路ベースの量子コンピュータに導入され、フォトニックデバイスのような測定ベースの量子コンピューティング(MBQC)プロセッサにはすぐには適用されなかった。本稿では,MBQCプロセスの物理ノイズと不完全性を等価量子回路の論理誤差にマッピングする枠組みを提示することにより,このギャップを解消する。本稿では,光量子コンピューティングの短期的候補として符号化されたGottesman-Kitaev-Preskill(GKP)に基づく連続可変クラスタ状態について検討し,実効論理ゲート誤差チャネルを導出し,GKPのスクイーズと光子損失率の観点から量子量を算出する。

Defining metrics for near-term quantum computing processors has been an integral part of the quantum hardware research and development efforts. Such quantitative characteristics are not only useful for reporting the progress and comparing different quantum platforms, but also essential for identifying the bottlenecks and designing a technology roadmap. Most metrics such as randomized benchmarking and quantum volume were originally introduced for circuit-based quantum computers and were not immediately applicable to measurement-based quantum computing (MBQC) processors such as in photonic devices. In this paper, we close this gap by presenting a framework to map physical noises and imperfections in MBQC processes to logical errors in equivalent quantum circuits, whereby enabling the well-known metrics to characterize MBQC. To showcase our framework, we study a continuous-variable cluster state based on the Gottesman-Kitaev-Preskill (GKP) encoding as a near-term candidate for photonic quantum computing, and derive the effective logical gate error channels and calculate the quantum volume in terms of the GKP squeezing and photon loss rate.

翻訳日:2023-04-05 00:56:09 公開日:2023-04-03

# 機密コンピューティングによる機械学習: 知識の体系化

Machine Learning with Confidential Computing: A Systematization of Knowledge ( http://arxiv.org/abs/2208.10134v2 )

ライセンス: Link先を確認

Fan Mo, Zahra Tarkhani, Hamed Haddadi

(参考訳) 機械学習(ML)におけるプライバシとセキュリティの課題は、MLの広範な開発と、最近の大規模な攻撃面のデモとともに、ますます深刻になっている。成熟したシステム指向のアプローチとして、Confidential Computingは、さまざまなMLシナリオにおけるプライバシとセキュリティの問題を軽減するために、学術と産業の両方で使用されている。本稿では,ML と Confidential Computing の連携について検討する。機密情報処理支援ML技術に関する先行研究を体系化する。一秘密の保証及び保証二完全性保証及びその先進的な特徴及び欠点について議論すること。重要な課題はさらに特定され、既存の信頼実行環境(tee)システムのmlユースケースに対する制限を専門的に分析する。最後に、クローズドループ保護のための基盤となるプライバシー定義、効率的なMLのパーティショニングされた実行、ML専用のTEEアシストデザイン、TEE対応ML、ML完全なパイプライン保証などについて論じる。知識の体系化にこれらの潜在的なソリューションを提供することで、計算やシステムコストを導入することなく、より強力なTEE対応MLをプライバシ保証のために実現するために、ブリッジを構築することを目指している。

Privacy and security challenges in Machine Learning (ML) have become increasingly severe, along with ML's pervasive development and the recent demonstration of large attack surfaces. As a mature system-oriented approach, Confidential Computing has been utilized in both academia and industry to mitigate privacy and security issues in various ML scenarios. In this paper, the conjunction between ML and Confidential Computing is investigated. We systematize the prior work on Confidential Computing-assisted ML techniques that provide i) confidentiality guarantees and ii) integrity assurances, and discuss their advanced features and drawbacks. Key challenges are further identified, and we provide dedicated analyses of the limitations in existing Trusted Execution Environment (TEE) systems for ML use cases. Finally, prospective works are discussed, including grounded privacy definitions for closed-loop protection, partitioned executions of efficient ML, dedicated TEE-assisted designs for ML, TEE-aware ML, and ML full pipeline guarantees. By providing these potential solutions in our systematization of knowledge, we aim at building the bridge to help achieve a much strong TEE-enabled ML for privacy guarantees without introducing computation and system costs.

翻訳日:2023-04-05 00:55:48 公開日:2023-04-03

# クロスモーダルトランスフォーマーを用いたダンススタイルトランスファー

Dance Style Transfer with Cross-modal Transformer ( http://arxiv.org/abs/2208.09406v3 )

ライセンス: Link先を確認

Wenjie Yin, Hang Yin, Kim Baraka, Danica Kragic, and M{\aa}rten Bj\"orkman

(参考訳) そこで本研究では,あるダンススタイルにおける既存のモーションクリップを,ダンスのモーションコンテキストを保ちつつ,別のダンススタイルのモーションクリップに変換する,ダンススタイル転送システムであるcycledanceを提案する。提案手法は,既存のCycleGANアーキテクチャを拡張して音声シーケンスをモデル化し,マルチモーダルトランスフォーマーエンコーダを統合する。シーケンス長に基づくカリキュラム学習を採用し,トレーニングを安定化する。本手法は,移動フレーム間のリッチかつ長期的関係を捉え,移動伝達と合成作業において共通の課題である。さらに,ダンス動作の文脈において,移動強度とコンテンツ保存の指標を新たに導入する。 5年以上のダンス経験を持つ30人を対象に,広範囲にわたるアブレーション研究と人間による研究を行った。その結果, サイクルダンスは, 自然性, 伝達強度, コンテンツ保存において, ベースラインのサイクルガンを著しく上回って, ターゲットスタイルで現実的な動きを生じさせることがわかった。

We present CycleDance, a dance style transfer system to transform an existing motion clip in one dance style to a motion clip in another dance style while attempting to preserve motion context of the dance. Our method extends an existing CycleGAN architecture for modeling audio sequences and integrates multimodal transformer encoders to account for music context. We adopt sequence length-based curriculum learning to stabilize training. Our approach captures rich and long-term intra-relations between motion frames, which is a common challenge in motion transfer and synthesis work. We further introduce new metrics for gauging transfer strength and content preservation in the context of dance movements. We perform an extensive ablation study as well as a human study including 30 participants with 5 or more years of dance experience. The results demonstrate that CycleDance generates realistic movements with the target style, significantly outperforming the baseline CycleGAN on naturalness, transfer strength, and content preservation.

翻訳日:2023-04-05 00:55:28 公開日:2023-04-03

# リーマン型PDE-G-CNNの解析

Analysis of (sub-)Riemannian PDE-G-CNNs ( http://arxiv.org/abs/2210.00935v4 )

ライセンス: Link先を確認

Gijs Bellaard, Daan L. J. Bon, Gautam Pai, Bart M. N. Smets, Remco Duits

(参考訳) グループ同変畳み込みニューラルネットワーク(G-CNN)は幾何学的深層学習に成功している。通常、G-CNNはCNNに対して、ネットワーク内でハードコードされたはずのトレーニング対称性にネットワーク容量を浪費しないという利点がある。最近導入されたPDEベースのG-CNN(PDE-G-CNN)フレームワークはG-CNNを一般化している。 PDE-G-CNNは、それらが同時に持つコアアドバンテージを持つ 1)ネットワークの複雑さを減らす。 2)分類性能の向上、及び 3)幾何学的解釈性を提供する。それらの実装は、主に核との線形および形態的畳み込みからなる。本稿では,前述した近似的形態素核が必ずしも正確な核を正確に近似するとは限らないことを示す。より具体的には、リーマン計量の空間異方性(英語版)に依存するので、準リーマン近似に頼らなければならない。異方性に関係なく動作する新しい近似カーネルを提供することでこの問題を解決する。近似核のより優れた誤差推定を持つ新しい定理を提供し、それらがすべて正確なものと同じ反射対称性を持っていることを証明する。 PDE-G-CNNフレームワークにおける複数の近似カーネルの有効性を2つのデータセットで検証し、新しい近似カーネルによる改善を観察する。我々は、PDE-G-CNNは、G-CNNとCNNの2つのデータセットに比較して、ネットワークの複雑さを著しく低減する。さらに、PDE-G-CNNはG-CNNよりも優れた幾何学的解釈可能性を持つ。

Group equivariant convolutional neural networks (G-CNNs) have been successfully applied in geometric deep learning. Typically, G-CNNs have the advantage over CNNs that they do not waste network capacity on training symmetries that should have been hard-coded in the network. The recently introduced framework of PDE-based G-CNNs (PDE-G-CNNs) generalises G-CNNs. PDE-G-CNNs have the core advantages that they simultaneously 1) reduce network complexity, 2) increase classification performance, and 3) provide geometric interpretability. Their implementations primarily consist of linear and morphological convolutions with kernels. In this paper we show that the previously suggested approximative morphological kernels do not always accurately approximate the exact kernels accurately. More specifically, depending on the spatial anisotropy of the Riemannian metric, we argue that one must resort to sub-Riemannian approximations. We solve this problem by providing a new approximative kernel that works regardless of the anisotropy. We provide new theorems with better error estimates of the approximative kernels, and prove that they all carry the same reflectional symmetries as the exact ones. We test the effectiveness of multiple approximative kernels within the PDE-G-CNN framework on two datasets, and observe an improvement with the new approximative kernels. We report that the PDE-G-CNNs again allow for a considerable reduction of network complexity while having comparable or better performance than G-CNNs and CNNs on the two datasets. Moreover, PDE-G-CNNs have the advantage of better geometric interpretability over G-CNNs, as the morphological kernels are related to association fields from neurogeometry.

翻訳日:2023-04-05 00:48:28 公開日:2023-04-03

# 室内環境におけるポイントゴールナビゲーションのための教師なしビジュアルオドメトリーとアクション統合

Unsupervised Visual Odometry and Action Integration for PointGoal Navigation in Indoor Environment ( http://arxiv.org/abs/2210.00413v2 )

ライセンス: Link先を確認

Yijun Cao, Xianshi Zhang, Fuya Luo, Chuan Lin, and Yongjie Li

(参考訳) 屋内環境におけるポイントゴールナビゲーションは、個人ロボットが特定の地点に向かうための基本的なタスクである。最近の研究は、ノイズのない動作とgpsとコンパスセンサによる完璧な位置決めの仮定の下で、フォトリアリスティックシミュレート環境でほぼ完璧に近い成功率でこのポイントゴーアナビゲーションタスクを解決した。しかし、実際の屋内環境で正確なGPS信号を得るのは難しい。 gps信号無しでポイントゴーアナビゲーション精度を向上させるために,視覚オドメトリ(vo)を用い,教師なしで訓練された新しいアクション統合モジュール(aim)を提案する。教師なしVOは、2つの隣接するフレームの再投射誤差からエージェントの相対的なポーズを計算し、正確なGPS信号を経路積分に置き換える。 VOによって推定される擬似位置は、エージェントが位置に対する内部認識を更新し、ナビゲーションの成功率を向上させるためのアクション統合の訓練に使用される。トレーニングと推論プロセスは、RGB、深さ、衝突、および自己行動情報のみを使用する。実験の結果,提案システムは良好な結果が得られ,Gibsonデータセット上で部分的に教師付き学習アルゴリズムよりも優れていた。

PointGoal navigation in indoor environment is a fundamental task for personal robots to navigate to a specified point. Recent studies solved this PointGoal navigation task with near-perfect success rate in photo-realistically simulated environments, under the assumptions with noiseless actuation and most importantly, perfect localization with GPS and compass sensors. However, accurate GPS signalis difficult to be obtained in real indoor environment. To improve the PointGoal navigation accuracy without GPS signal, we use visual odometry (VO) and propose a novel action integration module (AIM) trained in unsupervised manner. Sepecifically, unsupervised VO computes the relative pose of the agent from the re-projection error of two adjacent frames, and then replaces the accurate GPS signal with the path integration. The pseudo position estimated by VO is used to train action integration which assists agent to update their internal perception of location and helps improve the success rate of navigation. The training and inference process only use RGB, depth, collision as well as self-action information. The experiments show that the proposed system achieves satisfactory results and outperforms the partially supervised learning algorithms on the popular Gibson dataset.

翻訳日:2023-04-05 00:48:02 公開日:2023-04-03

# 10^{-14}$レベルの系統的不確実性を有するテラヘルツ振動分子時計

A terahertz vibrational molecular clock with systematic uncertainty at the $10^{-14}$ level ( http://arxiv.org/abs/2209.10864v4 )

ライセンス: Link先を確認

K. H. Leung, B. Iritani, E. Tiberi, I. Majewska, M. Borkowski, R. Moszynski, T. Zelevinsky

(参考訳) 光学格子中の中性量子吸収体は、精巧な分光分解能を持つ時計を実現するための主要なプラットフォームとして登場した。しかし、これらの時計の研究とその体系的な変化は、これまで原子に限られてきた。ここでは、この構造を二原子分子のアンサンブルに拡張し、純粋な分子振動に基づく正確な格子時計を実験的に実現する。非線形トラップ誘起光シフトのキャラクタリゼーションを含む主要な系統評価を行い,総系統的不確実性は4.6\times10^{-14}$である。振動分割の絶対周波数は31 825 183 207 592.8(5.1) Hzと測定され、分子の解離エネルギーは記録精度で決定される。この結果は分子分光とthz周波数標準の重要なマイルストーンであり、分子量子電気力学や新しい相互作用の探索を含む基礎物理学への応用により、他の中性分子種に一般化される可能性がある。

Neutral quantum absorbers in optical lattices have emerged as a leading platform for achieving clocks with exquisite spectroscopic resolution. However, the studies of these clocks and their systematic shifts have so far been limited to atoms. Here, we extend this architecture to an ensemble of diatomic molecules and experimentally realize an accurate lattice clock based on pure molecular vibration. We evaluate the leading systematics, including the characterization of nonlinear trap-induced light shifts, achieving a total systematic uncertainty of $4.6\times10^{-14}$. The absolute frequency of the vibrational splitting is measured to be 31 825 183 207 592.8(5.1) Hz, enabling the dissociation energy of our molecule to be determined with record accuracy. Our results represent an important milestone in molecular spectroscopy and THz-frequency standards, and may be generalized to other neutral molecular species with applications for fundamental physics, including tests of molecular quantum electrodynamics and the search for new interactions.

翻訳日:2023-04-05 00:47:16 公開日:2023-04-03

# r\'{e}nyiダイバージェンス深層相互学習

R\'{e}nyi Divergence Deep Mutual Learning ( http://arxiv.org/abs/2209.05732v4 )

ライセンス: Link先を確認

Weipeng Huang, Junjie Tao, Changbo Deng, Ming Fan, Wenqiang Wan, Qi Xiong, Guangyuan Piao

(参考訳) 本稿では、単純で効果的な計算パラダイムであるDeep Mutual Learning (DML)を再考する。我々は、より柔軟で調整可能なKL分散の代わりにR\'{e}nyi分散を用いて、バニラDMLを改善することを提案する。この修正により、バニラDMLよりもパフォーマンスを継続的に改善できる。提案したパラダイムの収束特性を理論的に解析し,非凸最適化タスクの最悪の場合において,定常学習率の確率勾配 Descent を $\mathcal{O}(1)$-bias に収束させることを示した。つまり、学習は近くの最適な場所に到達するが、境界の範囲内を探索し続けることで、過度な適合を軽減できる。最後に,dmlとr\'{e}nyiの発散を組み合わせることで,一般化したモデルをさらに改善できることを示す。

This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla DML with limited additional complexity. The convergence properties of the proposed paradigm are analyzed theoretically, and Stochastic Gradient Descent with a constant learning rate is shown to converge with $\mathcal{O}(1)$-bias in the worst case scenario for nonconvex optimization tasks. That is, learning will reach nearby local optima but continue searching within a bounded scope, which may help mitigate overfitting. Finally, our extensive empirical results demonstrate the advantage of combining DML and R\'{e}nyi divergence, which further improves generalized models.

翻訳日:2023-04-05 00:46:37 公開日:2023-04-03

# BioGPT: バイオメディカルテキスト生成とマイニングのための生成事前学習型トランス

BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining ( http://arxiv.org/abs/2210.10341v3 )

ライセンス: Link先を確認

Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu

(参考訳) 事前学習された言語モデルは、一般的な自然言語領域での成功に触発されて、生物医学領域で注目を集めている。一般言語領域における事前訓練された言語モデルの2つの主要分野、すなわちBERT(とその変種)とGPT(およびその変種)のうち、最初のものはBioBERTやPubMedBERTといった生物医学領域で広く研究されている。彼らは様々な差別的な下流のバイオメディカルなタスクで大きな成功を収めてきたが、生成能力の欠如はアプリケーションの範囲を制限している。本稿では,大規模生物医学文献に基づくドメイン固有生成型トランスフォーマー言語モデルであるBioGPTを提案する。バイオGPTを6つのNLPタスクで評価し、我々のモデルが多くのタスクで過去のモデルより優れていることを示す。特に、BC5CDRで44.98%、38.42%、40.76%のF1スコア、KD-DTIとDDIのエンドツーエンド関係抽出タスクで78.2%、PubMedQAで78.2%の精度で新しい記録を作成した。テキスト生成のケーススタディは、バイオメディカル文献におけるバイオGPTの利点をさらに示し、バイオメディカル用語の流動的な記述を生成する。コードはhttps://github.com/microsoft/BioGPTで入手できる。

Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e., BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature. We evaluate BioGPT on six biomedical NLP tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA, creating a new record. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms. Code is available at https://github.com/microsoft/BioGPT.

翻訳日:2023-04-05 00:40:30 公開日:2023-04-03

# 野生におけるカテゴリーレベル6次元物体ポーズ推定のための自己教師あり幾何対応

Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild ( http://arxiv.org/abs/2210.07199v3 )

ライセンス: Link先を確認

Kaifeng Zhang, Yang Fu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang

(参考訳) 6dオブジェクトポーズ推定はコンピュータビジョンとロボティクスに幅広く応用されているが、アノテーションの欠如によって解決されるには程遠い。カテゴリレベルの6dポーズに移行することで、この問題はさらに難しくなります。現在のアプローチは、シミュレーションや人間からの収集からアノテーションを活用することで制限されている。本稿では,カテゴリーレベルの6次元ポーズ推定のために,大規模現実世界のオブジェクトビデオを直接学習する自己教師型学習手法を導入することで,この障壁を克服する。本フレームワークは,対象カテゴリの正準3次元形状を再構成し,入力画像と正準形状との密接な対応を表面埋め込みにより学習する。トレーニングのために,2次元3次元空間,異なるインスタンス,異なる時間ステップにまたがるサイクルを構成する新しい幾何学的サイクル整合性損失を提案する。学習した対応は、6次元ポーズ推定やキーポイント転送などの下流タスクに適用できる。驚いたことに、この手法は人間のアノテーションやシミュレータを使わずに、以前の監視または半監視された画像のメソッドよりも、ほぼあるいはそれ以上の性能を達成できます。私たちのプロジェクトページは以下のとおりです。

While 6D object pose estimation has wide applications across computer vision and robotics, it remains far from being solved due to the lack of annotations. The problem becomes even more challenging when moving to category-level 6D pose, which requires generalization to unseen instances. Current approaches are restricted by leveraging annotations from simulation or collected from humans. In this paper, we overcome this barrier by introducing a self-supervised learning approach trained directly on large-scale real-world object videos for category-level 6D pose estimation in the wild. Our framework reconstructs the canonical 3D shape of an object category and learns dense correspondences between input images and the canonical shape via surface embedding. For training, we propose novel geometrical cycle-consistency losses which construct cycles across 2D-3D spaces, across different instances and different time steps. The learned correspondence can be applied for 6D pose estimation and other downstream tasks such as keypoint transfer. Surprisingly, our method, without any human annotations or simulators, can achieve on-par or even better performance than previous supervised or semi-supervised methods on in-the-wild images. Our project page is: https://kywind.github.io/self-pose .

翻訳日:2023-04-05 00:39:25 公開日:2023-04-03

# 自己誘導拡散モデル

Self-Guided Diffusion Models ( http://arxiv.org/abs/2210.06462v2 )

ライセンス: Link先を確認

Vincent Tao Hu, David W Zhang, Yuki M. Asano, Gertjan J. Burghouts, Cees G. M. Snoek

(参考訳) 拡散モデルは、特に生成過程を制御するためのガイダンスを使用する場合、画像生成品質の顕著な進歩を示した。しかし、指導にはトレーニングのために大量の画像注釈ペアが必要であり、その可用性、正確性、偏りに依存する。本稿では,自己誘導拡散モデルのためのフレームワークの設計に自己超越信号の柔軟性を活用することで,このようなアノテーションの必要性を解消する。特徴抽出関数と自己アノテーション関数を活用することで,全体像のレベルからオブジェクトボックス,さらにはセグメンテーションマスクまで,さまざまな画像粒度のガイダンス信号を提供する。シングルラベルおよびマルチラベル画像データセットを用いた実験により,自己ラベル誘導は,常にガイダンス無しの拡散モデルよりも優れており,特に不均衡データにおいて,接地ラベルに基づくガイダンスを超越する可能性も示された。自己教師付きボックスやマスクプロポーザルを備える場合、クラス、ボックス、セグメントラベルアノテーションを必要とせず、視覚的に多様で意味的に一貫性のある画像を生成する。自己誘導拡散はシンプルで柔軟性があり、大規模展開で利益を期待できる。

Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, guidance requires a large amount of image-annotation pairs for training and is thus dependent on their availability, correctness and unbiasedness. In this paper, we eliminate the need for such annotation by instead leveraging the flexibility of self-supervision signals to design a framework for self-guided diffusion models. By leveraging a feature extraction function and a self-annotation function, our method provides guidance signals at various image granularities: from the level of holistic images to object boxes and even segmentation masks. Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance and may even surpass guidance based on ground-truth labels, especially on unbalanced data. When equipped with self-supervised box or mask proposals, our method further generates visually diverse yet semantically consistent images, without the need for any class, box, or segment label annotation. Self-guided diffusion is simple, flexible and expected to profit from deployment at scale.

翻訳日:2023-04-05 00:39:06 公開日:2023-04-03

# 連続最適化によるプログラム合成

Synthesizing Programs with Continuous Optimization ( http://arxiv.org/abs/2211.00828v2 )

ライセンス: Link先を確認

Shantanu Mandal, Todd A. Anderson, Javier Turek, Justin Gottschlich, Abdullah Muzahid

(参考訳) いくつかの仕様に基づく自動ソフトウェア生成はプログラム合成として知られている。既存の手法の多くは、離散パラメータを持つ探索問題としてプログラム合成を定式化する。本稿では,プログラム合成を連続最適化問題として新たに定式化し,Covariance Matrix Adaptation Evolution Strategyとして知られる最先端の進化的アプローチを用いて解決する。次に,連続定式化を実際のプログラムに変換するマッピングスキームを提案する。我々は、GENESYSと呼ばれるシステムと、近年のプログラム合成技術(離散領域と連続領域の両方)を比較し、GENESYSが既存のスキームよりも固定時間内により多くのプログラムを合成していることを示す。例えば、長さ10のプログラムでは、GENESYSは既存の計画よりも28%多くのプログラムを同時に合成する。

Automatic software generation based on some specification is known as program synthesis. Most existing approaches formulate program synthesis as a search problem with discrete parameters. In this paper, we present a novel formulation of program synthesis as a continuous optimization problem and use a state-of-the-art evolutionary approach, known as Covariance Matrix Adaptation Evolution Strategy to solve the problem. We then propose a mapping scheme to convert the continuous formulation into actual programs. We compare our system, called GENESYS, with several recent program synthesis techniques (in both discrete and continuous domains) and show that GENESYS synthesizes more programs within a fixed time budget than those existing schemes. For example, for programs of length 10, GENESYS synthesizes 28% more programs than those existing schemes within the same time budget.

翻訳日:2023-04-05 00:30:17 公開日:2023-04-03

# 映像生成タスクのための連続表現空間INR-V

INR-V: A Continuous Representation Space for Video-based Generative Tasks ( http://arxiv.org/abs/2210.16579v2 )

ライセンス: Link先を確認

Bipasha Sen, Aditya Agarwal, Vinay P Namboodiri, C. V. Jawahar

(参考訳) ビデオの生成は複雑な作業であり、フレームごとに時間的にコヒーレントな画像を生成する。これにより、ビデオの表現性は、ネットワーク設計を必要とする個々のビデオフレーム上でのみの画像ベースの操作に制限される。本稿では,映像生成タスクの連続的な空間を学習する映像表現ネットワークINR-Vを提案する。 inr-vは、ビデオの各入力画素のrgb値を予測する多層パーセプトロンである暗黙的ニューラルネットワーク(inrs)を使用して、ビデオをパラメータ化する。 INRは、複数のビデオインスタンスの神経表現に基づいてトレーニングされたハイパーネットワークであるメタネットワークを使用して予測される。その後、メタネットワークをサンプル化し、様々な新しいビデオを生成することで、下流のビデオベースの生成タスクを実現できる。興味深いことに、条件付き正規化とプログレッシブウェイト初期化は、INR-Vを得る上で重要な役割を果たす。 INR-Vによって学習された表現空間は、既存の作品では不可能な多くの興味深い性質を示す画像空間よりも表現性が高い。例えば、inr-vは、既知のビデオインスタンス間(中間id、表情、ポーズなど)の中間ビデオをスムーズに補間することができる。また、ビデオの欠落部分を塗りつぶして、一時的にコヒーレントなフルビデオを復元することもできる。本研究では,INR-Vが学習した映像補間,新規映像生成,映像インバージョン,既存のベースラインに対する映像インペインティングなど,多様な生成タスクの空間を評価する。 INR-Vはこれらのいくつかの実証されたタスクのベースラインを著しく上回り、明らかに提案された表現空間の可能性を示している。

Generating videos is a complex task that is accomplished by generating a set of temporally coherent images frame-by-frame. This limits the expressivity of videos to only image-based operations on the individual video frames needing network designs to obtain temporally coherent trajectories in the underlying image space. We propose INR-V, a video representation network that learns a continuous space for video-based generative tasks. INR-V parameterizes videos using implicit neural representations (INRs), a multi-layered perceptron that predicts an RGB value for each input pixel location of the video. The INR is predicted using a meta-network which is a hypernetwork trained on neural representations of multiple video instances. Later, the meta-network can be sampled to generate diverse novel videos enabling many downstream video-based generative tasks. Interestingly, we find that conditional regularization and progressive weight initialization play a crucial role in obtaining INR-V. The representation space learned by INR-V is more expressive than an image space showcasing many interesting properties not possible with the existing works. For instance, INR-V can smoothly interpolate intermediate videos between known video instances (such as intermediate identities, expressions, and poses in face videos). It can also in-paint missing portions in videos to recover temporally coherent full videos. In this work, we evaluate the space learned by INR-V on diverse generative tasks such as video interpolation, novel video generation, video inversion, and video inpainting against the existing baselines. INR-V significantly outperforms the baselines on several of these demonstrated tasks, clearly showcasing the potential of the proposed representation space.

翻訳日:2023-04-05 00:30:05 公開日:2023-04-03

# グラフト視覚変換器

Grafting Vision Transformers ( http://arxiv.org/abs/2210.15943v2 )

ライセンス: Link先を確認

Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo

(参考訳) ビジョントランスフォーマー(ViT)は近年、多くのコンピュータビジョンタスクにおける最先端技術となっている。畳み込みネットワーク(CNN)とは対照的に、ViTはネットワークの浅い層、すなわち高解像度の機能でもグローバルな情報共有を可能にする。しかし、後にスウィントランス(swin transformer)のようなピラミッドアーキテクチャが成功し、パフォーマンスと複雑さのトレードオフが向上した。本稿では,ネットワーク全体のグローバル依存性とマルチスケール情報を考慮した簡易かつ効率的なアドオンコンポーネント(グラフト)を提案する。任意の深さで分岐する柔軟性があり、バックボーンのパラメータと計算のほとんどを共有する。 GrafTは、ハイブリッドトランスフォーマー型と純粋なトランスフォーマー型の両方、均質構造とピラミッド構造の両方、そして様々な自己注意法を含む、よく知られたモデルに対して一貫した利得を示す。特に、ハイレベルなセマンティクスを提供することで、モバイルサイズのモデルに大きく貢献する。 ImageNet-1kデータセットでは、DeiT-T、Swin-T、MobileViT-XXSに+3.9%、+1.4%、+1.9%の精度改善が提供されている。私たちのコードとモデルは利用可能になります。

Vision Transformers (ViTs) have recently become the state-of-the-art across many computer vision tasks. In contrast to convolutional networks (CNNs), ViTs enable global information sharing even within shallow layers of a network, i.e., among high-resolution features. However, this perk was later overlooked with the success of pyramid architectures such as Swin Transformer, which show better performance-complexity trade-offs. In this paper, we present a simple and efficient add-on component (termed GrafT) that considers global dependencies and multi-scale information throughout the network, in both high- and low-resolution features alike. It has the flexibility of branching out at arbitrary depths and shares most of the parameters and computations of the backbone. GrafT shows consistent gains over various well-known models which includes both hybrid and pure Transformer types, both homogeneous and pyramid structures, and various self-attention methods. In particular, it largely benefits mobile-size models by providing high-level semantics. On the ImageNet-1k dataset, GrafT delivers +3.9%, +1.4%, and +1.9% top-1 accuracy improvement to DeiT-T, Swin-T, and MobileViT-XXS, respectively. Our code and models will be made available.

翻訳日:2023-04-05 00:29:36 公開日:2023-04-03

# 帰納的行動推論

Abductive Action Inference ( http://arxiv.org/abs/2210.13984v2 )

ライセンス: Link先を確認

Clement Tan, Chai Kiat Yeo, Cheston Tan, Basura Fernando

(参考訳) 帰納的推論(abductive reasoning)は、与えられた不完全な観測集合の最も可能性の高い推論を行うことを目的としている。本研究では,「現在の状態に着くためには,どのような行動が人間によって実行されたのか?」という疑問に答える,帰納的行動推論(abductive action inference)という新しいタスクを提案する。状態が与えられた場合,行動セット予測,行動シーケンス予測,帰納的行動検証という3つの帰納的推論問題を調査する。我々は、Transformer、Graph Neural Network、CLIP、BLIP、エンドツーエンドトレーニングされたSlow-Fast、Resnet50-3Dモデルなど、いくつかのSOTAモデルをベンチマークする。今回提案するobject-relational bigedモデルは,アクションゲノムデータセットにおけるこの困難なタスクにおいて,他のすべての手法を上回っている。コードは利用可能になる。

Abductive reasoning aims to make the most likely inference for a given set of incomplete observations. In this work, we propose a new task called abductive action inference, in which given a situation, the model answers the question `what actions were executed by the human in order to arrive in the current state?'. Given a state, we investigate three abductive inference problems: action set prediction, action sequence prediction, and abductive action verification. We benchmark several SOTA models such as Transformers, Graph neural networks, CLIP, BLIP, end-to-end trained Slow-Fast, and Resnet50-3D models. Our newly proposed object-relational BiGED model outperforms all other methods on this challenging task on the Action Genome dataset. Codes will be made available.

翻訳日:2023-04-05 00:28:53 公開日:2023-04-03

# I$^2$-GNNを用いたグラフニューラルネットワークのサイクルカウントパワー向上

Boosting the Cycle Counting Power of Graph Neural Networks with I$^2$-GNNs ( http://arxiv.org/abs/2210.13978v2 )

ライセンス: Link先を確認

Yinan Huang, Xingang Peng, Jianzhu Ma, Muhan Zhang

(参考訳) メッセージパッシングニューラルネットワーク(英: Message Passing Neural Networks、MPNN)は、グラフニューラルネットワーク(GNN)の一種。 MPNNの限られた表現力は、証明可能な強力なGNNアーキテクチャの研究を刺激する。しかし、あるモデルを知ることは、あるモデルが表現できる機能やできない機能についての洞察をほとんど与えない。これらのモデルが、生物学、化学、社会ネットワーク分析の応用に不可欠な、特定のグラフ部分構造を数えるといった特定の関数を近似できるかどうかはまだ不明である。そこで本研究では,各ノードのルート付きサブグラフを抽出し,ルートノードにユニークな識別子を割り当て,ルートノードの表現をそのルート付きサブグラフ内にエンコードする,GNNモデルの最近の人気クラスであるSubgraph MPNNのカウント能力について検討する。具体的には、サブグラフmpnnがノードレベルで4サイクル以上を数えることができないことを証明し、ノード表現が4原子以上の環系のような周囲の部分構造を正しくエンコードできないことを示唆する。この制限を克服するため、各サブグラフ内のルートノードとその隣人に異なる識別子を割り当てることで、サブグラフMPNNを拡張するためのI$^2$-GNNを提案する。 I$^2$-GNNsの識別力は、サブグラフMPNNよりも強く、3WLテストより部分的に強いことが示されている。さらに重要なことは、I$^2$-GNNは3, 4, 5, 6サイクル全てを数えることができ、有機化学におけるベンゼン環のような一般的なサブ構造をカバーし、線形複雑性を維持している。我々の知る限りでは、理論的な保証とともに6サイクルを数えられる最初の線形時間GNNモデルである。サイクルカウントタスクにおけるカウント能力を検証するとともに,分子予測ベンチマークにおける競合性能を示す。

Message Passing Neural Networks (MPNNs) are a widely used class of Graph Neural Networks (GNNs). The limited representational power of MPNNs inspires the study of provably powerful GNN architectures. However, knowing one model is more powerful than another gives little insight about what functions they can or cannot express. It is still unclear whether these models are able to approximate specific functions such as counting certain graph substructures, which is essential for applications in biology, chemistry and social network analysis. Motivated by this, we propose to study the counting power of Subgraph MPNNs, a recent and popular class of powerful GNN models that extract rooted subgraphs for each node, assign the root node a unique identifier and encode the root node's representation within its rooted subgraph. Specifically, we prove that Subgraph MPNNs fail to count more-than-4-cycles at node level, implying that node representations cannot correctly encode the surrounding substructures like ring systems with more than four atoms. To overcome this limitation, we propose I$^2$-GNNs to extend Subgraph MPNNs by assigning different identifiers for the root node and its neighbors in each subgraph. I$^2$-GNNs' discriminative power is shown to be strictly stronger than Subgraph MPNNs and partially stronger than the 3-WL test. More importantly, I$^2$-GNNs are proven capable of counting all 3, 4, 5 and 6-cycles, covering common substructures like benzene rings in organic chemistry, while still keeping linear complexity. To the best of our knowledge, it is the first linear-time GNN model that can count 6-cycles with theoretical guarantees. We validate its counting power in cycle counting tasks and demonstrate its competitive performance in molecular prediction benchmarks.

翻訳日:2023-04-05 00:28:40 公開日:2023-04-03

# 神経理論とは? 大規模LMにおける社会知能の限界について

Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs ( http://arxiv.org/abs/2210.13312v2 )

ライセンス: Link先を確認

Maarten Sap, Ronan LeBras, Daniel Fried, Yejin Choi

(参考訳) 社会的インテリジェンスと心の理論(ToM)、すなわち、関係するすべての人々の異なる精神状態、意図、反応を推論する能力によって、人間は日々の社会的相互作用を効果的にナビゲートし理解することができる。 NLPシステムはますます複雑な社会状況において使用されるため、社会的ダイナミクスを理解する能力は重要である。本研究では,現代NLPシステムにおける社会的知能と心の理論のオープンな問題について,実証的・理論的観点から検討する。現在の最大の言語モデル(gpt-3, brown et al., 2020)の1つには,2つのタスク - socialiqa (sap et al., 2019) という,モデルが社会的インタラクションの参加者の意図や反応を理解する能力を測定するもの - と,モデルがメンタル状態や参加者の現実を推測できるかどうかを測定する tomi (le et al., 2019) がある。以上の結果から,socialiqa と tomi はそれぞれ 55% と 60% の well-below-human accuracies である。結論として,データやニューラルネットワーク,トレーニングパラダイムに起因する制限を調べることで,大規模言語モデルの欠点を文脈化するために,実用学からの理論を導出する。スケールしか必要としない一般的な物語に従えば、人中心のNLPアプローチがマインドの神経理論に対してより効果的である可能性が示唆される。更新版では、ニューラルToMのための新しい命令チューニングとRLFHモデルも分析した。その結果,ChatGPT や GPT-4 でさえ創発的心の理論を示さず,GPT-4 でさえ精神状態や現実に関する ToMi の質問に対して 60% の精度しか達成していないことがわかった。

Social intelligence and Theory of Mind (ToM), i.e., the ability to reason about the different mental states, intents, and reactions of all people involved, allow humans to effectively navigate and understand everyday social interactions. As NLP systems are used in increasingly complex social situations, their ability to grasp social dynamics becomes crucial. In this work, we examine the open question of social intelligence and Theory of Mind in modern NLP systems from an empirical and theory-based perspective. We show that one of today's largest language models (GPT-3; Brown et al., 2020) lacks this kind of social intelligence out-of-the box, using two tasks: SocialIQa (Sap et al., 2019), which measures models' ability to understand intents and reactions of participants of social interactions, and ToMi (Le et al., 2019), which measures whether models can infer mental states and realities of participants of situations. Our results show that models struggle substantially at these Theory of Mind tasks, with well-below-human accuracies of 55% and 60% on SocialIQa and ToMi, respectively. To conclude, we draw on theories from pragmatics to contextualize this shortcoming of large language models, by examining the limitations stemming from their data, neural architecture, and training paradigms. Challenging the prevalent narrative that only scale is needed, we posit that person-centric NLP approaches might be more effective towards neural Theory of Mind. In our updated version, we also analyze newer instruction tuned and RLFH models for neural ToM. We find that even ChatGPT and GPT-4 do not display emergent Theory of Mind; strikingly even GPT-4 performs only 60% accuracy on the ToMi questions related to mental states and realities.

翻訳日:2023-04-05 00:28:09 公開日:2023-04-03

# aiMotive Dataset:長距離知覚を用いたロバスト自動運転のためのマルチモーダルデータセット

aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving with Long-Range Perception ( http://arxiv.org/abs/2211.09445v2 )

ライセンス: Link先を確認

Tam\'as Matuszka, Iv\'an Barton, \'Ad\'am Butykai, P\'eter Hajas, D\'avid Kiss, Domonkos Kov\'acs, S\'andor Kuns\'agi-M\'at\'e, P\'eter Lengyel, G\'abor N\'emeth, Levente Pet\H{o}, Dezs\H{o} Ribli, D\'avid Szeghy, Szabolcs Vajna, B\'alint Varga

(参考訳) 自動運転はコンピュータビジョン研究コミュニティで人気のある研究分野である。自動運転車は安全性が極めて重要であるため、現実の展開には堅牢性を保証することが不可欠である。いくつかの公共のマルチモーダルデータセットはアクセス可能であるが、主に悪天候に適さない2つのセンサーモード(カメラ、LiDAR)で構成されている。さらに、長距離アノテーションが欠如しているため、自動運転車の高速道路アシスタント機能の基盤となるニューラルネットワークのトレーニングが困難になる。そこで本稿では,長距離認識による頑健な自律運転のためのマルチモーダルデータセットを提案する。データセットは176のシーンで構成され、同期して校正されたLiDAR、カメラ、レーダーセンサーが360度視野をカバーする。収集したデータは、昼間、夜間、雨季に高速道路、都市、郊外で撮影され、フレーム間に一貫した識別子を持つ3D境界ボックスで注釈付けされている。さらに,3次元物体検出のためのユニモーダルベースラインモデルとマルチモーダルベースラインモデルを訓練した。データは \url{https://github.com/aimotive/aimotive_dataset} で入手できる。

Autonomous driving is a popular research area within the computer vision research community. Since autonomous vehicles are highly safety-critical, ensuring robustness is essential for real-world deployment. While several public multimodal datasets are accessible, they mainly comprise two sensor modalities (camera, LiDAR) which are not well suited for adverse weather. In addition, they lack far-range annotations, making it harder to train neural networks that are the base of a highway assistant function of an autonomous vehicle. Therefore, we introduce a multimodal dataset for robust autonomous driving with long-range perception. The dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view. The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain and is annotated with 3D bounding boxes with consistent identifiers across frames. Furthermore, we trained unimodal and multimodal baseline models for 3D object detection. Data are available at \url{https://github.com/aimotive/aimotive_dataset}.

翻訳日:2023-04-05 00:21:25 公開日:2023-04-03

# マルチエージェント強化学習のための説明可能な行動助言

Explainable Action Advising for Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2211.07882v2 )

ライセンス: Link先を確認

Yue Guo, Joseph Campbell, Simon Stepputtis, Ruiyu Li, Dana Hughes, Fei Fang, Katia Sycara

(参考訳) 行動アドバイスは教師-学生パラダイムに基づく強化学習のための知識伝達技術である。専門教師は、学生のサンプル効率と政策性能を改善するために、訓練中に生徒にアドバイスを提供する。このようなアドバイスは一般に状態-作用対の形で与えられる。しかし、学生が新たな国家を論じて適用することは困難である。本稿では,教師が行動アドバイスを提示する説明可能な行動助言と,行動が選択された理由を示す説明を紹介する。これにより、生徒は学習したものを自己反映することができ、アドバイスの一般化が可能になり、教師が最適でない環境でもサンプルの効率と学習性能が向上する。我々は,単一エージェントシナリオと複数エージェントシナリオの両方において,我々のフレームワークが有効であることを実証的に示す。

Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm. An expert teacher provides advice to a student during training in order to improve the student's sample efficiency and policy performance. Such advice is commonly given in the form of state-action pairs. However, it makes it difficult for the student to reason with and apply to novel states. We introduce Explainable Action Advising, in which the teacher provides action advice as well as associated explanations indicating why the action was chosen. This allows the student to self-reflect on what it has learned, enabling advice generalization and leading to improved sample efficiency and learning performance - even in environments where the teacher is sub-optimal. We empirically show that our framework is effective in both single-agent and multi-agent scenarios, yielding improved policy returns and convergence rates when compared to state-of-the-art methods

翻訳日:2023-04-05 00:20:30 公開日:2023-04-03

# ラベルノイズがフェデレーション学習に及ぼす影響の定量化

Quantifying the Impact of Label Noise on Federated Learning ( http://arxiv.org/abs/2211.07816v7 )

ライセンス: Link先を確認

Shuqi Ke, Chao Huang, Xin Liu

(参考訳) Federated Learning(FL)は、クライアントがローカル(ヒューマン生成)データセットを使用してモデルを協調的にトレーニングする分散機械学習パラダイムである。既存の研究では、クライアント間のデータ不均一性に取り組むためのFLアルゴリズムの開発に焦点が当てられているが、FLにおけるデータ品質(ラベルノイズなど)の重要な問題は見過ごされている。本稿では,FLにおけるラベルノイズの影響を定量的に検討することにより,このギャップを埋めることを目的とする。クライアントのラベルノイズレベルにおいて線形な一般化誤差の上限を導出する。次に,様々なFLアルゴリズムを用いて,MNISTとCIFAR-10データセットの実験を行った。実験の結果,ノイズレベルが増加すると,大域モデル精度は線形に減少し,理論解析と一致することがわかった。さらに,ラベルノイズがflトレーニングの収束を遅くし,ノイズレベルが高い場合にはグローバルモデルが過剰に適合する傾向がみられた。

Federated Learning (FL) is a distributed machine learning paradigm where clients collaboratively train a model using their local (human-generated) datasets. While existing studies focus on FL algorithm development to tackle data heterogeneity across clients, the important issue of data quality (e.g., label noise) in FL is overlooked. This paper aims to fill this gap by providing a quantitative study on the impact of label noise on FL. We derive an upper bound for the generalization error that is linear in the clients' label noise level. Then we conduct experiments on MNIST and CIFAR-10 datasets using various FL algorithms. Our empirical results show that the global model accuracy linearly decreases as the noise level increases, which is consistent with our theoretical analysis. We further find that label noise slows down the convergence of FL training, and the global model tends to overfit when the noise level is high.

翻訳日:2023-04-05 00:20:15 公開日:2023-04-03

# 単一ステップrydbergブロックゲートによる全光量子情報処理

All optical quantum information processing via a single step Rydberg blockade gate ( http://arxiv.org/abs/2211.06998v3 )

ライセンス: Link先を確認

Mohammadsadegh Khazali

(参考訳) 量子インターネットの実現における重要な要素の1つは決定論的2光子ゲートである。この$CZ$フォトニックゲートは、全光学量子情報処理のためのユニバーサルゲートのセットも完成する。本稿では、非リドバーグ電磁誘導透過(eit)を用いた原子アンサンブルに制御光子とターゲット光子の両方を格納し、グローバルレーザーを用いた高速単段リドバーグ励起を行い、cz$フォトニックゲートを実現する手法について述べる。提案方式は、ライドバーグ励起に用いられる2つのレーザーの相対強度変調によって動作する。従来の$\pi$-gap-$\pi$スキームを回避して、提案手法では環境ノイズからライドバーグ原子を連続的にレーザーで保護する。閉塞半径内の貯蔵光子の完全な空間的重なりは、光学的深さを最適化し、実験を単純化する。ここでのコヒーレント操作は、以前のRydberg EITスキームで散逸した領域で行われる。主な不完全性源,すなわちRydbergの自然放出と中間レベル,集団回転誤差,遷移線のドップラー拡大,保存・検索効率,原子熱運動誘起デコヒーレンスを考慮し,現実的な実験パラメータ 99.7 % の忠実性は達成可能であると結論づける。

One of the critical elements in the realization of the quantum internet are deterministic two-photon gates. This $CZ$ photonic gate also completes a set of universal gates for all-optical quantum information processing. This article discusses an approach to realize high fidelity $CZ$ photonic gate by storing both control and target photons within an atomic ensemble using non-Rydberg electromagnetically induced transparency (EIT) followed by a fast, single-step Rydberg excitation with global lasers. The proposed scheme operates by relative intensity modulation of two lasers used in Rydberg excitation. Circumventing the conventional $\pi$-gap-$\pi$ schemes, the proposed operation features continuous laser protection of the Rydberg atoms from the environment noise. The complete spatial overlap of stored photons inside the blockade radius optimizes the optical depth and simplifies the experiment. The coherent operation here is performed in the region that was dissipative in the previous Rydberg EIT schemes. Encountering the main imperfection sources, i.e. the spontaneous emission of the Rydberg and intermediate levels, population rotation errors, Doppler broadening of the transition lines, storage/retrieval efficiency, and atomic thermal motion induced decoherence, this article concludes that with realistic experimental parameters 99.7\% fidelity is achievable.

翻訳日:2023-04-05 00:19:59 公開日:2023-04-03

# ニューラルネットワーク表現の人間のアライメント

Human alignment of neural network representations ( http://arxiv.org/abs/2211.01201v4 )

ライセンス: Link先を確認

Lukas Muttenthaler, Jonas Dippel, Lorenz Linhardt, Robert A. Vandermeulen, Simon Kornblith

(参考訳) 今日のコンピュータビジョンモデルは、多種多様なビジョンタスクで人間またはほぼ人間レベルのパフォーマンスを達成する。しかし、彼らのアーキテクチャ、データ、学習アルゴリズムは、人間のビジョンを生み出すものとは様々な点で異なる。本稿では,ニューラルネットワークが学習した表現と行動応答から推定される人間の心的表現のアライメントに影響を与える要因について検討する。モデルスケールとアーキテクチャは基本的に人間の行動応答に影響を及ぼさないが、トレーニングデータセットと客観的関数はどちらもはるかに大きな影響を与える。これらの結果は、2つの異なるタスクを用いて収集された3つの人間類似性判定データセット間で一致している。 1つのデータセットからの行動応答から学習したニューラルネットワーク表現の線形変換は、他の2つのデータセット上の人間の類似性判定とのアライメントを大幅に改善する。さらに, 食物や動物などの人間の概念はニューラルネットワークによってよく表現されているのに対し, ロイヤルやスポーツ関連の物体はそうではない。全体として、より大きく多様なデータセットでトレーニングされたモデルは、ImageNetだけでトレーニングされたモデルよりも人間との整合性が向上するが、我々の結果は、スケーリング単独では、人間が使用するモデルと一致する概念的な表現でニューラルネットワークをトレーニングするのに十分ではないことを示唆している。

Today's computer vision models achieve human or near-human level performance across a wide variety of vision tasks. However, their architectures, data, and learning algorithms differ in numerous ways from those that give rise to human vision. In this paper, we investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations inferred from behavioral responses. We find that model scale and architecture have essentially no effect on the alignment with human behavioral responses, whereas the training dataset and objective function both have a much larger impact. These findings are consistent across three datasets of human similarity judgments collected using two different tasks. Linear transformations of neural network representations learned from behavioral responses from one dataset substantially improve alignment with human similarity judgments on the other two datasets. In addition, we find that some human concepts such as food and animals are well-represented by neural networks whereas others such as royal or sports-related objects are not. Overall, although models trained on larger, more diverse datasets achieve better alignment with humans than models trained on ImageNet alone, our results indicate that scaling alone is unlikely to be sufficient to train neural networks with conceptual representations that match those used by humans.

翻訳日:2023-04-05 00:18:26 公開日:2023-04-03

# 画像復調のための適応動的フィルタリングネットワーク

Adaptive Dynamic Filtering Network for Image Denoising ( http://arxiv.org/abs/2211.12051v3 )

ライセンス: Link先を確認

Hao Shen, Zhong-Qiu Zhao, Wandi Zhang

(参考訳) 画像デノーミングネットワークでは、機能スケーリングは受動的フィールドサイズを拡大し、計算コストを削減するために広く利用されている。しかし、この慣行は高周波情報の損失を招き、大規模な特性を考慮できない。近年、動的畳み込みは高周波情報(エッジ、コーナー、テクスチャなど)の処理において強力な能力を発揮しているが、従来の作品はフィルタ生成における十分な空間的コンテクスト情報を欠いている。これらの問題を緩和するため,我々は動的畳み込みを用いて高周波・マルチスケール特徴の学習を改善することを提案する。具体的には,動的畳み込みを改善するために空間的に拡張されたカーネル生成(sekg)モジュールを設計し,計算量が非常に少ない空間的コンテキスト情報の学習を可能にした。 SEKG モジュールをベースとして,動的畳み込みブロック (DCB) とマルチスケール動的畳み込みブロック (MDCB) を提案する。前者は動的畳み込みにより高周波情報を強化し、スキップ接続を介して低周波情報を保存する。後者は、共有適応動的カーネルと拡張畳み込みの概念を利用して、効率的なマルチスケール特徴抽出を実現する。提案するマルチディメンジョン機能統合(MFI)機構は,マルチスケール機能をさらに融合させ,正確かつコンテキストに富んだ特徴表現を提供する。最後に,adfnet と呼ばれる dcb と mdcb を用いた効率的な分別ネットワークを構築する。実世界および合成ガウスノイズデータセットにおける計算複雑性の低い性能を実現する。ソースコードはhttps://github.com/it-hao/ADFNetで入手できる。

In image denoising networks, feature scaling is widely used to enlarge the receptive field size and reduce computational costs. This practice, however, also leads to the loss of high-frequency information and fails to consider within-scale characteristics. Recently, dynamic convolution has exhibited powerful capabilities in processing high-frequency information (e.g., edges, corners, textures), but previous works lack sufficient spatial contextual information in filter generation. To alleviate these issues, we propose to employ dynamic convolution to improve the learning of high-frequency and multi-scale features. Specifically, we design a spatially enhanced kernel generation (SEKG) module to improve dynamic convolution, enabling the learning of spatial context information with a very low computational complexity. Based on the SEKG module, we propose a dynamic convolution block (DCB) and a multi-scale dynamic convolution block (MDCB). The former enhances the high-frequency information via dynamic convolution and preserves low-frequency information via skip connections. The latter utilizes shared adaptive dynamic kernels and the idea of dilated convolution to achieve efficient multi-scale feature extraction. The proposed multi-dimension feature integration (MFI) mechanism further fuses the multi-scale features, providing precise and contextually enriched feature representations. Finally, we build an efficient denoising network with the proposed DCB and MDCB, named ADFNet. It achieves better performance with low computational complexity on real-world and synthetic Gaussian noisy datasets. The source code is available at https://github.com/it-hao/ADFNet.

翻訳日:2023-04-05 00:12:30 公開日:2023-04-03

# ESLAM:符号付き距離場のハイブリッド表現に基づく高効率高密度SLAMシステム

ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields ( http://arxiv.org/abs/2211.11704v2 )

ライセンス: Link先を確認

Mohammad Mahdi Johari, Camilla Carta, Fran\c{c}ois Fleuret

(参考訳) 同時局所化マッピング(SLAM)のための効率的な暗黙的ニューラル表現法である ESLAM を提案する。 ESLAMは、未知のカメラポーズでRGB-Dフレームを読み出し、シーン内の現在のカメラ位置を推定しながらシーン表現を漸進的に再構築する。ニューラルラジアンス場(NeRF)の最新の進歩をSLAMシステムに組み込んだ結果,高効率かつ高精度なビジュアルSLAM法が実現した。シーン表現は、連続空間の各点に対して、補間された特徴をTrncated Signed Distance Field (TSDF) と RGB の値にデコードする多重スケールの軸整列垂直特徴平面と浅いデコーダから構成される。 Replica、ScanNet、TUM RGB-Dの3つの標準データセットに対する広範な実験により、ESLAMは最先端の高密度視覚SLAM法の精度を50%以上向上する一方で、最大10倍高速で、事前トレーニングを必要としないことが示された。

We present ESLAM, an efficient implicit neural representation method for Simultaneous Localization and Mapping (SLAM). ESLAM reads RGB-D frames with unknown camera poses in a sequential manner and incrementally reconstructs the scene representation while estimating the current camera position in the scene. We incorporate the latest advances in Neural Radiance Fields (NeRF) into a SLAM system, resulting in an efficient and accurate dense visual SLAM method. Our scene representation consists of multi-scale axis-aligned perpendicular feature planes and shallow decoders that, for each point in the continuous space, decode the interpolated features into Truncated Signed Distance Field (TSDF) and RGB values. Our extensive experiments on three standard datasets, Replica, ScanNet, and TUM RGB-D show that ESLAM improves the accuracy of 3D reconstruction and camera localization of state-of-the-art dense visual SLAM methods by more than 50%, while it runs up to 10 times faster and does not require any pre-training.

翻訳日:2023-04-05 00:12:01 公開日:2023-04-03

# グローバル最適2D-3次元形状マッチングのための共役製品グラフ

Conjugate Product Graphs for Globally Optimal 2D-3D Shape Matching ( http://arxiv.org/abs/2211.11589v2 )

ライセンス: Link先を確認

Paul Roetzer and Zorah L\"ahner and Florian Bernard

(参考訳) 2次元輪郭と3次元メッシュの連続的および非厳密なマッチングを求める問題を考察する。このような問題は、両方の形状の間の積グラフの最も短い経路を見つけることによって大域的最適性に解決できるが、既存の解は縮退した解を避けるために非現実的な事前仮定に強く依存している(例えば、2次元輪郭の各点が一致する3次元形状の領域の知識)。そこで本稿では,2次元輪郭と3次元形状の共役積グラフに基づく新しい2d-3次元形状マッチング形式を提案する。そうすることで、シングルエッジで定義されたコストとは対照的に、初めて高次のコスト、すなわちエッジチェーンで定義されるコストを考えることができます。これによって柔軟性が大幅に向上し、先に局所的な剛性を取り込むことができます。これにより, 1次元特徴記述子のみを用いても, 効率よく退化解を回避し, より滑らかで現実的なマッチングが得られる。提案手法は, グローバルに最適かつ連続的な2D-3Dマッチングを行い, 従来の手法と同じ漸近的複雑性を持ち, 形状マッチングの最先端結果を生成し, 部分形状のマッチングも可能である。私たちのコードは公開されている(https://github.com/paul0noah/sm-2d3d)。

We consider the problem of finding a continuous and non-rigid matching between a 2D contour and a 3D mesh. While such problems can be solved to global optimality by finding a shortest path in the product graph between both shapes, existing solutions heavily rely on unrealistic prior assumptions to avoid degenerate solutions (e.g. knowledge to which region of the 3D shape each point of the 2D contour is matched). To address this, we propose a novel 2D-3D shape matching formalism based on the conjugate product graph between the 2D contour and the 3D shape. Doing so allows us for the first time to consider higher-order costs, i.e. defined for edge chains, as opposed to costs defined for single edges. This offers substantially more flexibility, which we utilise to incorporate a local rigidity prior. By doing so, we effectively circumvent degenerate solutions and thereby obtain smoother and more realistic matchings, even when using only a one-dimensional feature descriptor. Overall, our method finds globally optimal and continuous 2D-3D matchings, has the same asymptotic complexity as previous solutions, produces state-of-the-art results for shape matching and is even capable of matching partial shapes. Our code is publicly available (https://github.com/paul0noah/sm-2D3D).

翻訳日:2023-04-05 00:11:39 公開日:2023-04-03

# RobustLoc:運転環境におけるロバストカメラポッドの回帰

RobustLoc: Robust Camera Pose Regression in Challenging Driving Environments ( http://arxiv.org/abs/2211.11238v3 )

ライセンス: Link先を確認

Sijie Wang, Qiyu Kang, Rui She, Wee Peng Tay, Andreas Hartmannsgruber, Diego Navarro Navarro

(参考訳) カメラのリローカライゼーションは自動運転に様々な応用がある。従来のカメラポーズ回帰モデルは、環境摂動がほとんどない理想的なシナリオのみを考える。季節, 天気, 照明, 不安定な物体の存在に変化をもたらす可能性のある運転環境に対処するため, ニューラル微分方程式からの摂動に対する頑健さを導出するRobostLocを提案する。本モデルでは,多視点画像から特徴地図を抽出する畳み込みニューラルネットワーク,インタラクティブに情報を拡散するロバストなニューラルネットワーク方程式拡散ブロックモジュール,多層トレーニングによる分岐ポーズデコーダを用いて車両のポーズ推定を行う。実験により、ロバストロックは現在の最先端カメラの回帰モデルを超え、様々な環境で堅牢な性能を達成することが示された。私たちのコードは、https://github.com/sijieaaa/RobustLocでリリースされています。

Camera relocalization has various applications in autonomous driving. Previous camera pose regression models consider only ideal scenarios where there is little environmental perturbation. To deal with challenging driving environments that may have changing seasons, weather, illumination, and the presence of unstable objects, we propose RobustLoc, which derives its robustness against perturbations from neural differential equations. Our model uses a convolutional neural network to extract feature maps from multi-view images, a robust neural differential equation diffusion block module to diffuse information interactively, and a branched pose decoder with multi-layer training to estimate the vehicle poses. Experiments demonstrate that RobustLoc surpasses current state-of-the-art camera pose regression models and achieves robust performance in various environments. Our code is released at: https://github.com/sijieaaa/RobustLoc

翻訳日:2023-04-05 00:11:13 公開日:2023-04-03

# 複数のイグジットが必要:Unified Vision Language Modelの高速化のための動的早期イグジット

You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model ( http://arxiv.org/abs/2211.11152v2 )

ライセンス: Link先を確認

Shengkun Tang, Yaqing Wang, Zhenglun Kong, Tianchi Zhang, Yao Li, Caiwen Ding, Yanzhi Wang, Yi Liang, Dongkuan Xu

(参考訳) 大規模なトランスフォーマーモデルは、統一アーキテクチャによるダウンストリームビジョン言語タスクに大幅な改善をもたらす。性能改善はモデルサイズが向上し、推論速度が遅くなり、厳格化のコストが増大する。ある種の予測は大規模モデルの完全な複雑さから恩恵を受けるが、全ての入力が実行するのに同じ量の計算を必要とするわけではない。この課題に対処するために、入力複雑性の観点から計算パワーを適応的に割り当て、推論効率を向上させる早期退避を提案する。既存のアーリーエグジット戦略は、通常、中間層に基づく出力信頼度を入力複雑性のプロキシとして採用し、次の層をスキップするという決定を導き出す。しかし、エンコーダの出力信頼度推定が困難であるため、エンコーダとデコーダの両方で広く使われている統一アーキテクチャでは、このような戦略は適用できない。エンコーダコンポーネントの早期終了を無視する計算能力を省くという点では最適ではない。この課題に対処するために,エンコーダとデコーダの層を動的にスキップし,複数回の早期退避時間,すなわちtextbf{MuE} の入力層ワイド類似性を同時に行う,統一視覚言語モデルのための新しい早期退避戦略を提案する。エンコーダのイメージとテキストのモダリティを分解することで、muleは柔軟性があり、モダリティの観点から異なるレイヤをスキップでき、性能低下を最小限に抑えながら推論効率を向上できる。 SNLI-VEとMS COCOデータセットを用いた実験では,提案手法により予測推論時間を最大50\%,40\%まで短縮でき,それぞれ99\%,96\%の性能を維持した。

Large-scale Transformer models bring significant improvements for various downstream vision language tasks with a unified architecture. The performance improvements come with increasing model size, resulting in slow inference speed and increased cost for severing. While some certain predictions benefit from the full complexity of the large-scale model, not all of inputs need the same amount of computation to conduct, potentially leading to computation resource waste. To handle this challenge, early exiting is proposed to adaptively allocate computational power in term of input complexity to improve inference efficiency. The existing early exiting strategies usually adopt output confidence based on intermediate layers as a proxy of input complexity to incur the decision of skipping following layers. However, such strategies cannot apply to encoder in the widely-used unified architecture with both encoder and decoder due to difficulty of output confidence estimation in the encoder. It is suboptimal in term of saving computation power to ignore the early exiting in encoder component. To handle this challenge, we propose a novel early exiting strategy for unified visual language models, which allows dynamically skip the layers in encoder and decoder simultaneously in term of input layer-wise similarities with multiple times of early exiting, namely \textbf{MuE}. By decomposing the image and text modalities in the encoder, MuE is flexible and can skip different layers in term of modalities, advancing the inference efficiency while minimizing performance drop. Experiments on the SNLI-VE and MS COCO datasets show that the proposed approach MuE can reduce expected inference time by up to 50\% and 40\% while maintaining 99\% and 96\% performance respectively.

翻訳日:2023-04-05 00:10:32 公開日:2023-04-03

# Castling-ViT: 視覚変換器推論における線形角アテンションへの切り替えによる自己注意の圧縮

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference ( http://arxiv.org/abs/2211.10526v2 )

ライセンス: Link先を確認

Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Lin

(参考訳) 視覚変換器(ViT)は優れた性能を示しているが、畳み込みニューラルネットワーク(CNN)と比較して高い計算コストを必要とする。既存の効率的なViTは局所的な注意(Swinなど)や線形的な注意(Performerなど)を採用しており、これはViTがグローバルまたはローカルなコンテキストをキャプチャする能力を犠牲にする。この研究において、vitsは、推論中により効率的でありながら、グローバルコンテキストとローカルコンテキストの両方を学ぶことができるか? そこで本稿では,VT を線形角注意とマスク付きソフトマックス2次注意の両方を用いて訓練する Castling-ViT というフレームワークを提案する。当社のcastling-vitは角カーネルを利用して,クエリとキーの類似度をスペクトル角で測定します。 And we further simplify it with two techniques: (1) a novel linear-angular attention mechanism: we decompose the angular kernels into linear terms and high-order residuals, and only keep the linear terms; and (2) we adopt two parameterized modules to approximate high-order residuals: a depthwise convolution and an auxiliary masked softmax attention to help learn both global and local information, where the masks for softmax attention are regularized to gradually become zeros and thus incur no overhead during ViT inference. 3つのタスクに関する広範な実験とアブレーションの研究は、提案するキャスティング・ヴィットの有効性を一貫して検証している。例えば、画像ネットの分類において最大1.8%の精度と40%のmacs削減を達成し、同等のフロップでcoco検出時の1.2倍のマップを、バニラソフトマックスに基づくvitsと比較した。

Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens. Existing efficient ViTs adopt local attention (e.g., Swin) or linear attention (e.g., Performer), which sacrifice ViTs' capabilities of capturing either global or local context. In this work, we ask an important research question: Can ViTs learn both global and local context while being more efficient during inference? To this end, we propose a framework called Castling-ViT, which trains ViTs using both linear-angular attention and masked softmax-based quadratic attention, but then switches to having only linear angular attention during ViT inference. Our Castling-ViT leverages angular kernels to measure the similarities between queries and keys via spectral angles. And we further simplify it with two techniques: (1) a novel linear-angular attention mechanism: we decompose the angular kernels into linear terms and high-order residuals, and only keep the linear terms; and (2) we adopt two parameterized modules to approximate high-order residuals: a depthwise convolution and an auxiliary masked softmax attention to help learn both global and local information, where the masks for softmax attention are regularized to gradually become zeros and thus incur no overhead during ViT inference. Extensive experiments and ablation studies on three tasks consistently validate the effectiveness of the proposed Castling-ViT, e.g., achieving up to a 1.8% higher accuracy or 40% MACs reduction on ImageNet classification and 1.2 higher mAP on COCO detection under comparable FLOPs, as compared to ViTs with vanilla softmax-based attentions.

翻訳日:2023-04-05 00:08:40 公開日:2023-04-03

# neurallift-360: 360{\deg}ビューで3dオブジェクトに2d写真を持ち上げる

NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360{\deg} Views ( http://arxiv.org/abs/2211.16431v2 )

ライセンス: Link先を確認

Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Yi Wang, Zhangyang Wang

(参考訳) 仮想現実と拡張現実(XR)は、3Dコンテンツの需要を増大させる。しかし、高品質な3Dコンテンツを作成するには、人間の専門家がしなければならない面倒な作業が必要です。本研究では,1枚の画像を1枚の3Dオブジェクトに持ち上げるという課題について検討し,360{\deg}ビューを持つ可視3Dオブジェクトを与えられた参照画像とよく一致する形で生成できることを初めて実証する。参照画像に条件を付けることで,画像から物体の新しい視点を合成する,永遠の好奇心を満たすことができる。私たちの技術は、3DアーティストやXRデザイナーのワークフローを緩和する有望な方向性に光を当てています。我々は,NeuralLift-360という,深度認識型ニューラル放射率表現(NeRF)を利用した新しいフレームワークを提案する。我々のNeuralLift-360は、ランキングの損失を発生させることで、荒々しい深さを推定できる。また,コヒーレントガイダンスを提供する前に,CLIP誘導サンプリング戦略を採用した。大規模な実験により、我々のNeuralLift-360は既存の最先端のベースラインを大幅に上回っていることが示された。プロジェクトページ: https://vita-group.github.io/neurallift-360/

Virtual reality and augmented reality (XR) bring increasing demand for 3D content. However, creating high-quality 3D content requires tedious work that a human expert must do. In this work, we study the challenging task of lifting a single image to a 3D object and, for the first time, demonstrate the ability to generate a plausible 3D object with 360{\deg} views that correspond well with the given reference image. By conditioning on the reference image, our model can fulfill the everlasting curiosity for synthesizing novel views of objects from images. Our technique sheds light on a promising direction of easing the workflows for 3D artists and XR designers. We propose a novel framework, dubbed NeuralLift-360, that utilizes a depth-aware neural radiance representation (NeRF) and learns to craft the scene guided by denoising diffusion models. By introducing a ranking loss, our NeuralLift-360 can be guided with rough depth estimation in the wild. We also adopt a CLIP-guided sampling strategy for the diffusion prior to provide coherent guidance. Extensive experiments demonstrate that our NeuralLift-360 significantly outperforms existing state-of-the-art baselines. Project page: https://vita-group.github.io/NeuralLift-360/

翻訳日:2023-04-05 00:02:45 公開日:2023-04-03

# 教師なし画像セマンティックセグメンテーションにおけるアライメントと均一性の再考

Rethinking Alignment and Uniformity in Unsupervised Image Semantic Segmentation ( http://arxiv.org/abs/2211.14513v2 )

ライセンス: Link先を確認

Daoan Zhang, Chenming Li, Haoquan Li, Wenjian Huang, Lingyun Huang, Jianguo Zhang

(参考訳) 教師なし画像セマンティクスセグメンテーション(uiss)は、外部の監督なしに低レベルの視覚特徴と意味レベルの表現をマッチングすることを目的としている。本稿では,UISSモデルにおける特徴アライメントと特徴均一性の観点から,重要な特性について述べる。また,UISSと画像表現学習の比較を行った。本分析に基づき, 既存のMI法は表現崩壊に悩まされていると論じる。そこで,本稿では,意味的注意(semantic attention network,san)と呼ばれるロバストなネットワークを提案し,新たなモジュールである意味的注意(semantic attention,seat)を提案し,ピクセル毎および意味的特徴を動的に生成する。複数のセマンティクスセグメンテーションベンチマークの実験結果は、教師なしセグメンテーションフレームワークがセマンティクス表現のキャッチを専門としていることを示している。

Unsupervised image semantic segmentation(UISS) aims to match low-level visual features with semantic-level representations without outer supervision. In this paper, we address the critical properties from the view of feature alignments and feature uniformity for UISS models. We also make a comparison between UISS and image-wise representation learning. Based on the analysis, we argue that the existing MI-based methods in UISS suffer from representation collapse. By this, we proposed a robust network called Semantic Attention Network(SAN), in which a new module Semantic Attention(SEAT) is proposed to generate pixel-wise and semantic features dynamically. Experimental results on multiple semantic segmentation benchmarks show that our unsupervised segmentation framework specializes in catching semantic representations, which outperforms all the unpretrained and even several pretrained methods.

翻訳日:2023-04-05 00:01:42 公開日:2023-04-03

# イベントカメラのためのデータ駆動型特徴追跡

Data-driven Feature Tracking for Event Cameras ( http://arxiv.org/abs/2211.12826v2 )

ライセンス: Link先を確認

Nico Messikommer, Carter Fang, Mathias Gehrig, Davide Scaramuzza

(参考訳) 高時間分解能、動きのぼかしに対するレジリエンスの増大、そして非常に少ない出力のため、イベントカメラは挑戦的なシナリオであっても低レイテンシで低帯域幅の特徴追跡に最適であることが示されている。既存のイベントカメラの特徴追跡手法は手作りか第一原理から派生しているが、広範なパラメータチューニングが必要であり、ノイズに敏感であり、非モデル化効果のために異なるシナリオに一般化しない。これらの欠陥に対処するために、グレースケールフレームで検出された特徴を追跡するために、低レイテンシイベントを活用するイベントカメラ用の最初のデータ駆動機能トラッカーを導入する。特徴トラック間で情報を共有する新しいフレームアテンションモジュールにより,ロバストな性能を実現する。合成データから実データへのゼロショットを直接転送することで、データ駆動トラッカーは、相対的特徴年齢における既存のアプローチを最大120%上回り、低レイテンシを実現する。この性能ギャップはさらに130%増加し、トラッカーを新たな自己超越戦略で実データに適用する。

Because of their high temporal resolution, increased resilience to motion blur, and very sparse output, event cameras have been shown to be ideal for low-latency and low-bandwidth feature tracking, even in challenging scenarios. Existing feature tracking methods for event cameras are either handcrafted or derived from first principles but require extensive parameter tuning, are sensitive to noise, and do not generalize to different scenarios due to unmodeled effects. To tackle these deficiencies, we introduce the first data-driven feature tracker for event cameras, which leverages low-latency events to track features detected in a grayscale frame. We achieve robust performance via a novel frame attention module, which shares information across feature tracks. By directly transferring zero-shot from synthetic to real data, our data-driven tracker outperforms existing approaches in relative feature age by up to 120% while also achieving the lowest latency. This performance gap is further increased to 130% by adapting our tracker to real data with a novel self-supervision strategy.

翻訳日:2023-04-04 23:59:39 公開日:2023-04-03

# ResFormer:マルチリゾリューショントレーニングによるViTのスケーリング

ResFormer: Scaling ViTs with Multi-Resolution Training ( http://arxiv.org/abs/2212.00776v2 )

ライセンス: Link先を確認

Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang

(参考訳) 視覚トランスフォーマー(vits)は圧倒的な成功を収めているが、それらは脆弱な解像度のスケーラビリティ、すなわち、トレーニング中に目に見えない入力解像度が提示されると、パフォーマンスが大幅に低下する。 resformerはマルチレゾリューショントレーニングという独創的なアイデアに基づいて構築されたフレームワークで、幅広い範囲(ほとんど見えない)のテスト解像度のパフォーマンス向上を目的としています。特に、resformerは異なる解像度の複製された画像を操作し、異なるスケールでインタラクティブな情報を扱うためにスケール一貫性の損失を強制する。さらに,様々な解像度,特に新しい解像度を効果的に交互にテストするために,入力サイズに応じてスムーズに変化するグローバルローカルな位置埋め込み戦略を提案する。 ImageNet上で画像分類のための広範な実験を行う。この結果は、resformerが幅広い解像度に向けたスケーリング能力を持っているという強力な定量的証拠を提供する。例えば、ResFormer-B-MRは、比較的低解像度と高解像度(96と640)で評価すると、Top-1の精度が75.86%と81.72%に達する(DeiT-Bより48%と7.49%良い)。また,resformerは柔軟であり,意味セグメンテーション,オブジェクト検出,ビデオアクション認識にも容易に拡張できることを示す。コードはhttps://github.com/ruitian12/resformerで入手できる。

Vision Transformers (ViTs) have achieved overwhelming success, yet they suffer from vulnerable resolution scalability, i.e., the performance drops drastically when presented with input resolutions that are unseen during training. We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions. In particular, ResFormer operates on replicated images of different resolutions and enforces a scale consistency loss to engage interactive information across different scales. More importantly, to alternate among varying resolutions effectively, especially novel ones in testing, we propose a global-local positional embedding strategy that changes smoothly conditioned on input sizes. We conduct extensive experiments for image classification on ImageNet. The results provide strong quantitative evidence that ResFormer has promising scaling abilities towards a wide range of resolutions. For instance, ResFormer-B-MR achieves a Top-1 accuracy of 75.86% and 81.72% when evaluated on relatively low and high resolutions respectively (i.e., 96 and 640), which are 48% and 7.49% better than DeiT-B. We also demonstrate, moreover, ResFormer is flexible and can be easily extended to semantic segmentation, object detection and video action recognition. Code is available at https://github.com/ruitian12/resformer.

翻訳日:2023-04-04 23:51:43 公開日:2023-04-03

# 流れの正常化

Taming Normalizing Flows ( http://arxiv.org/abs/2211.16488v2 )

ライセンス: Link先を確認

Shimon Malnick, Shai Avidan, Ohad Fried

(参考訳) フローモデルの正規化 - モデルが特定の画像や画像カテゴリを生成する確率を変化させるアルゴリズムを提案する。与えられた画像の正確な生成確率を計算できるので、フローの正規化にフォーカスする。我々は、多くの興味深いプライバシーとバイアスを考慮したサブドメインである人間の顔を生成するモデルを用いて、改ざんを実証する。本手法は,プライバシの文脈,例えば,モデルの出力から特定の人物を取り除いたり,特定の対象分布に応じて特定の画像カテゴリを出力させたりすることで、デバイアスの文脈で利用することができる。モデリングは、モデルをスクラッチからトレーニングすることなく、高速な微調整プロセスで達成され、数分で目標を達成する。提案手法を定性的かつ定量的に評価し, 所望の変化を適用しつつ, 生成品質が持続することを示す。

We propose an algorithm for taming Normalizing Flow models - changing the probability that the model will produce a specific image or image category. We focus on Normalizing Flows because they can calculate the exact generation probability likelihood for a given image. We demonstrate taming using models that generate human faces, a subdomain with many interesting privacy and bias considerations. Our method can be used in the context of privacy, e.g., removing a specific person from the output of a model, and also in the context of debiasing by forcing a model to output specific image categories according to a given target distribution. Taming is achieved with a fast fine-tuning process without retraining the model from scratch, achieving the goal in a matter of minutes. We evaluate our method qualitatively and quantitatively, showing that the generation quality remains intact, while the desired changes are applied.

翻訳日:2023-04-04 23:49:26 公開日:2023-04-03

# Biomarker Activation Mapによる糖尿病網膜症の診断

Interpretable Diabetic Retinopathy Diagnosis based on Biomarker Activation Map ( http://arxiv.org/abs/2212.06299v2 )

ライセンス: Link先を確認

Pengxiao Zang, Tristan T. Hormel, Jie Wang, Yukun Guo, Steven T. Bailey, Christina J. Flaxel, David Huang, Thomas S. Hwang, and Yali Jia

(参考訳) 深層学習分類器は、光学コヒーレンス断層撮影(oct)とその血管造影(octa)に基づいて糖尿病網膜症(dr)を自動的に診断する最も正確な手段を提供する。これらのモデルのパワーは、部分的には、望ましいタスクを達成するのに必要な複雑さを提供する隠されたレイヤを含めることに起因する。しかし、隠れた層はアルゴリズムの出力を解釈しにくくする。本稿では, 臨床医が分類器の意思決定を検証・理解するための, 生成的敵対学習に基づく新しいバイオマーカー活性化マップ(BAM)フレームワークを提案する。 456個の黄斑スキャンを含むデータセットを、現在の臨床基準に基づいて非参照型または参照型DRとして評価した。 BAMを評価するのに使われたDR分類器は、このデータセットに基づいて最初に訓練された。 BAM生成フレームワークは、2つのU字型ジェネレータを組み合わせて設計され、この分類器に意味のある解釈性を提供する。メインジェネレータは、参照可能なスキャンを入力として取り、分類器によって非参照可能な出力を生成するように訓練された。次に、bamを主発電機の出力と入力との差分画像として構成する。 BAMが分類器を利用したバイオマーカーのみを強調するようにするために、アシスタントジェネレータは反対に行うように訓練され、参照できないスキャンから分類器によって参照可能なスキャンを生成する。生成したBAMは非灌流領域や網膜液を含む既知の病態の特徴を強調した。これらのハイライトに基づいて完全に解釈可能な分類器は、臨床医が自動DR診断をよりよく活用し、検証するのに役立ちます。

Deep learning classifiers provide the most accurate means of automatically diagnosing diabetic retinopathy (DR) based on optical coherence tomography (OCT) and its angiography (OCTA). The power of these models is attributable in part to the inclusion of hidden layers that provide the complexity required to achieve a desired task. However, hidden layers also render algorithm outputs difficult to interpret. Here we introduce a novel biomarker activation map (BAM) framework based on generative adversarial learning that allows clinicians to verify and understand classifiers decision-making. A data set including 456 macular scans were graded as non-referable or referable DR based on current clinical standards. A DR classifier that was used to evaluate our BAM was first trained based on this data set. The BAM generation framework was designed by combing two U-shaped generators to provide meaningful interpretability to this classifier. The main generator was trained to take referable scans as input and produce an output that would be classified by the classifier as non-referable. The BAM is then constructed as the difference image between the output and input of the main generator. To ensure that the BAM only highlights classifier-utilized biomarkers an assistant generator was trained to do the opposite, producing scans that would be classified as referable by the classifier from non-referable scans. The generated BAMs highlighted known pathologic features including nonperfusion area and retinal fluid. A fully interpretable classifier based on these highlights could help clinicians better utilize and verify automated DR diagnosis.

翻訳日:2023-04-04 23:44:17 公開日:2023-04-03

# 検出選択アルゴリズム : 物体検出のためのポスト処理を行う確率ベース最適化手法

Detection Selection Algorithm: A Likelihood based Optimization Method to Perform Post Processing for Object Detection ( http://arxiv.org/abs/2212.05706v2 )

ライセンス: Link先を確認

Angzhi Fan, Benjamin Ticknor and Yali Amit

(参考訳) 物体検出では、非最大抑圧(NMS)のような後処理法が広く用いられている。 NMSは偽陽性の検出回数を大幅に減らすことができるが、目標値の低いいくつかの検出を維持できる可能性がある。画像中のオブジェクトとそのラベルの正確な数を求めるため,NMSや関連手法の後に使用される検出選択アルゴリズム(DSA)と呼ばれるポスト処理手法を提案する。 DSAは検出されたバウンディングボックスのサブセットを優雅に選択し、オブジェクトの閉塞を考慮した画像全体の解釈を最も高い確率で行う完全なオブジェクト再構成を行う。アルゴリズムは4つの要素からなる。まず、オブジェクト間の閉塞関係を得るために、より高速なR-CNNに閉塞分岐を追加する。第2に,我々がデコーダと呼ぶ訓練済み生成ネットワークの潜在変数の最適化に基づいて,その可視部分から物体全体の外観を再構築できる単一再構成アルゴリズムを開発した。第3に, 咬合順序を考慮した仮説的解釈により, 全物体の同時再構成を行う全再構成アルゴリズムを提案する。最後に,リストから検出を漸進的に追加または削除し,対応する解釈の可能性を最大化する欲望アルゴリズムを提案する。 NMS や Soft-NMS を用いた DSA は NMS や Soft-NMS よりも優れた結果が得られる。

In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.

翻訳日:2023-04-04 23:42:56 公開日:2023-04-03

# イベントカメラを用いた物体検出用リカレントビジョントランス

Recurrent Vision Transformers for Object Detection with Event Cameras ( http://arxiv.org/abs/2212.05598v2 )

ライセンス: Link先を確認

Mathias Gehrig and Davide Scaramuzza

(参考訳) イベントカメラを用いた物体検出のための新しいバックボーンであるリカレントビジョントランス (RVT) を提案する。イベントカメラは、高ダイナミックレンジでミリ秒以下のレイテンシで視覚情報を提供する。これらのユニークな特性は、時間クリティカルなシナリオにおける低レイテンシオブジェクトの検出と追跡に大きな可能性を提供します。イベントベースのビジョンでの以前の作業は、優れた検出性能を達成しているが、実質的な推論時間(通常は40ミリ秒以上)のコストで達成されている。リカレントビジョンバックボーンのハイレベルな設計を再検討することにより、同様のパフォーマンスを維持しつつ推論時間を6倍に短縮する。これを実現するために,各段階において3つの重要な概念,すなわち条件付き位置埋め込みと見なすことができる畳み込み前処理を用いる多段階設計を探索する。第2に、局所的および拡張的グローバル自己注意による空間的特徴の相互作用第3に、時間情報を保持しながらレイテンシを最小限に抑えるために、繰り返し時間的特徴集約。 RVTは、Gen1オートマチックデータセット上で47.2%のmAPを達成するイベントベースのオブジェクト検出において、最先端のパフォーマンスに到達するために、ゼロからトレーニングすることができる。同時に、RVTは高速な推論(T4 GPU上では12ミリ秒)と好ましいパラメータ効率(先行技術より5倍少ない)を提供する。私たちの研究は、イベントベースのビジョンを超えた研究に役立ち得る効果的な設計選択に対する新たな洞察をもたらします。

We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras. Event cameras provide visual information with sub-millisecond latency at a high-dynamic range and with strong robustness against motion blur. These unique properties offer great potential for low-latency object detection and tracking in time-critical scenarios. Prior work in event-based vision has achieved outstanding detection performance but at the cost of substantial inference time, typically beyond 40 milliseconds. By revisiting the high-level design of recurrent vision backbones, we reduce inference time by a factor of 6 while retaining similar performance. To achieve this, we explore a multi-stage design that utilizes three key concepts in each stage: First, a convolutional prior that can be regarded as a conditional positional embedding. Second, local and dilated global self-attention for spatial feature interaction. Third, recurrent temporal feature aggregation to minimize latency while retaining temporal information. RVTs can be trained from scratch to reach state-of-the-art performance on event-based object detection - achieving an mAP of 47.2% on the Gen1 automotive dataset. At the same time, RVTs offer fast inference (<12 ms on a T4 GPU) and favorable parameter efficiency (5 times fewer than prior art). Our study brings new insights into effective design choices that can be fruitful for research beyond event-based vision.

翻訳日:2023-04-04 23:42:31 公開日:2023-04-03

# REVEAL:マルチソースマルチモーダル知識メモリによる検索拡張ビジュアルランゲージ事前学習

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory ( http://arxiv.org/abs/2212.05221v2 )

ライセンス: Link先を確認

Ziniu Hu and Ahmet Iscen and Chen Sun and Zirui Wang and Kai-Wei Chang and Yizhou Sun and Cordelia Schmid and David A. Ross and Alireza Fathi

(参考訳) 本稿では,世界の知識を大規模メモリにエンコードし,知識集約型クエリに答えるために,エンド・ツー・エンドで検索可能なビジュアル言語モデル(reveal)を提案する。 REVEALは、メモリ、エンコーダ、レシーバー、ジェネレータの4つのキーコンポーネントで構成されている。大規模メモリは、統一エンコーダを介して多様世界知識(画像テキストペア、質問応答ペア、知識グラフトリプレットなど)の様々なソースを符号化する。取得者はメモリ内の最も関連する知識エントリを見つけ、取得した知識と入力クエリを融合して出力を生成する。このアプローチの重要な特徴は、メモリ、エンコーダ、レトリバー、ジェネレータはすべて、大量のデータに対して、エンドツーエンドで事前訓練されていることです。さらに,本手法では多様なマルチモーダル・ナレッジ・ソースを利用できるため,大きな利得が得られている。本稿では,REVEALが視覚的質問応答と画像キャプションの最先端化を実現していることを示す。

In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that learns to encode world knowledge into a large-scale memory, and to retrieve from it to answer knowledge-intensive queries. REVEAL consists of four key components: the memory, the encoder, the retriever and the generator. The large-scale memory encodes various sources of multimodal world knowledge (e.g. image-text pairs, question answering pairs, knowledge graph triplets, etc) via a unified encoder. The retriever finds the most relevant knowledge entries in the memory, and the generator fuses the retrieved knowledge with the input query to produce the output. A key novelty in our approach is that the memory, encoder, retriever and generator are all pre-trained end-to-end on a massive amount of data. Furthermore, our approach can use a diverse set of multimodal knowledge sources, which is shown to result in significant gains. We show that REVEAL achieves state-of-the-art results on visual question answering and image captioning.

翻訳日:2023-04-04 23:42:10 公開日:2023-04-03

# cepha29: 自動脳波ランドマーク検出チャレンジ2023

CEPHA29: Automatic Cephalometric Landmark Detection Challenge 2023 ( http://arxiv.org/abs/2212.04808v2 )

ライセンス: Link先を確認

Muhammad Anwaar Khalid, Kanwal Zulfiqar, Ulfat Bashir, Areeba Shaheen, Rida Iqbal, Zarnab Rizwan, Ghina Rizwan, Muhammad Moazam Fraz

(参考訳) 定量的脳計測分析は、現代の矯正治療において最も広く用いられている臨床および研究ツールである。脳波ランドマークの正確な位置決定は解剖学的異常の定量化と分類を可能にするが、これらのランドマークをマークする従来の手作業は非常に退屈な作業である。自動頭蓋計測による目印検出システムの開発は、常に行われているが、矯正治療には不十分である。基本的な理由は、これらのデータセットでトレーニング用に提供される画像だけでなく、公開されているデータセットの量は、aiモデルがうまく機能しないためである。形態計測解析のための堅牢なAIソリューションの開発を容易にするため, IEEE International Symposium on Biomedical Imaging (ISBI 2023) と共同で, CEPHA29 Automatic Cephalometric Landmark Detection Challengeを開催する。この文脈では、1000個の頭部X線画像からなる、最も広く公開されているデータセットを提供する。我々は、私たちの挑戦が、自動頭脳計測のランドマーク識別の研究と革新を先導するだけでなく、この分野の新しい時代の始まりを示唆することを期待している。

Quantitative cephalometric analysis is the most widely used clinical and research tool in modern orthodontics. Accurate localization of cephalometric landmarks enables the quantification and classification of anatomical abnormalities, however, the traditional manual way of marking these landmarks is a very tedious job. Endeavours have constantly been made to develop automated cephalometric landmark detection systems but they are inadequate for orthodontic applications. The fundamental reason for this is that the amount of publicly available datasets as well as the images provided for training in these datasets are insufficient for an AI model to perform well. To facilitate the development of robust AI solutions for morphometric analysis, we organise the CEPHA29 Automatic Cephalometric Landmark Detection Challenge in conjunction with IEEE International Symposium on Biomedical Imaging (ISBI 2023). In this context, we provide the largest known publicly available dataset, consisting of 1000 cephalometric X-ray images. We hope that our challenge will not only derive forward research and innovation in automatic cephalometric landmark identification but will also signal the beginning of a new era in the discipline.

翻訳日:2023-04-04 23:41:52 公開日:2023-04-03

# Genie: 量子化のデータを見せてください

Genie: Show Me the Data for Quantization ( http://arxiv.org/abs/2212.04780v2 )

ライセンス: Link先を確認

Yongkweon Jeon, Chungman Lee, Ho-young Kim

(参考訳) ゼロショット量子化は、プライバシに関連するコストや問題など、さまざまな理由からデータがアクセスできない場合に、軽量なディープニューラルネットワークを開発する上で有望なアプローチである。 FP32事前学習モデルにおけるバッチ正規化層の学習パラメータ($\mu$と$\sigma$)を利用することで、ゼロショット量子化スキームは合成データの生成に焦点を当てる。その後、事前学習されたモデル(教師)から量子化モデル(学生)への知識を蒸留し、量子化モデルに合成データセットを最適化する。しかし、これまでのゼロショット量子化は、タスク固有の損失と長期最適化を必要とする量子化対応トレーニング手法の文脈で主に議論されてきた。そこで我々は,高品質な量子化ネットワークを数時間で生成するゼロショット量子化のための後学習量子化方式を提案する。さらに,量子化に適したデータを生成する \genie~というフレームワークを提案する。 Genieによって合成されたデータにより、実際のデータセットを使わずに堅牢な量子化モデルを作成できる。また,学習後の量子化アルゴリズムを提案し,量子化モデルの性能を向上させる。これらを組み合わせることで、ゼロショットと少数ショットの量子化のギャップを埋めることができ、既存のアプローチと比べて量子化性能を著しく改善することができる。言い換えれば、ユニークな最先端ゼロショット量子化アプローチを得ることができる。

Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By exploiting the learned parameters ($\mu$ and $\sigma$) of batch normalization layers in an FP32-pre-trained model, zero-shot quantization schemes focus on generating synthetic data. Subsequently, they distill knowledge from the pre-trained model (teacher) to the quantized model (student) such that the quantized model can be optimized with the synthetic dataset. However, thus far, zero-shot quantization has primarily been discussed in the context of quantization-aware training methods, which require task-specific losses and long-term optimization as much as retraining. We thus introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours. Furthermore, we propose a framework called \genie~that generates data suited for quantization. With the data synthesized by Genie, we can produce robust quantized models without real datasets, which is comparable to few-shot quantization. We also propose a post-training quantization algorithm to enhance the performance of quantized models. By combining them, we can bridge the gap between zero-shot and few-shot quantization while significantly improving the quantization performance compared to that of existing approaches. In other words, we can obtain a unique state-of-the-art zero-shot quantization approach.

翻訳日:2023-04-04 23:41:32 公開日:2023-04-03

# ニューラルネットワークモデルにおけるモンタギュー意味論と修飾子一貫性測定

Montague semantics and modifier consistency measurement in neural language models ( http://arxiv.org/abs/2212.04310v2 )

ライセンス: Link先を確認

Danilo S. Carvalho, Edoardo Manino, Julia Rozanova, Lucas Cordeiro, Andr\'e Freitas

(参考訳) 近年,分布型言語表現モデルは非常に実践的な成功を収めている。同時に、解釈可能性の必要性は、固有の特性と能力に関する疑問を提起している。重要なことに、分布モデルはしばしば自然言語の合成現象を扱う際に矛盾し、それがそれらの安全性と公正性に重大な影響を及ぼす。それにもかかわらず、構成性に関する最近の研究は、類似性タスクのみの性能を改善することを目的としている。本研究は異なるアプローチを採り、現代言語モデルにおける構成行動を測定する手法を提案する。具体的には,形容詞・名詞句における形容詞修飾現象に注目した。モンタギュー意味論に触発された作曲行動の3つの新しいテストを紹介する。実験の結果,現在のニューラルランゲージモデルは,期待される言語理論に限定して振る舞うことが示された。このことは、これらの言語モデルが私たちが評価した意味的性質を捉えられないのか、あるいはモンゴルの伝統からの言語理論が分布モデルの期待する能力と一致しないのかという疑問を提起する。

In recent years, distributional language representation models have demonstrated great practical success. At the same time, the need for interpretability has elicited questions on their intrinsic properties and capabilities. Crucially, distributional models are often inconsistent when dealing with compositional phenomena in natural language, which has significant implications for their safety and fairness. Despite this, most current research on compositionality is directed towards improving their performance on similarity tasks only. This work takes a different approach, and proposes a methodology for measuring compositional behavior in contemporary language models. Specifically, we focus on adjectival modifier phenomena in adjective-noun phrases. We introduce three novel tests of compositional behavior inspired by Montague semantics. Our experimental results indicate that current neural language models behave according to the expected linguistic theories to a limited extent only. This raises the question of whether these language models are not able to capture the semantic properties we evaluated, or whether linguistic theories from Montagovian tradition would not match the expected capabilities of distributional models.

翻訳日:2023-04-04 23:40:39 公開日:2023-04-03

# FunkNN: 機能生成のための神経補間

FunkNN: Neural Interpolation for Functional Generation ( http://arxiv.org/abs/2212.14042v2 )

ライセンス: Link先を確認

AmirEhsan Khorashadizadeh, Anadi Chaman, Valentin Debarnot, Ivan Dokmani\'c

(参考訳) スケールをまたいで一般化し、任意の座標で評価し、正確な微分の計算を認め、概念的に単純である連続生成モデルを構築することができるか? 既存のMLPベースのアーキテクチャは、良好な畳み込み誘導バイアスを持つグリッドベースのジェネレータよりも悪いサンプルを生成する。異なるスケールで画像を生成することに焦点を当てたモデルの方が優れているが、画像やデリバティブの継続的な評価のために設計されていない複雑なアーキテクチャを採用する。信号処理の観点から、サンプルからの補間として連続画像生成を扱う。実際、正しくサンプリングされた離散画像は、低空間周波数に関する全ての情報を含んでいる。問題は、上記の設計基準を満たしながら、データ駆動方式でスペクトルを外挿する方法である。われわれの答えはfunknn ― 任意の座標で連続画像を再構築する方法を学び、任意の画像データセットに適用できる新しい畳み込みネットワーク。離散生成モデルと組み合わさって、連続的な不正な逆問題に先行して作用する関数生成器となる。 funknnは高品質な連続画像を生成し,パッチベースの設計により,高い分散性能を示す。さらに,空間的微分を持つ数種類のスタイリッシュな逆問題において,その性能を示す。

Can we build continuous generative models which generalize across scales, can be evaluated at any coordinate, admit calculation of exact derivatives, and are conceptually simple? Existing MLP-based architectures generate worse samples than the grid-based generators with favorable convolutional inductive biases. Models that focus on generating images at different scales do better, but employ complex architectures not designed for continuous evaluation of images and derivatives. We take a signal-processing perspective and treat continuous image generation as interpolation from samples. Indeed, correctly sampled discrete images contain all information about the low spatial frequencies. The question is then how to extrapolate the spectrum in a data-driven way while meeting the above design criteria. Our answer is FunkNN -- a new convolutional network which learns how to reconstruct continuous images at arbitrary coordinates and can be applied to any image dataset. Combined with a discrete generative model it becomes a functional generator which can act as a prior in continuous ill-posed inverse problems. We show that FunkNN generates high-quality continuous images and exhibits strong out-of-distribution performance thanks to its patch-based design. We further showcase its performance in several stylized inverse problems with exact spatial derivatives.

翻訳日:2023-04-04 23:33:02 公開日:2023-04-03

# タイムゲート光子検出による非ガウス状態生成

Non-Gaussian state generation with time-gated photon detection ( http://arxiv.org/abs/2212.13335v2 )

ライセンス: Link先を確認

Tatsuki Sonoyama, Kazuma Takahashi, Baramee Charoensombutamon, Sachiko Takasu, Kaori Hattori, Daiji Fukuda, Kosuke Fukui, Kan Takase, Warit Asavanant, Jun-ichi Yoshikawa, Mamoru Endo, Akira Furusawa

(参考訳) フォールトトレラントで普遍的な光学量子計算に必須である非ガウス状態は、一般的に光子検出器を用いたヘラルドスキームによって生成される。近年,光子検出器の大きなタイミングジッタが生成する非ガウス状態 [t] の純度を低下させることが理論的に示されている。ソノヤマ、$\textit{et al}$。 Phys。 rev. a $\textbf{105}$, 043714 (2022)]。本研究では, 時間差光子検出により, ウィグナー負性を持つ非ガウス状態を生成する。我々は,50 nsから10 nsまでの遷移エッジセンサに基づく光子数分解検出器のタイミングジッタを効果的に改善するために,タイムゲーティングに高速光スイッチを用いる。その結果、時間ゲート光子検出法なしでは観測できないウィグナー負性$-0.011\pm 0.004$の非ガウス状態を生成する。これらの結果は,非ガウシアン状態生成に対するタイミングジッタの効果を初めて実験的に確認し,高純度非ガウシアン状態生成の有望な方法を提供する。

Non-Gaussian states of light, which are essential in fault-tolerant and universal optical quantum computation, are typically generated by a heralding scheme using photon detectors. Recently, it is theoretically shown that the large timing jitter of the photon detectors deteriorates the purity of the generated non-Gaussian states [T. Sonoyama, $\textit{et al}$., Phys. Rev. A $\textbf{105}$, 043714 (2022)]. In this study, we generate non-Gaussian states with Wigner negativity by time-gated photon detection. We use a fast optical switch for time gating to effectively improve the timing jitter of a photon-number-resolving detector based on transition edge sensor from 50 ns to 10 ns. As a result, we generate non-Gaussian states with Wigner negativity of $-0.011\pm 0.004$, which cannot be observed without the time-gated photon detection method. These results confirm the effect of the timing jitter on non-Gaussian state generation experimentally for the first time and provide the promising method of high-purity non-Gaussian state generation.

翻訳日:2023-04-04 23:32:44 公開日:2023-04-03

# 物理インフォームドガウス過程回帰は線形PDE解を一般化する

Physics-Informed Gaussian Process Regression Generalizes Linear PDE Solvers ( http://arxiv.org/abs/2212.12474v3 )

ライセンス: Link先を確認

Marvin Pf\"ortner and Ingo Steinwart and Philipp Hennig and Jonathan Wenger

(参考訳) 線形偏微分方程式(英: Linear partial differential equation, PDEs)は、熱伝達、電磁気、波動伝播などの物理過程を記述する重要な力学モデルのクラスである。実際には、離散化に基づく特殊数値法を用いてPDEを解く。一般に、未知のモデルパラメータの見積もりと、可能であれば初期化の物理的測定を用いる。このような解法はしばしば下流の応用でより大きな科学的モデルに埋め込まれ、エラー定量化が重要な役割を果たす。しかし、パラメータや測定の不確かさを無視することで、古典的なPDEソルバはその固有近似誤差の一貫した推定を導出できない可能性がある。本研究では、線形PDEを物理インフォームドガウス過程(GP)回帰として解釈することで、この問題を原理的にアプローチする。我々のフレームワークは、任意の有界線型作用素による観測に対するガウス過程推論定理の鍵となる一般化に基づいている。この確率論的視点は、(1)固有の離散化誤差の定量化、(2)モデルパラメータの不確かさを解に伝播させ、(3)ノイズ測定の条件を与える。この定式化の強さを実証し、重み付け残差法、コロケーション、有限体積、擬スペクトル、および有限要素法やスペクトル法のような(一般化)ガレルキン法を含むPDEソルバの中心クラスを厳密に一般化することを証明する。したがって、このクラスは構造化誤差推定を直接装備することができる。要約すると, 数値解析とベイズ推定の境界を曖昧にすることで, モジュラービルディングブロックとしての機械モデルと確率モデルとのシームレスな統合が可能となる。

Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models with a downstream application and thus error quantification plays a key role. However, by ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of the Gaussian process inference theorem to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models by blurring the boundaries between numerical analysis and Bayesian inference.

翻訳日:2023-04-04 23:32:03 公開日:2023-04-03

# VolRecon: 一般化可能な多視点再構成のための符号付き距離関数のボリュームレンダリング

VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction ( http://arxiv.org/abs/2212.08067v2 )

ライセンス: Link先を確認

Yufan Ren, Fangjinhua Wang, Tong Zhang, Marc Pollefeys and Sabine S\"usstrunk

(参考訳) ニューラル・ラジアンス・フィールド(NeRF)が新しいビュー合成で成功し、研究者はニューラル・暗黙のシーン再構成を提案するようになった。しかし、既存のほとんどの暗黙的再構成手法はシーンごとのパラメータを最適化し、新しいシーンへの一般化性に欠ける。本稿では,SRDF(Signed Ray Distance Function)を用いた新しい一般化可能な暗黙的再構成手法であるVolReconを紹介する。細かいディテールとノイズが少ないシーンを再構築するために、volreconはマルチビュー特徴から集約された投影特徴と、粗いグローバル特徴量から補間されたボリューム特徴を組み合わせる。放射光変換器を用いて試料点のSRDF値を算出し,色と深さを描画する。 DTUデータセットでは、VolReconはスパースビュー再構築においてSparseNeuSを約30%上回り、フルビュー再構築においてMVSNetと同等の精度を達成する。さらに,提案手法は大規模ETH3Dベンチマークにおいて優れた一般化性能を示す。

The success of the Neural Radiance Fields (NeRF) in novel view synthesis has inspired researchers to propose neural implicit scene reconstruction. However, most existing neural implicit reconstruction methods optimize per-scene parameters and therefore lack generalizability to new scenes. We introduce VolRecon, a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF). To reconstruct the scene with fine details and little noise, VolRecon combines projection features aggregated from multi-view features, and volume features interpolated from a coarse global feature volume. Using a ray transformer, we compute SRDF values of sampled points on a ray and then render color and depth. On DTU dataset, VolRecon outperforms SparseNeuS by about 30% in sparse view reconstruction and achieves comparable accuracy as MVSNet in full view reconstruction. Furthermore, our approach exhibits good generalization performance on the large-scale ETH3D benchmark.

翻訳日:2023-04-04 23:31:12 公開日:2023-04-03

# ANNNIモデルにおける量子クエンチの簡単な理論

A simple theory for quantum quenches in the ANNNI model ( http://arxiv.org/abs/2301.04070v2 )

ライセンス: Link先を確認

Jacob H. Robertson, Riccardo Senese and Fabian H. L. Essler

(参考訳) Haldar et al. (Phys. X 11, 031062) による最近の数値研究において、近位量子臨界点のシグネチャは特定の量子クエンチの後に早期および中間の時間で観測できることが示されている。この研究は、主に軸方向のnext-nearest nearby ising(annni)モデルに焦点をあてた。ここでは単純な時間依存平均場理論を構築し,これらのクエンチの定量的な記述を短時間で得られるようにした。本手法は, 量子臨界点検出におけるクエンチダイナミクスによる基本的な限界に加えて, 報告された数値結果を理解するための簡単な枠組みを提供する。さらに,長期間の有界状態の形成から生じる様々な観測物に見られる特異な振動挙動の起源を説明する。

In a recent numerical study by Haldar et al. (Phys. Rev. X 11, 031062) it was shown that signatures of proximate quantum critical points can be observed at early and intermediate times after certain quantum quenches. Said work focused mainly on the case of the axial next-nearest neighbour Ising (ANNNI) model. Here we construct a simple time-dependent mean-field theory that allows us to obtain a quantitatively accurate description of these quenches at short times, which for reasons we explain remains a fair approximation at late times (with some caveats). Our approach provides a simple framework for understanding the reported numerical results as well as fundamental limitations on detecting quantum critical points through quench dynamics. We moreover explain the origin of the peculiar oscillatory behaviour seen in various observables as arising from the formation of a long-lived bound state.

翻訳日:2023-04-04 23:24:09 公開日:2023-04-03

# 意味マッチングとエッジアライメントを用いた光リモートセンシング画像における軽量サルエント物体検出

Lightweight Salient Object Detection in Optical Remote-Sensing Images via Semantic Matching and Edge Alignment ( http://arxiv.org/abs/2301.02778v2 )

ライセンス: Link先を確認

Gongyang Li, Zhi Liu, Xinpeng Zhang, Weisi Lin

(参考訳) 近年,畳み込みニューラルネットワーク(cnns)に依存する光リモートセンシング画像(ori-sod)における物体検出手法が数多く提案されている。しかし、ほとんどの手法はcnnがもたらした膨大なパラメータと計算コストを無視しており、可搬性と移動性に注意を払うのはごくわずかである。本稿では,セマンティックマッチングとエッジアライメントに基づくORSI-SODのための新しい軽量ネットワークSeaNetを提案する。具体的には、機能抽出のための軽量MobileNet-V2、高レベルの機能のための動的セマンティックマッチングモジュール(DSMM)、低レベルの機能のためのエッジ自己調整モジュール(ESAM)、推論のためのポータブルデコーダを含む。まず、高レベルの機能はセマンティックカーネルに圧縮される。次に,DSMMの動的畳み込み操作により,高次特徴の2つのグループにおける有能なオブジェクト位置を活性化する。一方,ESAMでは,低レベル特徴群2群から抽出したクロススケールエッジ情報をL2損失により自己整合させ,詳細化に利用する。最後に、最高レベルの特徴から、デコーダは2つのモジュールの出力に含まれる正確な位置と細部に基づいて、正常なオブジェクトを推論する。 2つの公開データセットに関する大規模な実験によると、私たちの軽量SeaNetは、最先端の軽量メソッドよりも優れているだけでなく、最先端の従来手法と同等の精度を得られる。私たちのコードと結果はhttps://github.com/mathlee/seanetで入手できます。

Recently, relying on convolutional neural networks (CNNs), many methods for salient object detection in optical remote sensing images (ORSI-SOD) are proposed. However, most methods ignore the huge parameters and computational cost brought by CNNs, and only a few pay attention to the portability and mobility. To facilitate practical applications, in this paper, we propose a novel lightweight network for ORSI-SOD based on semantic matching and edge alignment, termed SeaNet. Specifically, SeaNet includes a lightweight MobileNet-V2 for feature extraction, a dynamic semantic matching module (DSMM) for high-level features, an edge self-alignment module (ESAM) for low-level features, and a portable decoder for inference. First, the high-level features are compressed into semantic kernels. Then, semantic kernels are used to activate salient object locations in two groups of high-level features through dynamic convolution operations in DSMM. Meanwhile, in ESAM, cross-scale edge information extracted from two groups of low-level features is self-aligned through L2 loss and used for detail enhancement. Finally, starting from the highest-level features, the decoder infers salient objects based on the accurate locations and fine details contained in the outputs of the two modules. Extensive experiments on two public datasets demonstrate that our lightweight SeaNet not only outperforms most state-of-the-art lightweight methods but also yields comparable accuracy with state-of-the-art conventional methods, while having only 2.76M parameters and running with 1.7G FLOPs for 288x288 inputs. Our code and results are available at https://github.com/MathLee/SeaNet.

翻訳日:2023-04-04 23:23:52 公開日:2023-04-03

# codetalker: 個別動作を優先した音声駆動3d顔アニメーション

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior ( http://arxiv.org/abs/2301.02379v2 )

ライセンス: Link先を確認

Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong

(参考訳) 音声駆動の3D顔アニメーションは広く研究されているが、音声視覚データの極めて不適切な性質と不足のため、現実主義と鮮明さを達成するには依然としてギャップがある。既存の作業は、通常、回帰タスクへのクロスモーダルマッピングを定式化するが、これは回帰と平均の問題に悩まされ、過度に滑らかな顔の動きにつながる。本稿では,学習したコードブックの有限プロキシ空間において,音声による顔のアニメーションをコードクエリタスクとしてキャストすることを提案する。コードブックは、実際の顔の動きに対する自己再構成によって学習され、現実的な顔の動きに埋め込まれる。離散的動作空間上では、入力された音声信号から顔の動きを逐次合成する時間的自己回帰モデルが用いられ、口唇同期と多彩な表情が保証される。提案手法は, 定性的かつ定量的に, 現在の最先端手法よりも優れていることを示す。また、ユーザスタディは、知覚品質の優位性をさらに正当化する。

Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness due to the highly ill-posed nature and scarcity of audio-visual data. Existing works typically formulate the cross-modal mapping into a regression task, which suffers from the regression-to-mean problem leading to over-smoothed facial motions. In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty. The codebook is learned by self-reconstruction over real facial motions and thus embedded with realistic facial motion priors. Over the discrete motion space, a temporal autoregressive model is employed to sequentially synthesize facial motions from the input speech signal, which guarantees lip-sync as well as plausible facial expressions. We demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively. Also, a user study further justifies our superiority in perceptual quality.

翻訳日:2023-04-04 23:23:10 公開日:2023-04-03

# MedKLIP: 医学的知識を活かした言語画像による放射線診断

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology ( http://arxiv.org/abs/2301.02228v3 )

ライセンス: Link先を確認

Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

(参考訳) 本稿では,放射線学的日々の実践から画像テキストのペアレポートを活用し,ドメイン固有知識を用いた医学的視覚言語前訓練(vlp)の強化を検討する。 In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks, e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. ゼロショットと微調整の両方において,従来の疾患分類法や接地法と比較して高い性能を示した。

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks, e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.

翻訳日:2023-04-04 23:22:51 公開日:2023-04-03

# ランダム深さ量子振幅推定

Random-depth Quantum Amplitude Estimation ( http://arxiv.org/abs/2301.00528v3 )

ライセンス: Link先を確認

Xi Lu and Hongwei Lin

(参考訳) 量子振幅推定は、量子計算と量子数値積分の基礎において重要なタスクである。最大ラピッド振幅推定(mlae)アルゴリズムは、古典モンテカルロ法上の理論的に二次的なスピードアップを持つ量子振幅推定問題の実用的な解である。 MLAEは量子フーリエ変換(QFT)を必要としないため、QFTベースのアルゴリズムよりも近い将来に広く使われる可能性が高い。しかし,mlaeは,その不正確性の主要な原因の一つであるいわゆる臨界点のため,偏りがないことが判明した。臨界点を避けるためにランダム深さ量子振幅推定法(RQAE)を提案する。また,本アルゴリズムがmlaeや他の量子振幅推定アルゴリズムよりも優れていないことを示す数値実験を行った。

The quantum amplitude estimation is a critical task in quantum computing and the foundation of quantum numerical integration. The maximum likelihood amplitude estimation (MLAE) algorithm is a practical solution to the quantum amplitude estimation problem, which has a theoretically quadratic speedup over classical Monte Carlo method. Since MLAE requires no use of the quantum Fourier transformation (QFT), it will be more likely to be widely used in the near future than QFT based algorithms. However, we find that MLAE is not unbiased due to the so-called critical points, which is one of the major causes of its inaccuracy. We propose a random-depth quantum amplitude estimation (RQAE) to avoid critical points. We also do numerical experiments to show that our algorithm is approximately unbiased and outperforms MLAE and other quantum amplitude estimation algorithms.

翻訳日:2023-04-04 23:22:18 公開日:2023-04-03

# Dream3D: 3次元形状とテキスト・画像拡散モデルを用いたゼロショットテキスト・ツー・3次元合成

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models ( http://arxiv.org/abs/2212.14704v2 )

ライセンス: Link先を確認

Jiale Xu, Xintao Wang, Weihao Cheng, Yan-Pei Cao, Ying Shan, Xiaohu Qie, Shenghua Gao

(参考訳) 最近のCLIP誘導3D最適化手法であるDreamFieldsやPureCLIPNeRFは、ゼロショットテキストから3D合成において驚くべき結果を得た。しかし、スクラッチトレーニングや事前知識のないランダム初期化のため、これらの手法は入力テキストに適合する正確で忠実な3D構造を生成することができないことが多い。本稿では,CLIP誘導3次元最適化プロセスに明示的な3次元形状を付加する最初の試みを行う。具体的には、まず、テキストから形状への入力テキストから、先行して3d形状として高品質な3d形状を生成する。次に、神経放射場の初期化として使用し、完全なプロンプトで最適化する。そこで本研究では,テキストと画像のモダリティを直接,強力なテキストと画像の拡散モデルでブリッジする,シンプルかつ効果的な手法を提案する。テキスト・ツー・イメージ拡散モデルにより合成された画像と画像・画像間のスタイル領域のギャップを狭めるために,学習可能なテキストプロンプトを最適化し,描画スタイルの画像生成のためのテキスト・ツー・イメージ拡散モデルを微調整することを提案する。本手法であるdream3dは、最先端の手法と比較して、視覚品質と形状精度に優れる想像的な3dコンテンツを生成することができる。

Recent CLIP-guided 3D optimization methods, such as DreamFields and PureCLIPNeRF, have achieved impressive results in zero-shot text-to-3D synthesis. However, due to scratch training and random initialization without prior knowledge, these methods often fail to generate accurate and faithful 3D structures that conform to the input text. In this paper, we make the first attempt to introduce explicit 3D shape priors into the CLIP-guided 3D optimization process. Specifically, we first generate a high-quality 3D shape from the input text in the text-to-shape stage as a 3D shape prior. We then use it as the initialization of a neural radiance field and optimize it with the full prompt. To address the challenging text-to-shape generation task, we present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. To narrow the style domain gap between the images synthesized by the text-to-image diffusion model and shape renderings used to train the image-to-shape generator, we further propose to jointly optimize a learnable text prompt and fine-tune the text-to-image diffusion model for rendering-style image generation. Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy compared to state-of-the-art methods.

翻訳日:2023-04-04 23:21:49 公開日:2023-04-03

# 絡み合いの集約:ドメイン一般化におけるドメインのバリエーションを再考する

Aggregation of Disentanglement: Reconsidering Domain Variations in Domain Generalization ( http://arxiv.org/abs/2302.02350v4 )

ライセンス: Link先を確認

Daoan Zhang, Mingkai Chen, Chenming Li, Lingyun Huang, Jianguo Zhang

(参考訳) ドメイン一般化(Domain Generalization, DG)は、さまざまなドメインにおけるモデル一般化を改善することを目的とした機械学習モデルの基本課題である。以前の手法では、様々なソースドメインからドメイン不変機能を生成することに重点を置いている。しかし,このドメインの変種には下流タスクのための有用な情報,ie,分類認識情報が含まれており,ほとんど無視されている。ソースドメインからドメイン不変の機能を学ぶのと異なり、入力イメージをドメインエキスパート機能とノイズに分離します。提案したドメインエキスパート機能は、各ドメインのイメージを独立して分類できる学習潜在空間にあり、分類対応ドメインのバリエーションを暗黙的に使用することができる。分析に基づいて、ドメインエキスパート機能をソースドメインイメージから切り離し、ターゲットのテストドメインを表現するためのソースドメインエキスパート機能を集約する、ドメインディスタングルメントネットワーク(ddn)と呼ばれる新しいパラダイムを提案しました。また、よりバランスよく分離可能な機能空間を形成するために、ドメインエキスパートの機能をガイドする新しいコントラスト学習手法も提案する。 PACS、VLCS、OfficeHome、DomainNet、TerraIncognitaの広く使われているベンチマーク実験は、最近提案された代替手法と比較して、我々の手法の競合性能を実証している。

Domain Generalization (DG) is a fundamental challenge for machine learning models, which aims to improve model generalization on various domains. Previous methods focus on generating domain invariant features from various source domains. However, we argue that the domain variantions also contain useful information, ie, classification-aware information, for downstream tasks, which has been largely ignored. Different from learning domain invariant features from source domains, we decouple the input images into Domain Expert Features and noise. The proposed domain expert features lie in a learned latent space where the images in each domain can be classified independently, enabling the implicit use of classification-aware domain variations. Based on the analysis, we proposed a novel paradigm called Domain Disentanglement Network (DDN) to disentangle the domain expert features from the source domain images and aggregate the source domain expert features for representing the target test domain. We also propound a new contrastive learning method to guide the domain expert features to form a more balanced and separable feature space. Experiments on the widely-used benchmarks of PACS, VLCS, OfficeHome, DomainNet, and TerraIncognita demonstrate the competitive performance of our method compared to the recently proposed alternatives.

翻訳日:2023-04-04 21:41:10 公開日:2023-04-03

# ChatGPTとその他の大規模生成AIモデルの制御

Regulating ChatGPT and other Large Generative AI Models ( http://arxiv.org/abs/2302.02337v5 )

ライセンス: Link先を確認

Philipp Hacker, Andreas Engel, Marco Mauer

(参考訳) ChatGPTやStable Diffusionのような大規模な生成AIモデル(LGAIM)は、私たちのコミュニケーション、図示、作成の方法に急速に変化しています。しかし、EUなどでは、AI規制は主にLGAIMではなく、従来のAIモデルに焦点を当てている。本稿では、信頼に値するAI規制に関する現在の議論の中で、これらの新しい生成モデルについて検討し、その能力にどのように適合するかを問う。技術基盤を整備した後は、(1)直接規制、(2)データ保護、(3)コンテンツモデレーション、(4)政策提案の4段階に進む。これは、LGAIMの開発者、デプロイ者、プロフェッショナルおよび非プロフェッショナルのユーザ、およびLGAIMのアウトプットを区別することで、LGAIM設定でAIバリューチェーンをキャプチャする新しい用語を提案する。我々は、これらの異なるアクターに対する規制業務をバリューチェーンに沿って調整し、LGAIMが社会全体の利益のために信頼でき、デプロイされることを保証するための4つの戦略を提案する。 ai法やその他の直接規制の規則は、事前訓練されたモデルの特異性に合致しなければならない。特に、規制は事前訓練されたモデル自身ではなく、具体的なハイリスクなアプリケーションに焦点を当てるべきであり、含めるべきである。一透明性に関する義務及び義務 (ii)リスク管理。非差別規定しかし、(iii)LGAIM開発者には適用できる。最後に (4) DSA コンテンツモデレーションルールの中核は LGAIM をカバーするように拡張されるべきである。これには通知とアクションのメカニズム、信頼できるフラグガーが含まれる。あらゆる分野において、規制当局や議員はチャットgptなどのダイナミクスを追跡するために迅速に行動する必要がある。

Large generative AI models (LGAIMs), such as ChatGPT or Stable Diffusion, are rapidly transforming the way we communicate, illustrate, and create. However, AI regulation, in the EU and beyond, has primarily focused on conventional AI models, not LGAIMs. This paper will situate these new generative models in the current debate on trustworthy AI regulation, and ask how the law can be tailored to their capabilities. After laying technical foundations, the legal part of the paper proceeds in four steps, covering (1) direct regulation, (2) data protection, (3) content moderation, and (4) policy proposals. It suggests a novel terminology to capture the AI value chain in LGAIM settings by differentiating between LGAIM developers, deployers, professional and non-professional users, as well as recipients of LGAIM output. We tailor regulatory duties to these different actors along the value chain and suggest four strategies to ensure that LGAIMs are trustworthy and deployed for the benefit of society at large. Rules in the AI Act and other direct regulation must match the specificities of pre-trained models. In particular, regulation should focus on concrete high-risk applications, and not the pre-trained model itself, and should include (i) obligations regarding transparency and (ii) risk management. Non-discrimination provisions (iii) may, however, apply to LGAIM developers. Lastly, (iv) the core of the DSA content moderation rules should be expanded to cover LGAIMs. This includes notice and action mechanisms, and trusted flaggers. In all areas, regulators and lawmakers need to act fast to keep track with the dynamics of ChatGPT et al.

翻訳日:2023-04-04 21:40:47 公開日:2023-04-03

# 部分的不変性による最適特徴の学習

Learning Optimal Features via Partial Invariance ( http://arxiv.org/abs/2301.12067v2 )

ライセンス: Link先を確認

Moulik Choraria, Ibtihal Ferwana, Ankur Mani, Lav R. Varshney

(参考訳) 分散シフトに頑健な学習モデルは、実際の適用可能性のコンテキストにおいて重要な関心事である。不変リスク最小化(IRM)は、複数の環境から堅牢なモデルを学ぶことを目的とした一般的なフレームワークである。 IRMの成功には重要な前提が必要であり、根底にある因果的メカニズムや特徴は環境全体にわたって不変である。満足しない場合には、IRMが予測子を過度に抑制できることを示し、これを緩和するために、$\textit{partial invariance}$ による緩和を提案する。本研究では、IRMの準最適性を理論的に強調し、トレーニング領域の分割から学習することで不変モデルを改善する方法を示す。線形な設定と、言語と画像データの両方のタスク上のディープニューラルネットワークの両方で実施した実験によって、結論の検証が可能になった。

Learning models that are robust to distribution shifts is a key concern in the context of their real-life applicability. Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments. The success of IRM requires an important assumption: the underlying causal mechanisms/features remain invariant across environments. When not satisfied, we show that IRM can over-constrain the predictor and to remedy this, we propose a relaxation via $\textit{partial invariance}$. In this work, we theoretically highlight the sub-optimality of IRM and then demonstrate how learning from a partition of training domains can help improve invariant models. Several experiments, conducted both in linear settings as well as with deep neural networks on tasks over both language and image data, allow us to verify our conclusions.

翻訳日:2023-04-04 21:39:42 公開日:2023-04-03

# ロバスト多視点三角測量のための半定値緩和

Semidefinite Relaxations for Robust Multiview Triangulation ( http://arxiv.org/abs/2301.11431v3 )

ライセンス: Link先を確認

Linus H\"arenstam-Nielsen, Niclas Zeller, Daniel Cremers

(参考訳) 本稿では,凸緩和に基づく最適ロバスト多視点三角測量のアプローチを提案する。この目的のために、最小二乗コスト関数を組み込むことで、既存の緩和アプローチを非ロバスト多視点三角測量に拡張する。本稿では,エピポーラ制約に基づく2つの定式化と,分数再投影制約に基づく2つの定式化を提案する。 1つ目は低次元であり、中程度の騒音と降圧レベルの下ではきつく、もう1つ目は高次元であり、したがって遅いが、極端な騒音と降圧レベルでもきつい。提案手法は,大きな雑音と大容量の異常の下でも,証明可能な最適再構成を計算できることを実証する。

We propose an approach based on convex relaxations for certifiably optimal robust multiview triangulation. To this end, we extend existing relaxation approaches to non-robust multiview triangulation by incorporating a least squares cost function. We propose two formulations, one based on epipolar constraints and one based on fractional reprojection constraints. The first is lower dimensional and remains tight under moderate noise and outlier levels, while the second is higher dimensional and therefore slower but remains tight even under extreme noise and outlier levels. We demonstrate through extensive experiments that the proposed approaches allow us to compute provably optimal reconstructions even under significant noise and a large percentage of outliers.

翻訳日:2023-04-04 21:39:19 公開日:2023-04-03

# 膨張エッジを持つ量子ホール系におけるアナログ・ド・シッター宇宙

Analog de Sitter universe in quantum Hall systems with an expanding edge ( http://arxiv.org/abs/2301.09270v2 )

ライセンス: Link先を確認

Yasusada Nambu and Masahiro Hotta

(参考訳) 量子ホール系におけるエッジの拡大は、量子1+1次元膨張宇宙のシミュレータとなる。これらの系では、エッジの排他は曲線時空におけるカイラルスカラー場として表される。このモデルにより予測されるホーキング放射と絡み合い挙動を、エッジ領域の膨張則がデ・シッター宇宙に対応すると仮定して検討する。量子場の観測可能な量として、検出領域に関連する局所空間モードをフィールドのウィンドウ関数を用いて導入し、それらの相関性を評価する。局所モードの自己相関関数に対するエッジ展開によるホーキング放射の影響を調べたところ,ホーキング放射による絡み合い死の発生が確認された。この絡み合いの挙動は、宇宙のインフレーションにおける「量子から古典への遷移」に関連している。

Expanding edges in quantum Hall systems can become a simulator of quantum 1+1 dimensional expanding universes. In these systems, edge exciations are represented as a chiral scalar field in curved spacetimes. We investigate Hawking radiation and entanglement behavior predicted by this model assuming that the expansion law of the edge region corresponds to a de Sitter universe. As observable quantities for the quantum field, local spatial modes associated with detection regions are introduced using window functions for the field, and their correlations are evaluated. We found impact of Hawking radiation caused by the edge expansion on auto-correlation functions of the local modes, and confirmed that entanglement death due to Hawking radiation occurs. This behavior of entanglement is related to ``quantum to classical transition" in cosmic inflations.

翻訳日:2023-04-04 21:38:18 公開日:2023-04-03

# Causal Triplet: インターベンション中心のCausal Representation Learningのためのオープンチャレンジ

Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning ( http://arxiv.org/abs/2301.05169v2 )

ライセンス: Link先を確認

Yuejiang Liu, Alexandre Alahi, Chris Russell, Max Horn, Dominik Zietlow, Bernhard Sch\"olkopf, Francesco Locatello

(参考訳) 近年、介入の下で低レベルの画像ペアから高レベルの因果表現を学ぶことへの関心が高まっている。しかし、既存の取り組みは、現実世界の問題とは程遠い単純な合成設定に限られている。本稿では,視覚的により複雑なシーンを特徴とする因果表現学習ベンチマークであるcausal tripletを提案する。 (i)あるオブジェクトレベル変数のみが反事実観察を許すが、他の変数が許さない行為可能な反事実設定 (ii)独立因果機構原理からの分散的ロバスト性を重視した介入的下流課題。広範な実験を通じて、乱れやオブジェクト中心の表現の知識で構築されたモデルは、分散表現よりも著しく優れていることが分かりました。しかし、近年の因果表現学習手法は、そのような潜伏構造を特定するのに苦慮しており、今後の仕事のかなりの課題と機会を示している。私たちのコードとデータセットはhttps://sites.google.com/view/causaltripletで利用可能です。

Recent years have seen a surge of interest in learning high-level causal representations from low-level image pairs under interventions. Yet, existing efforts are largely limited to simple synthetic settings that are far away from real-world problems. In this paper, we present Causal Triplet, a causal representation learning benchmark featuring not only visually more complex scenes, but also two crucial desiderata commonly overlooked in previous works: (i) an actionable counterfactual setting, where only certain object-level variables allow for counterfactual observations whereas others do not; (ii) an interventional downstream task with an emphasis on out-of-distribution robustness from the independent causal mechanisms principle. Through extensive experiments, we find that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts. However, recent causal representation learning methods still struggle to identify such latent structures, indicating substantial challenges and opportunities for future work. Our code and datasets will be available at https://sites.google.com/view/causaltriplet.

翻訳日:2023-04-04 21:38:06 公開日:2023-04-03

# コンフォーマル予測による超信頼性低レイテンシトラフィックの動的スケジューリング

Guaranteed Dynamic Scheduling of Ultra-Reliable Low-Latency Traffic via Conformal Prediction ( http://arxiv.org/abs/2302.07675v2 )

ライセンス: Link先を確認

Kfir M. Cohen, Sangwoo Park, Osvaldo Simeone, Petar Popovski, and Shlomo Shamai (Shitz)

(参考訳) アップリンクにおける超信頼性・低遅延トラフィック(urllc)の動的スケジューリングは、必要に応じてリソースを割り当てるだけで、モバイルブロードバンド(embb)デバイスなどの共存サービスの効率を大幅に向上させることができる。主な課題は、URLLCパケット生成のプロセスの不確実性によるものである。実際には、そのような予測は生成されるURLLCデータの量を過大評価または過小評価し、URLLCパケットに対してプリエンプティブに割り当てられるリソースの過剰または不足を生じる可能性がある。本稿では,urllcパケット用の新しいスケジューラを提案する。urllcトラフィック予測器の品質に関わらず,信頼性とレイテンシに関する形式的保証を提供する。提案手法は,オンライン整合予測(CP)の最近の進歩を活用し,設計者が設定した信頼性とレイテンシの要件を満たすために,割り当てリソースの量を動的に調整する原理に従う。

The dynamic scheduling of ultra-reliable and low-latency traffic (URLLC) in the uplink can significantly enhance the efficiency of coexisting services, such as enhanced mobile broadband (eMBB) devices, by only allocating resources when necessary. The main challenge is posed by the uncertainty in the process of URLLC packet generation, which mandates the use of predictors for URLLC traffic in the coming frames. In practice, such prediction may overestimate or underestimate the amount of URLLC data to be generated, yielding either an excessive or an insufficient amount of resources to be pre-emptively allocated for URLLC packets. In this paper, we introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor. The proposed method leverages recent advances in online conformal prediction (CP), and follows the principle of dynamically adjusting the amount of allocated resources so as to meet reliability and latency requirements set by the designer.

翻訳日:2023-04-04 21:32:11 公開日:2023-04-03

# 非特定運動データを用いた拡張可能なXRユーザ同定

Extensible Motion-based Identification of XR Users using Non-Specific Motion Data ( http://arxiv.org/abs/2302.07517v2 )

ライセンス: Link先を確認

Christian Schell, Konstantin Kobs, Tamara Fernando, Andreas Hotho, Marc Erich Latoschik

(参考訳) 本稿では,距離ベースと分類に基づくアプローチの強みを組み合わせることで,拡張現実ユーザの動きを識別する。そこで我々は,深層メトリック学習を活用した組込み型アプローチを提案する。われわれは,VRゲーム‘Half-Life: Alyx’’をプレイするユーザのデータセット上でモデルをトレーニングし,アート分類ベースモデルの状態をベースラインとして,複数の実験と分析を行う。その結果,埋め込み型手法が有効であった。 1) 数分間の登録データを使用して,非特定動作から新規ユーザを識別できる。 2)新しいユーザーを数秒以内に登録できるが、ベースラインアプローチの再トレーニングにはおよそ1日かかる。 3) 登録データが少ない場合にのみ,ベースラインアプローチよりも信頼性が高い。 4) 異なるVRデバイスで記録された別のデータセットから新しいユーザーを特定するために使用することができる。全体として、我々のソリューションは、拡張可能なxrユーザ識別システムの基礎であり、幅広いユーザ動作に適用できる。また、専門知識やハードウェア、あるいはディープラーニングモデルをトレーニングするためのデータを必要としない、XR実践者が使用可能なプロダクション対応モデルの道を開く。

In this paper, we combine the strengths of distance-based and classification-based approaches for the task of identifying extended reality users by their movements. For this we present an embedding-based approach that leverages deep metric learning. We train the model on a dataset of users playing the VR game ``Half-Life: Alyx'' and conduct multiple experiments and analyses using a state of the art classification-based model as baseline. The results show that the embedding-based method 1) is able to identify new users from non-specific movements using only a few minutes of enrollment data, 2) can enroll new users within seconds, while retraining the baseline approach takes almost a day, 3) is more reliable than the baseline approach when only little enrollment data is available, 4) can be used to identify new users from another dataset recorded with different VR devices. Altogether, our solution is a foundation for easily extensible XR user identification systems, applicable to a wide range of user motions. It also paves the way for production-ready models that could be used by XR practitioners without the requirements of expertise, hardware, or data for training deep learning models.

翻訳日:2023-04-04 21:31:52 公開日:2023-04-03

# 距離行列は幾何学的深層学習に十分か?

Is Distance Matrix Enough for Geometric Deep Learning? ( http://arxiv.org/abs/2302.05743v2 )

ライセンス: Link先を確認

Zian Li, Xiyuan Wang, Yinan Huang, Muhan Zhang

(参考訳) グラフニューラルネットワーク(GNN)は、分子動力学シミュレーションなど、与えられたグラフの幾何学を含むタスクによく使用される。幾何学グラフの距離行列には完全な幾何学的情報が含まれているが、この幾何学を学ぶにはメッセージパッシングニューラルネットワーク(MPNN)が不十分であることが示されている。本研究では,MPNNが距離行列と区別できない反例の族を拡張し,新しい幾何学グラフと対称幾何学グラフの族を構築する。次に,距離行列に含まれるリッチな幾何学を効果的に活用できる$k$-DisGNNを提案する。我々は、我々のモデルの高表現力を示し、既存のよく設計された幾何モデルが特別なケースとして$k$-DisGNNsで統一できることを証明する。最も重要なことは、幾何学的深層学習と従来のグラフ表現学習の関連性を確立することであり、グラフ構造学習用にもともと設計された表現力の高いGNNモデルは、幾何的深層学習にも適用可能であること、そして既存の複雑同変モデルが唯一の解決方法ではないこと、である。実験結果は我々の理論を検証する。

Graph Neural Networks (GNNs) are often used for tasks involving the geometry of a given graph, such as molecular dynamics simulation. Although the distance matrix of a geometric graph contains complete geometric information, it has been demonstrated that Message Passing Neural Networks (MPNNs) are insufficient for learning this geometry. In this work, we expand on the families of counterexamples that MPNNs are unable to distinguish from their distance matrices, by constructing families of novel and symmetric geometric graphs. We then propose $k$-DisGNNs, which can effectively exploit the rich geometry contained in the distance matrix. We demonstrate the high expressive power of our models and prove that some existing well-designed geometric models can be unified by $k$-DisGNNs as special cases. Most importantly, we establish a connection between geometric deep learning and traditional graph representation learning, showing that those highly expressive GNN models originally designed for graph structure learning can also be applied to geometric deep learning problems with impressive performance, and that existing complex, equivariant models are not the only solution. Experimental results verify our theory.

翻訳日:2023-04-04 21:30:56 公開日:2023-04-03

# 強化学習のための事前学習対象中心表現の検討

An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning ( http://arxiv.org/abs/2302.04419v2 )

ライセンス: Link先を確認

Jaesik Yoon, Yi-Fu Wu, Heechul Bae, and Sungjin Ahn

(参考訳) 教師なしオブジェクト指向表現(OCR)学習は近年,視覚表現の新しいパラダイムとして注目されている。これは、サンプル効率、体系的な一般化、推論という観点から、様々な下流タスクの効果的な事前学習技術になる可能性があるためである。画像に基づく強化学習(RL)は、こうした下流作業において最も重要かつ頻繁に言及される課題の1つであるが、RLの利点は驚くほど研究されていない。代わりに、ほとんどの評価は、セグメンテーションの品質やオブジェクトプロパティの予測精度といった、より間接的な指標に焦点を当てている。本稿では,OCR事前学習による画像に基づく強化学習の有効性を実証実験により検討する。体系的な評価のために、単純なオブジェクト指向ビジュアルRLベンチマークを導入し、'Does OCR pre-training improve performance on object-centric tasks?'や'Can OCR pre-training help with out-of-distriion generalization?'といった質問に答える実験を行う。以上の結果から,RLに対するOCR事前学習の有効性と,特定のシナリオにおけるOCR利用の潜在的な限界に関する貴重な知見が得られた。さらに,視覚複雑な環境におけるパフォーマンスや,オブジェクト表現を集約する適切なプーリング層など,rlにocrを事前トレーニングする上での重要な側面についても検討した。

Unsupervised object-centric representation (OCR) learning has recently drawn attention as a new paradigm of visual representation. This is because of its potential of being an effective pre-training technique for various downstream tasks in terms of sample efficiency, systematic generalization, and reasoning. Although image-based reinforcement learning (RL) is one of the most important and thus frequently mentioned such downstream tasks, the benefit in RL has surprisingly not been investigated systematically thus far. Instead, most of the evaluations have focused on rather indirect metrics such as segmentation quality and object property prediction accuracy. In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments. For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as ``Does OCR pre-training improve performance on object-centric tasks?'' and ``Can OCR pre-training help with out-of-distribution generalization?''. Our results provide empirical evidence for valuable insights into the effectiveness of OCR pre-training for RL and the potential limitations of its use in certain scenarios. Additionally, this study also examines the critical aspects of incorporating OCR pre-training in RL, including performance in a visually complex environment and the appropriate pooling layer to aggregate the object representations.

翻訳日:2023-04-04 21:30:12 公開日:2023-04-03

# 過去と未来 : マルチカメラ3dマルチオブジェクトトラッキングのための時空間モデリング

Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking ( http://arxiv.org/abs/2302.03802v2 )

ライセンス: Link先を確認

Ziqi Pang, Jie Li, Pavel Tokmakov, Dian Chen, Sergey Zagoruyko, Yu-Xiong Wang

(参考訳) 本研究では,エンドツーエンドのマルチカメラ3Dマルチオブジェクトトラッキング(MOT)フレームワークを提案する。時空間連続性を強調し、追跡対象の過去と将来の推論を統合する。そこで我々はこれを"Past-and-Future reasoning for Tracking"(PF-Track)と呼ぶ。具体的には、「注目による追跡」フレームワークに適応し、オブジェクトクエリと時間とともに追跡されたインスタンスを一貫性を持って表現する。私たちの"Past Reasoning"モジュールは、過去のフレームや他のオブジェクトからのクエリにクロスアタッチすることで、トラックを洗練し、オブジェクトの機能を強化することを学びました。 future reasoning"モジュールは、履歴情報を取り込み、堅牢な将来の軌跡を予測する。長期閉塞の場合,本手法は物体の位置を維持し,動き予測を統合することで再連想を可能にする。 nuScenes データセットでは,AMOTA のマージンが大きく向上し,従来の手法に比べて ID-Switch が90%削減された。コードとモデルはhttps://github.com/tri-ml/pf-trackで入手できる。

This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects. Thus, we name it "Past-and-Future reasoning for Tracking" (PF-Track). Specifically, our method adapts the "tracking by attention" framework and represents tracked instances coherently over time with object queries. To explicitly use historical cues, our "Past Reasoning" module learns to refine the tracks and enhance the object features by cross-attending to queries from previous frames and other objects. The "Future Reasoning" module digests historical information and predicts robust future trajectories. In the case of long-term occlusions, our method maintains the object positions and enables re-association by integrating motion predictions. On the nuScenes dataset, our method improves AMOTA by a large margin and remarkably reduces ID-Switches by 90% compared to prior approaches, which is an order of magnitude less. The code and models are made available at https://github.com/TRI-ML/PF-Track.

翻訳日:2023-04-04 21:29:50 公開日:2023-04-03

# 2つの損失は1より優れている:チーパプロキシを使った最適化の高速化

Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy ( http://arxiv.org/abs/2302.03542v2 )

ライセンス: Link先を確認

Blake Woodworth (SIERRA), Konstantin Mishchenko, Francis Bach (SIERRA, PSL)

(参考訳) 本稿では,関連関数をプロキシとして利用することにより,目的物を計算困難勾配で最小化するアルゴリズムを提案する。このアルゴリズムはプロキシ上の近似近近点反復と目的からの相対的勾配を組み合わせたものである。目的物とプロキシの差が$\delta$-smoothである場合、我々のアルゴリズムは、$\delta$-smoothの目的物に対する確率勾配勾配に一致する速度で収束することを保証する。我々のアルゴリズムは機械学習に多くの可能性があり、合成データ、物理シミュレータ、混合公開データ、プライベートデータなどを活用するための原則化された手段を提供する。

We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is $\delta$-smooth, our algorithm guarantees convergence at a rate matching stochastic gradient descent on a $\delta$-smooth objective, which can lead to substantially better sample efficiency. Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.

翻訳日:2023-04-04 21:29:33 公開日:2023-04-03

# 星-三角関係からの可積分量子回路

Integrable Quantum Circuits from the Star-Triangle Relation ( http://arxiv.org/abs/2302.12675v2 )

ライセンス: Link先を確認

Yuan Miao, Eric Vernier

(参考訳) 恒星-三角関係は、古典的な2次元統計力学モデルに対して正確な結果を提供する、正確に解けるモデルの領域において重要な役割を果たす。本稿では、星-三角関係を用いた可積分量子回路を構築する。この構成は、星-三角関係によって解かれた統計力学モデルに対して相互に可換な2パラメータ転移行列の族に依存しており、yang-baxter可積分頂点モデルに基づく既知構成とは異なる。スペクトルパラメータの特別な値において、転送行列は積分可能な量子回路にマッピングされ、そこでは局所保存電荷の無限の族が導出される。我々は、最近ロトコフらによって予想された積分性を持つ$Q$状態ポッツ回路と、我々の知識に新しい$\mathbb{Z}_Q$回路という、$Q$状態ポッツ回路の連鎖に作用する回路の2つの例を示す。最初の例では、$Q=3$ を Zamolodchikov-Fateev 19-頂点モデルに接続する。

The star-triangle relation plays an important role in the realm of exactly solvable models, offering exact results for classical two-dimensional statistical mechanical models. In this article, we construct integrable quantum circuits using the star-triangle relation. Our construction relies on families of mutually commuting two-parameter transfer matrices for statistical mechanical models solved by the star-triangle relation, and differs from previously known constructions based on Yang-Baxter integrable vertex models. At special value of the spectral parameter, the transfer matrices are mapped into integrable quantum circuits, for which infinite families of local conserved charges can be derived. We demonstrate the construction by giving two examples of circuits acting on a chain of $Q-$state qudits: $Q$-state Potts circuits, whose integrability has been conjectured recently by Lotkov et al., and $\mathbb{Z}_Q$ circuits, which are novel to our knowledge. In the first example, we present for $Q=3$ a connection to the Zamolodchikov-Fateev 19-vertex model.

翻訳日:2023-04-04 21:22:43 公開日:2023-04-03

# 単一物体追跡における変圧器 : 実験的検討

Transformers in Single Object Tracking: An Experimental Survey ( http://arxiv.org/abs/2302.11867v2 )

ライセンス: Link先を確認

Janani Thangavel, Thanikasalam Kokul, Amirthalingam Ramanan, and Subha Fernando

(参考訳) シングルオブジェクトトラッキングは、コンピュータビジョンにおいてよく知られ、挑戦的な研究トピックである。過去20年間、多くの研究者がこの問題を解くために様々なアルゴリズムを提案し、有望な結果を得た。近年、トランスフォーマーベースのトラッキングアプローチは、追跡ロバスト性が優れているため、単一オブジェクトトラッキングの新しい時代を告げている。トラッカの性能分析のための調査研究がいくつか行われているが、単一物体追跡におけるトランスフォーマーの導入後、別の調査研究が必要である。本研究では,変圧器追跡手法の文献と性能を分析することを目的とした。そこで我々は、Transformer Trackingアプローチの詳細な文献分析を行い、その追跡堅牢性と計算効率を、挑戦的なベンチマークデータセット上で評価する。さらに、異なるトラッキングシナリオでパフォーマンスを測定して、その強度と弱点を見つけました。我々の調査は、Transformer Trackingアプローチの基礎となる原則、直面している課題、今後の方向性に関する洞察を提供する。

Single object tracking is a well-known and challenging research topic in computer vision. Over the last two decades, numerous researchers have proposed various algorithms to solve this problem and achieved promising results. Recently, Transformer-based tracking approaches have ushered in a new era in single object tracking due to their superior tracking robustness. Although several survey studies have been conducted to analyze the performance of trackers, there is a need for another survey study after the introduction of Transformers in single object tracking. In this survey, we aim to analyze the literature and performances of Transformer tracking approaches. Therefore, we conduct an in-depth literature analysis of Transformer tracking approaches and evaluate their tracking robustness and computational efficiency on challenging benchmark datasets. In addition, we have measured their performances on different tracking scenarios to find their strength and weaknesses. Our survey provides insights into the underlying principles of Transformer tracking approaches, the challenges they face, and their future directions.

翻訳日:2023-04-04 21:22:03 公開日:2023-04-03

# 持続可能なオンデマンドライドプールの価格設定とマッチング

Future Aware Pricing and Matching for Sustainable On-demand Ride Pooling ( http://arxiv.org/abs/2302.10510v2 )

ライセンス: Link先を確認

Xianjie Zhang and Pradeep Varakantham and Hao Jiang

(参考訳) オンデマンドのライドプーリングの人気は、顧客(低価格)、タクシードライバー(高い収入)、環境(少ない車両によるカーボンフットプリント)、そしてuberのような集約企業(高い収入)に提供される利点がある。これらの利点を達成するには、2つの重要な相互リンク課題を効果的に解決する必要がある。 (a)価格 --タクシーの顧客要求に価格を設定すること (b)マッチング -- タクシー・車への顧客(価格を受け入れた)の割り当て。伝統的に、これら2つの課題は、将来の要求に対する現在のマッチングの影響を考慮せずに、個別に研究され、(現在の要求のみを考慮して)妙明なアプローチを用いている。本稿では,価格とマッチングの問題を取り扱うとともに,価格とマッチング決定の今後の影響も考慮しながら,新たな枠組みを提案する。実世界のタクシーデータセットにおける実験結果では、固定収入の取得に必要な車両数(最大14%、平均10.6%)と、車両の走行距離(最大11.1%、平均3.7%)を削減し、持続的に収益(平均17%、平均6.4%)を大幅に改善できることを実証した。つまり、顧客、ドライバー、アグリゲータ(ライドプール会社)に対して高い収益を得ると同時に、環境(道路上の車両の数が少なく、燃料消費も少ないため)に適している、すべての利害関係者(顧客、ドライバー、アグリゲータ、環境)に理想的なウィンウィンシナリオを提供することができるのです。

The popularity of on-demand ride pooling is owing to the benefits offered to customers (lower prices), taxi drivers (higher revenue), environment (lower carbon footprint due to fewer vehicles) and aggregation companies like Uber (higher revenue). To achieve these benefits, two key interlinked challenges have to be solved effectively: (a) pricing -- setting prices to customer requests for taxis; and (b) matching -- assignment of customers (that accepted the prices) to taxis/cars. Traditionally, both these challenges have been studied individually and using myopic approaches (considering only current requests), without considering the impact of current matching on addressing future requests. In this paper, we develop a novel framework that handles the pricing and matching problems together, while also considering the future impact of the pricing and matching decisions. In our experimental results on a real-world taxi dataset, we demonstrate that our framework can significantly improve revenue (up to 17% and on average 6.4%) in a sustainable manner by reducing the number of vehicles (up to 14% and on average 10.6%) required to obtain a given fixed revenue and the overall distance travelled by vehicles (up to 11.1% and on average 3.7%). That is to say, we are able to provide an ideal win-win scenario for all stakeholders (customers, drivers, aggregator, environment) involved by obtaining higher revenue for customers, drivers, aggregator (ride pooling company) while being good for the environment (due to fewer number of vehicles on the road and lesser fuel consumed).

翻訳日:2023-04-04 21:21:29 公開日:2023-04-03

# 物体検出における連続的領域適応のための領域ギャップの評価

Assessing Domain Gap for Continual Domain Adaptation in Object Detection ( http://arxiv.org/abs/2302.10396v2 )

ライセンス: Link先を確認

Anh-Dzung Doan and Bach Long Nguyen and Surabhi Gupta and Ian Reid and Markus Wagner and Tat-Jun Chin

(参考訳) 自律システムにおける信頼できる物体検出を確保するために、検出器は、日時、天候、季節などの環境要因による外観の変化に対応できなければならない。これらの変更を継続的に取り入れることは有望な解決策であるが、計算コストはかかる。提案手法は,現在のトレーニングデータと同じ分布を持たない新しいデータを用いて,必要なときにのみ検出器を選択的に適応させることである。この目的のために、ドメインギャップ評価のための3つの一般的なメトリクスを調査し、ドメインギャップと検出精度との間に相関があることを見出した。そこで, 領域ギャップを基準として, 検出器の適応時期を決定する。提案手法は, 環境条件が周期的に変化する現実のシナリオにおいて, 検出器全体の性能を犠牲にすることなく, 検出器の動作効率を向上させる可能性を秘めている。私たちのコードはhttps://github.com/dadung/DGE-CDA.comで公開されています。

To ensure reliable object detection in autonomous systems, the detector must be able to adapt to changes in appearance caused by environmental factors such as time of day, weather, and seasons. Continually adapting the detector to incorporate these changes is a promising solution, but it can be computationally costly. Our proposed approach is to selectively adapt the detector only when necessary, using new data that does not have the same distribution as the current training data. To this end, we investigate three popular metrics for domain gap evaluation and find that there is a correlation between the domain gap and detection accuracy. Therefore, we apply the domain gap as a criterion to decide when to adapt the detector. Our experiments show that our approach has the potential to improve the efficiency of the detector's operation in real-world scenarios, where environmental conditions change in a cyclical manner, without sacrificing the overall performance of the detector. Our code is publicly available at https://github.com/dadung/DGE-CDA.

翻訳日:2023-04-04 21:21:01 公開日:2023-04-03

# ブートストラップ the original latent: ブラックボックスモデルからプライベートモデルを学ぶ

Bootstrap The Original Latent: Learning a Private Model from a Black-box Model ( http://arxiv.org/abs/2303.03709v4 )

ライセンス: Link先を確認

Shuai Wang, Daoan Zhang, Jianguo Zhang, Weiwei Zhang, and Rui Li

(参考訳) 本稿では,モデル所有者とユーザニーズのデータ/モデルプライバシのバランスを考慮し,ブラックボックス基盤/ソースモデルのバックプロパゲーション結果のガイダンスを用いて,ユーザがプライベートモデルをより良いトレーニングを行うためのBack-Propagated Black-Box Adaptation (BPBA)を提案する。私たちの設定は、ファンデーション/ソースモデルの使用を容易にし、ファンデーション/ソースモデルの漏洩や誤用を防ぎます。さらに,基盤/ソースモデルを完全に活用するためのBootstrap The Original Latent(BTOL)という新たなトレーニング戦略を提案する。当社の戦略はドメインアダプタとフリーズ・アンド・ザウ戦略で構成されています。 3つのデータセットに対してBPBAとBlack-box UDA設定でBTOLを適用します。実験の結果,手作業による拡張を伴わずに,戦略が効率的かつ堅牢であることが確認された。

In this paper, considering the balance of data/model privacy of model owners and user needs, we propose a new setting called Back-Propagated Black-Box Adaptation (BPBA) for users to better train their private models via the guidance of the back-propagated results of a Black-box foundation/source model. Our setting can ease the usage of foundation/source models as well as prevent the leakage and misuse of foundation/source models. Moreover, we also propose a new training strategy called Bootstrap The Original Latent (BTOL) to fully utilize the foundation/source models. Our strategy consists of a domain adapter and a freeze-and-thaw strategy. We apply our BTOL under BPBA and Black-box UDA settings on three different datasets. Experiments show that our strategy is efficient and robust in various settings without manual augmentations.

翻訳日:2023-04-04 21:12:41 公開日:2023-04-03

# 量子コンピュータを用いた高次元離散時間結晶のシミュレーション

Simulation of Higher Dimensional Discrete Time Crystals on a Quantum Computer ( http://arxiv.org/abs/2303.02727v2 )

ライセンス: Link先を確認

Christopher Sims

(参考訳) 位相秩序状態の研究は、量子物質における対称性保護状態への関心が高まっている。近年、この理論は低温での秩序状態を示す量子多体系に拡張されている。この例は離散時間結晶(DTC)であり、実際の量子コンピュータや駆動システムで実証されている。これらの状態は周期的であり、ある程度の障害に対して保護されている。一般に、DTCは安定な多体局在状態(MBL)と不規則な熱状態の2つの段階に分けられる。本研究は, DTCを2次元に一般化することにより, 熱雑音の低減, MBL範囲の動作範囲の増大を実証する。

The study of topologically ordered states have given rise to a growing interest in symmetry protected states in quantum matter. Recently, this theory has been extended to quantum many body systems which demonstrate ordered states at low temperature. An example of this is the discrete time crystal (DTC) which has been demonstrated in a real quantum computer and in driven systems. These states are periodic in time and are protected to disorder to a certain extent. In general, DTC can be classified into two phases, the stable many body localization (MBL) state, and the disordered thermal state. This work demonstrates by generalizing DTC to 2 dimensions, there is an decrease in thermal noise and an increase in the operating range of the MBL range in the presence of disorder.

翻訳日:2023-04-04 21:12:03 公開日:2023-04-03

# 画像テキスト検索のための共通知識最適化型スタイルトランス

The style transformer with common knowledge optimization for image-text retrieval ( http://arxiv.org/abs/2303.00448v2 )

ライセンス: Link先を確認

Wenrui Li, Zhengyu Ma, Jinqiao Shi, Xiaopeng Fan

(参考訳) 異なるモダリティを関連付ける画像テキスト検索は,その優れた研究価値と広い実世界の応用により,広く注目を集めている。しかし、既存の手法のほとんどは、高レベルの意味的関係(スタイル埋め込み)とマルチモーダルからの共通知識を十分に考慮していない。そこで本稿では,画像テキスト検索のための共通知識最適化(CKSTN)を備えた新しいスタイルトランスフォーマーネットワークを提案する。主なモジュールは共通知識適応器 (CKA) であり、スタイル埋め込み抽出器 (SEE) と共通知識最適化 (CKO) モジュールの両方がある。具体的には、SEEはシーケンシャルアップデート戦略を使用して、SEEの異なるステージの特徴を効果的に接続します。 CKOモジュールは、様々なモダリティから共通知識の潜在概念を動的に捉えるために導入された。さらに、時間的共通知識を一般化するために、SEE内の異なるレイヤの特徴を従来の共通特徴ユニットと効果的に統合するためのシーケンシャルな更新戦略を提案する。 CKSTNは、MSCOCOおよびFlickr30Kデータセット上の画像テキスト検索における最先端手法の優位性を実証する。さらに、CKSTNは、より優れた性能と低いパラメータのため、実際のシーンに適用するためにより便利で実用的な軽量トランスフォーマーに基づいて構築される。

Image-text retrieval which associates different modalities has drawn broad attention due to its excellent research value and broad real-world application. However, most of the existing methods haven't taken the high-level semantic relationships ("style embedding") and common knowledge from multi-modalities into full consideration. To this end, we introduce a novel style transformer network with common knowledge optimization (CKSTN) for image-text retrieval. The main module is the common knowledge adaptor (CKA) with both the style embedding extractor (SEE) and the common knowledge optimization (CKO) modules. Specifically, the SEE uses the sequential update strategy to effectively connect the features of different stages in SEE. The CKO module is introduced to dynamically capture the latent concepts of common knowledge from different modalities. Besides, to get generalized temporal common knowledge, we propose a sequential update strategy to effectively integrate the features of different layers in SEE with previous common feature units. CKSTN demonstrates the superiorities of the state-of-the-art methods in image-text retrieval on MSCOCO and Flickr30K datasets. Moreover, CKSTN is constructed based on the lightweight transformer which is more convenient and practical for the application of real scenes, due to the better performance and lower parameters.

翻訳日:2023-04-04 21:10:50 公開日:2023-04-03

# 芸術の状況はどうなっていますか。機械学習ベンチマーク性能における多重性会計

What is the state of the art? Accounting for multiplicity in machine learning benchmark performance ( http://arxiv.org/abs/2303.07272v2 )

ライセンス: Link先を確認

Kajsa M{\o}llersen and Einar Holsb{\o}

(参考訳) 機械学習手法は一般に評価され、公開リポジトリのデータセットのパフォーマンスによって比較される。これにより、しばしば数千のメソッドが同じ条件下で、時間にわたって評価される。問題における最上位の成績は「最先端(SOTA)パフォーマンス」と呼ばれ、新しい手法を公表するための基準点として用いられる。 SOTAの最大性能を推定として用いることは偏りのある推定器であり、過度に楽観的な結果を与える。マルチプリシティ(multiplicity)は、複数の比較と複数のテストの文脈でよく研究されているトピックであるが、著者たちが認識している限り、SOTAの推定に関する議論からほとんど欠落している。新しい手法を評価するための基準として,楽観的な最先端推定法が用いられ,その結果が著しく劣る手法が容易に見過ごされる。本稿では、複数の分類器の場合の確率分布について、既知の解析手法を適用できるようにし、より優れたSOTA推定値を提供する。独立分類器を用いた模擬例による乗法の影響を実証する。分類子依存性が分散にどのように影響するかを示すとともに,精度が高い場合には影響が限定されることを示した。最後に,2020年のkaggleコンペティションという実例について論じる。

Machine learning methods are commonly evaluated and compared by their performance on data sets from public repositories. This allows for multiple methods, oftentimes several thousands, to be evaluated under identical conditions and across time. The highest ranked performance on a problem is referred to as state-of-the-art (SOTA) performance, and is used, among other things, as a reference point for publication of new methods. Using the highest-ranked performance as an estimate for SOTA is a biased estimator, giving overly optimistic results. The mechanisms at play are those of multiplicity, a topic that is well-studied in the context of multiple comparisons and multiple testing, but has, as far as the authors are aware of, been nearly absent from the discussion regarding SOTA estimates. The optimistic state-of-the-art estimate is used as a standard for evaluating new methods, and methods with substantial inferior results are easily overlooked. In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided. We demonstrate the impact of multiplicity through a simulated example with independent classifiers. We show how classifier dependency impacts the variance, but also that the impact is limited when the accuracy is high. Finally, we discuss a real-world example; a Kaggle competition from 2020.

翻訳日:2023-04-04 21:03:10 公開日:2023-04-03

# 動的環境におけるディジタルツインベースv2x通信を実現するマルチモーダルシミュレーションフレームワーク

A Multi-Modal Simulation Framework to Enable Digital Twin-based V2X Communications in Dynamic Environments ( http://arxiv.org/abs/2303.06947v2 )

ライセンス: Link先を確認

Lorenzo Cazzella, Francesco Linsalata, Maurizio Magarini, Matteo Matteucci, Umberto Spagnolini

(参考訳) 近年,物理無線環境のためのDigital Twins (DT) が,物理通信機器における多層決定を可能にする伝搬環境の正確な仮想表現として提案されている。高周波帯では、DTは車体環境を特徴とする高移動環境において生じる課題を克服するのに役立つ。本稿では,V2X通信シナリオのDT作成のための新しいデータ駆動ワークフローと,現実的なセンサデータと正確なmmWave/sub-THz無線チャネルを生成するためのマルチモーダルシミュレーションフレームワークを提案する。提案手法は,Unreal Engineゲームエンジンと正確なレイトレーシングチャネルシミュレータに基づく,自動車シミュレーションおよびテストフレームワークを活用する。都市シナリオのシミュレーションでは、達成可能な現実的なセンサーとチャネルがインフラとエゴ車両の両方でモデル化されている。

Digital Twins (DTs) for physical wireless environments have been recently proposed as accurate virtual representations of the propagation environment that can enable multi-layer decisions at the physical communication equipment. At high frequency bands, DTs can help to overcome the challenges emerging in the high mobility conditions featuring vehicular environments. In this paper, we propose a novel data-driven workflow for the creation of the DT of a Vehicle-to-Everything (V2X) communication scenario and a multi-modal simulation framework for the generation of realistic sensor data and accurate mmWave/sub-THz wireless channels. The proposed method leverages an automotive simulation and testing framework based on the Unreal Engine game engine and an accurate ray-tracing channel simulator. Simulations over an urban scenario show the achievable realistic sensor and channel modelling both at the infrastructure and at an ego-vehicle.

翻訳日:2023-04-04 21:02:49 公開日:2023-04-03

# coganppis:タンパク質-タンパク質相互作用サイト予測のための共進化強化グローバルアテンションニューラルネットワーク

CoGANPPIS: Coevolution-enhanced Global Attention Neural Network for Protein-Protein Interaction Site Prediction ( http://arxiv.org/abs/2303.06945v3 )

ライセンス: Link先を確認

Jiaxing Guo, Xuening Zhu, Zixin Hu, Xiaoxi Hu

(参考訳) タンパク質とタンパク質の相互作用は生化学的プロセスにおいて必須である。タンパク質-タンパク質相互作用部位(PPI)の正確な予測は、我々の生物学的メカニズムの理解を深め、新しい医薬品設計に不可欠である。しかし、従来のPPI予測実験手法はコストと時間を要するため、近年多くの計算手法、特にMLベースの手法が開発されている。これらの手法は, 満足度の高い結果を得たものの, 1) 多くのモデルでは有用な入力特徴を発掘しているが, 共進化的特徴を考慮に入れられなかった。(2) 注意ベースモデルでは, 対象残差から遠く離れた残差も考慮せず, 近隣残差に対してのみ注意重みを割り当てている。我々は,CGANPPISと呼ばれるPPI予測のためのシーケンスベースディープラーニングモデルである,共進化型グローバルアテンションニューラルネットワークを提案する。 It utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all the residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. そして、3つの出力が連結され、最終予測のために複数の完全連結層に渡される。 2つのベンチマークデータセット上のアプリケーションは、このモデルの最先端のパフォーマンスを実証しました。ソースコードはhttps://github.com/Slam1423/CoGANPPIS_source_codeで公開されている。

Protein-protein interactions are essential in biochemical processes. Accurate prediction of the protein-protein interaction sites (PPIs) deepens our understanding of biological mechanism and is crucial for new drug design. However, conventional experimental methods for PPIs prediction are costly and time-consuming so that many computational approaches, especially ML-based methods, have been developed recently. Although these approaches have achieved gratifying results, there are still two limitations: (1) Most models have excavated some useful input features, but failed to take coevolutionary features into account, which could provide clues for inter-residue relationships; (2) The attention-based models only allocate attention weights for neighboring residues, instead of doing it globally, neglecting that some residues being far away from the target residues might also matter. We propose a coevolution-enhanced global attention neural network, a sequence-based deep learning model for PPIs prediction, called CoGANPPIS. It utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all the residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. Then, the three outputs are concatenated and passed into several fully connected layers for the final prediction. Application on two benchmark datasets demonstrated a state-of-the-art performance of our model. The source code is publicly available at https://github.com/Slam1423/CoGANPPIS_source_code.

翻訳日:2023-04-04 21:02:33 公開日:2023-04-03

# テンソル分解における実対数正準閾値の上界とベイズ推定への応用

Upper Bound of Real Log Canonical Threshold of Tensor Decomposition and its Application to Bayesian Inference ( http://arxiv.org/abs/2303.05731v2 )

ライセンス: Link先を確認

Naoki Yoshida and Sumio Watanabe

(参考訳) テンソル分解は現在、データ分析、情報圧縮、知識回復に使われている。しかし、テンソル分解の数学的性質は特異学習機の1つであるため、まだ完全には解明されていない。本稿では,代数幾何学的手法を用いてテンソル分解の実対正準しきい値(rlct)の上界を与え,ベイズ一般化誤差を理論的に導出する。また,その数学的性質を数値実験によって考察する。

Tensor decomposition is now being used for data analysis, information compression, and knowledge recovery. However, the mathematical property of tensor decomposition is not yet fully clarified because it is one of singular learning machines. In this paper, we give the upper bound of its real log canonical threshold (RLCT) of the tensor decomposition by using an algebraic geometrical method and derive its Bayesian generalization error theoretically. We also give considerations about its mathematical property through numerical experiments.

翻訳日:2023-04-04 21:00:54 公開日:2023-04-03

# SUD$^2$:画像再構成のための拡散モデルによるスーパービジョン

SUD$^2$: Supervision by Denoising Diffusion Models for Image Reconstruction ( http://arxiv.org/abs/2303.09642v2 )

ライセンス: Link先を確認

Matthew A. Chan, Sean I. Young, Christopher A. Metzler

(参考訳) 多くのイメージング逆問題$\unicode{x2014}$ 画像依存のin-paintingやdehazing$\unicode{x2014}$ は、前方モデルが未知あるいは未知の潜在パラメータに依存しているため困難である。膨大な量のペアトレーニングデータでニューラルネットワークをトレーニングすることで、そのような問題を解決することができるが、ペアトレーニングデータはしばしば利用できない。本稿では,ペアトレーニングデータが少ない場合に,画像再構成ネットワークをトレーニングするための汎用フレームワークを提案する。特に,画像復号化アルゴリズムと拡張により,ペアトレーニングデータがない場合のネットワークトレーニングを監督する拡散モデルをデノナイズする能力を示す。

Many imaging inverse problems$\unicode{x2014}$such as image-dependent in-painting and dehazing$\unicode{x2014}$are challenging because their forward models are unknown or depend on unknown latent parameters. While one can solve such problems by training a neural network with vast quantities of paired training data, such paired training data is often unavailable. In this paper, we propose a generalized framework for training image reconstruction networks when paired training data is scarce. In particular, we demonstrate the ability of image denoising algorithms and, by extension, denoising diffusion models to supervise network training in the absence of paired training data.

翻訳日:2023-04-04 20:54:31 公開日:2023-04-03

# 分割定数近似を超える形状パルスのシミュレーションと設計

Simulation and design of shaped pulses beyond the piecewise-constant approximation ( http://arxiv.org/abs/2303.09458v3 )

ライセンス: Link先を確認

Uluk Rasulov, Anupama Acharya, Marina Carravetta, Ilya Kuprov

(参考訳) 共振回路の応答関数は、入力が急速に変化するとリングアーティファクトを生成する。電磁分光学の物理的限界を探索すると、2種類の問題が発生する。まず、シミュレーション: システムは応答のトランジェントごとに正確に伝達されなければならず、計算コストがかかる。第二に、最適制御:回路応答を考慮に入れなければならない;そのような歪みに耐性のあるパルスを設計することが有利である。両問題の根源は回転するフレームの制御シーケンスに対する一般的な分割定数近似であり、磁気共鳴では初期から持続し、市販のハードウェアに絡み付いている。本稿では,スムーズな制御シーケンスを効率的にシミュレートし最適化できる最近のリー群法の実装とベンチマークについて報告する。

Response functions of resonant circuits create ringing artefacts if their input changes rapidly. When physical limits of electromagnetic spectroscopies are explored, this creates two types of problems. Firstly, simulation: the system must be propagated accurately through every response transient, this may be computationally expensive. Secondly, optimal control: circuit response must be taken into account; it may be advantageous to design pulses that are resilient to such distortions. At the root of both problems is the popular piecewise-constant approximation for control sequences in the rotating frame; in magnetic resonance it has persisted since the earliest days and has become entrenched in the commercially available hardware. In this paper, we report an implementation and benchmarks of recent Lie-group methods that can efficiently simulate and optimise smooth control sequences.

翻訳日:2023-04-04 20:54:18 公開日:2023-04-03

# タンゴまで1回はかかるが、もっとトラブルを起こすのか? 文脈内学習に必要な実演数

It Takes One to Tango but More Make Trouble? The Number of Demonstrations Needed for In-Context Learning ( http://arxiv.org/abs/2303.08119v2 )

ライセンス: Link先を確認

Jiuhai Chen, LiChang Chen, Chen Zhu, Tianyi Zhou

(参考訳) 大規模言語モデル(LLM)は、インコンテキスト学習(ICL)によっていくつかのインプット・アウトプット・デモ(デム)が提供されると複雑な推論を行うことができ、デモの中間的推論ステップ(CoT)が与えられるとより強力になる。 ICLでマルチデモを使う必要はあるか? 本稿では,<wei2022chain} のタスクにおける各テストクエリのデモを減らして ICL について検討する。驚いたことに、ランダムに選択されたデモのみを使用する場合、大きな劣化は観察されない。この現象を研究するために、各テストクエリに対して、デモを"正しいデモ"に分類し、正しい回答を導き、"間違ったデモ"を誤った回答に導く。私たちの分析では、これらの広く研究されているデータセットに固有のバイアスが示されています。ほとんどのデモは、テストクエリの大部分に対して正しいものです。さらに、ICL(with and w/o CoT)は1つの正しいデモのみを使用しており、これまでのほとんどの研究で採用されていた全デモICLよりも大幅に優れており、バイアス付きデータセットでは評価が難しい入力クエリの正しいデモ(s)を見つける際のLCMの弱点を示している。さらに,より正確なデモを行うと,その正確性が低下(改善)するマルチデモを用いて,iclの直観に反する行動が観察される。これは、iclがデモとそれらのスプリアス相関の間の干渉によって容易に誤解されることを意味する。我々の分析では、LLMのトレーニング、ICL、ベンチマーク設計で対処する必要があるいくつかの基本的な課題を取り上げている。

Large language models (LLMs) are capable to perform complex reasoning by in-context learning (ICL) when provided with a few input-output demonstrations (demos) and more powerful when intermediate reasoning steps ("chain of thoughts (CoT)") of the demos are given. Is it necessary to use multi-demo in ICL? In this paper, we study ICL using fewer demos for each test query on the tasks in~\cite{wei2022chain}. Surprisingly, we do not observe significant degradation when using only one randomly chosen demo. To study this phenomenon, for each test query, we categorize demos into "correct demos" leading to the correct answer, and "wrong demos" resulting in wrong answers. Our analysis reveals an inherent bias in those widely studied datasets: most demos are correct for a majority of test queries, which explains the good performance of using one random demo. Moreover, ICL (with and w/o CoT) using only one correct demo significantly outperforms all-demo ICL adopted by most previous works, indicating the weakness of LLMs in finding correct demo(s) for input queries, which is difficult to evaluate on the biased datasets. Furthermore, we observe a counterintuitive behavior of ICL using multi-demo, i.e., its accuracy degrades(improves) when given more correct(wrong) demos. This implies that ICL can be easily misguided by interference among demos and their spurious correlations. Our analyses highlight several fundamental challenges that need to be addressed in LLMs training, ICL, and benchmark design.

翻訳日:2023-04-04 20:52:51 公開日:2023-04-03

# RepoCoder: 反復検索と生成によるリポジトリレベルのコード補完

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation ( http://arxiv.org/abs/2303.12570v2 )

ライセンス: Link先を確認

Fengji Zhang, Bei Chen, Yue Zhang, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, Weizhu Chen

(参考訳) リポジトリレベルのコード補完のタスクは、リポジトリのより広いコンテキストに基づいて未完成のコードを書き続けることです。自動化されたコード補完ツールでは、異なるファイルに散在する有用な情報を利用するのは難しい。この課題に対処するためのシンプルで汎用的で効果的なフレームワークであるRepoCoderを提案する。類似度ベースのレトリバーと事前学習されたコード言語モデルを組み合わせて、リポジトリレベルのコード補完プロセスを合理化し、コード補完にリポジトリレベルの情報の有効利用を可能にし、様々なレベルの粒度でコードを生成する機能を提供する。さらに、RepoCoderは、検索コンテキストと目的とする完了目標とのギャップを埋める、新しい反復検索生成パラダイムを利用する。また、ライン、API呼び出し、ファンクションボディ補完シナリオをカバーする最新かつ高品質な現実世界リポジトリで構成される新しいベンチマークRepoEvalを提案する。コードレトリバーとジェネレータの様々な組み合わせを用いて,レポコーダの性能をテストする。実験の結果,レポコーダはゼロショットコード補完ベースラインを全設定で10%以上向上させ,バニラ検索によるコード補完アプローチを一貫して上回っていることがわかった。さらに,RepoCoderの有効性を総合分析により検証し,今後の研究に有用な知見を提供する。

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in different files. We propose RepoCoder, a simple, generic, and effective framework to address the challenge. It streamlines the repository-level code completion process by incorporating a similarity-based retriever and a pre-trained code language model, which allows for the effective utilization of repository-level information for code completion and grants the ability to generate code at various levels of granularity. Furthermore, RepoCoder utilizes a novel iterative retrieval-generation paradigm that bridges the gap between retrieval context and the intended completion target. We also propose a new benchmark RepoEval, which consists of the latest and high-quality real-world repositories covering line, API invocation, and function body completion scenarios. We test the performance of RepoCoder by using various combinations of code retrievers and generators. Experimental results indicate that RepoCoder significantly improves the zero-shot code completion baseline by over 10% in all settings and consistently outperforms the vanilla retrieval-augmented code completion approach. Furthermore, we validate the effectiveness of RepoCoder through comprehensive analysis, providing valuable insights for future research.

翻訳日:2023-04-04 20:45:29 公開日:2023-04-03

# MEGA: 生成AIの多言語評価

MEGA: Multilingual Evaluation of Generative AI ( http://arxiv.org/abs/2303.12528v2 )

ライセンス: Link先を確認

Kabir Ahuja and Rishav Hada and Millicent Ochieng and Prachi Jain and Harshita Diddee and Samuel Maina and Tanuja Ganu and Sameer Segal and Maxamed Axmed and Kalika Bali and Sunayana Sitaram

(参考訳) 生成AIモデルは、言語理解、推論、言語生成など、多くの自然言語処理タスクにおいて印象的なパフォーマンスを持つ。今日のAIコミュニティから求められている最も重要な質問の1つは、これらのモデルの能力と限界についてであり、生成的AIを評価することが非常に難しいことは明らかである。生成型大言語モデル(llm)の研究のほとんどは英語に限られており、これらのモデルが他言語をいかに理解し生成できるかは不明である。そこで本研究では,標準NLPベンチマークのモデル評価を行うジェネレーティブLLMsMEGAの総合ベンチマークを行い,8つのタスクと33の言語を網羅した。また, 生成型LLMの性能を, これらのタスクにおける非自己回帰モデル(SOTA)と比較し, 生成型LLMと比較して, 生成型モデルの性能について検討した。本稿では, 言語間でのモデルの性能を徹底的に分析し, 生成LDMが現在すべての言語に最適でない理由について論じる。我々は,多言語設定におけるジェネレーティブLLMの評価フレームワークを作成し,今後の発展に向けての方向性を提供する。

Generative AI models have impressive performance on many Natural Language Processing tasks such as language understanding, reasoning and language generation. One of the most important questions that is being asked by the AI community today is about the capabilities and limits of these models, and it is clear that evaluating generative AI is very challenging. Most studies on generative Large Language Models (LLMs) are restricted to English and it is unclear how capable these models are at understanding and generating other languages. We present the first comprehensive benchmarking of generative LLMs - MEGA, which evaluates models on standard NLP benchmarks, covering 8 diverse tasks and 33 typologically diverse languages. We also compare the performance of generative LLMs to State of the Art (SOTA) non-autoregressive models on these tasks to determine how well generative models perform compared to the previous generation of LLMs. We present a thorough analysis of the performance of models across languages and discuss some of the reasons why generative LLMs are currently not optimal for all languages. We create a framework for evaluating generative LLMs in the multilingual setting and provide directions for future progress in the field.

翻訳日:2023-04-04 20:45:06 公開日:2023-04-03

# ニューラルラジアンスフィールドの対話的幾何学的編集

Interactive Geometry Editing of Neural Radiance Fields ( http://arxiv.org/abs/2303.11537v2 )

ライセンス: Link先を確認

Shaoxu Li and Ye Pan

(参考訳) 本稿では,神経放射場操作のためのインタラクティブな幾何学的編集を可能にする手法を提案する。シーンの編集には2つのプロキシケージ(インナーケージと外部ケージ)を使用します。インナーケージは操作対象を定義し、アウターケージは調整空間を定義する。 2つのケージには様々な操作が適用される。ケージ選択後、インナーケージの操作は、インナーケージの所望の変換と外ケージの調整につながる。ユーザーは翻訳、回転、スケーリング、組み合わせでシーンを編集できる。角の操作やケージの端の操作もサポートされている。我々の手法は明示的な3次元幾何表現を必要としない。インタラクティブな幾何編集は、暗黙の神経放射場に直接適用される。広範な実験結果から,本手法の有効性が示された。

In this paper, we propose a method that enables interactive geometry editing for neural radiance fields manipulation. We use two proxy cages(inner cage and outer cage) to edit a scene. The inner cage defines the operation target, and the outer cage defines the adjustment space. Various operations apply to the two cages. After cage selection, operations on the inner cage lead to the desired transformation of the inner cage and adjustment of the outer cage. Users can edit the scene with translation, rotation, scaling, or combinations. The operations on the corners and edges of the cage are also supported. Our method does not need any explicit 3D geometry representations. The interactive geometry editing applies directly to the implicit neural radiance fields. Extensive experimental results demonstrate the effectiveness of our approach.

翻訳日:2023-04-04 20:44:12 公開日:2023-04-03

# 絡み合った送信機を有するマルチアクセスチャネル

The Multiple-Access Channel with Entangled Transmitters ( http://arxiv.org/abs/2303.10456v2 )

ライセンス: Link先を確認

Uzi Pereg, Christian Deppe, and Holger Boche

(参考訳) 従来型マルチアクセスチャネル(mac)と絡み合いリソースとの通信を考慮し,通信開始前に2つの送信機で絡み合いリソースを共有する。 leditzki et al. (2020) は、疑似テレパシーゲームで定義される古典的なmacの例を示し、絡み合った送信機との和率は、そのようなリソースのない最高の達成可能な和率よりも厳密に高いことを示した。ここでは,一般MACのキャパシティ領域とエンタングル送信器の完全なキャパシティ特性を導出し,この結果が特別な場合として得られることを示す。有限次元の補助変数とアンシラを含む単一の文字公式が確立される。これにより、このレート領域を達成するのに十分な絡み合い率が得られる。さらに、メッセージ平均誤差基準の下での古典的なmacの容量領域は、最大誤差基準よりも厳密に大きいことが長年知られている(dueck, 1978)。絡み合った資源が与えられた場合、その領域は一致する。

Communication over a classical multiple-access channel (MAC) with entanglement resources is considered, whereby two transmitters share entanglement resources a priori before communication begins. Leditzki et al. (2020) presented an example of a classical MAC, defined in terms of a pseudo telepathy game, such that the sum rate with entangled transmitters is strictly higher than the best achievable sum rate without such resources. Here, we derive a full characterization of the capacity region for the general MAC with entangled transmitters, and show that the previous result can be obtained as a special case. A single letter formula is established involving auxiliary variables and ancillas of finite dimensions. This, in turn, leads to a sufficient entanglement rate to achieve the rate region. Furthermore, it has long been known that the capacity region of the classical MAC under a message-average error criterion can be strictly larger than with a maximal error criterion (Dueck, 1978). We observe that given entanglement resources, the regions coincide.

翻訳日:2023-04-04 20:42:38 公開日:2023-04-03

# TRAK: スケールでのモデル行動への貢献

TRAK: Attributing Model Behavior at Scale ( http://arxiv.org/abs/2303.14186v2 )

ライセンス: Link先を確認

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry

(参考訳) データ帰属の目的は、モデルの予測をトレーニングデータに遡ることである。この目標への長い努力にもかかわらず、データ帰属に対する既存のアプローチは、ユーザに計算の扱いやすさと有効性を選択させる傾向がある。すなわち、計算可能な手法は、非凸設定(ディープニューラルネットワークの文脈など)におけるモデル予測の正確な帰属に苦労するが、そのような手法では、数千のモデルを訓練する必要があるため、大規模モデルやデータセットでは実用的でない。本稿では,大規模で微分可能なモデルに対して,有効かつ計算的に抽出可能なデータ帰属法であるTRAK(Tracing with the Randomly-projected After Kernel)を紹介する。特に、わずかに訓練されたモデルを活用することで、TRAKは何千ものモデルのトレーニングを必要とする属性メソッドのパフォーマンスにマッチすることができる。我々は、イメージネットで訓練された画像分類器、視覚言語モデル(CLIP)、言語モデル(BERT、mT5)のTRAKの有用性を実証する。私たちは https://github.com/MadryLab/trak で TRAK を使用するためのコードを提供しています。

The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .

翻訳日:2023-04-04 20:36:02 公開日:2023-04-03

# Make-It-3D:拡散前の単一画像からの高忠実度3D創出

Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior ( http://arxiv.org/abs/2303.14184v2 )

ライセンス: Link先を確認

Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, Dong Chen

(参考訳) 本研究では,1枚の画像のみから高忠実度3Dコンテンツを作成する問題について検討する。基本的には下層の3d幾何学を推定し、目に見えないテクスチャを同時に幻覚させる。この課題に対処するために,訓練された2次元拡散モデルからの事前知識を活用し,3次元生成のための3次元認識監督を行う。提案手法であるMake-It-3Dは,2段階の最適化パイプラインを用いており,第1段階は前部からの基準画像からの制約を取り入れ,第2段階は粗いモデルをテクスチャ化された点雲に変換し,第2段階は参照画像から高品質なテクスチャを活用しながら,拡散により現実性を高める。広汎な実験により,本手法は先行研究よりも大きなマージンを達成し,忠実な再建と印象的な視覚的品質を実現した。本手法は,汎用オブジェクトの単一画像から高品質な3D作成を実現するための最初の試みであり,テキスト・ツー・3D作成やテクスチャ編集などの様々な応用を可能にする。

In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our approach, Make-It-3D, employs a two-stage optimization pipeline: the first stage optimizes a neural radiance field by incorporating constraints from the reference image at the frontal view and diffusion prior at novel views; the second stage transforms the coarse model into textured point clouds and further elevates the realism with diffusion prior while leveraging the high-quality textures from the reference image. Extensive experiments demonstrate that our method outperforms prior works by a large margin, resulting in faithful reconstructions and impressive visual quality. Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.

翻訳日:2023-04-04 20:35:39 公開日:2023-04-03

# 物理インフォームドポイントネット:不規則な幾何の測地を同時に解くことができるか? 線形弾性への応用

Physics-informed PointNet: On how many irregular geometries can it solve an inverse problem simultaneously? Application to linear elasticity ( http://arxiv.org/abs/2303.13634v2 )

ライセンス: Link先を確認

Ali Kashefi, Leonidas J. Guibas, Tapan Mukerji

(参考訳) 正規物理情報ニューラルネットワーク(PINN)はスパースラベル付きデータを用いた偏微分方程式の解を1つの領域で予測する。一方、完全に教師付き学習モデルは通常、既知のソリューション(ラベル付きデータ)を持つ数千以上のドメインで訓練され、数百の未知のドメインでそのソリューションを予測する。物理インフォームドポイントネット(PIPN)は、PINN(弱教師付き学習モデル)と完全教師付き学習モデルの間のギャップを埋めるように設計されている。本稿では、PIPNが数百の領域に対して所望の偏微分方程式の解を同時に予測し、スパースラベル付きデータのみを使用することを示した。このフレームワークは、ラベル付きデータしか利用できない業界で高速な幾何学的設計の恩恵を受ける。特に, pipnは, 異なる地形を持つ500以上の領域において, 平面応力問題の解を同時に予測することを示した。さらに,顕著なバッチサイズの概念(すなわち,各サブエポックで pipn に供給されるジオメトリの数)を pipn に実装する先駆者でもある。具体的には,7,14,19,38,76,133のバッチサイズを試す。さらに、損失関数におけるスパースラベルデータの構成成分に対するPIPNサイズ、PIPNアーキテクチャにおける対称関数、および静的および動的重みの影響について検討した。

Regular physics-informed neural networks (PINNs) predict the solution of partial differential equations using sparse labeled data but only over a single domain. On the other hand, fully supervised learning models are first trained usually over a few thousand domains with known solutions (i.e., labeled data) and then predict the solution over a few hundred unseen domains. Physics-informed PointNet (PIPN) is primarily designed to fill this gap between PINNs (as weakly supervised learning models) and fully supervised learning models. In this article, we demonstrate that PIPN predicts the solution of desired partial differential equations over a few hundred domains simultaneously, while it only uses sparse labeled data. This framework benefits fast geometric designs in the industry when only sparse labeled data are available. Particularly, we show that PIPN predicts the solution of a plane stress problem over more than 500 domains with different geometries, simultaneously. Moreover, we pioneer implementing the concept of remarkable batch size (i.e., the number of geometries fed into PIPN at each sub-epoch) into PIPN. Specifically, we try batch sizes of 7, 14, 19, 38, 76, and 133. Additionally, the effect of the PIPN size, symmetric function in the PIPN architecture, and static and dynamic weights for the component of the sparse labeled data in the loss function are investigated.

翻訳日:2023-04-04 20:34:33 公開日:2023-04-03

# アクティブサンプリングを用いた病理組織学におけるデータ効率の良いコントラスト学習

Data Efficient Contrastive Learning in Histopathology using Active Sampling ( http://arxiv.org/abs/2303.16247v2 )

ライセンス: Link先を確認

Tahsin Reasat and David S. Smith

(参考訳) ディープラーニングに基づく診断システムは、デジタル病理学において正確で堅牢な定量的分析を提供することができる。これらのアルゴリズムは、病理組織像の高分解能のため、病理学では実用的でない大量の注釈付きトレーニングデータを必要とする。そこで,アドホックなプレテキストタスクを用いて特徴を学習するための自己指導手法が提案されている。自己教師型トレーニングプロセスは時間がかかり、学習した特徴空間、特にデータ不均衡の下で顕著な制約が欠如しているため、しばしばサブパー機能表現につながる。本研究では,少数のラベルと小さなプロキシネットワークを用いてトレーニングセットを積極的にサンプリングし,サンプル要求を93%削減し,トレーニング時間を99%削減することを提案する。

Deep Learning based diagnostics systems can provide accurate and robust quantitative analysis in digital pathology. These algorithms require large amounts of annotated training data which is impractical in pathology due to the high resolution of histopathological images. Hence, self-supervised methods have been proposed to learn features using ad-hoc pretext tasks. The self-supervised training process is time consuming and often leads to subpar feature representation due to a lack of constrain on the learnt feature space, particularly prominent under data imbalance. In this work, we propose to actively sample the training set using a handful of labels and a small proxy network, decreasing sample requirement by 93% and training time by 99%.

翻訳日:2023-04-04 20:26:20 公開日:2023-04-03

# 群不変多様体の拡散写像

Diffusion Maps for Group-Invariant Manifolds ( http://arxiv.org/abs/2303.16169v2 )

ライセンス: Link先を確認

Paulina Hoyos and Joe Kileel

(参考訳) 本稿では、コンパクトリー群$K$の作用の下でデータセットが不変であるときの多様体学習問題を考察する。私たちのアプローチは、既存のデータポイントの$k$-orbitsを統合してデータ誘発グラフラプラシアンを増強することで、$k$-invariantグラフラプラシアン$l$を得るというものです。 l$ は、k$ のユニタリ既約表現行列を用いて対角化できることを証明し、その固有値と固有関数を計算するための明示的な公式を提供する。さらに、正規化されたラプラシア作用素 $L_N$ がデータ多様体のラプラス・ベルトラミ作用素に収束し、収束率が向上し、対称性群 $K$ の次元で改善が増加することを示す。この研究は、Landa と Shkolnisky のステアブルグラフ Laplacian フレームワークを $\operatorname{SO}(2)$ の場合には任意のコンパクトリー群に拡張する。

In this article, we consider the manifold learning problem when the data set is invariant under the action of a compact Lie group $K$. Our approach consists in augmenting the data-induced graph Laplacian by integrating over the $K$-orbits of the existing data points, which yields a $K$-invariant graph Laplacian $L$. We prove that $L$ can be diagonalized by using the unitary irreducible representation matrices of $K$, and we provide an explicit formula for computing its eigenvalues and eigenfunctions. In addition, we show that the normalized Laplacian operator $L_N$ converges to the Laplace-Beltrami operator of the data manifold with an improved convergence rate, where the improvement grows with the dimension of the symmetry group $K$. This work extends the steerable graph Laplacian framework of Landa and Shkolnisky from the case of $\operatorname{SO}(2)$ to arbitrary compact Lie groups.

翻訳日:2023-04-04 20:25:48 公開日:2023-04-03

# PADME-SoSci: 社会科学のための分析と分散機械学習のためのプラットフォーム

PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences ( http://arxiv.org/abs/2303.18200v2 )

ライセンス: Link先を確認

Zeyd Boukhers and Arnim Bleier and Yeliz Ucer Yediel and Mio Hienstorfer-Heitmann and Mehrshad Jaberansary and Adamantios Koumpis and Oya Beyan

(参考訳) データプライバシと所有権は、社会データ科学において重要であり、法的および倫理的な懸念を提起する。異なるパーティがデータの一部を所有している場合、データの共有と分析は難しい。この課題に対するアプローチは、分析のために収集する前にデータに非識別または匿名化技術を適用することである。しかし、これによりデータの有用性が低下し、再識別のリスクが高まる。これらの制約に対処するため,モデル実装とトレーニングを連携させる分散分析ツールであるPADMEを提案する。 PADMEは、モデルをすべてのパーティによって実装し、デプロイするフェデレートされたアプローチを使用して、トレーニングのために各データロケーションを漸進的に訪問する。これにより、すべてのデータが単一の場所にあるかのようにモデルをトレーニングしながら、ロケーションをまたいだデータ分析が可能になる。元の場所でデータに基づいてモデルをトレーニングすることは、データのオーナシップを保存する。さらに、すべてのデータロケーションで分析が完了するまで結果が提供されず、プライバシを確保し、結果のバイアスを回避する。

Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME, a distributed analytics tool that federates model implementation and training. PADME uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.

翻訳日:2023-04-04 20:17:30 公開日:2023-04-03

# 今日の連続学習アルゴリズムはどの程度効率的か?

How Efficient Are Today's Continual Learning Algorithms? ( http://arxiv.org/abs/2303.18171v2 )

ライセンス: Link先を確認

Md Yousuf Harun, Jhair Gallardo, Tyler L. Hayes, Christopher Kanan

(参考訳) Supervised Continual Learningでは、ラベル付きデータのストリームからディープニューラルネットワーク(DNN)を更新する。ほとんどの研究は破滅的な忘れを克服することに重点を置いているが、継続的学習の背景にある大きな動機の1つは、トレーニングデータセットをスクラッチからトレーニングするのではなく、新しい情報でネットワークを効率的に更新できることだ。最近の連続的な学習手法は破滅的な忘れ問題を主に解決しているが、これらのアルゴリズムの効率性にはほとんど注意が払われていない。本稿では,近年のインクリメンタルなクラス学習手法について検討し,計算,メモリ,記憶の面では非常に非効率であることを示す。スクラッチからトレーニングするよりも多くの計算を必要とするメソッドもあります! 連続学習が現実の応用性を持つためには、研究コミュニティはこれらのアルゴリズムが使用するリソースを無視できない。破滅的な忘れを和らげるより連続的な学習がある。

Supervised Continual learning involves updating a deep neural network (DNN) from an ever-growing stream of labeled data. While most work has focused on overcoming catastrophic forgetting, one of the major motivations behind continual learning is being able to efficiently update a network with new information, rather than retraining from scratch on the training dataset as it grows over time. Despite recent continual learning methods largely solving the catastrophic forgetting problem, there has been little attention paid to the efficiency of these algorithms. Here, we study recent methods for incremental class learning and illustrate that many are highly inefficient in terms of compute, memory, and storage. Some methods even require more compute than training from scratch! We argue that for continual learning to have real-world applicability, the research community cannot ignore the resources used by these algorithms. There is more to continual learning than mitigating catastrophic forgetting.

翻訳日:2023-04-04 20:17:14 公開日:2023-04-03

# Poster: トレーニングDNNにおけるバイアス、ノード感度、ロングテール分布の関連性

Poster: Link between Bias, Node Sensitivity and Long-Tail Distribution in trained DNNs ( http://arxiv.org/abs/2303.16589v2 )

ライセンス: Link先を確認

Mahum Naseer and Muhammad Shafique

(参考訳) 優れた学習(と再学習)能力のため、ディープニューラルネットワーク(DNN)は多くの現実世界のアプリケーションで使われている。しかし、これらのデータ駆動機械学習モデルの学習は、トレーニングで利用できるデータと同じくらい一般的に優れている。したがって、長いテール分布を持つトレーニングデータセットは、異なる出力クラス間で異なるレベルの分類性能を提供する可能性があるため、dnnにとって課題となる。このようなネットワークの全体的なバイアスはすでに既存の研究で強調されているが、この研究は異なる出力クラスに対するノードの感度の変化につながるノードバイアスを特定する。私たちの知る限りでは、これはDNNにおけるこのユニークな課題を強調し、その可能性について議論し、この新しい研究の方向性にオープンな課題を提供する最初の作品です。実世界のデータセットでトレーニングされたネットワークの実証的なケーススタディを用いて、推論を支援する。

Owing to their remarkable learning (and relearning) capabilities, deep neural networks (DNNs) find use in numerous real-world applications. However, the learning of these data-driven machine learning models is generally as good as the data available to them for training. Hence, training datasets with long-tail distribution pose a challenge for DNNs, since the DNNs trained on them may provide a varying degree of classification performance across different output classes. While the overall bias of such networks is already highlighted in existing works, this work identifies the node bias that leads to a varying sensitivity of the nodes for different output classes. To the best of our knowledge, this is the first work highlighting this unique challenge in DNNs, discussing its probable causes, and providing open challenges for this new research direction. We support our reasoning using an empirical case study of the networks trained on a real-world dataset.

翻訳日:2023-04-04 20:16:22 公開日:2023-04-03

# グラフニューラルネットワークの事前トレーニングはいつか? データ生成の観点からの答え!

When to Pre-Train Graph Neural Networks? An Answer from Data Generation Perspective! ( http://arxiv.org/abs/2303.16458v2 )

ライセンス: Link先を確認

Yuxuan Cao, Jiarong Xu, Carl Yang, Jiaan Wang, Yunchao Zhang, Chunping Wang, Lei Chen, Yang Yang

(参考訳) 近年,グラフ事前学習が注目されており,グラフデータから伝達可能な知識を学習して下流の性能を向上させることを目指している。これらの最近の試みにもかかわらず、下流タスクにグラフ事前学習モデルを適用する場合、負の転送は大きな問題である。既存の作業は、事前トレーニングの方法と、多数のグラフ事前トレーニングと微調整戦略を設計することで、事前トレーニングの方法の問題に多大な努力を払っていた。しかし、戦略がどんなに進歩しても、「事前訓練と微調整」のパラダイムは依然として明確な利益を得られないケースがある。本稿では,事前トレーニングや微調整を行う前に,事前トレーニングをいつ行うか(つまり,どのような状況でグラフ事前トレーニングを活用できるか)という重要な質問に答える汎用フレームワークw2pgnnを紹介する。まず,新しい視点から,事前学習データから下流データへの複雑な生成メカニズムを探索する。特に、w2pgnnは、まず事前トレーニングされたデータをgraphonベースに適合させ、graphon基底(すなわちgraphon)の各要素は、事前トレーニングされたグラフの集合によって共有される基本的な転送可能なパターンを識別する。グラフェン塩基のすべての凸結合は生成空間を生じさせ、そこから生成されたグラフは、事前学習の恩恵を受ける下流データのための解空間を形成する。これにより、発電機空間内の任意の発電機からの下流データの生成確率として事前学習の実現可能性を定量化することができる。 W2PGNNは、グラフ事前トレーニングモデルの適用範囲の提供、事前トレーニングの実行可能性の定量化、事前トレーニングデータの選択による下流のパフォーマンス向上など、幅広い3つのアプリケーションを提供している。後者の2つの応用について, 理論上, 合理的な解法と広範な経験的正当性を与える。

Recently, graph pre-training has attracted wide research attention, which aims to learn transferable knowledge from unlabeled graph data so as to improve downstream performance. Despite these recent attempts, the negative transfer is a major issue when applying graph pre-trained models to downstream tasks. Existing works made great efforts on the issue of what to pre-train and how to pre-train by designing a number of graph pre-training and fine-tuning strategies. However, there are indeed cases where no matter how advanced the strategy is, the "pre-train and fine-tune" paradigm still cannot achieve clear benefits. This paper introduces a generic framework W2PGNN to answer the crucial question of when to pre-train (i.e., in what situations could we take advantage of graph pre-training) before performing effortful pre-training or fine-tuning. We start from a new perspective to explore the complex generative mechanisms from the pre-training data to downstream data. In particular, W2PGNN first fits the pre-training data into graphon bases, each element of graphon basis (i.e., a graphon) identifies a fundamental transferable pattern shared by a collection of pre-training graphs. All convex combinations of graphon bases give rise to a generator space, from which graphs generated form the solution space for those downstream data that can benefit from pre-training. In this manner, the feasibility of pre-training can be quantified as the generation probability of the downstream data from any generator in the generator space. W2PGNN provides three broad applications, including providing the application scope of graph pre-trained models, quantifying the feasibility of performing pre-training, and helping select pre-training data to enhance downstream performance. We give a theoretically sound solution for the first application and extensive empirical justifications for the latter two applications.

翻訳日:2023-04-04 20:16:09 公開日:2023-04-03

# 安定拡散に対するクエリフリー逆攻撃に関するパイロット研究

A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion ( http://arxiv.org/abs/2303.16378v2 )

ライセンス: Link先を確認

Haomin Zhuang, Yihua Zhang and Sijia Liu

(参考訳) 安定拡散によるテキスト・トゥ・イメージ(T2I)生成における記録破りのパフォーマンスにもかかわらず、その逆の堅牢性には研究の注意が払われていない。本研究では,安定拡散に対する対角攻撃生成の問題について検討し,エンドツーエンドのモデルクエリがなくても,逆方向のテキストプロンプトが得られるかどうかを問う。結果の問題を「クエリフリーアタック生成」と呼ぶ。この問題を解決するために、T2Iモデルの脆弱性は、テキストエンコーダの堅牢性の欠如、例えば、安定拡散攻撃に使用されるCLIPテキストエンコーダに根ざしていることを示す。このような知見に基づいて,前者がテキスト埋め込み空間において最も影響力のある次元に基づいて構築され,我々は「ステアブルキー次元」と呼んでいる,非ターゲットのクエリフリーアタックとターゲットのクエリフリーアタックの両方を提案する。提案する攻撃を活用し,テキストプロンプトに対する5文字の摂動のみが,安定な拡散を用いて合成画像の重要コンテンツシフトを誘発できることを実証的に示す。さらに,提案するターゲット攻撃は拡散モデルを正確に制御し,対象画像コンテンツをスクラブし,非対象画像コンテンツに大きな変化を生じさせないことを示す。私たちのコードはhttps://github.com/OPTML-Group/QF-Attack.comで利用可能です。

Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable Diffusion, less research attention is paid to its adversarial robustness. In this work, we study the problem of adversarial attack generation for Stable Diffusion and ask if an adversarial text prompt can be obtained even in the absence of end-to-end model queries. We call the resulting problem 'query-free attack generation'. To resolve this problem, we show that the vulnerability of T2I models is rooted in the lack of robustness of text encoders, e.g., the CLIP text encoder used for attacking Stable Diffusion. Based on such insight, we propose both untargeted and targeted query-free attacks, where the former is built on the most influential dimensions in the text embedding space, which we call steerable key dimensions. By leveraging the proposed attacks, we empirically show that only a five-character perturbation to the text prompt is able to cause the significant content shift of synthesized images using Stable Diffusion. Moreover, we show that the proposed target attack can precisely steer the diffusion model to scrub the targeted image content without causing much change in untargeted image content. Our code is available at https://github.com/OPTML-Group/QF-Attack.

翻訳日:2023-04-04 20:15:38 公開日:2023-04-03

# YOLO-v7特徴量を用いたVVC符号化ビデオにおける物体検出精度の向上

Accuracy Improvement of Object Detection in VVC Coded Video Using YOLO-v7 Features ( http://arxiv.org/abs/2304.00689v1 )

ライセンス: Link先を確認

Takahiro Shindo, Taiju Watanabe, Kein Yamada, Hiroshi Watanabe

(参考訳) ディープラーニングに基づく画像認識技術の進歩に伴い、人工知能による自動ビデオ解析が普及している。画像認識に使用される映像の量が増加するにつれて、このような映像データの効率的な圧縮方法が必要となる。一般的に、画像符号化により画質が劣化すると、画像認識精度も低下する。そこで本稿では,符号化映像に後処理を適用することにより,画像認識精度,特に物体検出精度を向上させるニューラルネットワークに基づく手法を提案する。 Versatile Video Coding (VVC) は, ビデオ圧縮法として, 最高の符号化性能を有する最新のビデオ符号化法である。ニューラルネットワークは、最新のオブジェクト検出モデルであるYOLO-v7の特徴を使ってトレーニングされている。 VVCをビデオ符号化法とし、YOLO-v7を検出モデルとし、低ビットレートでも高い物体検出精度を実現する。実験の結果,提案手法とvvcの組み合わせにより,対象検出精度が通常のvvcよりも高い符号化性能が得られることがわかった。

With advances in image recognition technology based on deep learning, automatic video analysis by Artificial Intelligence is becoming more widespread. As the amount of video used for image recognition increases, efficient compression methods for such video data are necessary. In general, when the image quality deteriorates due to image encoding, the image recognition accuracy also falls. Therefore, in this paper, we propose a neural-network-based approach to improve image recognition accuracy, especially the object detection accuracy by applying post-processing to the encoded video. Versatile Video Coding (VVC) will be used for the video compression method, since it is the latest video coding method with the best encoding performance. The neural network is trained using the features of YOLO-v7, the latest object detection model. By using VVC as the video coding method and YOLO-v7 as the detection model, high object detection accuracy is achieved even at low bit rates. Experimental results show that the combination of the proposed method and VVC achieves better coding performance than regular VVC in object detection accuracy.

翻訳日:2023-04-04 16:55:34 公開日:2023-04-03

# マクロな低損失フォノンキャビティによる最小長の制約の改善

Improved Constraints on the Minimum Length with a Macroscopic Low Loss Phonon Cavity ( http://arxiv.org/abs/2304.00688v1 )

ライセンス: Link先を確認

William M. Campbell and Michael E. Tobar and Serge Galliou and Maxim Goryachev

(参考訳) 重力の量子的記述を定式化しようとする多くの理論は、基本的な最小長スケールの存在を示唆している。この最小長を組み込む一般的な方法は、一般化不確実性原理(generalized uncertainty principle, gup)として知られるハイゼンベルクの不確実性原理の修正である。複合システムに適用されたGUPの実験実験は、機械共振器モードの誘導周波数摂動を探索することにより、特定のシナリオにおける最小長の度合いを制限できる。本研究は, 従来の機械式共振器による制約を, 極低温クォーツバルク波共振器を用いて3桁の精度で改善するものである。純粋な機械的共振モードだけでなく、ハイブリッド電気機械的反共振モードも検討し、同じGUP誘発効果に敏感であることを示した。

Many theories that attempt to formulate a quantum description of gravity suggest the existence of a fundamental minimum length scale. A popular method for incorporating this minimum length is through a modification of the Heisenberg uncertainty principle known as the generalised uncertainty principle (GUP). Experimental tests of the GUP applied to composite systems can be performed by searching for the induced frequency perturbations of the modes of mechanical resonators, thus constraining the degree of minimum length in certain scenarios. In this work previous constraints made with mechanical resonators are improved upon by three orders of magnitude, via the utilisation of a cryogenic quartz bulk acoustic wave resonator. As well as purely mechanical resonant modes; hybrid electromechanical anti-resonant modes are investigated, and shown to be sensitive to the same GUP induced effects.

翻訳日:2023-04-04 16:55:18 公開日:2023-04-03

# 視覚タスクのための視覚言語モデル:調査

Vision-Language Models for Vision Tasks: A Survey ( http://arxiv.org/abs/2304.00685v1 )

ライセンス: Link先を確認

Jingyi Zhang, Jiaxing Huang, Sheng Jin and Shijian Lu

(参考訳) ほとんどの視覚認識研究は、ディープニューラルネットワーク(dnn)トレーニングにおけるクラウドラベルデータに大きく依存しており、それらは通常、単一の視覚認識タスクごとにdnnを訓練し、手間と時間を要する視覚認識パラダイムへと繋がる。この2つの課題に対処するため、視覚言語モデル(VLM)は近年、インターネット上でほぼ無限に利用できるWebスケールの画像テキストペアからリッチな視覚言語相関を学習し、単一のVLMを用いて様々な視覚認識タスクのゼロショット予測を可能にする、集中的に研究されている。 This paper provides a systematic review of visual language models for various visual recognition tasks, including: (1) the background that introduces the development of visual recognition paradigms; (2) the foundations of VLM that summarize the widely-adopted network architectures, pre-training objectives, and downstream tasks; (3) the widely-adopted datasets in VLM pre-training and evaluations; (4) the review and categorization of existing VLM pre-training methods, VLM transfer learning methods, and VLM knowledge distillation methods; (5) the benchmarking, analysis and discussion of the reviewed methods; (6) several research challenges and potential research directions that could be pursued in the future VLM studies for visual recognition. この調査に関連するプロジェクトはhttps://github.com/jingyi0000/vlm_surveyで作成されている。

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm. To address the two challenges, Vision-Language Models (VLMs) have been intensively investigated recently, which learns rich vision-language correlation from web-scale image-text pairs that are almost infinitely available on the Internet and enables zero-shot predictions on various visual recognition tasks with a single VLM. This paper provides a systematic review of visual language models for various visual recognition tasks, including: (1) the background that introduces the development of visual recognition paradigms; (2) the foundations of VLM that summarize the widely-adopted network architectures, pre-training objectives, and downstream tasks; (3) the widely-adopted datasets in VLM pre-training and evaluations; (4) the review and categorization of existing VLM pre-training methods, VLM transfer learning methods, and VLM knowledge distillation methods; (5) the benchmarking, analysis and discussion of the reviewed methods; (6) several research challenges and potential research directions that could be pursued in the future VLM studies for visual recognition. A project associated with this survey has been created at https://github.com/jingyi0000/VLM_survey.

翻訳日:2023-04-04 16:55:05 公開日:2023-04-03

# 超伝導量子ビットの最適リセット

Optimizing resetting of superconducting qubits ( http://arxiv.org/abs/2304.00684v1 )

ライセンス: Link先を確認

Ciro M. Diniz, Rogerio J. de Assis, Norton G. de Almeida and Celso J. Villas-Boas

(参考訳) 多くの量子アルゴリズムは、信頼できる統計結果を得るために多数の繰り返しを要求する。したがって、それぞれの繰り返しにおいて、量子ビットを可能な限り短時間で効率よく正確にリセットする必要があるため、量子コンピュータは古典的よりも有利である。本研究では,超伝導量子ビットにおける情報リセットのための3種類のモデルについて詳細な解析を行う。我々の実験装置は、主量子ビットの情報を消去するために使用される、異なる補助散逸系に結合された主量子ビットで構成されている。解析の結果,主キュービットのリセット時間を削減するために補助系に関連する結合や散逸率を増加させるには不十分であり,各研究手法のパラメータの最適集合を見出すことが動機となり,解析した3つのモデルのリセット時間を大幅に減少させることができた。

Many quantum algorithms demand a large number of repetitions to obtain reliable statistical results. Thus, at each repetition it is necessary to reset the qubits efficiently and precisely in the shortest possible time, so that quantum computers actually have advantages over classical ones. In this work, we perform a detailed analysis on three different models for information resetting in superconducting qubits. Our experimental setup consists of a main qubit coupled to different auxiliary dissipative systems, that are employed in order to perform the erasing of the information of the main qubit. Our analysis shows that it is not enough to increase the coupling and the dissipation rate associated with the auxiliary systems to decrease the resetting time of the main qubit, a fact that motivates us to find the optimal set of parameters for each studied approach, allowing a significant decrease in the reset time of the three models analyzed.

翻訳日:2023-04-04 16:54:46 公開日:2023-04-03

# 極性超強結合:基底状態における量子絡み合い

Polaritonic Ultrastrong Coupling: Quantum Entanglement in Ground State ( http://arxiv.org/abs/2304.00680v1 )

ライセンス: Link先を確認

Qingtian Miao and G.S. Agarwal

(参考訳) 物質の基本励起と微小キャビティモードの超強結合は、完全に解析的な量子力学理論の枠組みで研究されている。初等励起はフォノン、励起子、プラズモンなどである。ハミルトニアンの対角化から、我々はポラリトンハミルトニアンの基底状態を得る。グラウンドステートはガウスクラスに属する。ガウスの性質を用いて基底状態における量子交絡を計算する。量子エンタングルメントには、エンタングルメントエントロピーと対数的負のパラメータの2つの異なる測度を使い、エンタングルメント測度に対してかなり単純な解析式を得る。以上の結果から,超強結合系では基底状態の量子絡み合い量が非常に大きいことがわかった。偏光子周波数の測定から得られる。

The ultrastrong coupling between the elementary excitations of matter and microcavity modes is studied in a fully analytical quantum-mechanical theoretical framework. The elementary excitation could be phonons, excitons, plasmons, etc. From the diagonalization of the Hamiltonian, we obtain the ground state of the polariton Hamiltonian. The ground state belongs to the Gaussian class. Using the Gaussian property we calculate the quantum entanglement in the ground state. We use two different measures for quantum entanglement -- entanglement entropy and the logarithmic negativity parameter and obtain rather simple analytical expressions for the entanglement measures. Our findings show that the amount of quantum entanglement in the ground state is quite significant in the ultrastrong coupling regime. It can be obtained from the measurement of the polariton frequencies.

翻訳日:2023-04-04 16:54:33 公開日:2023-04-03

# cv2x-loca:自律走行車のための路側ユニット対応協調ローカライズフレームワーク

CV2X-LOCA: Roadside Unit-Enabled Cooperative Localization Framework for Autonomous Vehicles ( http://arxiv.org/abs/2304.00676v1 )

ライセンス: Link先を確認

Zilin Huang, Sikai Chen, Yuzhuang Pian, Zihao Sheng, Soyoung Ahn, and David A. Noyce

(参考訳) 都市部での安全な運転を可能にするために、正確なロバストな位置決めシステムは自動運転車(AV)にとって不可欠である。既存のグローバルナビゲーション衛星システム(GNSS)ベースの手法はオープンスキー地域での車両の配置に有効であるが、多層橋の下層や高層道路、トンネルなどの都市キャニオンでの高精度の位置決めは依然として課題である。本稿では,セルラーV2X(C-V2X)無線通信がGNSS環境下でのAVのローカライズ性能を向上させる可能性について検討する。具体的には,C-V2Xチャネル状態情報のみを用いてレーンレベルの位置決め精度を実現する,第1の道路側ユニット(RSU)対応協調ローカライゼーションフレームワーク,CV2X-LOCAを提案する。 CV2X-LOCAは、データ処理モジュール、粗い位置決めモジュール、環境パラメータ修正モジュール、車両軌道フィルタリングモジュールの4つの重要な部分から構成されている。これらのモジュールは、動的C-V2Xネットワークに存在する課題を共同で処理する。 CV2X-LOCAは, 高速走行, スパースRSUのカバー環境において, 騒音条件下であっても, 車両位置決めの最先端性能を実現する。この研究結果は、rsusの費用対効果に関する運輸機関の今後の投資決定に関する洞察も提供する。

An accurate and robust localization system is crucial for autonomous vehicles (AVs) to enable safe driving in urban scenes. While existing global navigation satellite system (GNSS)-based methods are effective at locating vehicles in open-sky regions, achieving high-accuracy positioning in urban canyons such as lower layers of multi-layer bridges, streets beside tall buildings, tunnels, etc., remains a challenge. In this paper, we investigate the potential of cellular-vehicle-to-everything (C-V2X) wireless communications in improving the localization performance of AVs under GNSS-denied environments. Specifically, we propose the first roadside unit (RSU)-enabled cooperative localization framework, namely CV2X-LOCA, that only uses C-V2X channel state information to achieve lane-level positioning accuracy. CV2X-LOCA consists of four key parts: data processing module, coarse positioning module, environment parameter correcting module, and vehicle trajectory filtering module. These modules jointly handle challenges present in dynamic C-V2X networks. Extensive simulation and field experiments show that CV2X-LOCA achieves state-of-the-art performance for vehicle localization even under noisy conditions with high-speed movement and sparse RSUs coverage environments. The study results also provide insights into future investment decisions for transportation agencies regarding deploying RSUs cost-effectively.

翻訳日:2023-04-04 16:54:19 公開日:2023-04-03

# フィルタインバージョンによる部分ビューオブジェクトビュー合成

Partial-View Object View Synthesis via Filtered Inversion ( http://arxiv.org/abs/2304.00673v1 )

ライセンス: Link先を確認

Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber

(参考訳) 本研究では,1つか数つの部分ビューからレンダリング可能な3dオブジェクト表現を予測する学習フレームワークおよび最適化プロセスであるfiltering inversion(finv)を提案する。 FINVは、部分的な観察からオブジェクトの新たなビューを合成するという課題に対処する。これを達成するため、finvは3次元生成モデルを訓練して形状事前学習を行う。推測において、新しい現実世界のオブジェクトの1つ以上のビューが与えられたとき、FINVはまず、生成モデルを複数の初期シードから反転させることで、オブジェクトの潜在コードを見つける。潜伏符号のセットの維持、finvフィルタの検証、およびパーティクルフィルタリングのような新しい観察を受けた後の再サンプリング。次にジェネレータは、利用可能なビューの各潜在コードに対して微調整され、新しいオブジェクトに適応する。 FINVは, 合成対象にのみ訓練された場合でも, 現実の物体(例えば, 椅子, テーブル, 車)の新規な視点を合成することに成功した。 sim-to-real問題に対処する能力により、FINVは実際のデータセットなしでオブジェクトカテゴリに使用できる。 FINVは、複数の実世界のデータセット上で最先端のパフォーマンスを達成し、部分的およびスパースなビューからオブジェクトの形状とテクスチャを回復し、閉塞に対して堅牢であり、より多くの観測でその表現を漸進的に改善することができる。

We propose Filtering Inversion (FINV), a learning framework and optimization process that predicts a renderable 3D object representation from one or few partial views. FINV addresses the challenge of synthesizing novel views of objects from partial observations, spanning cases where the object is not entirely in view, is partially occluded, or is only observed from similar views. To achieve this, FINV learns shape priors by training a 3D generative model. At inference, given one or more views of a novel real-world object, FINV first finds a set of latent codes for the object by inverting the generative model from multiple initial seeds. Maintaining the set of latent codes, FINV filters and resamples them after receiving each new observation, akin to particle filtering. The generator is then finetuned for each latent code on the available views in order to adapt to novel objects. We show that FINV successfully synthesizes novel views of real-world objects (e.g., chairs, tables, and cars), even if the generative prior is trained only on synthetic objects. The ability to address the sim-to-real problem allows FINV to be used for object categories without real-world datasets. FINV achieves state-of-the-art performance on multiple real-world datasets, recovers object shape and texture from partial and sparse views, is robust to occlusion, and is able to incrementally improve its representation with more observations.

翻訳日:2023-04-04 16:53:54 公開日:2023-04-03

# CRN: 高精度でロバストで効率的な3D知覚のためのカメラレーダネット

CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception ( http://arxiv.org/abs/2304.00670v1 )

ライセンス: Link先を確認

Youngseok Kim, Sanmin Kim, Juyeb Shin, Jun Won Choi, Dongsuk Kum

(参考訳) 自律運転には、3Dオブジェクトの検出、追跡、セグメンテーションを含む正確で高速な3D知覚システムが必要である。最近の低コストカメラベースのアプローチは有望な結果を示しているが、照明の悪さや悪天候の影響を受けやすいため、局所誤差が大きい。したがって、精密な長距離測定を提供し、すべての環境で確実に作動する低コストのレーダーカメラは有望であるが、まだ十分に調査されていない。本稿では,様々なタスクに対して,意味的にリッチで空間的に正確なbird's-eye-view(bev)特徴マップを生成する,新しいカメラ・レーダー融合フレームワークであるcamer radar net(crn)を提案する。画像中の空間情報の欠如を克服するため、視線ビュー画像の特徴をスパースで正確なレーダーポイントの助けを借りてBEVに変換する。入力間の空間的不一致に対処するために設計されたマルチモーダル変形可能な注意を用いて,bevにおける画像とレーダ特徴マップをさらに集約する。リアルタイム設定のCRNは20FPSで動作し、nuScenes上のLiDAR検出器と同等の性能を達成し、100m設定で遠くでも性能を向上する。さらに、オフライン設定のCRNは、nuScenesテストセットで62.4%のNDS、57.5%のmAPを出力し、全カメラおよびカメラレーダー3Dオブジェクト検出器の中で第1位である。

Autonomous driving requires an accurate and fast 3D perception system that includes 3D object detection, tracking, and segmentation. Although recent low-cost camera-based approaches have shown promising results, they are susceptible to poor illumination or bad weather conditions and have a large localization error. Hence, fusing camera with low-cost radar, which provides precise long-range measurement and operates reliably in all environments, is promising but has not yet been thoroughly investigated. In this paper, we propose Camera Radar Net (CRN), a novel camera-radar fusion framework that generates a semantically rich and spatially accurate bird's-eye-view (BEV) feature map for various tasks. To overcome the lack of spatial information in an image, we transform perspective view image features to BEV with the help of sparse but accurate radar points. We further aggregate image and radar feature maps in BEV using multi-modal deformable attention designed to tackle the spatial misalignment between inputs. CRN with real-time setting operates at 20 FPS while achieving comparable performance to LiDAR detectors on nuScenes, and even outperforms at a far distance on 100m setting. Moreover, CRN with offline setting yields 62.4% NDS, 57.5% mAP on nuScenes test set and ranks first among all camera and camera-radar 3D object detectors.

翻訳日:2023-04-04 16:53:28 公開日:2023-04-03

# SAR ATRにおけるディープラーニングの非因性発見と説明

Discovering and Explaining the Non-Causality of Deep Learning in SAR ATR ( http://arxiv.org/abs/2304.00668v1 )

ライセンス: Link先を確認

Weijie Li, Wei Yang, Li Liu, Wenpeng Zhang, Yongxiang Liu

(参考訳) 合成開口レーダ自動目標認識(SAR ATR)は、SAR画像解釈において重要な技術の一つであり、軍事・民間分野で重要な応用分野である。この分野ではディープラーニングが広く使われており、近年ではベンチマークデータセット上で優れた認識率を達成している。しかし、ベンチマークデータセットは単一のデータ収集条件のため、データ選択バイアスに悩まされる。このデータバイアスは、深層学習モデルを強化し、非因果的背景クラッタを過度に適合させる。また,既存の手法ではモデル因果関係を定性的に分析し,このデータバイアスを深く分析していない。本稿では,データ選択バイアスがモデルの非因果性やclutterのスプリアス相関につながることを示す。まず,Shapley値を用いて,学習過程における目標領域,乱れ領域,影領域の寄与を定量化する。乱雑な貢献は、トレーニングプロセス中に大きな割合を占める。第2に、SAR ATRにおけるディープラーニングの非因果性の原因は、データ選択バイアスとモデルテクスチャバイアスである。データ選択バイアスはクラス関連クラッタと偽の特徴表現をもたらす。さらに,トレーニングセットとテストセットの類似した信号対クラッタ比(scr)からクラッタのスプリアス相関が生じる。最後に,クラッタのオーバーフィットを低減するためのランダムscr再重み付け手法を提案する。しかし、モデルのテクスチャバイアスは、データバイアスを取り除いた後にモデルの複雑さとともに増加する。ベンチマークMSTARデータセットの標準動作条件下での異なるモデルの実験結果から,上記の結論が得られた。

Synthetic aperture radar automatic target recognition (SAR ATR) is one of the critical technologies for SAR image interpretation, which has an important application prospect in military and civilian fields. Deep learning has been widely used in this area and achieved an excellent recognition rate on the benchmark dataset in recent years. However, the benchmark dataset suffers from data selection bias due to a single data collection condition. This data bias enhances deep learning models to overfit non-causal background clutter. Moreover, existing methods qualitatively analyze the model causality and do not deeply analyze this data bias. In this paper, we explicitly show that the data selection bias leads to the non-causality of the model and spurious correlation of clutter. First, we quantify the contribution of the target, clutter, and shadow regions during the training process through the Shapley value. The clutter contribution has a large proportion during the training process. Second, the causes of the non-causality of deep learning in SAR ATR include data selection bias and model texture bias. Data selection bias results in class-related clutter and false feature representation. Furthermore, the spurious correlation of clutter arises from the similar signal-to-clutter ratios (SCR) between the training and test sets. Finally, we propose a random SCR re-weighting method to reduce the overfitting for clutter. However, the model texture bias increases with model complexity after removing data bias. The experimental results of different models under the standard operating condition of the benchmark MSTAR dataset prove the above conclusions.

翻訳日:2023-04-04 16:53:01 公開日:2023-04-03

# 人工境界条件を用いた量子力学の量子シミュレーション

Quantum Simulation for Quantum Dynamics with Artificial Boundary Conditions ( http://arxiv.org/abs/2304.00667v1 )

ライセンス: Link先を確認

Shi Jin and Nana Liu and Xiantao Li and Yue Yu

(参考訳) 量子力学 (quantum dynamics) は、時間依存的なシュリンガー方程式(英語版)(Schr\"odinger equation)とエルミート・ハミルトン方程式(英語版)(Hermitian Hamiltonian)という形で表される、量子コンピューティングの自然な応用である。しかし、電子の放出を伴う量子力学をシミュレートする際には、固定領域内で計算を限定するために人工境界条件(ABC)を用いる必要がある。 ABCの導入は力学のハミルトン構造を変え、進化がもはやユニタリではないため、既存の量子アルゴリズムを直接適用することはできない。本稿では,非エルミート力学をschr\"odinger形式に変換するための最近導入されたschr\"odingerization method (jin et al. arxiv:2212.13969 and arxiv:2212.14703) を用いた。本手法は,複素吸収ポテンシャル法,完全整合層法,dirichlet-to-neumann法を含む3種類のabcに対して実装する。これらのアルゴリズムの問合せ複雑性を分析し,数値実験を行い,その妥当性を検証した。これは、非有界領域における量子力学の利用可能な量子アルゴリズムと計算モデルの間のギャップを埋めるのに役立つ。

Quantum dynamics, typically expressed in the form of a time-dependent Schr\"odinger equation with a Hermitian Hamiltonian, is a natural application for quantum computing. However, when simulating quantum dynamics that involves the emission of electrons, it is necessary to use artificial boundary conditions (ABC) to confine the computation within a fixed domain. The introduction of ABCs alters the Hamiltonian structure of the dynamics, and existing quantum algorithms can not be directly applied since the evolution is no longer unitary. The current paper utilizes a recently introduced Schr\"odingerisation method (Jin et al. arXiv:2212.13969 and arXiv:2212.14703) that converts non-Hermitian dynamics to a Schr\"odinger form, for the artificial boundary problems. We implement this method for three types of ABCs, including the complex absorbing potential technique, perfectly matched layer methods, and Dirichlet-to-Neumann approach. We analyze the query complexity of these algorithms, and perform numerical experiments to demonstrate the validity of this approach. This helps to bridge the gap between available quantum algorithms and computational models for quantum dynamics in unbounded domains.

翻訳日:2023-04-04 16:52:43 公開日:2023-04-03

# テキスト駆動型ソフトマスクによるマルチモーダル表現学習

Multi-Modal Representation Learning with Text-Driven Soft Masks ( http://arxiv.org/abs/2304.00719v1 )

ライセンス: Link先を確認

Jaeyoo Park, Bohyung Han

(参考訳) 本稿では,新しい操作,損失,データ拡張戦略を導入することにより,自己教師付き学習フレームワーク内で視覚言語表現学習手法を提案する。まず、画像中の特定の単語に最も関係のある領域をソフトマスキングすることで、画像テキストマッチング(itm)タスクの多様な特徴を生成する。本フレームワークは細かなアノテーションを伴わない画像キャプチャペアのみに依存するため,マルチモーダルエンコーダを用いて単語条件の視覚的注意を演算することにより,各単語の関連領域を識別する。第2に,画像テキストコントラスト学習(image-text contrastive learning, itc)の目的に対して焦点損失を提示することで,ハードだが多様な例に焦点を合わせることを奨励する。最後に,テキストのマスキングと画像の歪みのレンダリングにより,様々な例をマイニングすることで,自己教師あり学習のためのマルチモーダルデータ拡張を行う。これらの3つのイノベーションの組み合わせは、事前学習されたモデルを学ぶのに効果的であり、複数の視覚言語下流タスクにおいて優れたパフォーマンスをもたらす。

We propose a visual-linguistic representation learning approach within a self-supervised learning framework by introducing a new operation, loss, and data augmentation strategy. First, we generate diverse features for the image-text matching (ITM) task via soft-masking the regions in an image, which are most relevant to a certain word in the corresponding caption, instead of completely removing them. Since our framework relies only on image-caption pairs with no fine-grained annotations, we identify the relevant regions to each word by computing the word-conditional visual attention using multi-modal encoder. Second, we encourage the model to focus more on hard but diverse examples by proposing a focal loss for the image-text contrastive learning (ITC) objective, which alleviates the inherent limitations of overfitting and bias issues. Last, we perform multi-modal data augmentations for self-supervised learning via mining various examples by masking texts and rendering distortions on images. We show that the combination of these three innovations is effective for learning a pretrained model, leading to outstanding performance on multiple vision-language downstream tasks.

翻訳日:2023-04-04 16:46:25 公開日:2023-04-03

# minirbt:中国製2段蒸留小型プリトレーニングモデル

MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model ( http://arxiv.org/abs/2304.00717v1 )

ライセンス: Link先を確認

Xin Yao, Ziqing Yang, Yiming Cui, Shijin Wang

(参考訳) 自然言語処理では、事前訓練された言語モデルが重要な基盤となっている。しかしながら、これらのモデルは、大きなサイズ、長い推論時間、困難なデプロイメントといった問題に悩まされることが多い。さらに、ほとんどの主流の事前訓練モデルは英語に焦点を合わせており、小さな中国の事前訓練モデルについての研究は不十分である。本稿では,中国語の自然言語処理の研究を進めることを目的とした,中国語事前学習モデルMiniRBTを紹介する。 MiniRBTは狭く深い学生モデルを採用し、事前訓練中に全単語のマスキングと2段階の蒸留を取り入れ、下流の作業に適している。機械読解とテキスト分類タスクに関する実験により,MiniRBTはRoBERTaと比較して94%の性能を実現し,6.8倍の高速化を実現した。

In natural language processing, pre-trained language models have become essential infrastructures. However, these models often suffer from issues such as large size, long inference time, and challenging deployment. Moreover, most mainstream pre-trained models focus on English, and there are insufficient studies on small Chinese pre-trained models. In this paper, we introduce MiniRBT, a small Chinese pre-trained model that aims to advance research in Chinese natural language processing. MiniRBT employs a narrow and deep student model and incorporates whole word masking and two-stage distillation during pre-training to make it well-suited for most downstream tasks. Our experiments on machine reading comprehension and text classification tasks reveal that MiniRBT achieves 94% performance relative to RoBERTa, while providing a 6.8x speedup, demonstrating its effectiveness and efficiency.

翻訳日:2023-04-04 16:46:03 公開日:2023-04-03

# 量子チャネルと量子状態のいくつかの絶対性質

Quantum channels and some absolute properties of quantum states ( http://arxiv.org/abs/2304.00711v1 )

ライセンス: Link先を確認

Tapaswini Patro, Kaushiki Mukherjee, Nirman Ganguly

(参考訳) 環境相互作用は、量子情報処理プロトコルの実際の応用においてユビキタスである。このような相互作用は量子資源の枯渇をもたらす。量子情報の文脈における2つの重要なメリットは、完全に絡み合った分数(FEF)と複合量子系の条件エントロピーである。 FEFはテレポーテーションのようなタスクで重要な役割を担います。一方、条件エントロピーは特定の量子状態に対して負となりうるので、負性は密度の高い符号化や状態の融合といったタスクの資源として残っている。 2f $ > 1/d $ a $ d \otimes d $ quantum system は重要なしきい値であるが、いくつかの量子状態において、グローバルユニタリ操作においても閾値以下であり、結果として絶対完全絡み合い分数(afef)を持つ状態として知られる。条件付きフォン・ノイマンエントロピーを含む状態は、大域的ユニタリ作用の下で条件付きエントロピーの非負性を保持する状態があり、絶対的条件付きフォン・ノイマンエントロピー非負性状態 (ACVENN) と呼ばれる。本稿では,量子チャネルの作用を2つの量子ビットと2つのquditで検証し,ある量子状態が非絶対的状態から絶対的状態へと作用することを示す。グローバルなユニタリ操作は絶対的でない状態に戻すことができないため、絡み合いスワッピングネットワークを用いた検索のための処方料を提供する。さらに、絶対性の概念を条件R'enyiエントロピーに拡張し、絶対条件R'enyiエントロピー非負性(ACRENN)を持つ状態に必要な条件を求める。次に、三成分系の限界を含むように作業を拡張し、上記の絶対性に関してそれらの特徴付けを提供する。

Environmental interactions are ubiquitous in any real-world application of a quantum information processing protocol. Such interactions result in depletion of quantum resources. Two important figure of merits in the context of quantum information are the fully entangled fraction (FEF) and conditional entropy of a composite quantum system. FEF has a key role to play in tasks like teleportation. Conditional entropy on the other hand can be negative for certain quantum states and thus the negativity remains a resource for tasks like dense coding and state merging. FEF $ > 1/d $ for a $ d \otimes d $ quantum system is a significant threshold, however for some quantum states it remains less than the threshold even with global unitary operations, consequently being known as states having absolute fully entangled fraction (AFEF). Pertaining to conditional von Neumann entropy, there are some states which retains the nonnegativity of the conditional entropy under global unitary action, to be called as states with absolute conditional von Neumann entropy nonnegative (ACVENN) property. In the present submission, we probe the action of some quantum channels in two qubits and two qudits and find that some quantum states move from the non-absolute regime to the absolute regime under the action. Since, global unitary operations are unable to retrieve them back to the non-absolute regime, we provide a prescription for the retrieval using an entanglement swapping network. Furthermore, we extend the notion of absoluteness to conditional R\'enyi entropies and find the required condition for a state to have absolute conditional R\'enyi entropy non-negative (ACRENN) property. We then extend the work to include the marginals of a tripartite system and provide for their characterization with respect to the aforementioned absolute properties.

翻訳日:2023-04-04 16:45:48 公開日:2023-04-03

# ユニバーサルブレイディング量子ゲート

Universal Braiding Quantum Gates ( http://arxiv.org/abs/2304.00710v1 )

ライセンス: Link先を確認

David Lovitz

(参考訳) ヤン・バクスター方程式と様々な形式は、統計力学、結び目理論、量子情報など多くの分野に応用されている。ブレンド・ヤン・バクスター方程式のユニタリ解は、位相量子コンピュータの量子ゲートとして特に興味深い。量子計算においてユニタリかつ普遍的である任意の次元の解に対する単純な構成を示す。また、ある一般化されたyang-baxter方程式に対する解の族を完全に分類し、方程式の特定の例に等式のスカラー倍である解しか持たないことを証明する。

The Yang-Baxter equation and it's various forms have applications in many fields, including statistical mechanics, knot theory, and quantum information. Unitary solutions of the braided Yang-Baxter equation are of particular interest as quantum gates for topological quantum computers. We demonstrate a simple construction for solutions in any dimension, which are both unitary and universal for quantum computation. We also fully classify a family of solutions to certain generalized Yang-Baxter equations and prove that certain instances of the equation only have solutions that are scalar multiples of the identity.

翻訳日:2023-04-04 16:45:13 公開日:2023-04-03

# 調整可能な確率的再構成誤差と平均シフトアウトリアースコアを用いたオートエンコーダに基づくアウトリアー検出の改善

Improving Autoencoder-based Outlier Detection with Adjustable Probabilistic Reconstruction Error and Mean-shift Outlier Scoring ( http://arxiv.org/abs/2304.00709v1 )

ライセンス: Link先を確認

Xu Tan, Jiawei Yang, Junqi Chen, Sylwan Rahardja, Susanto Rahardja

(参考訳) オートエンコーダは多くの機械学習タスクで、強力な学習能力のおかげで広く使われており、異常検出の分野で研究者の間で大きな関心を集めている。しかし,従来のオートエンコーダ方式には2つの側面があった。これにより、異常検出のパフォーマンスが制限された。まず,従来のオートエンコーダにおける平均二乗誤差は,その表現能力を制限したオートエンコーダの判定の不確実性を無視した。第2に、オートエンコーダは異常なリコンストラクション問題に苦しめられ、いくつかのアウトリアーは予期せぬほどうまく再構築され、インリアーからの識別が困難になる。上記の問題を緩和するため,本論文では2つの新しい手法を提案する。まず, 復元バイアスと判断不確実性の両方を考慮し, 確率的再構成誤差(pre)という新しい損失関数を構築した。これら2つの因子のトレードオフをさらに制御するために、前生成型確率的再構成誤差(apre)において2つの重みが導入された。第二に、平均シフト(MSS)に基づく概念的に新しい外れ値スコアリング法が提案され、オートエンコーダによって生じる誤りのインリエを低減する。 32個の実世界の外れ値検出データセットの実験により,提案手法の有効性が確認された。提案手法の組み合わせは, 最良ベースラインと比較して, 性能向上率の41%を達成した。 MSSは複数のオートエンコーダベースのアウトリア検出器の性能を平均20%改善した。提案する2つの手法は, 異常検出におけるオートエンコーダの開発を促進する可能性を秘めている。コードはwww.outliernet.comで再現可能である。

Autoencoders were widely used in many machine learning tasks thanks to their strong learning ability which has drawn great interest among researchers in the field of outlier detection. However, conventional autoencoder-based methods lacked considerations in two aspects. This limited their performance in outlier detection. First, the mean squared error used in conventional autoencoders ignored the judgment uncertainty of the autoencoder, which limited their representation ability. Second, autoencoders suffered from the abnormal reconstruction problem: some outliers can be unexpectedly reconstructed well, making them difficult to identify from the inliers. To mitigate the aforementioned issues, two novel methods were proposed in this paper. First, a novel loss function named Probabilistic Reconstruction Error (PRE) was constructed to factor in both reconstruction bias and judgment uncertainty. To further control the trade-off of these two factors, two weights were introduced in PRE producing Adjustable Probabilistic Reconstruction Error (APRE), which benefited the outlier detection in different applications. Second, a conceptually new outlier scoring method based on mean-shift (MSS) was proposed to reduce the false inliers caused by the autoencoder. Experiments on 32 real-world outlier detection datasets proved the effectiveness of the proposed methods. The combination of the proposed methods achieved 41% of the relative performance improvement compared to the best baseline. The MSS improved the performance of multiple autoencoder-based outlier detectors by an average of 20%. The proposed two methods have the potential to advance autoencoder's development in outlier detection. The code is available on www.OutlierNet.com for reproducibility.

翻訳日:2023-04-04 16:45:04 公開日:2023-04-03

# 滑らかな共分散を伴うオンライン最小二乗SGDの高次元スケーリング限界と揺らぎ

High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance ( http://arxiv.org/abs/2304.00707v1 )

ライセンス: Link先を確認

Krishnakumar Balasubramanian, Promit Ghosal, Ye He

(参考訳) オンライン最小二乗確率勾配降下(sgd)アルゴリズムの高次元スケーリング限界とゆらぎを,データ生成モデルの特性を明示的に考慮して導出する。提案手法では,SGDを相互作用粒子系として繰り返し処理し,その相互作用は入力の共分散構造によって特徴づけられる。 8階までのモーメント上の滑らか性条件を仮定し、ガウス性を明確に仮定することなく、無限次元常微分方程式(odes)または確率微分方程式(sdes)の形で高次元のスケーリング限界とゆらぎを確立する。その結果,イテレートの正確な3段階の相転移が明らかになった。弾道性から拡散性,そしてノイズのばらつきが低レベルから中程度に,そして極端に高いノイズ設定へと変化する。低雑音環境では、(スケールした)反復の正確なゆらぎを無限次元のSDEとして特徴づける。また、導出制限ODEとSDEに対する解の存在と特異性を示す。その結果, 限界平均二乗推定や予測誤差のキャラクタリゼーションや, 限界方程式を解析的あるいは数値的に解くことで得られる変動など, いくつかの応用が得られた。

We derive high-dimensional scaling limits and fluctuations for the online least-squares Stochastic Gradient Descent (SGD) algorithm by taking the properties of the data generating model explicitly into consideration. Our approach treats the SGD iterates as an interacting particle system, where the expected interaction is characterized by the covariance structure of the input. Assuming smoothness conditions on moments of order up to eight orders, and without explicitly assuming Gaussianity, we establish the high-dimensional scaling limits and fluctuations in the form of infinite-dimensional Ordinary Differential Equations (ODEs) or Stochastic Differential Equations (SDEs). Our results reveal a precise three-step phase transition of the iterates; it goes from being ballistic, to diffusive, and finally to purely random behavior, as the noise variance goes from low, to moderate and finally to very-high noise setting. In the low-noise setting, we further characterize the precise fluctuations of the (scaled) iterates as infinite-dimensional SDEs. We also show the existence and uniqueness of solutions to the derived limiting ODEs and SDEs. Our results have several applications, including characterization of the limiting mean-square estimation or prediction errors and their fluctuations which can be obtained by analytically or numerically solving the limiting equations.

翻訳日:2023-04-04 16:44:38 公開日:2023-04-03

# d-score:突然変異演算子に基づくcnnのホワイトボックス診断スコア

D-Score: A White-Box Diagnosis Score for CNNs Based on Mutation Operators ( http://arxiv.org/abs/2304.00697v1 )

ライセンス: Link先を確認

Xin Zhang and Yuqi Song and Xiaofeng Wang and Fei Zuo

(参考訳) 畳み込みニューラルネットワーク(cnns)は、自動運転や医療診断など、多くの安全クリティカルな領域に広く適用されている。標準テスト方法はテストセットにおけるモデルのパフォーマンスを評価するが、低品質で不十分なテストセットは信頼性の低い評価結果につながり、予期せぬ結果をもたらす可能性がある。したがって、cnnを総合的に評価する方法と、評価結果に基づいて、信頼度を高める方法が緊急対応すべき重要な課題である。以前の研究では、cnnのテストセットを評価するために突然変異試験を用いた。しかし、評価スコアはブラックボックスであり、テスト対象として十分に明示されていない。本稿では,突然変異演算子と画像変換を用いてモデルの特徴と注意分布を算出し,さらに,モデルのロバスト性とデータセットへの適合性を反映したd-scoreという診断スコアを提示するホワイトボックス診断手法を提案する。また,D-Scoreに基づくデータ拡張手法を提案し,CNNの性能を翻訳や再スケーリングに拡張する。広く使われている2つのデータセットと3つのCNNに関する総合的な実験は、我々のアプローチの有効性を実証している。

Convolutional neural networks (CNNs) have been widely applied in many safety-critical domains, such as autonomous driving and medical diagnosis. However, concerns have been raised with respect to the trustworthiness of these models: The standard testing method evaluates the performance of a model on a test set, while low-quality and insufficient test sets can lead to unreliable evaluation results, which can have unforeseeable consequences. Therefore, how to comprehensively evaluate CNNs and, based on the evaluation results, how to enhance their trustworthiness are the key problems to be urgently addressed. Prior work has used mutation tests to evaluate the test sets of CNNs. However, the evaluation scores are black boxes and not explicit enough for what is being tested. In this paper, we propose a white-box diagnostic approach that uses mutation operators and image transformation to calculate the feature and attention distribution of the model and further present a diagnosis score, namely D-Score, to reflect the model's robustness and fitness to a dataset. We also propose a D-Score based data augmentation method to enhance the CNN's performance to translations and rescalings. Comprehensive experiments on two widely used datasets and three commonly adopted CNNs demonstrate the effectiveness of our approach.

翻訳日:2023-04-04 16:44:14 公開日:2023-04-03

# 熱拡散関数(TSF):物理誘導材料分類

Thermal Spread Functions (TSF): Physics-guided Material Classification ( http://arxiv.org/abs/2304.00696v1 )

ライセンス: Link先を確認

Aniket Dashpute, Vishwanath Saragadam, Emma Alexander, Florian Willomitzer, Aggelos Katsaggelos, Ashok Veeraraghavan, Oliver Cossairt

(参考訳) ロバストで非破壊的な物質分類は、多くの視覚応用において難しいが重要な第一歩である。本研究では,物体の熱特性に依存する物理誘導材料分類フレームワークを提案する。我々の重要な観察は、物体の加熱と冷却の速度が、材料の固有の性質、すなわち放射率と拡散率に依存することである。熱カメラが加熱・冷却過程の計測を捉えている間、この観察を低出力レーザーで一定期間温め、それをオフにすることで活用する。次に、この空間的および時間的「熱拡散関数」(TSF)を用いて、有限差分法による逆熱方程式を解き、空間的に微分率と放射率を推定する。これらのタプルは、各空間画素で微細な材料ラベルを生成する分類器の訓練に使用される。提案手法は小型光源(低出力レーザー)とサーマルカメラのみを極端に必要とし,16クラスで86%の精度でロバストな分類結果を生成する。

Robust and non-destructive material classification is a challenging but crucial first-step in numerous vision applications. We propose a physics-guided material classification framework that relies on thermal properties of the object. Our key observation is that the rate of heating and cooling of an object depends on the unique intrinsic properties of the material, namely the emissivity and diffusivity. We leverage this observation by gently heating the objects in the scene with a low-power laser for a fixed duration and then turning it off, while a thermal camera captures measurements during the heating and cooling process. We then take this spatial and temporal "thermal spread function" (TSF) to solve an inverse heat equation using the finite-differences approach, resulting in a spatially varying estimate of diffusivity and emissivity. These tuples are then used to train a classifier that produces a fine-grained material label at each spatial pixel. Our approach is extremely simple requiring only a small light source (low power laser) and a thermal camera, and produces robust classification results with 86% accuracy over 16 classes.

翻訳日:2023-04-04 16:43:56 公開日:2023-04-03

# マトリックスプロファイルによるリチウムイオン電池オンライン膝のオンセット検出

Lithium-ion Battery Online Knee Onset Detection by Matrix Profile ( http://arxiv.org/abs/2304.00691v1 )

ライセンス: Link先を確認

Kate Qi Zhou, Yan Qin, Chau Yuen

(参考訳) リチウムイオン電池(LiBs)は膝の発症までわずかに劣化し、その後劣化は寿命(EOL)に加速する。加速劣化速度の開始を示す膝の発症は、電池の性能変化を早期に警告する上で重要である。しかし、オンライン膝の特定に関する文献は限られている。また、簡便に収集した測定値を用いてその識別を行うことが好ましい。これらの課題を解決するために、放電データ内の時間情報を利用してオンライン膝のオンセット識別法を開発した。第1に、わずかな劣化段階から放電電圧サイクルに埋め込まれた時間的ダイナミクスを動的時間ゆがみによって抽出する。第2に、サブシーケンス類似性探索中に、異常をマトリックスプロファイルで露呈する。新しいサイクルの時間的ダイナミクスが制御限界を超え、プロファイル指標がレジームの変化を示すと、膝の発症が検出される。最後に、識別された膝のオンセットを使用して、電池のEOLサイクルとの強い相関により、バッテリーを長距離または短距離のカテゴリに分類する。電池分類と同一統計分布下で得られたトレーニングデータのサポートにより,提案したSOH推定モデルは,ルート平均2乗誤差を0.22%以下に向上した推定結果が得られる。

Lithium-ion batteries (LiBs) degrade slightly until the knee onset, after which the deterioration accelerates to end of life (EOL). The knee onset, which marks the initiation of the accelerated degradation rate, is crucial in providing an early warning of the battery's performance changes. However, there is only limited literature on online knee onset identification. Furthermore, it is good to perform such identification using easily collected measurements. To solve these challenges, an online knee onset identification method is developed by exploiting the temporal information within the discharge data. First, the temporal dynamics embedded in the discharge voltage cycles from the slight degradation stage are extracted by the dynamic time warping. Second, the anomaly is exposed by Matrix Profile during subsequence similarity search. The knee onset is detected when the temporal dynamics of the new cycle exceed the control limit and the profile index indicates a change in regime. Finally, the identified knee onset is utilized to categorize the battery into long-range or short-range categories by its strong correlation with the battery's EOL cycles. With the support of the battery categorization and the training data acquired under the same statistic distribution, the proposed SOH estimation model achieves enhanced estimation results with a root mean squared error as low as 0.22%.

翻訳日:2023-04-04 16:43:38 公開日:2023-04-03

# 野生における3次元セマンティックセマンティックセグメンテーション--逆導電点雲の一般モデル学習

3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds ( http://arxiv.org/abs/2304.00690v1 )

ライセンス: Link先を確認

Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

(参考訳) 全天候条件下でのロバストポイントクラウド解析は、自動運転におけるレベル5の自律性に不可欠である。しかしながら、一般的な3Dセマンティックセグメンテーション(DSS)モデルを学習する方法はほとんど無視されている。我々は,ポイントレベルの密接なアノテーションを提供し,様々な気象条件下で3dsを解析可能な,悪天候のポイントクラウドデータセットであるsemanticstfを紹介する。全天候3DSSモデリングを2つの設定で検討する。 1) 正常ウェザーデータから悪ウェザーデータに適応するドメイン適応型3DSS 2) ドメイン一般化可能な3DSSは, 通常の天候データから全天候3DSSモデルを学習する。本研究は,既存の3DSS手法が悪天候データに遭遇する際の課題を明らかにするものである。さらに,点雲の幾何学的スタイルをランダム化し,それらの埋め込みを集約するドメインランダム化手法を考案し,その結果,様々な悪天候下で3dsを効果的に改善できる一般化モデルを構築した。 SemanticSTFと関連するコードは、 \url{https://github.com/xiaoaoran/SemanticSTF}で入手できる。

Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. We study all-weather 3DSS modeling under two setups: 1) domain adaptive 3DSS that adapts from normal-weather data to adverse-weather data; 2) domain generalizable 3DSS that learns all-weather 3DSS models from normal-weather data. Our studies reveal the challenge while existing 3DSS methods encounter adverse-weather data, showing the great value of SemanticSTF in steering the future endeavor along this very meaningful research direction. In addition, we design a domain randomization technique that alternatively randomizes the geometry styles of point clouds and aggregates their embeddings, ultimately leading to a generalizable model that can improve 3DSS under various adverse weather effectively. The SemanticSTF and related codes are available at \url{https://github.com/xiaoaoran/SemanticSTF}.

翻訳日:2023-04-04 16:43:19 公開日:2023-04-03

# 変分オートエンコーダを用いたデバイス画像-IVマッピングによる逆設計と前方予測

Device Image-IV Mapping using Variational Autoencoder for Inverse Design and Forward Prediction ( http://arxiv.org/abs/2304.00738v1 )

ライセンス: Link先を確認

Thomas Lu, Albert Lu, and Hiu Yung Wong

(参考訳) 本稿では,変分オートエンコーダ(vae)に基づく新しい枠組みを用いて,デバイス構造画像を対応する電流電圧(iv)特性にマッピングすることで,基礎となるデバイス物理の学習を実証する。 VAEは使用されるため、ドメインの専門知識は必要とせず、フレームワークはどんな新しいデバイスや測定にも素早くデプロイできる。これは、デバイス横断画像と電気的特性しか利用できない場合(例えば、新しい新興メモリ)に、新しいデバイスのコンパクトなモデリングに有用であることが期待される。実演には技術コンピュータ支援設計(tcad)と手描きの金属酸化物半導体(mos)デバイス画像とノイズドレイン電流ゲート電圧曲線(idvg)を用いた。このフレームワークは2つのVAE(画像多様体学習用とIDVG多様体学習用)を積み重ねて形成され、潜在変数を介して相互に通信する。異なる強度を持つ5つの独立変数が使用される。逆設計(所定のIDVGの設計構造を生成する)と前方予測(所定の構造画像に対する予測IDVG)をうまく行うことができ、画像がデバイスパラメータとして扱われる場合のコンパクトなモデリングに使用できる。多様体学習が用いられるため、機械は入力(手書き画像とノイズIDVG曲線)のノイズに対して頑健であり、弱い独立変数と無関係な独立変数に混同されない。

This paper demonstrates the learning of the underlying device physics by mapping device structure images to their corresponding Current-Voltage (IV) characteristics using a novel framework based on variational autoencoders (VAE). Since VAE is used, domain expertise is not required and the framework can be quickly deployed on any new device and measurement. This is expected to be useful in the compact modeling of novel devices when only device cross-sectional images and electrical characteristics are available (e.g. novel emerging memory). Technology Computer-Aided Design (TCAD) generated and hand-drawn Metal-Oxide-Semiconductor (MOS) device images and noisy drain-current-gate-voltage curves (IDVG) are used for the demonstration. The framework is formed by stacking two VAEs (one for image manifold learning and one for IDVG manifold learning) which communicate with each other through the latent variables. Five independent variables with different strengths are used. It is shown that it can perform inverse design (generate a design structure for a given IDVG) and forward prediction (predict IDVG for a given structure image, which can be used for compact modeling if the image is treated as device parameters) successfully. Since manifold learning is used, the machine is shown to be robust against noise in the inputs (i.e. using hand-drawn images and noisy IDVG curves) and not confused by weak and irrelevant independent variables.

翻訳日:2023-04-04 16:37:43 公開日:2023-04-03

# SparDL: 効率的なスパース通信による分散ディープラーニングトレーニング

SparDL: Distributed Deep Learning Training with Efficient Sparse Communication ( http://arxiv.org/abs/2304.00737v1 )

ライセンス: Link先を確認

Minjun Zhao, Yichen Yin, Yuren Mao, Lu Chen, Yunjun Gao

(参考訳) Top-k$スペーシフィケーションは近年,分散ディープラーニングにおける通信量削減に広く利用されているが,Gradient Accumulation (GA) ジレンマにより,Top-k$スペーシフィケーションの性能は依然として限られている。 GAジレンマの処理にはいくつかの方法が提案されているが,(1)大量の余剰送信を導入すると通信の複雑化に不満を抱くこと,(2)非力の労働者には柔軟性がないこと,の2つの欠点がある。これら2つの問題を解決するために,SparDLと呼ばれるフレキシブルで効率的なスパース通信フレームワークを提案する。 SparDLはSpar-Reduce-Scatterアルゴリズムを用いて、追加の通信操作なしでGAジレンマを解く。さらに,通信複雑性をさらに低減し,通信複雑性のレイテンシと帯域幅コストの比率を調整するために,SparDLの一部としてSpar-All-Gatherアルゴリズムを提案する。広範な実験はspardlの優位性を検証する。

Top-$k$ sparsification has recently been widely used to reduce the communication volume in distributed deep learning; however, due to Gradient Accumulation (GA) dilemma, the performance of top-$k$ sparsification is still limited. Several methods have been proposed to handle the GA dilemma but have two drawbacks: (1) they are frustrated by the high communication complexity as they introduce a large amount of extra transmission; (2) they are not flexible for non-power-of-two numbers of workers. To solve these two problems, we propose a flexible and efficient sparse communication framework, dubbed SparDL. SparDL uses the Spar-Reduce-Scatter algorithm to solve the GA dilemma without additional communication operations and is flexible to any number of workers. Besides, to further reduce the communication complexity and adjust the proportion of latency and bandwidth cost in communication complexity, we propose the Spar-All-Gather algorithm as part of SparDL. Extensive experiments validate the superiority of SparDL.

翻訳日:2023-04-04 16:37:17 公開日:2023-04-03

# フェムト秒分解能における量子エミッタのマルチポーラロンダイナミクス検出のための液相単粒子分光法

Solution-phase single-particle spectroscopy for probing multi-polaronic dynamics in quantum emitters at femtosecond resolution ( http://arxiv.org/abs/2304.00735v1 )

ライセンス: Link先を確認

Jiaojian Shi, Yuejun Shen, Feng Pan, Weiwei Sun, Anudeep Mangu, Cindy Shi, Amy McKeown-Green, Parivash Moradifar, Moungi G. Bawendi, William E. Moerner, Jennifer A. Dionne, Fang Liu, Aaron M. Lindenberg

(参考訳) 多くの光量子技術の発展は、ほぼ完全な光コヒーレンスを持つ固体単一量子エミッタの可用性に依存する。しかしながら、系統的な改善を制限するスタンディング問題は、単一のエミッタレベルと超高速時間スケールでの微視的エネルギーフローの重大なサンプルの不均一性と機械的な理解の欠如である。フェムト秒分解能で前例のない明快さで単一分子および/または欠陥状態におけるサンプル平均ダイナミクスをキャプチャする光子相関検出を用いた溶液相単粒子ポンププローブ分光を開発した。我々はこの手法を2次元六方晶窒化ホウ素の単一量子エミッタに適用し, 高い不均一性と低い量子効率に苦しむ。ミリ秒からナノ秒の時間スケールでは、翻訳拡散、準安定状態関連肩、回転ダイナミクス、および反バンキング特性は、それぞれの異なる光子相関時間スケールによって切り離され、正規化された2光子放出量子収率を定量化する。フェムト秒分解能、スペクトル選択率、超低ノイズ(固体法よりも2桁改善)を活用することで、単一欠陥レベルで時間領域における電子-フォノンカップリングを可視化し、多電子励起によるポーラロン形成の加速を検出する。理論的ポーラロンモデルの結果と合致して、サンプル平均光子忠実性がカスケード放出効率と光デコヒーレンス時間にどのように変換されるかを示す。我々の研究は、単一エミッタ、分子、欠陥の超高速分光のための枠組みを提供し、量子情報応用のための超大規模キャラクタリゼーションと合成改善の新たな道を開く。

The development of many optical quantum technologies depends on the availability of solid-state single quantum emitters with near-perfect optical coherence. However, a standing issue that limits systematic improvement is the significant sample heterogeneity and lack of mechanistic understanding of microscopic energy flow at the single emitter level and ultrafast timescales. Here we develop solution-phase single-particle pump-probe spectroscopy with photon correlation detection that captures sample-averaged dynamics in single molecules and/or defect states with unprecedented clarity at femtosecond resolution. We apply this technique to single quantum emitters in two-dimensional hexagonal boron nitride, which suffers from significant heterogeneity and low quantum efficiency. From millisecond to nanosecond timescales, the translation diffusion, metastable-state-related bunching shoulders, rotational dynamics, and antibunching features are disentangled by their distinct photon-correlation timescales, which collectively quantify the normalized two-photon emission quantum yield. Leveraging its femtosecond resolution, spectral selectivity and ultralow noise (two orders of magnitude improvement over solid-state methods), we visualize electron-phonon coupling in the time domain at the single defect level, and discover the acceleration of polaronic formation driven by multi-electron excitation. Corroborated with results from a theoretical polaron model, we show how this translates to sample-averaged photon fidelity characterization of cascaded emission efficiency and optical decoherence time. Our work provides a framework for ultrafast spectroscopy in single emitters, molecules, or defects prone to photoluminescence intermittency and heterogeneity, opening new avenues of extreme-scale characterization and synthetic improvements for quantum information applications.

翻訳日:2023-04-04 16:36:58 公開日:2023-04-03

# 重力誘起低温原子の絡み合い

Gravitationally-induced entanglement in cold atoms ( http://arxiv.org/abs/2304.00734v1 )

ライセンス: Link先を確認

Richard Howl, Nathan Cooper, Lucia Hackerm\"uller

(参考訳) 実験室で量子重力をテストするための有望なルートは、2つ以上の量子物質間の重力誘起絡み合い(GIE)を探すことである。主に、N00N状態や高スクイーズ状態のような非古典状態のマイクロソリッドシステムを用いている。ここでは、初めて、2つの冷たい原子ガス間のGIEを量子重力のテストとして考える。本稿では、2つの原子干渉計を並列に配置し、GIEと量子重力の証拠として出力ポートにおける原子数の相関関係を求める。 N00N や Schr\odinger cat のような挑戦的なマクロな重ね合わせ状態はなく、代わりに原子の古典的な「コヒーレント」状態がある。これにより、原子干渉計の総質量はプランク質量スケールと長い積分時間でなければならない。しかし、現在最先端の量子スクイーズでは、質量スケールは接近可能なレベルに還元でき、そのような質量スケールが近い将来どのように達成されるかについて概説する。

A promising route to testing quantum gravity in the laboratory is to look for gravitationally-induced entanglement (GIE) between two or more quantum matter systems. Principally, proposals for such tests have used microsolid systems, with highly non-classical states, such as N00N states or highly-squeezed states. Here, we consider, for the first time, GIE between two cold atomic gasses as a test of quantum gravity. We propose placing two atom interferometers next to each other in parallel and looking for correlations in the number of atoms at the output ports as evidence of GIE and quantum gravity. There are no challenging macroscopic superposition states, such as N00N or Schr\"odinger cat states, instead classical-like `coherent' states of atoms. This requires the total mass of the atom interferometers to be on the Planck mass scale, and long integration times. With current state-of-the-art quantum squeezing in cold atoms, however, we argue that the mass scale can be reduced to approachable levels and outline how such a mass scale can be achieved in the near future.

翻訳日:2023-04-04 16:36:26 公開日:2023-04-03

# ビデオにおける未バイアスシーングラフ生成

Unbiased Scene Graph Generation in Videos ( http://arxiv.org/abs/2304.00733v1 )

ライセンス: Link先を確認

Sayak Nag, Kyle Min, Subarna Tripathi, Amit K. Roy Chowdhury

(参考訳) 映像からの動的シーングラフ生成(SGG)の課題は、シーン固有のダイナミクス、モデル予測の時間的変動、画像ベースSGGの既存の課題に加えて、視覚的関係の長期分布などにより複雑かつ困難である。動的sggの既存の手法は、上述の課題、特に長期にわたる関係の分散に対処せずに、複雑なアーキテクチャを用いて時空間的コンテキストを捉えることに重点を置いている。これはしばしばバイアス付きシーングラフの生成につながる。これらの課題に対処するために,我々はテンプラと呼ばれる新しいフレームワークを紹介している。 TEMPURAは、トランスフォーマーに基づくシーケンスモデリングによりオブジェクトレベルの時間的整合性を採用し、メモリ誘導学習を用いて非バイアス関係表現を合成し、ガウス混合モデル(GMM)を用いて視覚関係の予測的不確実性を減衰させる。広範囲な実験により,既存の手法に比べて,より偏りのないシーングラフの生成において,性能が大幅に向上すること(場合によっては最大10%)を実証した。

The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of model predictions, and the long-tailed distribution of the visual relationships in addition to the already existing challenges in image-based SGG. Existing methods for dynamic SGG have primarily focused on capturing spatio-temporal context using complex architectures without addressing the challenges mentioned above, especially the long-tailed distribution of relationships. This often leads to the generation of biased scene graphs. To address these challenges, we introduce a new framework called TEMPURA: TEmporal consistency and Memory Prototype guided UnceRtainty Attenuation for unbiased dynamic SGG. TEMPURA employs object-level temporal consistencies via transformer-based sequence modeling, learns to synthesize unbiased relationship representations using memory-guided training, and attenuates the predictive uncertainty of visual relations using a Gaussian Mixture Model (GMM). Extensive experiments demonstrate that our method achieves significant (up to 10% in some cases) performance gain over existing methods highlighting its superiority in generating more unbiased scene graphs.

翻訳日:2023-04-04 16:36:06 公開日:2023-04-03

# 時空間流体プロセスの適応サンプリングのための予測モデルの利用

Leveraging Predictive Models for Adaptive Sampling of Spatiotemporal Fluid Processes ( http://arxiv.org/abs/2304.00732v1 )

ライセンス: Link先を確認

Sandeep Manjanna and Tom Z. Jiahao and M. Ani Hsieh

(参考訳) 時空間流体プロセスの永続的なモニタリングには、データのサンプリングと監視中のプロセスの予測モデルが必要である。本稿では,時空間過程の予測モデルに基づく適応サンプリングを行うPASSTアルゴリズムを提案する。 PASSTは、予測モデルを活用する適応型ロボットサンプリングアルゴリズムで、特定の領域における流体プロセスの効率的かつ永続的な監視を行う。本アルゴリズムは,学習した予測モデルから予測を活用し,自律走行車両が関心領域を適応的かつ効率的にサーベイする経路を計画する。次に、サンプルデータを用いて予測モデルに更新初期状態を与えることにより、より良い予測を得る。予測モデルの場合、流体過程のモデルを訓練するために知識に基づく神経常微分方程式を用いる。これらのモデルのサイズは桁違いに小さく、流体過程やその他の計算流体モデルを記述する偏微分方程式の直接数値シミュレーションから得られた流体データよりもずっと高速である。経路計画には、フィールド予測を報酬関数として使用する強化学習に基づく計画アルゴリズムを用いる。数値シミュレーションされた流体データと実世界の海流データの両方に対する適応的サンプリング経路計画アルゴリズムを評価し,与えられた領域の時空間場を長時間地平線でサンプリングできることを示した。また,学習モデルの学習レパートリーにない流体プロセスからサンプルを得るためのパストアルゴリズムの一般化能力を評価する。

Persistent monitoring of a spatiotemporal fluid process requires data sampling and predictive modeling of the process being monitored. In this paper we present PASST algorithm: Predictive-model based Adaptive Sampling of a Spatio-Temporal process. PASST is an adaptive robotic sampling algorithm that leverages predictive models to efficiently and persistently monitor a fluid process in a given region of interest. Our algorithm makes use of the predictions from a learned prediction model to plan a path for an autonomous vehicle to adaptively and efficiently survey the region of interest. In turn, the sampled data is used to obtain better predictions by giving an updated initial state to the predictive model. For predictive model, we use Knowledged-based Neural Ordinary Differential Equations to train models of fluid processes. These models are orders of magnitude smaller in size and run much faster than fluid data obtained from direct numerical simulations of the partial differential equations that describe the fluid processes or other comparable computational fluids models. For path planning, we use reinforcement learning based planning algorithms that use the field predictions as reward functions. We evaluate our adaptive sampling path planning algorithm on both numerically simulated fluid data and real-world nowcast ocean flow data to show that we can sample the spatiotemporal field in the given region of interest for long time horizons. We also evaluate PASST algorithm's generalization ability to sample from fluid processes that are not in the training repertoire of the learned models.

翻訳日:2023-04-04 16:35:45 公開日:2023-04-03

# 規則表現学習者に基づく解釈可能なローン債権評価方法

An Interpretable Loan Credit Evaluation Method Based on Rule Representation Learner ( http://arxiv.org/abs/2304.00731v1 )

ライセンス: Link先を確認

Zihao Chen, Xiaomeng Wang, Yuanjiang Huang, Tao Jia

(参考訳) モデルの解釈性は、ハイステイクフィールドにおける幅広い応用の障害の1つとなっている。解釈可能性を得る一般的な方法は、まずブラックボックスを構築し、次にポストホックメソッドを使って説明することである。しかし,ポストホック法による説明は必ずしも信頼できない。代わりに、レンディングクラブデータセットのためのrrl(rule representation learner)に基づいた本質的な解釈可能なモデルを設計する。具体的には、特徴はそれぞれの特性に応じて3つのカテゴリに分けられ、それぞれ3つのサブネットワークを構築することができ、それぞれが単一の隠蔽層を持つニューラルネットワークに似ているが、等価にルールのセットに変換できる。トレーニング中、私たちは以前の研究からバイナリ重みを効果的にトレーニングするためのトリックを学びました。最後に,本モデルと木モデルを比較した。その結果, 金融機関と借入業者の両方にとって実用上重要なブラックボックスに近い, 性能上, 解釈可能な決定木よりも, はるかに優れたモデルが得られた。さらに,本モデルはポストホック法で生成された説明の正確性をテストするために用いられ,ポストホック法が必ずしも信頼できるとは限らないことを示す。

The interpretability of model has become one of the obstacles to its wide application in the high-stake fields. The usual way to obtain interpretability is to build a black-box first and then explain it using the post-hoc methods. However, the explanations provided by the post-hoc method are not always reliable. Instead, we design an intrinsically interpretable model based on RRL(Rule Representation Learner) for the Lending Club dataset. Specifically, features can be divided into three categories according to their characteristics of themselves and build three sub-networks respectively, each of which is similar to a neural network with a single hidden layer but can be equivalently converted into a set of rules. During the training, we learned tricks from previous research to effectively train binary weights. Finally, our model is compared with the tree-based model. The results show that our model is much better than the interpretable decision tree in performance and close to other black-box, which is of practical significance to both financial institutions and borrowers. More importantly, our model is used to test the correctness of the explanations generated by the post-hoc method, the results show that the post-hoc method is not always reliable.

翻訳日:2023-04-04 16:35:23 公開日:2023-04-03

# CG-3DSRGAN:低線量PET画像からの画質回復のための分類ガイド付き3次元生成対向ネットワーク

CG-3DSRGAN: A classification guided 3D generative adversarial network for image quality recovery from low-dose PET images ( http://arxiv.org/abs/2304.00725v1 )

ライセンス: Link先を確認

Yuxin Xue, Yige Peng, Lei Bi, and Dagan Feng, Jinman Kim

(参考訳) ポジトロン・エミッション・トモグラフィー (PET) は, 現代医療において, 最も感度の高い分子イメージング法である。注入トレーサー線量による高放射能はPET画像における主要な関心事であり、臨床応用を制限している。しかし、投与量を減少させると、画像品質が不適切な診断に繋がる。低線量で高画質な画像を作成する必要性から、低線量で高画質なPET合成のための畳み込みニューラルネットワーク(CNN)ベースの手法が開発されている。従来のCNNによる研究は通常、低用量PETを異なる線量還元レベルを考慮せずに特徴空間に直接マッピングする。本研究では,CG-3DSRGAN (Classification-Guided Generative Adversarial Network with Super Resolution Refinement) という新しい手法を提案する。具体的には、分類ヘッドによって誘導されるマルチタスク粗いジェネレータにより、低線量データに存在するノイズレベルの特徴をより包括的に理解し、画像合成を改善することができる。さらに,標準PETの空間的詳細を回復するために,第2段階のトレーニングとして補助的な超解像ネットワークであるContextual-Netを提案し,粗い予測と標準PETのギャップを狭める。本手法を全身PETにおける各線量低減因子 (DRF) の経時的変化と比較した。実験により、我々の手法は全てのDRFにおいて他よりも優れることを示した。

Positron emission tomography (PET) is the most sensitive molecular imaging modality routinely applied in our modern healthcare. High radioactivity caused by the injected tracer dose is a major concern in PET imaging and limits its clinical applications. However, reducing the dose leads to inadequate image quality for diagnostic practice. Motivated by the need to produce high quality images with minimum low-dose, Convolutional Neural Networks (CNNs) based methods have been developed for high quality PET synthesis from its low-dose counterparts. Previous CNNs-based studies usually directly map low-dose PET into features space without consideration of different dose reduction level. In this study, a novel approach named CG-3DSRGAN (Classification-Guided Generative Adversarial Network with Super Resolution Refinement) is presented. Specifically, a multi-tasking coarse generator, guided by a classification head, allows for a more comprehensive understanding of the noise-level features present in the low-dose data, resulting in improved image synthesis. Moreover, to recover spatial details of standard PET, an auxiliary super resolution network - Contextual-Net - is proposed as a second-stage training to narrow the gap between coarse prediction and standard PET. We compared our method to the state-of-the-art methods on whole-body PET with different dose reduction factors (DRFs). Experiments demonstrate our method can outperform others on all DRF.

翻訳日:2023-04-04 16:35:07 公開日:2023-04-03

# 参照自由テキスト品質評価における大規模言語モデルの利用を探る:予備的実証的研究

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study ( http://arxiv.org/abs/2304.00723v1 )

ライセンス: Link先を確認

Yi Chen, Rui Wang, Haiyun Jiang, Shuming Shi, Ruifeng Xu

(参考訳) 自然言語処理において,生成テキストの品質評価は難しい課題である。この困難は本文の複雑さと多様性から生じる。最近では,openaiの大規模言語モデル(llm)であるchatgptが,さまざまなタスクのパフォーマンス向上によって注目を浴びている。そこで本報告では,LLM,特にChatGPTの有効性について検討し,テキスト品質評価におけるそれらの使用方法を検討する。 chatgptまたは類似のllmに基づく3種類の参照フリー評価手法を比較した。実験の結果,ChatGPTは様々な視点からテキスト品質を効果的に評価でき,既存の自動メトリクスよりも優れた性能を示すことがわかった。特に,ChatGPTを用いてテキスト品質を計測する数値スコアを生成するExplicit Scoreは,この3つの手法の中で最も効果的で信頼性の高い手法である。しかし、ChatGPTを用いて2つのテキストの品質を直接比較することは、最適以下の結果をもたらす可能性がある。本稿では,ChatGPT などの LLM を用いたテキスト品質評価手法の選択について,貴重な知見を提供する。

Evaluating the quality of generated text is a challenging task in natural language processing. This difficulty arises from the inherent complexity and diversity of text. Recently, OpenAI's ChatGPT, a powerful large language model (LLM), has garnered significant attention due to its impressive performance in various tasks. Therefore, we present this report to investigate the effectiveness of LLMs, especially ChatGPT, and explore ways to optimize their use in assessing text quality. We compared three kinds of reference-free evaluation methods based on ChatGPT or similar LLMs. The experimental results prove that ChatGPT is capable to evaluate text quality effectively from various perspectives without reference and demonstrates superior performance than most existing automatic metrics. In particular, the Explicit Score, which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method among the three exploited approaches. However, directly comparing the quality of two texts using ChatGPT may lead to suboptimal results. We hope this report will provide valuable insights into selecting appropriate methods for evaluating text quality with LLMs such as ChatGPT.

翻訳日:2023-04-04 16:34:44 公開日:2023-04-03

# 集合的エミッションの確立における貯水池記憶の影響--非マルコビアン性-

Reservoir Memory Effect in the Establishment of Collective Emission: Non-Markovianity beyond Retardation ( http://arxiv.org/abs/2304.00722v1 )

ライセンス: Link先を確認

Yu-Xiang Zhang

(参考訳) 集団放出を確立するために、アンサンブル内の原子は仮想光子を交換することでその挙動を調整しなければならない。我々は、この非マルコフ過程を1次元(1次元)導波路に結合したサブ波長原子鎖で研究し、非マルコフ性の唯一の原因ではないことを発見した。もう1つの要因は光子環境の記憶であり、そこでは1つの励起原子は二次崩壊であるゼノレジームから指数崩壊に至る有限の時間を必要とする。導波路の設定では、このクロスオーバーは遅延よりも長い時間スケールを持ち、それによって集団行動の構築に影響を及ぼす。完全な量子処理と遅延効果のみを組み込んだ近似を比較することで、原子励起の集団によって特徴づけられるフィールドメモリ効果は、単一光子超放射において単一原子の崩壊よりもずっと顕著であることが分かる。

To establish a collective emission, the atoms in an ensemble must coordinate their behavior by exchanging virtual photons. We study this non-Markovian process in a subwavelength atom chain coupled to a one-dimensional (1D) waveguide and find that retardation is not the only cause of non-Markovianity. The other factor is the memory of the photonic environment, for which a single excited atom needs a finite time to cross from quadratic decay, the Zeno regime, to exponential decay. In waveguide setup, this crossover has a time scale longer than the retardation, thus impacts on the building up of collective behavior. By comparing a full quantum treatment with an approximation incorporating only the retardation effect, we find that field memory effect, characterized by the population of atomic excitation, is much more pronounced in single-photon superradiance than that in the decay of a single atom.

翻訳日:2023-04-04 16:34:27 公開日:2023-04-03

# 例外点近傍のピーターマン因子と位相剛性

Petermann factors and phase rigidities near exceptional points ( http://arxiv.org/abs/2304.00764v1 )

ライセンス: Link先を確認

Jan Wiersig

(参考訳) ピーターマン因子と位相剛性は、摂動に対するエネルギー固有値の感度やレーザーにおける量子過剰ノイズの大きさなど、オープン量子および波動系の様々な側面に便利な尺度である。非エルミート退化に近い2つの重要な量の挙動を議論する。小型の一般摂動の場合、例外点のスペクトル応答強度との関係を示す解析的明示的な公式を導出する。一般理論の予測は、おもちゃモデルの解析解と比較に成功している。さらに, ピーターマン係数とスペクトル応答強度との関係は, 後者を計算するための効率的な数値計算の基礎となることが示されている。

The Petermann factor and the phase rigidity are convenient measures for various aspects of open quantum and wave systems, such as the sensitivity of energy eigenvalues to perturbations or the magnitude of quantum excess noise in lasers. We discuss the behavior of these two important quantities near non-Hermitian degeneracies, so-called exceptional points. For small generic perturbations, we derive analytically explicit formulas which reveal a relation to the spectral response strength of the exceptional point. The predictions of the general theory are successfully compared to analytical solutions of a toy model. Moreover, it is demonstrated that the connection between Petermann factor and spectral response strength provides the basis for an efficient numerical scheme to calculate the latter.

翻訳日:2023-04-04 16:27:52 公開日:2023-04-03

# bollwm:インドの綿畑からボルワーム害虫をモニタリングする現実世界のデータセット

BOLLWM: A real-world dataset for bollworm pest monitoring from cotton fields in India ( http://arxiv.org/abs/2304.00763v1 )

ライセンス: Link先を確認

Jerome White, Chandan Agrawal, Anmol Ojha, Apoorv Agnihotri, Makkunda Sharma, Jigar Doshi

(参考訳) 本稿では,インド全土の小規模農家や農業拡張労働者が5年間にわたって収集した農薬画像のデータセットについて述べる。このデータセットは、農夫の害虫管理決定を支援するために人工知能に依存するモバイルアプリケーションをサポートするために使用されている。作成は、組織化されたデータ収集の混合と、制御の少ないモバイルアプリケーションの使用から行われた。これにより、データセットは害虫検出コミュニティ内でユニークになり、他の非農業目的の検知データセットに近い多くの特徴が示される。これは、データセットを将来の害虫管理アプリケーションに適用するだけでなく、他の様々な研究課題への扉を開く。

This paper presents a dataset of agricultural pest images captured over five years by thousands of small holder farmers and farming extension workers across India. The dataset has been used to support a mobile application that relies on artificial intelligence to assist farmers with pest management decisions. Creation came from a mix of organized data collection, and from mobile application usage that was less controlled. This makes the dataset unique within the pest detection community, exhibiting a number of characteristics that place it closer to other non-agricultural objected detection datasets. This not only makes the dataset applicable to future pest management applications, it opens the door for a wide variety of other research agendas.

翻訳日:2023-04-04 16:27:40 公開日:2023-04-03

# 3d衣料アニメーションのための学習アンカー変換

Learning Anchor Transformations for 3D Garment Animation ( http://arxiv.org/abs/2304.00761v1 )

ライセンス: Link先を確認

Fang Zhao, Zekun Li, Shaoli Huang, Junwu Weng, Tianfei Zhou, Guo-Sen Xie, Jue Wang, Ying Shan

(参考訳) 本稿では,アンカーに基づく変形モデル,すなわちアンカーDEFを提案し,身体動作シーケンスから3次元衣料アニメーションを予測する。これは、余分な非線形変位を持つ剛性変換の混合により、衣料メッシュテンプレートを変形させる。メッシュ表面を囲む一連のアンカーは、剛性変換行列の学習を導くために導入された。アンカー変換が見つかると、衣服テンプレートの頂点ごとの非線形変位を正準空間に回帰させることができるため、変形空間学習の複雑さが軽減される。変換されたアンカーを位置、正規および方向の成分を満たすように明示的に制約することにより、空間における学習されたアンカー変換の物理的意味がより一般化するために保証される。さらに,代表的アンカー変換を学習するために,局所メッシュトポロジを意識してアンカー位置を最適化するアダプティブアンカー更新を提案する。異なる種類の衣服の質的および定量的実験により、アンコールDEFは、特にゆるやかな衣服において、動作中の3次元衣服の変形予測における最先端の性能を達成することを示した。

This paper proposes an anchor-based deformation model, namely AnchorDEF, to predict 3D garment animation from a body motion sequence. It deforms a garment mesh template by a mixture of rigid transformations with extra nonlinear displacements. A set of anchors around the mesh surface is introduced to guide the learning of rigid transformation matrices. Once the anchor transformations are found, per-vertex nonlinear displacements of the garment template can be regressed in a canonical space, which reduces the complexity of deformation space learning. By explicitly constraining the transformed anchors to satisfy the consistencies of position, normal and direction, the physical meaning of learned anchor transformations in space is guaranteed for better generalization. Furthermore, an adaptive anchor updating is proposed to optimize the anchor position by being aware of local mesh topology for learning representative anchor transformations. Qualitative and quantitative experiments on different types of garments demonstrate that AnchorDEF achieves the state-of-the-art performance on 3D garment deformation prediction in motion, especially for loose-fitting garments.

翻訳日:2023-04-04 16:27:30 公開日:2023-04-03

# FedIN: モデル不均一性のためのフェデレーション中間層学習

FedIN: Federated Intermediate Layers Learning for Model Heterogeneity ( http://arxiv.org/abs/2304.00759v1 )

ライセンス: Link先を確認

Chan Yun-Hin, Jiang Zhihan, Deng Jing, Ngai C.-H. Edith

(参考訳) フェデレートラーニング(FL)は、エッジデバイスがローカルおよびプライベートにトレーニングデータを維持しながら、グローバルな共有モデルを協調的にトレーニングすることを促進する。しかし、FLにおける一般的だが非現実的な仮定は、参加するエッジデバイスは同じリソースを持ち、同じグローバルモデルアーキテクチャを共有することである。本研究では,FedIN(Federated Intermediate Layers Learning)と呼ばれる新しいFL手法を提案する。 FedINのトレーニングモデルは、抽出器、中間層、分類器を含む3つの部分に分けられる。抽出器と分類器のモデルアーキテクチャは、中間層の特徴の一貫性を維持するためにすべてのデバイスで同じであるが、中間層のアーキテクチャはリソース容量に応じて異種デバイスに対して異なる。特徴から知識を生かすため、我々は、他のクライアントの機能に合わせて中間層を訓練し、訓練することを提案する。さらに,INトレーニングと局所トレーニングの競合によって引き起こされる勾配分散問題を緩和するため,凸最適化問題を定式化し,解決する。実験結果から,FedINは異種モデル環境において,最先端のアルゴリズムと比較して最高の性能を発揮することが示された。さらに,本研究では,イントレーニングの有効性と凸最適化問題に対する解法を示す。

Federated learning (FL) facilitates edge devices to cooperatively train a global shared model while maintaining the training data locally and privately. However, a common but impractical assumption in FL is that the participating edge devices possess the same required resources and share identical global model architecture. In this study, we propose a novel FL method called Federated Intermediate Layers Learning (FedIN), supporting heterogeneous models without utilizing any public dataset. The training models in FedIN are divided into three parts, including an extractor, the intermediate layers, and a classifier. The model architectures of the extractor and classifier are the same in all devices to maintain the consistency of the intermediate layer features, while the architectures of the intermediate layers can vary for heterogeneous devices according to their resource capacities. To exploit the knowledge from features, we propose IN training, training the intermediate layers in line with the features from other clients. Additionally, we formulate and solve a convex optimization problem to mitigate the gradient divergence problem induced by the conflicts between the IN training and the local training. The experiment results show that FedIN achieves the best performance in the heterogeneous model environment compared with the state-of-the-art algorithms. Furthermore, our ablation study demonstrates the effectiveness of IN training and the solution to the convex optimization problem.

翻訳日:2023-04-04 16:27:10 公開日:2023-04-03

# Spot-the-Camel: 安全な道路のためのコンピュータビジョン

Spot-the-Camel: Computer Vision for Safer Roads ( http://arxiv.org/abs/2304.00757v1 )

ライセンス: Link先を確認

Khalid Alnujaidi, Ghada Alhabib, Abdulaziz Alodhieb

(参考訳) 人口が増加し、土地が都市化に利用されていくにつれて、私たちの道路や車によって生態系は混乱しています。このインフラストラクチャーの拡大は野生生物の領域を縮小し、多くの野生動物と車両の衝突(wvc)を引き起こした。これらのWVCの事例は、グローバルな社会経済的影響を持つ世界的な問題であり、数十億ドルの財産損害と、時には自動車利用者の死亡率をもたらす。サウジアラビアでは、この問題は同様のものであり、カメル・ヴェイクル衝突(CVC)の事例はラクダの大型化によって特に致命率 [1] の25% となる。この研究の焦点は、道路上でラクダを検出するタスクに基づいて、異なる物体検出モデルをテストすることである。実験で使用されるDeep Learning(DL)オブジェクト検出モデルは、Center Net、Efficient Det、Faster R-CNN、SSD、YOLOv8である。実験の結果, YOLOv8の精度は最高であり, トレーニングでは最も効率的であった。将来的には、田舎道をより安全にするシステムを開発することで、この事業を拡大する計画だ。

As the population grows and more land is being used for urbanization, ecosystems are disrupted by our roads and cars. This expansion of infrastructure cuts through wildlife territories, leading to many instances of Wildlife-Vehicle Collision (WVC). These instances of WVC are a global issue that is having a global socio-economic impact, resulting in billions of dollars in property damage and, at times, fatalities for vehicle occupants. In Saudi Arabia, this issue is similar, with instances of Camel-Vehicle Collision (CVC) being particularly deadly due to the large size of camels, which results in a 25% fatality rate [1]. The focus of this work is to test different object detection models on the task of detecting camels on the road. The Deep Learning (DL) object detection models used in the experiments are: Center Net, Efficient Det, Faster R-CNN, SSD, and YOLOv8. Results of the experiments show that YOLOv8 performed the best in terms of accuracy and was the most efficient in training. In the future, the plan is to expand on this work by developing a system to make countryside roads safer.

翻訳日:2023-04-04 16:26:50 公開日:2023-04-03

# 構造情報原理による効果的で安定な役割ベース多エージェント協調

Effective and Stable Role-Based Multi-Agent Collaboration by Structural Information Principles ( http://arxiv.org/abs/2304.00755v1 )

ライセンス: Link先を確認

Xianghua Zeng, Hao Peng, Angsheng Li

(参考訳) ロールベース学習はマルチエージェント強化学習(marl)の性能を向上させるための有望なアプローチである。しかしながら、現在のロールベースのメソッドでは、事前に定義されたロール構造か、ハイパーパラメータを選択するための実践的な経験のいずれかを前提として、複雑なタスクを効果的に分解する一連のロールを安定して発見することは保証できない。本稿では、SIRDという数学的構造情報原理に基づくロールディスカバリ手法を提案し、マルチエージェント協調のためのSIRD最適化MARLフレームワーク、SR-MARLを提案する。 SIRDはロール発見を階層的なアクション空間クラスタリングに変換する。具体的には、SIRDは構造化、スパーシフィケーション、最適化モジュールで構成され、最適なエンコーディングツリーを生成して、役割を発見するための抽象化を実行する。 SIRDは特定のMARLアルゴリズムに非依存であり、様々な値関数分解アプローチと柔軟に統合される。 StarCraft IIマイクロマネジメントベンチマークの実証的な評価は、最先端のMARLアルゴリズムと比較して、SR-MARLフレームワークは平均テストの勝利率を0.17%、6.08%、3.24%改善し、容易でハードなシナリオ下では16.67%、30.80%、66.30%の偏差を減少させることを示した。

Role-based learning is a promising approach to improving the performance of Multi-Agent Reinforcement Learning (MARL). Nevertheless, without manual assistance, current role-based methods cannot guarantee stably discovering a set of roles to effectively decompose a complex task, as they assume either a predefined role structure or practical experience for selecting hyperparameters. In this article, we propose a mathematical Structural Information principles-based Role Discovery method, namely SIRD, and then present a SIRD optimizing MARL framework, namely SR-MARL, for multi-agent collaboration. The SIRD transforms role discovery into a hierarchical action space clustering. Specifically, the SIRD consists of structuralization, sparsification, and optimization modules, where an optimal encoding tree is generated to perform abstracting to discover roles. The SIRD is agnostic to specific MARL algorithms and flexibly integrated with various value function factorization approaches. Empirical evaluations on the StarCraft II micromanagement benchmark demonstrate that, compared with state-of-the-art MARL algorithms, the SR-MARL framework improves the average test win rate by 0.17%, 6.08%, and 3.24%, and reduces the deviation by 16.67%, 30.80%, and 66.30%, under easy, hard, and super hard scenarios.

翻訳日:2023-04-04 16:26:26 公開日:2023-04-03

# 3DポイントクラウドのセマンティックセグメンテーションをU-Nextフレームワークで強化する

Small but Mighty: Enhancing 3D Point Clouds Semantic Segmentation with U-Next Framework ( http://arxiv.org/abs/2304.00749v1 )

ライセンス: Link先を確認

Ziyin Zeng and Qingyong Hu and Zhong Xie and Jian Zhou and Yongyang Xu

(参考訳) 大規模3次元点雲のセマンティックセグメンテーションの問題点を考察する。近年,局所的特徴集約,損失関数の改善,サンプリング戦略など,多くの研究が進められている。ポイントクラウドセマンティックセグメンテーションの基本的なフレームワークはほとんど見過ごされているが、既存のアプローチのほとんどはデフォルトではU-Netアーキテクチャに依存している。本稿では,ポイントクラウドセマンティクスセグメンテーション用に設計された,小型だが強力なフレームワークであるu-nextを提案する。このフレームワークの鍵は、意味的に類似した特徴写像からマルチスケール階層表現を学ぶことである。具体的には,複数のU-Net$L^1$コーデックをネストした高密度な方法で積み重ねることで,セマンティックギャップを最小限に抑えるとともに,機能マップをスケールにわたって融合させて,詳細な詳細を効果的に回収する。また,よりスムーズな勾配伝搬とネットワーク最適化を実現するため,マルチレベル深層監視機構を考案した。 S3DIS、Tronto3D、SensatUrbanの3つの大規模ベンチマークで実施された大規模な実験は、提案したU-Nextアーキテクチャの優位性と有効性を示している。我々のU-Nextアーキテクチャは、さまざまなタスクやベースラインモデルにまたがる一貫性と可視性の向上を示し、将来の研究の一般的なフレームワークとして機能する可能性を示している。

We study the problem of semantic segmentation of large-scale 3D point clouds. In recent years, significant research efforts have been directed toward local feature aggregation, improved loss functions and sampling strategies. While the fundamental framework of point cloud semantic segmentation has been largely overlooked, with most existing approaches rely on the U-Net architecture by default. In this paper, we propose U-Next, a small but mighty framework designed for point cloud semantic segmentation. The key to this framework is to learn multi-scale hierarchical representations from semantically similar feature maps. Specifically, we build our U-Next by stacking multiple U-Net $L^1$ codecs in a nested and densely arranged manner to minimize the semantic gap, while simultaneously fusing the feature maps across scales to effectively recover the fine-grained details. We also devised a multi-level deep supervision mechanism to further smooth gradient propagation and facilitate network optimization. Extensive experiments conducted on three large-scale benchmarks including S3DIS, Toronto3D, and SensatUrban demonstrate the superiority and the effectiveness of the proposed U-Next architecture. Our U-Next architecture shows consistent and visible performance improvements across different tasks and baseline models, indicating its great potential to serve as a general framework for future research.

翻訳日:2023-04-04 16:25:56 公開日:2023-04-03

# OTS: 歴史的文書におけるテキストスポッティングのワンショット学習手法

OTS: A One-shot Learning Approach for Text Spotting in Historical Manuscripts ( http://arxiv.org/abs/2304.00746v1 )

ライセンス: Link先を確認

Wen-Bo Hu, Hong-Jian Zhan, Cong Liu, Bing Yin, Yue Lu

(参考訳) 歴史文書処理は、限定的な注釈付きトレーニングデータや新しいクラスの出現といった課題を提起する。そこで本研究では,新しい文字を1つの注釈付きサポートサンプルで正確にかつ確実に検出する,ワンショット学習ベースのテキストスポッティング(OTS)手法を提案する。認知研究からインスピレーションを得た空間アライメントモジュールを導入し、一つの支援画像に基づいてクエリ画像の最も識別性の高い空間領域を探索し、注目し、学習する。特に,低リソーススポッティングタスクは,例えば不均衡の問題に直面することが多いため,距離計量の埋め込み空間をより識別可能な,トーラス損失と呼ばれる新しい損失関数を提案する。我々のアプローチは非常に効率的で、わずかなトレーニングサンプルしか必要とせず、新しい文字やシンボルを扱う素晴らしい能力を示しています。データセットの多様性を高めるために、古代ドンバヒエログリフィクス(dbh)を含む新しい写本データセットを作成する。我々は、利用可能なVML-HD、TKH、NCデータセット、新しいDBHデータセットについて実験を行う。実験の結果,OTSは1ショットテキストスポッティングにおいて最先端の手法よりも優れていた。提案手法は,歴史写本のテキストスポッティング分野における有望な応用を提供する。

Historical manuscript processing poses challenges like limited annotated training data and novel class emergence. To address this, we propose a novel One-shot learning-based Text Spotting (OTS) approach that accurately and reliably spots novel characters with just one annotated support sample. Drawing inspiration from cognitive research, we introduce a spatial alignment module that finds, focuses on, and learns the most discriminative spatial regions in the query image based on one support image. Especially, since the low-resource spotting task often faces the problem of example imbalance, we propose a novel loss function called torus loss which can make the embedding space of distance metric more discriminative. Our approach is highly efficient and requires only a few training samples while exhibiting the remarkable ability to handle novel characters, and symbols. To enhance dataset diversity, a new manuscript dataset that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct experiments on publicly available VML-HD, TKH, NC datasets, and the new proposed DBH dataset. The experimental results demonstrate that OTS outperforms the state-of-the-art methods in one-shot text spotting. Overall, our proposed method offers promising applications in the field of text spotting in historical manuscripts.

翻訳日:2023-04-04 16:25:31 公開日:2023-04-03

# DeGPR:マルチクラス細胞検出・カウントのためのディープガイド後正則化

DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting ( http://arxiv.org/abs/2304.00741v1 )

ライセンス: Link先を確認

Aayush Kumar Tyagi, Chirag Mohapatra, Prasenjit Das, Govind Makharia, Lalita Mehra, Prathosh AP, Mausam

(参考訳) マルチクラス細胞検出とカウントは多くの病理診断に必須の課題である。手動計数は面倒で、しばしば病理学者の間でのサーバ間差につながる。複数の汎用的深層学習に基づく物体検出と計数方法が存在するが、限られたデータ、小さな重なり合った物体の存在、複数の細胞タイプ、重篤なクラス不均衡、細胞のサイズと形状の微妙な違いなどにより、医療画像中の細胞の検出と計数に容易に移行できない可能性がある。そこで本研究では,細胞間における識別的特徴を活かし,物体検出を支援するガイド付き後正則化(DeGPR)を提案する。これらの特徴は、病理学者によって提供されたり、視覚データから直接推測される。我々は2つの公開データセット(CoNSePとMoNuSAC)と、コントリビュートした新しいデータセットであるMuCeDでモデルを検証した。 MuCeDは、盲腸疾患を予測するためのヒト十二指腸の生検画像55枚からなる。 3つのデータセットで3つのオブジェクト検出ベースラインで広範な実験を行い、degprがモデルに依存しないことを示し、9%までの(絶対的な)マップゲインを得るベースラインを一貫して改善した。

Multi-class cell detection and counting is an essential task for many pathological diagnoses. Manual counting is tedious and often leads to inter-observer variations among pathologists. While there exist multiple, general-purpose, deep learning-based object detection and counting methods, they may not readily transfer to detecting and counting cells in medical images, due to the limited data, presence of tiny overlapping objects, multiple cell types, severe class-imbalance, minute differences in size/shape of cells, etc. In response, we propose guided posterior regularization (DeGPR), which assists an object detector by guiding it to exploit discriminative features among cells. The features may be pathologist-provided or inferred directly from visual data. We validate our model on two publicly available datasets (CoNSeP and MoNuSAC), and on MuCeD, a novel dataset that we contribute. MuCeD consists of 55 biopsy images of the human duodenum for predicting celiac disease. We perform extensive experimentation with three object detection baselines on three datasets to show that DeGPR is model-agnostic, and consistently improves baselines obtaining up to 9% (absolute) mAP gains.

翻訳日:2023-04-04 16:25:11 公開日:2023-04-03

# 言語モデルにおける知識表現の測定と操作

Measuring and Manipulating Knowledge Representations in Language Models ( http://arxiv.org/abs/2304.00740v1 )

ライセンス: Link先を確認

Evan Hernandez, Belinda Z. Li, Jacob Andreas

(参考訳) ニューラルネットワークモデル(lms)は、テキストで記述された世界の事実を表す。しばしばこれらの事実は訓練データに由来する(ほとんどのlmsではバナナという言葉の表現はバナナが果物であるという事実を象徴している)。時々、事実は入力テキスト自体に由来する("I poured the bottle"という文の表現は、ボトルが空になったという事実をエンコードしている)。 LMファクト表現の検査と修正を行うツールは、世界が変化した時に更新したり、バイアスのソースをローカライズしたり削除したり、生成されたテキストのエラーを識別したりできる。 LMにおける事実知識のクエリと修正のためのアプローチであるREMEDIについて述べる。 REMEDIは、LMの内部表現システムにおいて、テキストクエリから事実エンコーディングへのマップを学習する。これらのエンコーディングは知識エディタとして使用できる。lm隠れ表現に追加することで、下流生成を変更でき、新しい事実と一致させることができる。 REMEDIエンコーディングは、モデルプローブとしても使用することができる: LM表現と比較することで、LMが言及したエンティティにどの特性があるかを確認し、背景知識や入力テキストと矛盾する出力を生成するタイミングを予測することができる。したがって、REMEDIは、探索、プロンプト、モデル編集の研究をリンクし、LMにおける知識のきめ細かい検査と制御のための一般的なツールへのステップを提供する。

Neural language models (LMs) represent facts about the world described by text. Sometimes these facts derive from training data (in most LMs, a representation of the word banana encodes the fact that bananas are fruits). Sometimes facts derive from input text itself (a representation of the sentence "I poured out the bottle" encodes the fact that the bottle became empty). Tools for inspecting and modifying LM fact representations would be useful almost everywhere LMs are used: making it possible to update them when the world changes, to localize and remove sources of bias, and to identify errors in generated text. We describe REMEDI, an approach for querying and modifying factual knowledge in LMs. REMEDI learns a map from textual queries to fact encodings in an LM's internal representation system. These encodings can be used as knowledge editors: by adding them to LM hidden representations, we can modify downstream generation to be consistent with new facts. REMEDI encodings can also be used as model probes: by comparing them to LM representations, we can ascertain what properties LMs attribute to mentioned entities, and predict when they will generate outputs that conflict with background knowledge or input text. REMEDI thus links work on probing, prompting, and model editing, and offers steps toward general tools for fine-grained inspection and control of knowledge in LMs.

翻訳日:2023-04-04 16:24:49 公開日:2023-04-03

# ソース不要のドメイン適応に必要な微調整は少ない

Few-shot Fine-tuning is All You Need for Source-free Domain Adaptation ( http://arxiv.org/abs/2304.00792v1 )

ライセンス: Link先を確認

Suho Lee, Seungwon Seo, Jihyo Kim, Yejin Lee, Sangheum Hwang

(参考訳) 近年、ラベル付きソースデータが常にアクセス可能であると仮定するアン教師なしドメイン適応(UDA)と比較して、ソースフリーなアン教師なしドメイン適応(SFUDA)が実用的で実現可能なアプローチとして出現している。しかし、SFUDAアプローチに関連する重要な制限はしばしば見過ごされ、現実のアプリケーションにおける実用性を制限する。これらの制限には、最適なハイパーパラメータを決定するための原則的な方法の欠如と、未ラベルのターゲットデータが、ソースデータに対するクローズドセットや同一ラベルの分布のような特定の要件を満たすことができない場合のパフォーマンス劣化が含まれる。これらの制限はすべて、SFUDAが完全にラベルのないターゲットデータに依存しているという事実に由来する。実世界のシナリオにおける既存のsfudaメソッドの限界を実証し、対象データへの分散やラベルの分散シフトを実証し、これらの方法が現実世界の設定に安全に適用できないことを検証した。実験結果から,SFUDAの限界を回避するために,ラベル付きデータ(例:1-または3-shot)で事前訓練したソースモデルを微調整することが,実用的で信頼性の高いソリューションであると主張している。一般的な信念とは対照的に、注意深い微調整モデルでは、ラベル付きデータのみをトレーニングしても過度な適合に悩まされず、サンプリングバイアスによるパフォーマンスの変化もほとんどない。様々なドメイン適応ベンチマークにおける実験結果から, マイナショットの微調整手法は標準sfuda設定で比較し, 現実的なシナリオで比較手法を上回った。私たちのコードはhttps://github.com/daintlab/fewshot-SFDAで利用可能です。

Recently, source-free unsupervised domain adaptation (SFUDA) has emerged as a more practical and feasible approach compared to unsupervised domain adaptation (UDA) which assumes that labeled source data are always accessible. However, significant limitations associated with SFUDA approaches are often overlooked, which limits their practicality in real-world applications. These limitations include a lack of principled ways to determine optimal hyperparameters and performance degradation when the unlabeled target data fail to meet certain requirements such as a closed-set and identical label distribution to the source data. All these limitations stem from the fact that SFUDA entirely relies on unlabeled target data. We empirically demonstrate the limitations of existing SFUDA methods in real-world scenarios including out-of-distribution and label distribution shifts in target data, and verify that none of these methods can be safely applied to real-world settings. Based on our experimental results, we claim that fine-tuning a source pretrained model with a few labeled data (e.g., 1- or 3-shot) is a practical and reliable solution to circumvent the limitations of SFUDA. Contrary to common belief, we find that carefully fine-tuned models do not suffer from overfitting even when trained with only a few labeled data, and also show little change in performance due to sampling bias. Our experimental results on various domain adaptation benchmarks demonstrate that the few-shot fine-tuning approach performs comparatively under the standard SFUDA settings, and outperforms comparison methods under realistic scenarios. Our code is available at https://github.com/daintlab/fewshot-SFDA .

翻訳日:2023-04-04 16:18:49 公開日:2023-04-03

# リアルタイムWindowsによる動的車両ルーティング問題を解決するための機械学習の組合せ最適化

Combinatorial Optimization enriched Machine Learning to solve the Dynamic Vehicle Routing Problem with Time Windows ( http://arxiv.org/abs/2304.00789v1 )

ライセンス: Link先を確認

L\'eo Baty, Kai Jungel, Patrick S. Klein, Axel Parmentier, Maximilian Schiffer

(参考訳) eコマースの台頭と顧客要求の増加により、ロジスティクスサービスプロバイダは日々の計画において新たな複雑さに直面している。既存のマルチステージ確率最適化アプローチは、基礎となる動的車両ルーティングの問題を解決するには、オンライン設定のアプリケーションには計算コストがかかりすぎるか、あるいは強化学習の場合、高次元の組合せ問題にうまく対応できない。これらの欠点を緩和するために,組合せ最適化層を組み込んだ新しい機械学習パイプラインを提案する。最近,EURO Meets NeurIPS Vehicle Routing Competition at NeurIPS 2022で推進されているディスパッチ波を用いた動的車両ルーティング問題に適用した。提案手法は,提案した動的車両経路問題の解法において,他の全ての手法よりも優れていた。本研究は,提案するパイプラインの有効性とメリットを,例えば,未確認のインスタンスやシナリオに対して符号化されたポリシの堅牢性を示すことで,競争で達成された結果を超えて強調する。

With the rise of e-commerce and increasing customer requirements, logistics service providers face a new complexity in their daily planning, mainly due to efficiently handling same day deliveries. Existing multi-stage stochastic optimization approaches that allow to solve the underlying dynamic vehicle routing problem are either computationally too expensive for an application in online settings, or -- in the case of reinforcement learning -- struggle to perform well on high-dimensional combinatorial problems. To mitigate these drawbacks, we propose a novel machine learning pipeline that incorporates a combinatorial optimization layer. We apply this general pipeline to a dynamic vehicle routing problem with dispatching waves, which was recently promoted in the EURO Meets NeurIPS Vehicle Routing Competition at NeurIPS 2022. Our methodology ranked first in this competition, outperforming all other approaches in solving the proposed dynamic vehicle routing problem. With this work, we provide a comprehensive numerical study that further highlights the efficacy and benefits of the proposed pipeline beyond the results achieved in the competition, e.g., by showcasing the robustness of the encoded policy against unseen instances and scenarios.

翻訳日:2023-04-04 16:18:21 公開日:2023-04-03

# 3次元アノテーションを伴わないオープンボキャブラリポイントクラウド物体検出

Open-Vocabulary Point-Cloud Object Detection without 3D Annotation ( http://arxiv.org/abs/2304.00788v1 )

ライセンス: Link先を確認

Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang

(参考訳) open-vocabulary detectionの目的は、任意のテキスト記述に基づいて新しいオブジェクトを識別することである。本稿では,オープンな3次元ポイントクラウド検出を分割・コンカレンス戦略により解決する。 1)各種オブジェクトのローカライズのための汎用表現を学習可能なポイントクラウド検出器の開発 2)テキスト表現とポイントクラウド表現を接続することで,検出者がテキストプロンプトに基づいて新たなオブジェクトカテゴリを分類できる。具体的には、2dプリトレーニングされた検出器から予測された2dバウンディングボックスの監督下で、ポイントクラウド検出器がオブジェクトのローカライズを学習するリッチイメージプリトレーニングモデルを用いる。さらに,画像,点雲,テキストのモダリティを結合し,視覚言語による事前学習モデル(CLIP)の恩恵を受けるために,非偏差三重項比較学習を提案する。ポイントクラウド検出器に画像と視覚言語を事前訓練した新しいモデルを使用することで、3Dアノテーションを必要とせずにオープンな3Dオブジェクト検出が可能になる。実験により,ScanNet および SUN RGB-D データセット上での幅広いベースラインに対して,少なくとも 3.03 点と 7.47 点の改善が得られた。さらに,アプローチが機能する理由を説明するために,包括的な分析を行う。

The goal of open-vocabulary detection is to identify novel objects based on arbitrary textual descriptions. In this paper, we address open-vocabulary 3D point-cloud detection by a dividing-and-conquering strategy, which involves: 1) developing a point-cloud detector that can learn a general representation for localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting. Specifically, we resort to rich image pre-trained models, by which the point-cloud detector learns localizing objects under the supervision of predicted 2D bounding boxes from 2D pre-trained detectors. Moreover, we propose a novel de-biased triplet cross-modal contrastive learning to connect the modalities of image, point-cloud and text, thereby enabling the point-cloud detector to benefit from vision-language pre-trained models,i.e.,CLIP. The novel use of image and vision-language pre-trained models for point-cloud detectors allows for open-vocabulary 3D object detection without the need for 3D annotations. Experiments demonstrate that the proposed method improves at least 3.03 points and 7.47 points over a wide range of baselines on the ScanNet and SUN RGB-D datasets, respectively. Furthermore, we provide a comprehensive analysis to explain why our approach works.

翻訳日:2023-04-04 16:17:58 公開日:2023-04-03

# 画像マッティングのための異方性事前学習

Disentangled Pre-training for Image Matting ( http://arxiv.org/abs/2304.00784v1 )

ライセンス: Link先を確認

Yanda Li, Zilong Huang, Gang Yu, Ling Chen, Yunchao Wei, Jianbo Jiao

(参考訳) 画像マッチングは、近年の文献における深層モデルのトレーニングを支援するために、高品質なピクセルレベルの人間のアノテーションを必要とする。このようなアノテーションは費用がかかり、スケールが難しいが、研究の発展を著しく妨げている。本研究では,無限個のデータを利用してマットング性能を向上させる自己教師付き事前学習手法を提案することで,この問題への最初の試みを行う。プリトレーニングタスクは、ランダムなトリマップとアルファマットを生成して画像不等角化目標を達成するイメージマットングと似た方法で設計される。次に、事前訓練されたモデルは、微調整のための下流マットングタスクの初期化として使用される。広範な実験評価により,提案手法は最先端のマットング法と他の自己教師付き初期化手法を大差で上回ることがわかった。また,異なるバックボーンアーキテクチャ上で提案手法の堅牢性を示す。コードとモデルは一般公開される予定だ。

Image matting requires high-quality pixel-level human annotations to support the training of a deep model in recent literature. Whereas such annotation is costly and hard to scale, significantly holding back the development of the research. In this work, we make the first attempt towards addressing this problem, by proposing a self-supervised pre-training approach that can leverage infinite numbers of data to boost the matting performance. The pre-training task is designed in a similar manner as image matting, where random trimap and alpha matte are generated to achieve an image disentanglement objective. The pre-trained model is then used as an initialisation of the downstream matting task for fine-tuning. Extensive experimental evaluations show that the proposed approach outperforms both the state-of-the-art matting methods and other alternative self-supervised initialisation approaches by a large margin. We also show the robustness of the proposed approach over different backbone architectures. The code and models will be publicly available.

翻訳日:2023-04-04 16:17:35 公開日:2023-04-03

# nemf:neural microflakeフィールドを用いた逆ボリュームレンダリング

NeMF: Inverse Volume Rendering with Neural Microflake Field ( http://arxiv.org/abs/2304.00782v1 )

ライセンス: Link先を確認

Youjia Zhang, Teng Xu, Junqing Yu, Yuteng Ye, Junle Wang, Yanqing Jing, Jingyi Yu, Wei Yang

(参考訳) 未知の照明下で撮影された画像から物体の外観の物理的特性を復元することは、写真リアルなレンダリングには不可欠である。 Recent approaches adopt the emerging implicit scene representations and have shown impressive results.However, they unanimously adopt a surface-based representation,and hence can not well handle scenes with very complex geometry, translucent object and etc.In this paper, we propose to conduct inverse volume rendering, in contrast to surface-based, by representing a scene using microflake volume, which assumes the space is filled with infinite small flakes and light reflects or scatters at each spatial location according to microflake distributions. 我々はさらに、マイクロフレークボリュームを暗黙的にエンコードする座標ネットワークを採用し、原理的にエンド・ツー・エンドでネットワークをトレーニングするための微分可能なマイクロフレークボリュームレンダを開発し、我々のNeMFは、高度に複雑な幾何学や散乱物体の外観特性を効果的に回復し、高品質なリライティング、素材編集を可能にし、特に表面ベースアプローチでは不可能な散乱などのボリュームレンダリング効果をシミュレートする。

Recovering the physical attributes of an object's appearance from its images captured under an unknown illumination is challenging yet essential for photo-realistic rendering. Recent approaches adopt the emerging implicit scene representations and have shown impressive results.However, they unanimously adopt a surface-based representation,and hence can not well handle scenes with very complex geometry, translucent object and etc.In this paper, we propose to conduct inverse volume rendering, in contrast to surface-based, by representing a scene using microflake volume, which assumes the space is filled with infinite small flakes and light reflects or scatters at each spatial location according to microflake distributions. We further adopt the coordinate networks to implicitly encode the microflake volume, and develop a differentiable microflake volume renderer to train the network in an end-to-end way in principle.Our NeMF enables effective recovery of appearance attributes for highly complex geometry and scattering object, enables high-quality relighting, material editing, and especially simulates volume rendering effects, such as scattering, which is infeasible for surface-based approaches.

翻訳日:2023-04-04 16:17:22 公開日:2023-04-03

# デンス予測のための確率的確率的プロンプト学習

Probabilistic Prompt Learning for Dense Prediction ( http://arxiv.org/abs/2304.00779v1 )

ライセンス: Link先を確認

Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang, Kwanghoon Sohn

(参考訳) 決定論的素早い学習の最近の進歩は、様々な下流視覚タスクの代替となり、事前学習された視覚言語モデルの助けを借りて、モデルが強力な視覚表現を学習できるようになる。しかしながら、このアプローチは、単一の決定論的記述が画像全体を十分に表現できないため、より複雑で多様なオブジェクトを扱う必要のある密集した予測タスクのパフォーマンスを制限している。本稿では,高次予測タスクにおいて視覚言語知識を十分に活用するための新しい確率的プロンプト学習を提案する。まず,オブジェクトクラス全体の共通属性を記述するために,学習可能なクラス非依存属性プロンプトを導入する。属性は、クラス固有のテキスト分布を定義するために、クラス情報と視覚コンテキスト知識とを組み合わせる。テキスト表現をサンプル化し、確率的画素テキストマッチング損失を用いて高密度予測タスクを導出し、提案手法の安定性と一般化能力を高める。様々な密集予測タスクとアブレーション研究の広範な実験により,提案手法の有効性が示された。

Recent progress in deterministic prompt learning has become a promising alternative to various downstream vision tasks, enabling models to learn powerful visual representations with the help of pre-trained vision-language models. However, this approach results in limited performance for dense prediction tasks that require handling more complex and diverse objects, since a single and deterministic description cannot sufficiently represent the entire image. In this paper, we present a novel probabilistic prompt learning to fully exploit the vision-language knowledge in dense prediction tasks. First, we introduce learnable class-agnostic attribute prompts to describe universal attributes across the object class. The attributes are combined with class information and visual-context knowledge to define the class-specific textual distribution. Text representations are sampled and used to guide the dense prediction task using the probabilistic pixel-text matching loss, enhancing the stability and generalization capability of the proposed method. Extensive experiments on different dense prediction tasks and ablation studies demonstrate the effectiveness of our proposed method.

翻訳日:2023-04-04 16:17:03 公開日:2023-04-03

# 思考連鎖予測制御

Chain-of-Thought Predictive Control ( http://arxiv.org/abs/2304.00776v1 )

ライセンス: Link先を確認

Zhiwei Jia, Fangchen Liu, Vineet Thumuluri, Linghao Chen, Zhiao Huang, Hao Su

(参考訳) 複雑な低レベル制御タスク(コンタクトリッチオブジェクト操作など)の実証から、一般化可能なポリシー学習を研究する。本稿では,時間的抽象概念と階層的RL(HRL)の計画能力を,新規かつ効果的な方法で組み込んだ模倣学習手法を提案する。意思決定基盤モデルへのステップとして、当社の設計はスケーラブルで、高度に最適化されたデモを活用できます。具体的には、デモの短い部分列、すなわち CoT は、タスクのサブゴールの完了を示すことでそれらの階層構造を反映する。本モデルでは,CoT全体を協調的かつ構造化された長期アクションガイダンスとして動的に予測し,典型的な2段階のサブゴール条件のポリシーを一貫して上回っている。一方、このようなCoTは、デモ間で共有される決定パターン(重騒音やランダム性のあるものでさえ)を実証するため、一般化可能な政策学習を促進する。提案手法であるChain-of-Thought Predictive Control (CoTPC) は,スケーラブルかつ高度に最適化されたデモから,低レベルの操作タスクに挑戦する上で,既存のものよりも優れています。

We study generalizable policy learning from demonstrations for complex low-level control tasks (e.g., contact-rich object manipulations). We propose an imitation learning method that incorporates the idea of temporal abstraction and the planning capabilities from Hierarchical RL (HRL) in a novel and effective manner. As a step towards decision foundation models, our design can utilize scalable, albeit highly sub-optimal, demonstrations. Specifically, we find certain short subsequences of the demos, i.e. the chain-of-thought (CoT), reflect their hierarchical structures by marking the completion of subgoals in the tasks. Our model learns to dynamically predict the entire CoT as coherent and structured long-term action guidance and consistently outperforms typical two-stage subgoal-conditioned policies. On the other hand, such CoT facilitates generalizable policy learning as they exemplify the decision patterns shared among demos (even those with heavy noises and randomness). Our method, Chain-of-Thought Predictive Control (CoTPC), significantly outperforms existing ones on challenging low-level manipulation tasks from scalable yet highly sub-optimal demos.

翻訳日:2023-04-04 16:16:46 公開日:2023-04-03

# MRIスキャンによるMGMTプロモーターメチル化状態の予測深層学習モデルの広範な実験評価

MGMT promoter methylation status prediction using MRI scans? An extensive experimental evaluation of deep learning models ( http://arxiv.org/abs/2304.00774v1 )

ライセンス: Link先を確認

Numan Saeed, Muhammad Ridzuan, Hussain Alasmawi, Ikboljon Sobirov, Mohammad Yaqub

(参考訳) 医学的診断のための深層学習の研究は増えており、これらのシステムは臨床医を上回っているとしばしば主張されている。しかし、医療効果を示すシステムはごくわずかである。この観点から,高齢者の致死性脳腫瘍であるグリオブラスト腫(glioblastoma)に対する幅広い深層学習アルゴリズムについて検討した。手術、化学療法、放射線療法はグリオブラスト腫の標準的な治療である。腫瘍に特異的な遺伝子配列であるmgmtプロモーターのメチル化状態は化学療法の効果に影響する。 MGMTプロモーターメチル化は、いくつかのがんにおける化学療法反応と生存を改善する。 MGMTプロモーターメチル化は腫瘍組織生検によって決定され、遺伝子検査される。この長期かつ侵襲的な処置は、感染やその他の合併症のリスクを高める。そこで、研究者は深層学習モデルを用いて、脳MRIスキャンから腫瘍を調べ、MGMTプロモーターのメチル化状態を決定する。 MRIスキャンを用いてMGMTプロモーターのメチル化状態を予測するため,深層学習モデルと585人の参加者の公開MRIデータセットの1つを用いた。我々はこれらのモデルをGrad-CAM、オクルージョン感度、特徴可視化、学習損失景観を用いてテストする。以上の結果から, 癌診断における深層学習システムの精度と信頼性を確保するために, 外部コホートデータを用いてこれらのモデルの性能を検証すべきであることが示唆された。

The number of studies on deep learning for medical diagnosis is expanding, and these systems are often claimed to outperform clinicians. However, only a few systems have shown medical efficacy. From this perspective, we examine a wide range of deep learning algorithms for the assessment of glioblastoma - a common brain tumor in older adults that is lethal. Surgery, chemotherapy, and radiation are the standard treatments for glioblastoma patients. The methylation status of the MGMT promoter, a specific genetic sequence found in the tumor, affects chemotherapy's effectiveness. MGMT promoter methylation improves chemotherapy response and survival in several cancers. MGMT promoter methylation is determined by a tumor tissue biopsy, which is then genetically tested. This lengthy and invasive procedure increases the risk of infection and other complications. Thus, researchers have used deep learning models to examine the tumor from brain MRI scans to determine the MGMT promoter's methylation state. We employ deep learning models and one of the largest public MRI datasets of 585 participants to predict the methylation status of the MGMT promoter in glioblastoma tumors using MRI scans. We test these models using Grad-CAM, occlusion sensitivity, feature visualizations, and training loss landscapes. Our results show no correlation between these two, indicating that external cohort data should be used to verify these models' performance to assure the accuracy and reliability of deep learning systems in cancer diagnosis.

翻訳日:2023-04-04 16:16:25 公開日:2023-04-03

# オンライン確率ニュートン法による幾何中央値の推定と応用

Online stochastic Newton methods for estimating the geometric median and applications ( http://arxiv.org/abs/2304.00770v1 )

ライセンス: Link先を確認

Antoine Godichon-Baggioni (LPSM (UMR\_8001)), Wei Lu (LMI)

(参考訳) 大規模なサンプルの場合、少数の個体が平均のような基本的な統計指標を損なうことがある。非定型的個人を自動的に検出することは困難であり、別の戦略は堅牢なアプローチである。本稿では,中心傾向のロバスト指標である確率変数の幾何学的中央値の推定に着目する。逐次到着するデータの大量のサンプルを扱うために,幾何中央値の推定を行うオンライン確率ニュートンアルゴリズムを導入し,その収束率を示す。中央値とヘッセン行列の推定値を再帰的に更新できるので、任意の指定された方向における中央値の信頼区間を決定し、オンライン統計検査を行う。

In the context of large samples, a small number of individuals might spoil basic statistical indicators like the mean. It is difficult to detect automatically these atypical individuals, and an alternative strategy is using robust approaches. This paper focuses on estimating the geometric median of a random variable, which is a robust indicator of central tendency. In order to deal with large samples of data arriving sequentially, online stochastic Newton algorithms for estimating the geometric median are introduced and we give their rates of convergence. Since estimates of the median and those of the Hessian matrix can be recursively updated, we also determine confidences intervals of the median in any designated direction and perform online statistical tests.

翻訳日:2023-04-04 16:16:05 公開日:2023-04-03

# トポロジー行動による電力グリッドの管理--高度なルールベースと強化学習エージェントの比較研究

Managing power grids through topology actions: A comparative study between advanced rule-based and reinforcement learning agents ( http://arxiv.org/abs/2304.00765v1 )

ライセンス: Link先を確認

Malte Lehna and Jan Viebahn and Christoph Scholz and Antoine Marot and Sven Tomforde

(参考訳) 電力網の運用は、現在の上昇と再生可能エネルギー生産の増加により、ますます複雑になっている。その結果、アクティブグリッド管理は従来のアプローチで限界に達している。パワーネットワークの課題を実行するための学習の文脈において、強化学習(rl)は効率良く信頼性の高いアプローチであり、自動グリッド操作の可能性がかなり高いことが示されている。本稿では、Binbinchenから提出されたエージェントを分析し、RLとルールベースのアプローチの両方において、エージェントを改善するための新しい戦略を提供する。主な改善点はN-1戦略であり、1行が切断されてもグリッドを安定に保つトポロジー作用を考える。さらに、元のグリッドへのトポロジーの回帰も提案するが、これは有益であることが証明された。改善は、チャレンジテストセットの参照アプローチに対してテストされ、ルールベースのエージェントのパフォーマンスを27%向上することができる。ルールベースとRLエージェントを直接比較すると、同様の性能が得られる。しかし、rlエージェントには明確な計算上の利点がある。また、サンプルケースの振る舞いをより詳細に分析して、さらなる洞察を与えます。ここでは,n-1戦略を通じて,エージェントの行動がより多様化するのを観察した。

The operation of electricity grids has become increasingly complex due to the current upheaval and the increase in renewable energy production. As a consequence, active grid management is reaching its limits with conventional approaches. In the context of the Learning to Run a Power Network challenge, it has been shown that Reinforcement Learning (RL) is an efficient and reliable approach with considerable potential for automatic grid operation. In this article, we analyse the submitted agent from Binbinchen and provide novel strategies to improve the agent, both for the RL and the rule-based approach. The main improvement is a N-1 strategy, where we consider topology actions that keep the grid stable, even if one line is disconnected. More, we also propose a topology reversion to the original grid, which proved to be beneficial. The improvements are tested against reference approaches on the challenge test sets and are able to increase the performance of the rule-based agent by 27%. In direct comparison between rule-based and RL agent we find similar performance. However, the RL agent has a clear computational advantage. We also analyse the behaviour in an exemplary case in more detail to provide additional insights. Here, we observe that through the N-1 strategy, the actions of the agents become more diversified.

翻訳日:2023-04-04 16:15:54 公開日:2023-04-03

# 文書レベル関係抽出のための識別性とロバスト性の統合に向けて

Towards Integration of Discriminability and Robustness for Document-Level Relation Extraction ( http://arxiv.org/abs/2304.00824v1 )

ライセンス: Link先を確認

Jia Guo, Stanley Kok, Lidong Bing

(参考訳) ドキュメントレベル関係抽出(docre)は、ドキュメントの長距離コンテキスト依存推論に依存するエンティティペアの関係を予測します。典型的なマルチラベル分類問題として、docreは、少数のポジティブな関係と多くのネガティブな関係を効果的に区別するという課題に直面している。この課題は、データセットにかなりの数のアノテーションエラーがある場合、さらに克服が困難になる。本研究では,DocRE問題に対する差別性と堅牢性の両方をよりよく統合することを目指している。具体的には,まず,確率的出力と内部表現の両方に対して高い識別性を与える効果的な損失関数を設計する。我々は,エントロピー最小化と教師付きコントラスト学習を革新的にカスタマイズした。ラベル誤りの影響を改善するため,本手法はモデルのロバスト性を高めるために,新しい負のラベルサンプリング戦略を導入した。さらに,アノテーションエラーを伴うより現実的なシナリオを模倣する2つの新しいデータレジームを導入し,サンプリング戦略を評価する。実験により,各コンポーネントの有効性を検証し,提案手法がDocREDデータセット,最近クリーン化したRe-DocRED,提案したデータレシスタンスにおいて,新たな最先端結果を実現することを示す。

Document-level relation extraction (DocRE) predicts relations for entity pairs that rely on long-range context-dependent reasoning in a document. As a typical multi-label classification problem, DocRE faces the challenge of effectively distinguishing a small set of positive relations from the majority of negative ones. This challenge becomes even more difficult to overcome when there exists a significant number of annotation errors in the dataset. In this work, we aim to achieve better integration of both the discriminability and robustness for the DocRE problem. Specifically, we first design an effective loss function to endow high discriminability to both probabilistic outputs and internal representations. We innovatively customize entropy minimization and supervised contrastive learning for the challenging multi-label and long-tailed learning problems. To ameliorate the impact of label errors, we equipped our method with a novel negative label sampling strategy to strengthen the model robustness. In addition, we introduce two new data regimes to mimic more realistic scenarios with annotation errors and evaluate our sampling strategy. Experimental results verify the effectiveness of each component and show that our method achieves new state-of-the-art results on the DocRED dataset, its recently cleaned version, Re-DocRED, and the proposed data regimes.

翻訳日:2023-04-04 16:08:36 公開日:2023-04-03

# 水中気泡の非線形振動による音楽の創造性

Musical creativity enabled by nonlinear oscillations of a bubble in water ( http://arxiv.org/abs/2304.00822v1 )

ライセンス: Link先を確認

Ivan S. Maksymov

(参考訳) オリジナルとアレンジされた既存の音楽成果は、習得に何年もの学習と実践を要する芸術である。しかし、aiによる音楽創造の分野における絶え間ない進歩にもかかわらず、質の高い音楽結果の生産は、まだ人間の前兆である。ここでは,水中の1つの気泡が,古典音楽の一片を符号化する音響圧力信号の下で非線形に振動するときに,創造的な音楽結果を生み出すために使用できることを実証する。バブルの応答の音声信号は、オリジナルの作曲のエレキギターバージョンに似ている。我々は,このバブルの性質が,音楽の配置と構成において人間の創造性をシミュレートできる物理に着想を得たAIシステムの構築に有効である,という理論的支持論を提案し,提案する。

Producing original and arranging existing musical outcomes is an art that takes years of learning and practice to master. Yet, despite the constant advances in the field of AI-powered musical creativity, production of quality musical outcomes remains a prerogative of the humans. Here we demonstrate that a single bubble in water can be used to produce creative musical outcomes, when it nonlinearly oscillates under an acoustic pressure signal that encodes a piece of classical music. The audio signal of the response of the bubble resembles an electric guitar version of the original composition. We suggest, and provide plausible theoretical supporting arguments, that this property of the bubble can be used to create physics-inspired AI systems capable of simulating human creativity in arrangement and composition of music.

翻訳日:2023-04-04 16:08:15 公開日:2023-04-03

# 適応的メッシュリファインメントのためのSwarm強化学習

Swarm Reinforcement Learning For Adaptive Mesh Refinement ( http://arxiv.org/abs/2304.00818v1 )

ライセンス: Link先を確認

Niklas Freymuth, Philipp Dahlinger, Tobias W\"urth, Luise K\"arger, Gerhard Neumann

(参考訳) アダプティブメッシュリファインメント(AMR)は、メッシュの解像度を動的に調整し、計算コストとシミュレーション精度のトレードオフを可能にするため、メッシュベースのシミュレーションには不可欠である。しかし、既存のAMRの方法はタスク依存のヒューリスティックス、高価なエラー推定器を使うか、より大きなメッシュやより複雑な問題にうまくスケールしない。本稿では、AMRをSwarm Reinforcement Learning問題として定式化し、メッシュの各要素を単純で均一なエージェントの協調システムの一部として見る。この問題の定式化とエージェントワイド報酬関数とグラフニューラルネットワークを組み合わせることで、任意の方程式系の信頼性とスケーラブルな洗練戦略を学習することができる。複雑なシミュレーションの精度と効率を改善するためのアプローチの有効性を実験的に実証した。その結果,学習ベースラインを上回って,推論中にエラーインジケータを必要とせず,従来のエラーベースのamrリファインメント戦略と同等のリファインメント品質を達成できた。

Adaptive Mesh Refinement (AMR) is crucial for mesh-based simulations, as it allows for dynamically adjusting the resolution of a mesh to trade off computational cost with the simulation accuracy. Yet, existing methods for AMR either use task-dependent heuristics, expensive error estimators, or do not scale well to larger meshes or more complex problems. In this paper, we formalize AMR as a Swarm Reinforcement Learning problem, viewing each element of a mesh as part of a collaborative system of simple and homogeneous agents. We combine this problem formulation with a novel agent-wise reward function and Graph Neural Networks, allowing us to learn reliable and scalable refinement strategies on arbitrary systems of equations. We experimentally demonstrate the effectiveness of our approach in improving the accuracy and efficiency of complex simulations. Our results show that we outperform learned baselines and achieve a refinement quality that is on par with a traditional error-based AMR refinement strategy without requiring error indicators during inference.

翻訳日:2023-04-04 16:08:00 公開日:2023-04-03

# 暗黙の談話関係をクラウドソーシングするための設計選択--タスク設計によるバイアスの顕在化

Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design ( http://arxiv.org/abs/2304.00815v1 )

ライセンス: Link先を確認

Valentina Pyatkin, Frances Yung, Merel C.J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg

(参考訳) 自然言語アノテーションの識別は、アノテーションやアノテーションフレームワークによって導入されたバイアスの観点から研究されている。そこで,本研究では,自然言語を用いて名詞の解釈を導出するクラウドソース言語アノテーションに対して,特に強い影響を与えるタスク設計バイアス(task design bias)を提案する。この目的のために,関係の曖昧さから繰り返し難易度が示された暗黙の談話関係アノテーションについて考察する。 2つの異なるアノテーションタスクを用いて得られた1200の談話関係のアノテーションを比較し、4つの異なるドメインにわたって両方のメソッドのバイアスを定量化する。どちらのメソッドもクラウドソーシング用に設計された自然言語アノテーションタスクである。タスク設計は、特定の関係に注釈者を押し付けることができ、いくつかの談話関係感覚は、一方または他方のアノテーションアプローチによりよりよく導かれることが示される。また、トレーニングやテストモデルでは、このようなバイアスを考慮するべきだと結論付けています。

Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations' ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.

翻訳日:2023-04-04 16:07:41 公開日:2023-04-03

# ディープニューラルネットワークのモデル非依存的到達性解析

Model-Agnostic Reachability Analysis on Deep Neural Networks ( http://arxiv.org/abs/2304.00813v1 )

ライセンス: Link先を確認

Chi Zhang, Wenjie Ruan, Fu Wang, Peipei Xu, Geyong Min, Xiaowei Huang

(参考訳) 検証は安全クリティカルシステムの形式解析において重要な役割を果たす。現在の検証手法の多くは、ディープニューラルネットワーク(DNN)に取り組む際に、特定の要件を持っている。それらは、例えばfeedforward neural networks(fnn)のような特定のネットワークカテゴリや、特定のアクティベーション機能を持つネットワーク、例えばrdluをターゲットにしている。本稿では、DeepAgnと呼ばれるモデルに依存しない検証フレームワークを開発し、FNN、リカレントニューラルネットワーク(RNN)、あるいは両者の混合に適用可能であることを示す。リプシッツ連続性の仮定の下で、DeepAgnは、グローバル収束を保証する新しい最適化スキームに基づいて、DNNの到達可能性を分析する。レイヤやパラメータといったネットワークの内部構造にアクセスする必要はない。到達可能性解析により、DeepAgnは与えられた入力に対する最大安全半径を計算し、接地的真逆の例を生成するなど、よく知られた堅牢性問題に取り組むことができる。我々はまた、最先端の検証アプローチよりも、非常に深い層と数百万のニューロンを持つFNNとRNNを含む、より広いレベルのディープニューラルネットワークを扱うDeepAgnの優れた能力と効率を実証的に示す。

Verification plays an essential role in the formal analysis of safety-critical systems. Most current verification methods have specific requirements when working on Deep Neural Networks (DNNs). They either target one particular network category, e.g., Feedforward Neural Networks (FNNs), or networks with specific activation functions, e.g., RdLU. In this paper, we develop a model-agnostic verification framework, called DeepAgn, and show that it can be applied to FNNs, Recurrent Neural Networks (RNNs), or a mixture of both. Under the assumption of Lipschitz continuity, DeepAgn analyses the reachability of DNNs based on a novel optimisation scheme with a global convergence guarantee. It does not require access to the network's internal structures, such as layers and parameters. Through reachability analysis, DeepAgn can tackle several well-known robustness problems, including computing the maximum safe radius for a given input, and generating the ground-truth adversarial examples. We also empirically demonstrate DeepAgn's superior capability and efficiency in handling a broader class of deep neural networks, including both FNNs, and RNNs with very deep layers and millions of neurons, than other state-of-the-art verification approaches.

翻訳日:2023-04-04 16:07:24 公開日:2023-04-03

# メソスコピックキャビティ-QEDシステムにおける深い光・物質相互作用の非摂動効果

Non-perturbative effects of deep-strong light-matter interaction in a mesoscopic cavity-QED system ( http://arxiv.org/abs/2304.00805v1 )

ライセンス: Link先を確認

Andrey Kudlis, Denis Novokreschenov, Ivan Iorsh, Ilya Tokatly

(参考訳) 量子ダイマーの2つの群を共通の電磁空洞に配置し、その群のいずれかに静的外部電位を選択的に印加することにより制御するシステムを考える。真空電磁ゆらぎへの強い結合の過程において、二量体間の創発的な光子アシスト相互作用は、第2群に適用されるポテンシャルに対する第1の偏りのない二量体群の強い非線形量子化クロスポーラライゼーション応答をもたらすことを示す。全体分極は、数と位置が群内の二量体の数のパリティに依存するような、ほぼ理想的なステップの連続を示す。この非摂動効果は、有限個のダイマーからなるメソスコピック系の特徴的な特徴であり、一般化されたディッケモデルの予測によく用いられる熱力学的極限で消失する。

We consider a system comprising two groups of quantum dimers placed in a common electromagnetic cavity, and controlled by selectively applying a static external potential to one of the groups. We show that in the regime of deep strong coupling to vacuum electromagnetic fluctuations, the emergent photon-assisted interaction between the dimers leads to a strongly non-linear quantized cross-polarization response of the first, unbiased group of dimers to the potential applied to the second group. The total polarization shows a series of almost ideal steps whose number and position depends on the parity of the numbers of dimers in the groups. This non-perturbative effect is a distinctive feature of mesoscopic systems comprising finite number of dimers and disappears in the thermodynamic limit which is commonly used in the desciption of the generalized Dicke models.

翻訳日:2023-04-04 16:07:04 公開日:2023-04-03

# 強化学習入門

A Tutorial Introduction to Reinforcement Learning ( http://arxiv.org/abs/2304.00803v1 )

ライセンス: Link先を確認

Mathukumalli Vidyasagar

(参考訳) 本稿では,Stochastic Approximation(SA)を統一テーマとして,強化学習(RL)に関する簡単な調査を行う。論文の範囲はMarkov Reward Processes、Markov Decision Processes、Stochastic Approximation Algorithm、時間差分学習や$Q$-learningといった広く使われているアルゴリズムを含む。

In this paper, we present a brief survey of Reinforcement Learning (RL), with particular emphasis on Stochastic Approximation (SA) as a unifying theme. The scope of the paper includes Markov Reward Processes, Markov Decision Processes, Stochastic Approximation algorithms, and widely used algorithms such as Temporal Difference Learning and $Q$-learning.

翻訳日:2023-04-04 16:06:45 公開日:2023-04-03

# ソフトディッションによるノイズ画像分割

Noisy Image Segmentation With Soft-Dice ( http://arxiv.org/abs/2304.00801v1 )

ライセンス: Link先を確認

Marcus Nordstr\"om, Henrik Hult, Atsuto Maki, Fredrik L\"ofman

(参考訳) 本稿では,対象ラベルにノイズが存在する状況において,医用画像セグメンテーションにおいて最も一般的な損失関数であるソフトダイス損失について検討する。特に最適解の集合が特徴づけられ、これらの解の体積バイアスの鋭い境界が提供される。さらに, 最適ソフトディスに収束するソフトセグメンテーションのシーケンスは, しきい値化を用いてハードセグメンテーションに変換した場合, 最適ディスに収束することを示した。これは、ソフトディースの計量を最大化するためのプロキシとしてしばしば使用されるため、重要な結果である。最後に、理論結果の確認実験を行う。

This paper presents a study on the soft-Dice loss, one of the most popular loss functions in medical image segmentation, for situations where noise is present in target labels. In particular, the set of optimal solutions are characterized and sharp bounds on the volume bias of these solutions are provided. It is further shown that a sequence of soft segmentations converging to optimal soft-Dice also converges to optimal Dice when converted to hard segmentations using thresholding. This is an important result because soft-Dice is often used as a proxy for maximizing the Dice metric. Finally, experiments confirming the theoretical results are provided.

翻訳日:2023-04-04 16:06:37 公開日:2023-04-03

# マイクロ波量子ダイオード

Microwave quantum diode ( http://arxiv.org/abs/2304.00799v1 )

ライセンス: Link先を確認

Rishabh Upadhyay, Dmitry S. Golubev, Yu-Cheng Chang, George Thomas, Andrew Guthrie, Joonas T. Peltonen, and Jukka P. Pekola

(参考訳) 量子回路の脆弱な性質は、スケーラブルな量子アプリケーションにとって大きなボトルネックである。低温で動作する量子回路は、増幅バックアクションや外部ノイズに対して非常に脆弱である。この目的のために循環器やアイソレータなどの非逆マイクロ波デバイスが使用される。これらのデバイスは、量子回路のスケーラビリティを制限している。超伝導フラックス量子ビットの非線形性を利用した小型マイクロ波ダイオードアーキテクチャを提案する。 qubit縮退点において, 逆方向に伝達される電力レベルに有意な差があることを実験的に示す。観測結果は提案された理論モデルと一致している。入力電力は-99dBmで、また、クビット共振器近傍で交差領域を回避し、50MHzの広帯域帯域では6.81 GHzから6.86 GHz、250MHzでは6.67 GHzから6.91 GHzの伝送補正比が90%を超えることを報告した。提示されたアーキテクチャはコンパクトで、複数の読み出しチャネルに対して容易にスケーラブルであり、量子情報、マイクロ波読み出し、光メカニクスの多様な機会を開く可能性がある。

The fragile nature of quantum circuits is a major bottleneck to scalable quantum applications. Operating at cryogenic temperatures, quantum circuits are highly vulnerable to amplifier backaction and external noise. Non-reciprocal microwave devices such as circulators and isolators are used for this purpose. These devices have a considerable footprint in cryostats, limiting the scalability of quantum circuits. We present a compact microwave diode architecture, which exploits the non-linearity of a superconducting flux qubit. At the qubit degeneracy point we experimentally demonstrate a significant difference between the power levels transmitted in opposite directions. The observations align with the proposed theoretical model. At -99 dBm input power, and near the qubit-resonator avoided crossing region, we report the transmission rectification ratio exceeding 90% for a 50 MHz wide frequency range from 6.81 GHz to 6.86 GHz, and over 60% for the 250 MHz range from 6.67 GHz to 6.91 GHz. The presented architecture is compact, and easily scalable towards multiple readout channels, potentially opening up diverse opportunities in quantum information, microwave read-out and optomechanics.

翻訳日:2023-04-04 16:06:28 公開日:2023-04-03

# FinnWoodlands データセット

FinnWoodlands Dataset ( http://arxiv.org/abs/2304.00793v1 )

ライセンス: Link先を確認

Juan Lagos, Urho Lempi\"o and Esa Rahtu

(参考訳) 大規模で多様なデータセットが利用可能になったことは、自動運転や屋内アプリケーションにおいて大きなブレークスルーをもたらしたが、林業アプリケーションはまだ遅れており、新しい森林データセットは、森林のようなシナリオのためのデータ駆動手法の開発において大きな進歩をもたらすだろう。本稿では, RGBステレオ画像, 点雲, スパース深度マップ, および意味, 例, 汎視的セグメンテーションのための接地真理手動アノテーションからなる森林データセット「textit{FinnWoodlands}」を紹介する。 \textit{FinnWoodlands} は、4226のオブジェクトを手動で注釈付けし、そのうち2562のオブジェクト (60.6\%) は、"Spruce Tree"、"Birch Tree"、"Pine Tree"の3つの異なるインスタンスカテゴリに分類されるツリートランクに対応している。ツリートランクの他に、インスタンスとして"Obstacles"オブジェクトや、"Lake"、"Ground"、"Track"といったセマンティックなクラスも注釈付けしました。私たちのデータセットは、環境の全体的表現が関連する森林アプリケーションで使用できます。ケースセグメンテーション、パン光学セグメンテーション、深さ補完の3つのモデルを用いた初期ベンチマークを行い、そのような非構造化シナリオがもたらす課題を説明する。

While the availability of large and diverse datasets has contributed to significant breakthroughs in autonomous driving and indoor applications, forestry applications are still lagging behind and new forest datasets would most certainly contribute to achieving significant progress in the development of data-driven methods for forest-like scenarios. This paper introduces a forest dataset called \textit{FinnWoodlands}, which consists of RGB stereo images, point clouds, and sparse depth maps, as well as ground truth manual annotations for semantic, instance, and panoptic segmentation. \textit{FinnWoodlands} comprises a total of 4226 objects manually annotated, out of which 2562 objects (60.6\%) correspond to tree trunks classified into three different instance categories, namely "Spruce Tree", "Birch Tree", and "Pine Tree". Besides tree trunks, we also annotated "Obstacles" objects as instances as well as the semantic stuff classes "Lake", "Ground", and "Track". Our dataset can be used in forestry applications where a holistic representation of the environment is relevant. We provide an initial benchmark using three models for instance segmentation, panoptic segmentation, and depth completion, and illustrate the challenges that such unstructured scenarios introduce.

翻訳日:2023-04-04 16:06:10 公開日:2023-04-03

# GreekBART:最初の事前訓練されたギリシャのシークエンス・ツー・シークエンスモデル

GreekBART: The First Pretrained Greek Sequence-to-Sequence Model ( http://arxiv.org/abs/2304.00869v1 )

ライセンス: Link先を確認

Iakovos Evdaimon, Hadi Abdine, Christos Xypolopoulos, Stamatis Outsios, Michalis Vazirgiannis, Giorgos Stamou

(参考訳) 転校学習の時代は、コンピュータビジョンと自然言語処理の分野に革命をもたらし、様々なタスクにまたがる優れた事前学習モデルをもたらした。具体的には、自然言語処理タスクはトランスフォーマーベースの言語モデルによって支配されている。自然言語推論および自然言語生成タスクでは、BERTモデルとその変種は、GPTモデルとその後継と同様に、模範的な性能を示した。しかし、これらのモデルのほとんどは事前訓練され、主に英語や多言語コーパスで評価される。本稿では,bartベースアーキテクチャに基づいた最初のseq2seqモデルであるギリシャバルトを紹介し,大規模ギリシアコーパスで事前学習する。我々は,BART-random, Greek-BERT, XLM-Rを様々な識別課題で評価し,比較した。さらに,新たに導入されたギリシャ語用要約データセットである greeksum の 2 つの nlg タスクにおける性能について検討した。モデル、コード、新しい要約データセットが公開される予定だ。

The era of transfer learning has revolutionized the fields of Computer Vision and Natural Language Processing, bringing powerful pretrained models with exceptional performance across a variety of tasks. Specifically, Natural Language Processing tasks have been dominated by transformer-based language models. In Natural Language Inference and Natural Language Generation tasks, the BERT model and its variants, as well as the GPT model and its successors, demonstrated exemplary performance. However, the majority of these models are pretrained and assessed primarily for the English language or on a multilingual corpus. In this paper, we introduce GreekBART, the first Seq2Seq model based on BART-base architecture and pretrained on a large-scale Greek corpus. We evaluate and compare GreekBART against BART-random, Greek-BERT, and XLM-R on a variety of discriminative tasks. In addition, we examine its performance on two NLG tasks from GreekSUM, a newly introduced summarization dataset for the Greek language. The model, the code, and the new summarization dataset will be publicly available.

翻訳日:2023-04-04 16:00:50 公開日:2023-04-03

# より高いチャーン数を持つランダウレベルとそのアナログの特異性

Uniqueness of Landau levels and their analogs with higher Chern numbers ( http://arxiv.org/abs/2304.00866v1 )

ライセンス: Link先を確認

Bruno Mera, Tomoki Ozawa

(参考訳) 最も低いランダウレベルの波動関数は、一様磁場の下で2次元の荷電粒子のハミルトニアンの固有状態である。それらは実数空間と運動量空間の両方で正則であることが知られ、また運動量空間において一様で変換不変な幾何学的性質を示す。本稿ではストーン・ヴォン・ノイマンの定理を用いて、最低ランダウレベルの波動関数が、これらの条件を満たす単位チャーン数を持つ唯一の可能な状態であることを示す。また,高いチャーン数を持つ直接アナログの特異性を証明し,それらの表現を提供する。

Lowest Landau level wavefunctions are eigenstates of the Hamiltonian of a charged particle in two dimensions under a uniform magnetic field. They are known to be holomorphic both in real and momentum spaces, and also exhibit uniform, translationally invariant, geometrical properties in momentum space. In this paper, using the Stone-von Neumann theorem, we show that lowest Landau level wavefunctions are indeed the only possible states with unit Chern number satisfying these conditions. We also prove the uniqueness of their direct analogs with higher Chern numbers and provide their expressions.

翻訳日:2023-04-04 16:00:32 公開日:2023-04-03

# コヒーレント制御と非コヒーレント制御による2量子ビットシステムの最適状態操作

Optimal State Manipulation for a Two-Qubit System Driven by Coherent and Incoherent Controls ( http://arxiv.org/abs/2304.00863v1 )

ライセンス: Link先を確認

Oleg Morzhin, Alexander Pechen

(参考訳) 2量子ビット量子系の最適制御は、2量子ビットゲート生成からスピン鎖に沿ってコヒーレンス行列を転送する受信機の最適化に至るまでの応用により、高い関心を集めている。国家の準備と操作は、そのようなシステムを研究する上で重要な課題である。通常、レーザーパルスのようなコヒーレント制御は、2量子ビットシステムを操作するために用いられる。しかし、環境は一貫性のない制御リソースとして$\unicode{x2013}$を使うこともできる。本稿では、ゴリニ-コサコフスキー-スダルシャン-リンドブラッドマスター方程式により力学が支配される2量子系の最適状態操作について考察する。 2つの物理的に異なる相互作用クラスとコヒーレント制御を活用し、最終密度行列と目標密度行列の間のヒルベルト・シュミット重なりを最適化し、与えられた値へのステアリングの最適化を含む。我々は、コヒーレントかつ非コヒーレントな制御がポントリャーギンの最大原理を満たすときの条件と、それらが目的函数の定常点を形成するときの条件を見出す。さらに、この定常点がオーバーラップのグローバルに最小の値を与える場合を見出す。重ね合わせに上界と下界を用い,機能制御を併用した1段階および2段階の勾配投影法を開発した。

Optimal control of two-qubit quantum systems attracts high interest due to applications ranging from two-qubit gate generation to optimization of receiver for transferring coherence matrices along spin chains. State preparation and manipulation is among important tasks to study for such systems. Typically coherent control, e.g. a shaped laser pulse, is used to manipulate two-qubit systems. However, the environment can also be used $\unicode{x2013}$ as an incoherent control resource. In this article, we consider optimal state manipulation for a two-qubit system whose dynamics is governed by the Gorini-Kossakowski-Sudarshan-Lindblad master equation, where coherent control enters into the Hamiltonian and incoherent control into both the Hamiltonian (via Lamb shift) and the superoperator of dissipation. We exploit two physically different classes of interaction with coherent control and optimize the Hilbert-Schmidt overlap between final and target density matrices, including optimization of its steering to a given value. We find the conditions when zero coherent and incoherent controls satisfy the Pontryagin maximum principle, and in addition, when they form a stationary point of the objective functional. Moreover, we find a case when this stationary point provides the globally minimal value of the overlap. Using upper and lower bounds for the overlap, we develop one- and two-step gradient projection methods operating with functional controls.

翻訳日:2023-04-04 16:00:21 公開日:2023-04-03

# 音声感情認識システムの設計と評価:IEMOCAPを用いた実環境チェックケーススタディ

Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAP ( http://arxiv.org/abs/2304.00860v1 )

ライセンス: Link先を確認

Nikolaos Antoniou and Athanasios Katsamanis and Theodoros Giannakopoulos and Shrikanth Narayanan

(参考訳) 音声感情認識(SER)の直接的かつ公平な比較を可能にするためのガイドラインと標準テストセットがすぐに必要となる。 Interactive Emotional Dyadic Motion Capture (IEMOCAP) データベースのようなリソースは、研究者がSERのモデルを開発し、テストするために広く採用されている参照コーパスとして現れてきたが、論文は再現性と一般化に挑戦するその用途において、幅広い仮定と多様性を明らかにしている。 IEMOCAPをユースケースとして用いたSERの最近の進歩に対する批判的なレビューに基づいて、我々の研究は2つのコントリビューションを目指している。第2に,オープンソース実装を用いた最近の出版物では,serの再現性評価に重点を置いている。

There is an imminent need for guidelines and standard test sets to allow direct and fair comparisons of speech emotion recognition (SER). While resources, such as the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database, have emerged as widely-adopted reference corpora for researchers to develop and test models for SER, published work reveals a wide range of assumptions and variety in its use that challenge reproducibility and generalization. Based on a critical review of the latest advances in SER using IEMOCAP as the use case, our work aims at two contributions: First, using an analysis of the recent literature, including assumptions made and metrics used therein, we provide a set of SER evaluation guidelines. Second, using recent publications with open-sourced implementations, we focus on reproducibility assessment in SER.

翻訳日:2023-04-04 15:59:56 公開日:2023-04-03

# 自己教師型骨格に基づく行動認識のためのFocalized Contrastive View-invariant Learning

Focalized Contrastive View-invariant Learning for Self-supervised Skeleton-based Action Recognition ( http://arxiv.org/abs/2304.00858v1 )

ライセンス: Link先を確認

Qianhui Men, Edmond S. L. Ho, Hubert P. H. Shum, Howard Leung

(参考訳) ビュー不変表現の学習は,骨格に基づく行動認識における特徴識別能力の向上の鍵となる。既存のアプローチでは、暗黙のビュー依存表現による視点の影響を効果的に排除することはできない。本研究では,視点が粗い表現空間における視点固有情報を著しく抑圧する,focalized contrastive view-invariant learning(focovil)と呼ばれる自己教師付きフレームワークを提案する。多視点サンプルペア間の効果的なコントラスト損失で相互情報を最大化することにより、FoCoViLはアクションを共通のビュー不変性に関連付け、異種情報を同時に分離する。さらに,ペアワイズ類似性に基づく適応焦点化法を提案し,学習空間におけるよりクリアなクラスタ境界に対するコントラスト学習を強化する。教師付き分類器に大きく依存する既存の自己教師付き表現学習作業とは異なり、FoCoViLは教師なし分類器と教師なし分類器の両方で優れた認識性能を持つ。広範な実験により、コントラストベースの焦点化がより識別的な潜在表現を生成することも示されている。

Learning view-invariant representation is a key to improving feature discrimination power for skeleton-based action recognition. Existing approaches cannot effectively remove the impact of viewpoint due to the implicit view-dependent representations. In this work, we propose a self-supervised framework called Focalized Contrastive View-invariant Learning (FoCoViL), which significantly suppresses the view-specific information on the representation space where the viewpoints are coarsely aligned. By maximizing mutual information with an effective contrastive loss between multi-view sample pairs, FoCoViL associates actions with common view-invariant properties and simultaneously separates the dissimilar ones. We further propose an adaptive focalization method based on pairwise similarity to enhance contrastive learning for a clearer cluster boundary in the learned space. Different from many existing self-supervised representation learning work that rely heavily on supervised classifiers, FoCoViL performs well on both unsupervised and supervised classifiers with superior recognition performance. Extensive experiments also show that the proposed contrastive-based focalization generates a more discriminative latent representation.

翻訳日:2023-04-04 15:59:32 公開日:2023-04-03

# ハイパースペクトル画像復調用スペクトル強調矩形変換器

Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising ( http://arxiv.org/abs/2304.00844v1 )

ライセンス: Link先を確認

Miaoyu Li, Ji Liu, Ying Fu, Yulun Zhang and Dejing Dou

(参考訳) ノイズ除去は、ハイパースペクトル画像(hsi)アプリケーションにとって重要なステップである。深層学習の強大な力を目撃する一方で、既存のhsi分類法は、非局所的自己相似性を捉える上での限界に苦しむ。トランスフォーマーは長距離依存を捕捉する可能性を示しているが、HSIの空間的およびスペクトル的相関をモデル化するために特別に設計されたトランスフォーマーを用いた試みはほとんど行われていない。本稿では,スペクトル拡張矩形変圧器を提案し,hsisの非局所空間的類似性と大域的スペクトル低ランク性について検討する。前者については,空間領域における非局所的類似性を捉えるために,直方体自己付着を水平および垂直に活用する。後者のために,空間スペクトル立方体の大域的低ランク特性を抽出し,重なり合わない空間長方形間の相互作用を可能とし,雑音を抑制するスペクトル拡張モジュールを設計する。合成ノイズHSIと実雑音HSIを併用して広汎な実験を行い,本手法の有効性を客観的な計測値と主観的視覚的品質の両方の観点から示した。コードはhttps://github.com/myuli/sertで入手できる。

Denoising is a crucial step for hyperspectral image (HSI) applications. Though witnessing the great power of deep learning, existing HSI denoising methods suffer from limitations in capturing the non-local self-similarity. Transformers have shown potential in capturing long-range dependencies, but few attempts have been made with specifically designed Transformer to model the spatial and spectral correlation in HSIs. In this paper, we address these issues by proposing a spectral enhanced rectangle Transformer, driving it to explore the non-local spatial similarity and global spectral low-rank property of HSIs. For the former, we exploit the rectangle self-attention horizontally and vertically to capture the non-local similarity in the spatial domain. For the latter, we design a spectral enhancement module that is capable of extracting global underlying low-rank property of spatial-spectral cubes to suppress noise, while enabling the interactions among non-overlapping spatial rectangles. Extensive experiments have been conducted on both synthetic noisy HSIs and real noisy HSIs, showing the effectiveness of our proposed method in terms of both objective metric and subjective visual quality. The code is available at https://github.com/MyuLi/SERT.

翻訳日:2023-04-04 15:59:02 公開日:2023-04-03

# MetaHead: リアルなデジタルヘッドを作るためのエンジン

MetaHead: An Engine to Create Realistic Digital Head ( http://arxiv.org/abs/2304.00838v1 )

ライセンス: Link先を確認

Dingyun Zhang, Chenglai Zhong, Yudong Guo, Yang Hong, Juyong Zhang

(参考訳) トレーニングデータの収集とラベル付けは、学習ベースの手法にとって重要なステップである。顔分析タスクでは、顔データを生成するためにいくつかの生成モデルを使用することができるが、生成の多様性、再現精度、立体整合性、高忠実度視覚的品質、編集容易性のサブセットしか達成できない。近年、グラフィックベースの生成手法が研究されているが、計算コストの高い低リアリズムヘッドしかレンダリングできない。本稿では,制御可能な頭部放射場(metahead-f)と,表示に一貫性のある3d制御可能なデジタルヘッドと,所定のカスタマイズ可能な特徴ラベルに準拠したデジタルヘッドを生成する汎用的トップダウン画像生成フレームワーク labelheadとからなる,統一的でフル機能の制御可能なデジタルヘッドエンジンであるmetaheadを提案する。制御可能なディジタルヘッドエンジンは、最先端の視覚的品質と再現精度を実現する。さらに、生成されたラベル付きデータは、実際のトレーニングデータを支援し、トレーニング効果の観点からグラフィックベースの手法によって生成されたラベル付きデータを著しく上回ることができる。

Collecting and labeling training data is one important step for learning-based methods because the process is time-consuming and biased. For face analysis tasks, although some generative models can be used to generate face data, they can only achieve a subset of generation diversity, reconstruction accuracy, 3D consistency, high-fidelity visual quality, and easy editability. One recent related work is the graphics-based generative method, but it can only render low realism head with high computation cost. In this paper, we propose MetaHead, a unified and full-featured controllable digital head engine, which consists of a controllable head radiance field(MetaHead-F) to super-realistically generate or reconstruct view-consistent 3D controllable digital heads and a generic top-down image generation framework LabelHead to generate digital heads consistent with the given customizable feature labels. Experiments validate that our controllable digital head engine achieves the state-of-the-art generation visual quality and reconstruction accuracy. Moreover, the generated labeled data can assist real training data and significantly surpass the labeled data generated by graphics-based methods in terms of training effect.

翻訳日:2023-04-04 15:58:31 公開日:2023-04-03

# 障害不変性ニューラル表現

Disorder-invariant Implicit Neural Representation ( http://arxiv.org/abs/2304.00837v1 )

ライセンス: Link先を確認

Hao Zhu, Shaowen Xie, Zhen Liu, Fengyi Liu, Qi Zhang, You Zhou, Yi Lin, Zhan Ma, Xun Cao

(参考訳) 入射神経表現(INR)は、信号の属性を対応する座標の関数として特徴づけ、逆問題を解決するための鋭い武器として現れる。しかし、INRの表現力は、ネットワークトレーニングにおけるスペクトルバイアスによって制限される。本稿では,入力信号の座標を再配置することにより,従来のinrバックボーンにハッシュテーブルを付加することで,そのような周波数関連問題を大幅に解決できることを示す。同じ属性のヒストグラムと異なる配置順序を共有する離散的な信号が与えられると、ハッシュテーブルは座標を後のinrネットワークを用いてより良くモデル化できる同じ分布に投影し、スペクトルバイアスを大幅に軽減することができる。さらに、DINERの表現力は、ハッシュテーブルの幅によって決定される。異なる幅は属性空間の異なる幾何学的要素に対応する: \textit{e.e.}, 1d curve, 2d curve-plane, 3d curve-volume それぞれ1ドル、2ドル、3ドルである。幾何学的要素のより広い領域はより強い表現力をもたらす。実験では、異なるINRバックボーン(MLP vs. SIREN)と様々なタスク(画像/ビデオ表現、位相検索、屈折率回復、神経放射場最適化)に対するDINERの一般化だけでなく、品質と速度の両方において最先端のアルゴリズムよりも優れていることを示す。 \textit{Project page:} \url{https://ezio77.github.io/DINER-website/}

Implicit neural representation (INR) characterizes the attributes of a signal as a function of corresponding coordinates which emerges as a sharp weapon for solving inverse problems. However, the expressive power of INR is limited by the spectral bias in the network training. In this paper, we find that such a frequency-related problem could be greatly solved by re-arranging the coordinates of the input signal, for which we propose the disorder-invariant implicit neural representation (DINER) by augmenting a hash-table to a traditional INR backbone. Given discrete signals sharing the same histogram of attributes and different arrangement orders, the hash-table could project the coordinates into the same distribution for which the mapped signal can be better modeled using the subsequent INR network, leading to significantly alleviated spectral bias. Furthermore, the expressive power of the DINER is determined by the width of the hash-table. Different width corresponds to different geometrical elements in the attribute space, \textit{e.g.}, 1D curve, 2D curved-plane and 3D curved-volume when the width is set as $1$, $2$ and $3$, respectively. More covered areas of the geometrical elements result in stronger expressive power. Experiments not only reveal the generalization of the DINER for different INR backbones (MLP vs. SIREN) and various tasks (image/video representation, phase retrieval, refractive index recovery, and neural radiance field optimization) but also show the superiority over the state-of-the-art algorithms both in quality and speed. \textit{Project page:} \url{https://ezio77.github.io/DINER-website/}

翻訳日:2023-04-04 15:58:11 公開日:2023-04-03

# AUDIT:潜時拡散モデルによる指示の追従による音声編集

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models ( http://arxiv.org/abs/2304.00830v1 )

ライセンス: Link先を確認

Yuancheng Wang, Zeqian Ju, Xu Tan, Lei He, Zhizheng Wu, Jiang Bian, Sheng Zhao

(参考訳) オーディオ編集は、背景の音響効果の追加、楽器の交換、損傷したオーディオの修復など、様々な目的に適用できる。近年,出力音声のテキスト記述を条件とした拡散雑音処理により,ゼロショット音声編集を実現する手法が提案されている。しかし、これらの方法にはまだいくつか問題がある。 1) 編集作業の訓練を受けておらず,良好な編集効果を確保できない。 2) 編集を必要としないオーディオセグメントを誤って変更することができる。 3) 出力音声の完全な記述が必要であり、実用シナリオでは必ずしも利用可能あるいは必要ではない。本研究では,遅延拡散モデルに基づく命令誘導音声編集モデルであるAUDITを提案する。具体的には、AUDITには3つの主要な設計特徴がある。 1)異なるオーディオ編集タスクのためのトリプルトトレーニングデータ(インストラクション、入力オーディオ、出力オーディオ)を構築し、命令および入力(編集対象)オーディオを条件として、出力(編集済み)オーディオを生成する拡散モデルを訓練する。 2) 入力音声と出力音声の違いを比較することにより,編集が必要なセグメントのみを自動で変更することを学ぶことができる。 3) テキスト入力として完全なターゲットオーディオ記述ではなく,編集命令のみを必要とする。 AUDITは、いくつかのオーディオ編集タスク(例えば、追加、ドロップ、置換、塗り替え、超解像)の客観的および主観的なメトリクスで最先端の結果を達成する。デモサンプルはhttps://audit-demo.github.io/で入手できる。

Audio editing is applicable for various purposes, such as adding background sound effects, replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-based methods achieved zero-shot audio editing by using a diffusion and denoising process conditioned on the text description of the output audio. However, these methods still have some problems: 1) they have not been trained on editing tasks and cannot ensure good editing effects; 2) they can erroneously modify audio segments that do not require editing; 3) they need a complete description of the output audio, which is not always available or necessary in practical scenarios. In this work, we propose AUDIT, an instruction-guided audio editing model based on latent diffusion models. Specifically, AUDIT has three main design features: 1) we construct triplet training data (instruction, input audio, output audio) for different audio editing tasks and train a diffusion model using instruction and input (to be edited) audio as conditions and generating output (edited) audio; 2) it can automatically learn to only modify segments that need to be edited by comparing the difference between the input and output audio; 3) it only needs edit instructions instead of full target audio descriptions as text input. AUDIT achieves state-of-the-art results in both objective and subjective metrics for several audio editing tasks (e.g., adding, dropping, replacement, inpainting, super-resolution). Demo samples are available at https://audit-demo.github.io/.

翻訳日:2023-04-04 15:57:41 公開日:2023-04-03

# 多品位情報融合によるソーシャルメディア上のマルチモーダルフェイクニュース検出

Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion ( http://arxiv.org/abs/2304.00827v1 )

ライセンス: Link先を確認

Yangming Zhou, Yuzhou Yang, Qichao Ying, Zhenxing Qian and Xinpeng Zhang

(参考訳) ソーシャルメディア上でのマルチメディアコンテンツの共有が容易になったことで、フェイクニュースが急速に拡散し、社会の安定と安全を脅かしている。そのため、偽ニュース検出は社会科学の分野で幅広い研究の関心を集めている。現在の手法は主にテキストと視覚的特徴の統合に集中しているが、細粒度と粗粒度の両方で効果的にマルチモーダル情報を活用できない。さらに、モダリティ間の相関の欠如や、各モダリティによってなされた決定の矛盾により、曖昧な問題に苦しむ。これらの課題を克服するため,偽ニュース検出のためのMMFN(Multi-fine Multi-modal Fusion Network)を提案する。ニュースの真正性を評価する多面的プロセスに着想を得て,テキストと画像からトークンレベルの特徴を符号化するために,トランスフォーマティブをベースとする2つの事前学習モデルを用いた。マルチモーダルモジュールは、CLIPエンコーダでエンコードされた粗い機能を考慮して、きめ細かい機能をフューズする。あいまいさ問題に対処するため、類似度に基づく重み付けによる一様分岐を設計し、マルチモーダル特徴の利用を適応的に調整する。実験の結果,提案手法は3つの有意なデータセット上で,最先端の手法よりも優れていた。

The easy sharing of multimedia content on social media has caused a rapid dissemination of fake news, which threatens society's stability and security. Therefore, fake news detection has garnered extensive research interest in the field of social forensics. Current methods primarily concentrate on the integration of textual and visual features but fail to effectively exploit multi-modal information at both fine-grained and coarse-grained levels. Furthermore, they suffer from an ambiguity problem due to a lack of correlation between modalities or a contradiction between the decisions made by each modality. To overcome these challenges, we present a Multi-grained Multi-modal Fusion Network (MMFN) for fake news detection. Inspired by the multi-grained process of human assessment of news authenticity, we respectively employ two Transformer-based pre-trained models to encode token-level features from text and images. The multi-modal module fuses fine-grained features, taking into account coarse-grained features encoded by the CLIP encoder. To address the ambiguity problem, we design uni-modal branches with similarity-based weighting to adaptively adjust the use of multi-modal features. Experimental results demonstrate that the proposed framework outperforms state-of-the-art methods on three prevalent datasets.

翻訳日:2023-04-04 15:57:17 公開日:2023-04-03

# lahm : multi-domain and multilingual hate speech identificationのための大規模注釈付きデータセット

LAHM : Large Annotated Dataset for Multi-Domain and Multilingual Hate Speech Identification ( http://arxiv.org/abs/2304.00913v1 )

ライセンス: Link先を確認

Ankit Yadav, Shubham Chandel, Sushant Chatufale and Anil Bandhakavi

(参考訳) ヘイトスピーチ分析に関する現在の研究は、典型的には単言語および単一分類タスクに向けられている。本稿では、英語、ヒンディー語、アラビア語、フランス語、ドイツ語、スペイン語の多言語用ヘイトスピーチ分析データセットについて、ヘイトスピーチにおける虐待、人種差別、性差別、宗教的なヘイト、過激主義といった複数のドメインについて述べる。本論文は,この6つの言語において,これら5つの広い領域において,様々なタイプのヘイトスピーチを識別する問題を最初に解決した。本稿では、データセットの作成方法を説明し、異なるドメインに対して高レベルかつ低レベルなアノテーションを作成し、現在の最先端のマルチ言語およびマルチタスク学習アプローチをテストする方法について説明する。様々なモノリンガル、クロスリンガル、マシン翻訳の分類設定でデータセットを評価し、このタスクのために集約してマージしたオープンソースの英語データセットと比較します。次に,このアプローチを大規模ヘイトスピーチデータセットの作成に活用し,ヘイトスピーチ検出と分類全般を改善するためにアノテーションを活用する方法について論じる。

Current research on hate speech analysis is typically oriented towards monolingual and single classification tasks. In this paper, we present a new multilingual hate speech analysis dataset for English, Hindi, Arabic, French, German and Spanish languages for multiple domains across hate speech - Abuse, Racism, Sexism, Religious Hate and Extremism. To the best of our knowledge, this paper is the first to address the problem of identifying various types of hate speech in these five wide domains in these six languages. In this work, we describe how we created the dataset, created annotations at high level and low level for different domains and how we use it to test the current state-of-the-art multilingual and multitask learning approaches. We evaluate our dataset in various monolingual, cross-lingual and machine translation classification settings and compare it against open source English datasets that we aggregated and merged for this task. Then we discuss how this approach can be used to create large scale hate-speech datasets and how to leverage our annotations in order to improve hate speech detection and classification in general.

翻訳日:2023-04-04 15:50:36 公開日:2023-04-03

# laplace-fpinns: laplace-based fractional physics-informed neural networks for solve forward and inverse problems of subdiffusion

Laplace-fPINNs: Laplace-based fractional physics-informed neural networks for solving forward and inverse problems of subdiffusion ( http://arxiv.org/abs/2304.00909v1 )

ライセンス: Link先を確認

Xiong-Bin Yan and Zhi-Qin John Xu and Zheng Ma

(参考訳) 物理インフォームドニューラルネットワーク(PINN)の使用は、分数拡散方程式の前方および逆問題の解法において有望であることを示している。しかし、分数微分には自動微分が適用できないため、PINNを用いた分数拡散方程式の解法はさらなる課題に対処する必要がある。この問題に対処するため,本論文ではラプラス型分数物理学インフォームドニューラルネットワーク (laplace-fpinns) と呼ばれるピンの拡張を提案する。このアプローチは補助点の質量の導入を回避し、損失関数を単純化する。いくつかの例を用いてLaplace-fPINNsアプローチの有効性を検証する。その結果,ラプラス-fpinns法は高次元分数拡散方程式の前方および逆問題の両方を効果的に解くことができることがわかった。

The use of Physics-informed neural networks (PINNs) has shown promise in solving forward and inverse problems of fractional diffusion equations. However, due to the fact that automatic differentiation is not applicable for fractional derivatives, solving fractional diffusion equations using PINNs requires addressing additional challenges. To address this issue, this paper proposes an extension to PINNs called Laplace-based fractional physics-informed neural networks (Laplace-fPINNs), which can effectively solve the forward and inverse problems of fractional diffusion equations. This approach avoids introducing a mass of auxiliary points and simplifies the loss function. We validate the effectiveness of the Laplace-fPINNs approach using several examples. Our numerical results demonstrate that the Laplace-fPINNs method can effectively solve both the forward and inverse problems of high-dimensional fractional diffusion equations.

翻訳日:2023-04-04 15:50:17 公開日:2023-04-03

# ScandEval: スカンジナビアの自然言語処理ベンチマーク

ScandEval: A Benchmark for Scandinavian Natural Language Processing ( http://arxiv.org/abs/2304.00906v1 )

ライセンス: Link先を確認

Dan Saattrup Nielsen

(参考訳) 本稿では,スカンジナビア言語の4つの異なるタスクに対して事前学習されたモデルをベンチマークできる,スカンジナビアのベンチマークプラットフォームであるscandevalを紹介する。言語受容性と質問応答性という2つのタスクで使用されるデータセットは新しいものである。我々は,Hugging Face Hubにアップロードされたモデルを,再現可能な結果でベンチマークすることができるPythonパッケージとコマンドラインインターフェースであるScandevalを開発し,リリースする。このパッケージを使って100以上のスカンジナビア語または多言語モデルのベンチマークを行い、それらの結果をインタラクティブなオンラインリーダーボードに提示し、結果の分析を提供する。この分析は、スカンディナヴィア語族(デンマーク語、スウェーデン語、ノルウェー語)とインスキュラ・スカンディナヴィア語族(デンマーク語、スウェーデン語、ノルウェー語)の間でかなりの言語間移動が存在することを示している。ベンチマークの結果は、ノルウェー、スウェーデン、デンマークにおける言語技術への投資が、XLM-RoBERTaやmDeBERTaV3のような多言語モデルよりも優れた言語モデルを生み出したことを示している。パッケージとリーダーボードの両方のソースコードをリリースします。

This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained model on four different tasks in the Scandinavian languages. The datasets used in two of the tasks, linguistic acceptability and question answering, are new. We develop and release a Python package and command-line interface, scandeval, which can benchmark any model that has been uploaded to the Hugging Face Hub, with reproducible results. Using this package, we benchmark more than 100 Scandinavian or multilingual models and present the results of these in an interactive online leaderboard, as well as provide an analysis of the results. The analysis shows that there is substantial cross-lingual transfer among the Mainland Scandinavian languages (Danish, Swedish and Norwegian), with limited cross-lingual transfer between the group of Mainland Scandinavian languages and the group of Insular Scandinavian languages (Icelandic and Faroese). The benchmarking results also show that the investment in language technology in Norway, Sweden and Denmark has led to language models that outperform massively multilingual models such as XLM-RoBERTa and mDeBERTaV3. We release the source code for both the package and leaderboard.

翻訳日:2023-04-04 15:50:00 公開日:2023-04-03

# 学校における適応型学習プラットフォームの導入:教師のエンゲージメントに影響を与える要因を明らかにする

Adoption of Adaptive Learning Platforms in Schools: Unveiling Factors Influencing Teachers Engagement ( http://arxiv.org/abs/2304.00903v1 )

ライセンス: Link先を確認

Mutlu Cukurova, Xin Miao, Richard Brooker

(参考訳) AIベースの適応学習プラットフォームの影響に関する証拠は存在するが、彼らの学校における大規模採用は、せいぜい遅い。さらに、学校で採用されるAIツールは、常に研究コミュニティで検討され研究されている製品であるとは限らない。そのため、採用に影響を与える要因の特定や、これらの要因が適応型学習プラットフォームへの教師の関与を予測できる程度に研究が進められている。そこで我々は,教師が学校における適応型学習プラットフォームを採用する上で,より包括的要因を測定するための信頼性の高い尺度を開発した。さらに,学校教師(n=792)を大国人からサンプリングし,このデータを用いて,学校における適応学習プラットフォームとの現実的な関わりを予測した。以上の結果から,教師の知識,信頼度,製品品質がすべて重要な要因であるにもかかわらず,教師が学校におけるaiプラットフォームと関わる上で最も重要な要因であるとは限らない。追加の作業負荷、教師の所有と信頼の増大、支援のメカニズムの生成、倫理的問題が最小化されていることを保証することは、学校でAIを採用する上でも不可欠であり、プラットフォームへの教師の関与をより良く予測する可能性がある。本論文は, 予測モデルの変動率を増大させ, 実装変動を実際に減少させることにより, 適応学習プラットフォームの現実的普及と有効性を高める要因の価値について考察した。

Albeit existing evidence about the impact of AI-based adaptive learning platforms, their scaled adoption in schools is slow at best. In addition, AI tools adopted in schools may not always be the considered and studied re-search products of the research community. Therefore, there have been in-creasing concerns about identifying factors influencing adoption, and studying the extent to which these factors can be used to predict teachers engagement with adaptive learning platforms. To address this, we developed a reliable instrument to measure more holistic factors influencing teachers adoption of adaptive learning platforms in schools. In addition, we present the results of its implementation with school teachers (n=792) sampled from a large country-level population and use this data to predict teachers real-world engagement with the adaptive learning platform in schools. Our results show that although teachers knowledge, confidence and product quality are all important factors, they are not necessarily the only, may not even be the most important factors influencing the teachers engagement with AI platforms in schools. Not generating any additional workload, in-creasing teacher ownership and trust, generating support mechanisms for help, and assuring that ethical issues are minimised, are also essential for the adoption of AI in schools and may predict teachers engagement with the platform better. We conclude the paper with a discussion on the value of factors identified to increase the real-world adoption and effectiveness of adaptive learning platforms by increasing the dimensions of variability in prediction models and decreasing the implementation variability in practice.

翻訳日:2023-04-04 15:49:35 公開日:2023-04-03

# パラメトリックマルチロス最適化による可変畳み込み

Tunable Convolutions with Parametric Multi-Loss Optimization ( http://arxiv.org/abs/2304.00898v1 )

ライセンス: Link先を確認

Matteo Maggioni, Thomas Tanay, Francesca Babiloni, Steven McDonagh, Ale\v{s} Leonardis

(参考訳) ニューラルネットワークの振舞いは、トレーニング中に使用される特定の損失とデータによって不可分に決定される。しかしながら、ユーザの好みやデータの動的特性といった外部要因に基づいて、推論時にモデルをチューニングすることが望ましい場合が多い。これは、不適切な画像から画像への変換タスクの知覚歪曲トレードオフのバランスをとるために特に重要である。本研究では,多数の異なるカーネルを含むパラメトリック可変畳み込み層を,同じ数の目的を含むパラメトリックマルチロスを用いて最適化することを提案する。私たちの重要な洞察は、パラメータの共有セットを使用して、目的とカーネルの両方を動的に補間することです。トレーニング中、これらのパラメータはランダムにサンプリングされ、目的のすべての可能な組み合わせを明示的に最適化し、その結果、対応するカーネルにその効果を乱す。推論の間、これらのパラメータはモデルのインタラクティブな入力となり、モデルの振る舞いを信頼できる一貫した制御を可能にします。広範な実験結果から,既存のニューラルネットワークにおける従来の畳み込みの代替として,画像のデノイジング,デブラリング,スーパーレゾリューション,スタイル転送など,幅広いアプリケーションにおいて最先端の制御戦略を上回って,従来の畳み込みの代替として効果的に動作することが分かった。

Behavior of neural networks is irremediably determined by the specific loss and data used during training. However it is often desirable to tune the model at inference time based on external factors such as preferences of the user or dynamic characteristics of the data. This is especially important to balance the perception-distortion trade-off of ill-posed image-to-image translation tasks. In this work, we propose to optimize a parametric tunable convolutional layer, which includes a number of different kernels, using a parametric multi-loss, which includes an equal number of objectives. Our key insight is to use a shared set of parameters to dynamically interpolate both the objectives and the kernels. During training, these parameters are sampled at random to explicitly optimize all possible combinations of objectives and consequently disentangle their effect into the corresponding kernels. During inference, these parameters become interactive inputs of the model hence enabling reliable and consistent control over the model behavior. Extensive experimental results demonstrate that our tunable convolutions effectively work as a drop-in replacement for traditional convolutions in existing neural networks at virtually no extra computational cost, outperforming state-of-the-art control strategies in a wide range of applications; including image denoising, deblurring, super-resolution, and style transfer.

翻訳日:2023-04-04 15:49:08 公開日:2023-04-03

# ディープラーニングモデルのエネルギー消費量の推定は、精度だけではありません。

Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models ( http://arxiv.org/abs/2304.00897v1 )

ライセンス: Link先を確認

Johannes Getzner, Bertrand Charpentier, Stephan G\"unnemann

(参考訳) 現代の機械学習モデルは、膨大な量のエネルギーを消費し始めており、大きな炭素フットプリントを生み出している(Strubell et al., 2019)。この問題に対処するため,我々は,実際の運用やトレーニングを行わずに,事前にモデルのエネルギーニーズを見積もることのできる,エネルギー推定パイプライン1を開発した。そこで我々は,高品質なエネルギーデータを収集し,推定層エネルギーを蓄積することによりDLモデルのエネルギー消費を予測できる第1ベースラインモデルを構築した。

Modern machine learning models have started to consume incredible amounts of energy, thus incurring large carbon footprints (Strubell et al., 2019). To address this issue, we have created an energy estimation pipeline1, which allows practitioners to estimate the energy needs of their models in advance, without actually running or training them. We accomplished this, by collecting high-quality energy data and building a first baseline model, capable of predicting the energy consumption of DL models by accumulating their estimated layer-wise energies.

翻訳日:2023-04-04 15:48:46 公開日:2023-04-03

# エッジでのディープラーニングアプリケーションにおける階層推論のためのオンラインアルゴリズム

Online Algorithms for Hierarchical Inference in Deep Learning applications at the Edge ( http://arxiv.org/abs/2304.00891v1 )

ライセンス: Link先を確認

Vishnu Narayanan Moothedath, Jaya Prakash Champati, James Gross

(参考訳) 本稿では,リソース制約のあるエッジデバイス(ED)に,汎用分類アプリケーション用の小型MLモデル(S-ML)と,大規模MLモデル(L-ML)をホストするエッジサーバ(ES)について検討する。 S-MLの推論精度はL-MLよりも低いため、すべてのデータサンプルをESにオフロードすると高い推測精度が得られるが、EDにS-MLを埋め込むことの目的を損なうとともに、遅延低減、帯域幅の節約、ローカル推論のエネルギー効率を損なう。 S-ML推論が正しい場合にのみ受け入れられる階層推論(hierarchical Inference, HI)の考え方を検討する。そうでなければ、データサンプルはL-ML推論のためにオフロードされる。しかし、HIの理想的な実装は、S-ML推論の正しさがEDに知られていないため、実現不可能である。そこで我々は,S-ML推論の正確性を予測するオンラインメタ学習フレームワークを提案する。その結果、オンライン学習の問題は、エキスパートアドバイザによる予測(Expert Advice:PEA)問題であることがわかった。我々は、edが推論を受け入れると、s-mlの正しさに関するフィードバックを受信する全フィードバックシナリオと、edが分類の根拠となる真理を受信しない非局所フィードバックシナリオを検討し、hil-f と hil-n アルゴリズムを提案し、データサンプル数に準ずる後悔の限界を証明する。我々は,画像分類用アルゴリズムであるImagenette, Imagewoof, MNIST, CIFAR-10の4つのデータセットを用いて,提案アルゴリズムの性能評価と評価を行った。

We consider a resource-constrained Edge Device (ED) embedded with a small-size ML model (S-ML) for a generic classification application, and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. To get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We thus propose an online meta-learning framework to predict the correctness of the S-ML inference. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. We consider the full feedback scenario, where the ED receives feedback on the correctness of the S-ML once it accepts the inference, and the no-local feedback scenario, where the ED does not receive the ground truth for the classification, and propose the HIL-F and HIL-N algorithms and prove a regret bound that is sublinear with the number of data samples. We evaluate and benchmark the performance of the proposed algorithms for image classification applications using four datasets, namely, Imagenette, Imagewoof, MNIST, and CIFAR-10.

翻訳日:2023-04-04 15:48:35 公開日:2023-04-03

# 対話対話:アクションレベル生成によるタスク指向対話システムの構築

Dialog-to-Actions: Building Task-Oriented Dialogue System via Action-Level Generation ( http://arxiv.org/abs/2304.00884v1 )

ライセンス: Link先を確認

Yuncheng Hua, Xiangyu Xi, Zheng Jiang, Guanwei Zhang, Chaobo Sun, Guanglu Wan, Wei Ye

(参考訳) タスク指向対話システムでは、エンドツーエンド生成に基づくアプローチが研究され、適用されている。しかし、産業シナリオでは、既存の手法は制御可能性(ドメイン一貫性のない応答、繰り返し問題など)と効率(例えば、長い計算時間など)のボトルネックに直面します。本稿では,アクションレベル生成によるタスク指向対話システムを提案する。具体的には,まず,大規模対話から対話行動を構築し,対話行動の列として各自然言語(nl)応答を表現する。さらに、対話履歴を入力として対話アクションのシーケンスを出力するシーケンスツーシーケンスモデルをトレーニングする。生成された対話動作は、音声応答に変換される。実験の結果, 軽量化手法は競争性能が向上し, 制御性と効率性が向上した。

End-to-end generation-based approaches have been investigated and applied in task-oriented dialogue systems. However, in industrial scenarios, existing methods face the bottlenecks of controllability (e.g., domain-inconsistent responses, repetition problem, etc) and efficiency (e.g., long computation time, etc). In this paper, we propose a task-oriented dialogue system via action-level generation. Specifically, we first construct dialogue actions from large-scale dialogues and represent each natural language (NL) response as a sequence of dialogue actions. Further, we train a Sequence-to-Sequence model which takes the dialogue history as input and outputs sequence of dialogue actions. The generated dialogue actions are transformed into verbal responses. Experimental results show that our light-weighted method achieves competitive performance, and has the advantage of controllability and efficiency.

翻訳日:2023-04-04 15:48:01 公開日:2023-04-03

# smproblog:確率的議論のためのproblogの安定モデルセマンティクス

smProbLog: Stable Model Semantics in ProbLog for Probabilistic Argumentation ( http://arxiv.org/abs/2304.00879v1 )

ライセンス: Link先を確認

Pietro Totis, Angelika Kimmig, Luc De Raedt

(参考訳) 議論問題は、それらの関係構造から一連の引数の受け入れ可能性を決定することに関係している。利用可能な情報が不確実な場合、確率論的議論フレームワークは、それを説明するモデリングツールを提供する。この論文の最初の貢献は、確率的議論フレームワークを確率的論理プログラムとして新しい解釈である。確率論理プログラム(probabilistic logic program)は、いくつかの事実に確率を付記した論理プログラムである。本稿では,確率論的論理プログラミング(PLP)のセマンティクスにおいて,確率論的議論フレームワークを表すプログラムが共通の前提を満たしていないことを示す。この論文の第二の貢献は、確率的事実の選択が論理原子の真理割り当てを一意に決定しないプログラムのための新しいPLP意味論である。本論文の3番目の貢献は,この意味論をサポートするplpシステムの実装であるsmproblogの実装である。 smProbLogは確率論理型プログラミング言語ProbLogをベースにした新しいPLPフレームワークである。 smproblogはplpの典型的な推論や学習タスクをサポートしており、私たちの最初の貢献とともに確率的議論のための新しい推論ツールを提供しています。本手法は,提案アルゴリズムの計算コストを解析し,議論問題のデータセットに適用する実験を用いて評価する。

Argumentation problems are concerned with determining the acceptability of a set of arguments from their relational structure. When the available information is uncertain, probabilistic argumentation frameworks provide modelling tools to account for it. The first contribution of this paper is a novel interpretation of probabilistic argumentation frameworks as probabilistic logic programs. Probabilistic logic programs are logic programs in which some of the facts are annotated with probabilities. We show that the programs representing probabilistic argumentation frameworks do not satisfy a common assumption in probabilistic logic programming (PLP) semantics, which is, that probabilistic facts fully capture the uncertainty in the domain under investigation. The second contribution of this paper is then a novel PLP semantics for programs where a choice of probabilistic facts does not uniquely determine the truth assignment of the logical atoms. The third contribution of this paper is the implementation of a PLP system supporting this semantics: smProbLog. smProbLog is a novel PLP framework based on the probabilistic logic programming language ProbLog. smProbLog supports many inference and learning tasks typical of PLP, which, together with our first contribution, provide novel reasoning tools for probabilistic argumentation. We evaluate our approach with experiments analyzing the computational cost of the proposed algorithms and their application to a dataset of argumentation problems.

翻訳日:2023-04-04 15:47:47 公開日:2023-04-03

# 動的行動空間強化学習におけるアクションピックアップ

Action Pick-up in Dynamic Action Space Reinforcement Learning ( http://arxiv.org/abs/2304.00873v1 )

ライセンス: Link先を確認

Jiaqi Ye, Xiaodong Li, Pangjing Wu, Feng Wang

(参考訳) ほとんどの強化学習アルゴリズムはマルコフ決定過程(MDP)が定常であるという重要な仮定に基づいている。しかし、動的アクション空間を持つ非定常MDPは、実世界のシナリオにおいて一様である。しかし, 動的行動空間強化学習の課題は, これまでにも数多く研究されてきたが, 学習効率を向上させるために, 新たな, 目に見えない行動から, どのように価値ある行動を選択するかは未定のままである。この問題に対処するために,我々は,新たなアクション群からパフォーマンスを最も高める可能性のある有用なアクションを自律的に選択するインテリジェントアクションピックアップ(ap)アルゴリズムを提案する。本稿では,まず,事前の最適政策が有用な知識と経験を提供することで,行動ピックアップにおいて重要な役割を果たすことを理論的に分析し,発見する。次に,事前の最適ポリシーに基づいて,周波数ベースグローバル法と状態クラスタリングベースローカル法という2つの異なるap法を設計する。最後に,動作空間が時間とともに変化する2つのシミュレーション環境におけるAPの評価を行った。実験の結果,提案したAPは学習効率のベースラインよりも優れていることがわかった。

Most reinforcement learning algorithms are based on a key assumption that Markov decision processes (MDPs) are stationary. However, non-stationary MDPs with dynamic action space are omnipresent in real-world scenarios. Yet problems of dynamic action space reinforcement learning have been studied by many previous works, how to choose valuable actions from new and unseen actions to improve learning efficiency remains unaddressed. To tackle this problem, we propose an intelligent Action Pick-up (AP) algorithm to autonomously choose valuable actions that are most likely to boost performance from a set of new actions. In this paper, we first theoretically analyze and find that a prior optimal policy plays an important role in action pick-up by providing useful knowledge and experience. Then, we design two different AP methods: frequency-based global method and state clustering-based local method, based on the prior optimal policy. Finally, we evaluate the AP on two simulated but challenging environments where action spaces vary over time. Experimental results demonstrate that our proposed AP has advantages over baselines in learning efficiency.

翻訳日:2023-04-04 15:47:26 公開日:2023-04-03

# 離散潜在変数を持つ表現の学習スパーシティ

Learning Sparsity of Representations with Discrete Latent Variables ( http://arxiv.org/abs/2304.00935v1 )

ライセンス: Link先を確認

Zhao Xu, Daniel Onoro Rubio, Giuseppe Serra, Mathias Niepert

(参考訳) ディープラーニングの強みと確率モデルとをエレガントな方法で組み合わせる能力によって、深い潜在生成モデルに注目が集まっている。モデルで学んだデータ表現は、しばしば連続的で密度が高い。しかし、多くのアプリケーションでは、教師なし環境でデータのスパースな高次元埋め込みを学習したり、教師なし環境で数千の候補タグからマルチラベルを学習したりといったスパース表現が期待されている。いくつかのシナリオでは、スパーシティの程度にさらに制限がある可能性がある: 表現の0でない特徴の数は、予め定義されたしきい値 $l_0$ よりも大きくはならない。本稿では,スパース性の程度を明示的にモデル化し,定量化されたスパース性制約によりデータのスパース構造を学習するためのスパース深部潜在生成モデルsdlgmを提案する。表現の空間性は固定されていないが、事前に定義された制限の下で観察そのものに適合する。特に、各観測値 $i$ を補助確率変数 $l_i$ に導入し、その表現のスパーシティをモデル化する。スパース表現は、2つのGumbel-Softmax分布を介して2段階のサンプリングプロセスで生成される。推論と学習のために,mc勾配推定法に基づく不定形変分法を開発した。結果として生じるスパース表現はバックプロパゲーションで微分可能である。教師なしおよび教師なしの学習問題に対する複数のデータセットに対する実験評価は,提案手法の利点を示す。

Deep latent generative models have attracted increasing attention due to the capacity of combining the strengths of deep learning and probabilistic models in an elegant way. The data representations learned with the models are often continuous and dense. However in many applications, sparse representations are expected, such as learning sparse high dimensional embedding of data in an unsupervised setting, and learning multi-labels from thousands of candidate tags in a supervised setting. In some scenarios, there could be further restriction on degree of sparsity: the number of non-zero features of a representation cannot be larger than a pre-defined threshold $L_0$. In this paper we propose a sparse deep latent generative model SDLGM to explicitly model degree of sparsity and thus enable to learn the sparse structure of the data with the quantified sparsity constraint. The resulting sparsity of a representation is not fixed, but fits to the observation itself under the pre-defined restriction. In particular, we introduce to each observation $i$ an auxiliary random variable $L_i$, which models the sparsity of its representation. The sparse representations are then generated with a two-step sampling process via two Gumbel-Softmax distributions. For inference and learning, we develop an amortized variational method based on MC gradient estimator. The resulting sparse representations are differentiable with backpropagation. The experimental evaluation on multiple datasets for unsupervised and supervised learning problems shows the benefits of the proposed method.

翻訳日:2023-04-04 15:41:55 公開日:2023-04-03

# 連続学習表現における知識蓄積と特徴提示の課題

Knowledge Accumulation in Continually Learned Representations and the Issue of Feature Forgetting ( http://arxiv.org/abs/2304.00933v1 )

ライセンス: Link先を確認

Timm Hess, Eli Verwimp, Gido M. van de Ven, Tinne Tuytelaars

(参考訳) ニューラルネットワークはデフォルトで、すべてのトレーニングデータを一度に学習する。このようなモデルを新しいデータのシーケンシャルなチャンクでトレーニングする場合、古いデータの扱い方を壊滅的に忘れる傾向があります。本研究では,連続学習者が表現を学習し,忘れる方法について検討する。我々は,知識蓄積,時間とともに表現が向上する現象と,タスク固有の表現の喪失という2つの現象を観察する。両現象をよりよく理解するために,タスク排他比較と呼ばれる新しい分析手法を導入する。モデルがタスクを見ていて、タスク固有のすべての機能を忘れていない場合、そのタスクの表現は、同様のタスクでトレーニングされたモデルよりも優れているが、正確なものではない。画像分類実験の結果,タスク固有の特徴の多くは,これまで提案されてきたものと対照的に,すぐに忘れられることがわかった。さらに,リプレイや表現学習からのアイデアといった連続学習手法が,継続的に学習される表現に与える影響を実証する。表現品質は連続学習性能と密接な相関関係にあると結論づけた。

By default, neural networks learn on all training data at once. When such a model is trained on sequential chunks of new data, it tends to catastrophically forget how to handle old data. In this work we investigate how continual learners learn and forget representations. We observe two phenomena: knowledge accumulation, i.e. the improvement of a representation over time, and feature forgetting, i.e. the loss of task-specific representations. To better understand both phenomena, we introduce a new analysis technique called task exclusion comparison. If a model has seen a task and it has not forgotten all the task-specific features, then its representation for that task should be better than that of a model that was trained on similar tasks, but not that exact one. Our image classification experiments show that most task-specific features are quickly forgotten, in contrast to what has been suggested in the past. Further, we demonstrate how some continual learning methods, like replay, and ideas from representation learning affect a continually learned representation. We conclude by observing that representation quality is tightly correlated with continual learning performance.

翻訳日:2023-04-04 15:41:34 公開日:2023-04-03

# HypLiLoc: ハイパーボリック核融合によるLiDARの効率的な回帰を目指して

HypLiLoc: Towards Effective LiDAR Pose Regression with Hyperbolic Fusion ( http://arxiv.org/abs/2304.00932v1 )

ライセンス: Link先を確認

Sijie Wang, Qiyu Kang, Rui She, Wei Wang, Kai Zhao, Yang Song, Wee Peng Tay

(参考訳) LiDARの再ローカライゼーションは、ロボット工学、自律運転、コンピュータビジョンなど、多くの分野で重要な役割を果たしている。データベースからのLiDARベースの検索は、通常、高い計算ストレージコストを発生させ、データベースがスパースすぎる場合、世界中の不正確なポーズ推定につながる可能性がある。一方、ポーズ回帰手法では、画像や雲を入力として捉え、エンドツーエンドでグローバルポーズを直接レグレッションする。データベースマッチングは行わず、検索技術よりも計算効率が高い。我々は、LiDARポーズ回帰の新しいモデルであるHypLiLocを提案する。 2つの分岐したバックボーンを用いてそれぞれ3次元特徴と2次元投影特徴を抽出する。より効率的な特徴表現を得るために,ユークリッド空間と双曲空間のマルチモーダル特徴融合を考える。実験結果から,HypLiLocは屋外および屋内の両方のデータセットで最先端の性能を達成することが示された。また,マルチモーダル特徴抽出とマルチスペース埋め込みの有効性を示すフレームワーク設計に関する広範なアブレーション研究を行う。私たちのコードは、https://github.com/sijieaaa/HypLiLocでリリースされています。

LiDAR relocalization plays a crucial role in many fields, including robotics, autonomous driving, and computer vision. LiDAR-based retrieval from a database typically incurs high computation storage costs and can lead to globally inaccurate pose estimations if the database is too sparse. On the other hand, pose regression methods take images or point clouds as inputs and directly regress global poses in an end-to-end manner. They do not perform database matching and are more computationally efficient than retrieval techniques. We propose HypLiLoc, a new model for LiDAR pose regression. We use two branched backbones to extract 3D features and 2D projection features, respectively. We consider multi-modal feature fusion in both Euclidean and hyperbolic spaces to obtain more effective feature representations. Experimental results indicate that HypLiLoc achieves state-of-the-art performance in both outdoor and indoor datasets. We also conduct extensive ablation studies on the framework design, which demonstrate the effectiveness of multi-modal feature extraction and multi-space embedding. Our code is released at: https://github.com/sijieaaa/HypLiLoc

翻訳日:2023-04-04 15:41:16 公開日:2023-04-03

# オンボード映像からのオンラインレーングラフ抽出

Online Lane Graph Extraction from Onboard Video ( http://arxiv.org/abs/2304.00930v1 )

ライセンス: Link先を確認

Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool

(参考訳) 自動運転には、周囲の道路網の構造化された理解が必要である。そのような理解の最も一般的で有用な表現の1つは、BEVレーングラフの形でなされる。本研究では,車載カメラからの映像ストリームを用いて,周囲のレーングラフのオンライン抽出を行う。単一の画像ではなくビデオを使うことは、入力が異なる時間ステップからの情報を組み合わせることのメリットと課題の両方をもたらす。我々は3つの異なるアプローチで出現した課題を調査した。第1のアプローチは、単一フレームレーングラフの推定値を統一レーングラフにマージ可能な、後処理ステップである。第2のアプローチでは、トランスに時間的埋め込みを組み込むことで、ネットワークが最適な時間的集約戦略を発見することができる。最後に、第3の手法と提案手法は、明示的なBEV投影とフレームワイド特徴のアライメントによる初期時間的アグリゲーションである。提案手法の単一モデルでは、1つを含む任意の画像を処理して正確なレーングラフを生成することができる。ヌースセンおよびアルゴバースデータセットを用いた実験では,提案手法の優越性を強調しながら,すべてのアプローチの有効性を示す。コードは公開されます。

Autonomous driving requires a structured understanding of the surrounding road network to navigate. One of the most common and useful representation of such an understanding is done in the form of BEV lane graphs. In this work, we use the video stream from an onboard camera for online extraction of the surrounding's lane graph. Using video, instead of a single image, as input poses both benefits and challenges in terms of combining the information from different timesteps. We study the emerged challenges using three different approaches. The first approach is a post-processing step that is capable of merging single frame lane graph estimates into a unified lane graph. The second approach uses the spatialtemporal embeddings in the transformer to enable the network to discover the best temporal aggregation strategy. Finally, the third, and the proposed method, is an early temporal aggregation through explicit BEV projection and alignment of framewise features. A single model of this proposed simple, yet effective, method can process any number of images, including one, to produce accurate lane graphs. The experiments on the Nuscenes and Argoverse datasets show the validity of all the approaches while highlighting the superiority of the proposed method. The code will be made public.

翻訳日:2023-04-04 15:41:01 公開日:2023-04-03

# オンライン第三者追跡による二酸化炭素排出量の定量化

Quantifying Carbon Emissions due to Online Third-Party Tracking ( http://arxiv.org/abs/2304.00927v1 )

ライセンス: Link先を確認

Michalis Pachilakis, Savino Dambra, Iskander Sanchez-Rola, Leyla Bilge

(参考訳) 過去10年間で、地球温暖化はいくつかの話題を巻き起こし、世界中の注目を集めた。炭素フットプリントは温室効果ガスの排出量を増加させ、惑星の温度上昇をもたらす主要な要因である。公共の注目は、輸送、食品消費、家庭用活動による二酸化炭素排出量の削減に向けられているが、オンライン活動によるCO2eq排出量の寄与は無視する。現在の情報化時代には、オンラインでのブラウジングに多くの時間を費やしています。この活性はco2eqを生成する電気を消費する。ウェブサイトのブラウジングは温室効果ガスの発生に寄与するが、インターネットが環境に与える影響は、Web追跡の実践によってさらに悪化している。実際、ほとんどのウェブページは、主に広告、データ分析、ユーザビリティの改善に使用されるコンテンツを追跡することで、非常にロードされている。この余分な内容は、電力消費が増加し、温室効果ガスの排出が増加するという大きなデータ伝達を意味する。本研究では,Webのトラッキングによるオーバーヘッドに着目し,そのネットワークと炭素のフットプリントを解析する。 1万人のユーザのブラウジングテレメトリと270万のwebサイトをクロールする実験結果を利用することで、web追跡はデータ送信を21%以上増加させることがわかり、これは毎年大気中の温室効果ガスの約11 mtの追加放出を示唆する。このような貢献は無視できないものではなく、肉の生産、輸送、さらには暗号通貨採掘といった現代の生活の多くの活動に匹敵するものである。また、異なる国、ウェブサイトカテゴリー、追跡機関の足跡を考慮すると、いくつかのアクターが他のアクターよりもはるかに大きな不平等が存在することも明らかにした。

In the past decade, global warming made several headlines and turned the attention of the whole world to it. Carbon footprint is the main factor that drives greenhouse emissions up and results in the temperature increase of the planet with dire consequences. While the attention of the public is turned to reducing carbon emissions by transportation, food consumption and household activities, we ignore the contribution of CO2eq emissions produced by online activities. In the current information era, we spend a big amount of our days browsing online. This activity consumes electricity which in turn produces CO2eq. While website browsing contributes to the production of greenhouse gas emissions, the impact of the Internet on the environment is further exacerbated by the web-tracking practice. Indeed, most webpages are heavily loaded by tracking content used mostly for advertising, data analytics and usability improvements. This extra content implies big data transmissions which results in higher electricity consumption and thus higher greenhouse gas emissions. In this work, we focus on the overhead caused by web tracking and analyse both its network and carbon footprint. By leveraging the browsing telemetry of 100k users and the results of a crawling experiment of 2.7M websites, we find that web tracking increases data transmissions upwards of 21%, which in turn implies the additional emission of around 11 Mt of greenhouse gases in the atmosphere every year. We find such contribution to be far from negligible, and comparable to many activities of modern life, such as meat production, transportation, and even cryptocurrency mining. Our study also highlights that there exist significant inequalities when considering the footprint of different countries, website categories, and tracking organizations, with a few actors contributing to a much greater extent than the remaining ones.

翻訳日:2023-04-04 15:40:46 公開日:2023-04-03

# Abstraqt: 抽象安定化器シミュレーションによる量子回路の解析

Abstraqt: Analysis of Quantum Circuits via Abstract Stabilizer Simulation ( http://arxiv.org/abs/2304.00921v1 )

ライセンス: Link先を確認

Benjamin Bichsel, Maximilian Baader, Anouk Paradis, Martin Vechev

(参考訳) 安定化器シミュレーションはクリフォードゲートのみからなる量子回路の重要なクラスを効率的にシミュレートすることができる。しかし、このシミュレーションの非クリフォードゲートを含む任意の量子回路への既存の拡張はすべて指数関数的ランタイムに苦しむ。本研究では、任意の量子回路上での効率的な安定化器シミュレーションのための新しい手法を、損失精度で提示することで、この問題に対処する。私たちのキーとなるアイデアは、量子状態の指数和表現を、(少なくとも)起こるすべてのサマンドをカバーする単一の抽象的なサマンドに圧縮することです。これにより,クリフォードゲート,非クリフォードゲート,(内部)計測などの回路操作の効果を過剰に吸収することにより,抽象サムマンドを効率的に操作できる抽象安定化シミュレータを導入することができる。我々はAbstraqtと呼ばれるツールに抽象シミュレータを実装し、既存の手法で回路特性を抽出できることを実験的に実証した。

Stabilizer simulation can efficiently simulate an important class of quantum circuits consisting exclusively of Clifford gates. However, all existing extensions of this simulation to arbitrary quantum circuits including non-Clifford gates suffer from an exponential runtime. In this work, we address this challenge by presenting a novel approach for efficient stabilizer simulation on arbitrary quantum circuits, at the cost of lost precision. Our key idea is to compress an exponential sum representation of the quantum state into a single abstract summand covering (at least) all occurring summands. This allows us to introduce an abstract stabilizer simulator that efficiently manipulates abstract summands by over-abstracting the effect of circuit operations including Clifford gates, non-Clifford gates, and (internal) measurements. We implemented our abstract simulator in a tool called Abstraqt and experimentally demonstrate that Abstraqt can establish circuit properties intractable for existing techniques.

翻訳日:2023-04-04 15:40:17 公開日:2023-04-03

# ノード分類における不確実性伝播

Uncertainty Propagation in Node Classification ( http://arxiv.org/abs/2304.00918v1 )

ライセンス: Link先を確認

Zhao Xu, Carolin Lawrence, Ammar Shaker, Raman Siarheyeu

(参考訳) ニューラルネットワークの予測の不確かさの定量化が最近注目を集めている。本研究では,ノード分類作業におけるグラフニューラルネットワーク(GNN)の不確実性の測定に焦点をあてる。既存のGNNはノード間のメッセージパッシングをモデル化している。メッセージはしばしば決定論的です。メッセージに不確実性はあるか? このような不確実性を、メッセージとともにグラフ上でどのように伝播させるのか? これらの問題に対処するために,gnnをベイズモデリングフレームワークに組み込むベイズ不確実性伝播(bup)法を提案し,予測確率とメッセージの不確かさのベイズ信頼度を有するノード分類の予測不確実性をモデル化する。本手法はガウスモデルにインスパイアされた新しい不確実性伝播機構を提案する。さらに,GNNが学習手順における予測不確実性を明瞭に統合できるようにするノード分類における不確実性指向損失を提案する。その結果、予測の不確実性が大きいトレーニング例が罰せられる。予測信頼性とアウト・オブ・ディストリビューション(OOD)予測に関して,BUPを実証する。学習された不確実性も深く分析される。 ood症例における不確かさとグラフトポロジーの関係,および予測不確実性との関係を広範な実験により検討した。人気のあるベンチマークデータセットを用いた実験結果は,提案手法の優れた性能を示す。

Quantifying predictive uncertainty of neural networks has recently attracted increasing attention. In this work, we focus on measuring uncertainty of graph neural networks (GNNs) for the task of node classification. Most existing GNNs model message passing among nodes. The messages are often deterministic. Questions naturally arise: Does there exist uncertainty in the messages? How could we propagate such uncertainty over a graph together with messages? To address these issues, we propose a Bayesian uncertainty propagation (BUP) method, which embeds GNNs in a Bayesian modeling framework, and models predictive uncertainty of node classification with Bayesian confidence of predictive probability and uncertainty of messages. Our method proposes a novel uncertainty propagation mechanism inspired by Gaussian models. Moreover, we present an uncertainty oriented loss for node classification that allows the GNNs to clearly integrate predictive uncertainty in learning procedure. Consequently, the training examples with large predictive uncertainty will be penalized. We demonstrate the BUP with respect to prediction reliability and out-of-distribution (OOD) predictions. The learned uncertainty is also analyzed in depth. The relations between uncertainty and graph topology, as well as predictive uncertainty in the OOD cases are investigated with extensive experiments. The empirical results with popular benchmark datasets demonstrate the superior performance of the proposed method.

翻訳日:2023-04-04 15:40:03 公開日:2023-04-03

# 拡散橋の混合輸送, schr\"odinger bridge問題と生成モデル

Diffusion Bridge Mixture Transports, Schr\"odinger Bridge Problems and Generative Modeling ( http://arxiv.org/abs/2304.00917v1 )

ライセンス: Link先を確認

Stefano Peluchetti

(参考訳) 動的schr\"odinger bridge問題(英語版)は、2つの目標確率測度間の移動を定義する確率過程を求め、クルバック・リーバーの発散の観点から最接近の基準を最適に満たしている。本稿では,動的schr\"odinger bridge問題を解くために,新しいサンプリングベース反復アルゴリズムである反復拡散橋混合輸送 (idbm) を提案する。 IDBM手順は、各ステップにおける目標測度間の有効な結合を実現するという魅力的な性質を示す。我々はIDBM手順に関する最初の理論的研究を行い、その収束特性を確立した。理論的な結果は、様々な応用におけるIDBM手順の競合性能を実証する多数の数値実験によって補完される。生成モデリングの最近の進歩は、拡散過程の時間反転を用いて、単純な分布をデータ分布に大まかに輸送する生成過程を定義する。代替案として, idbm手順の第1イテレーションを, このトランスポートを実現する近似フリー手法として用いることを提案する。このアプローチは生成過程のダイナミクスを選択する際の柔軟性を高め、より長い離散化間隔でより高速なトレーニングと優れたサンプル品質を示す。実装面では、必要な修正は最小限の侵入的であり、生成サンプリングに必要な変更はなく、トレーニング損失計算に限定される。

The dynamic Schr\"odinger bridge problem seeks a stochastic process that defines a transport between two target probability measures, while optimally satisfying the criteria of being closest, in terms of Kullback-Leibler divergence, to a reference process. We propose a novel sampling-based iterative algorithm, the iterated diffusion bridge mixture transport (IDBM), aimed at solving the dynamic Schr\"odinger bridge problem. The IDBM procedure exhibits the attractive property of realizing a valid coupling between the target measures at each step. We perform an initial theoretical investigation of the IDBM procedure, establishing its convergence properties. The theoretical findings are complemented by numerous numerical experiments illustrating the competitive performance of the IDBM procedure across various applications. Recent advancements in generative modeling employ the time-reversal of a diffusion process to define a generative process that approximately transports a simple distribution to the data distribution. As an alternative, we propose using the first iteration of the IDBM procedure as an approximation-free method for realizing this transport. This approach offers greater flexibility in selecting the generative process dynamics and exhibits faster training and superior sample quality over longer discretization intervals. In terms of implementation, the necessary modifications are minimally intrusive, being limited to the training loss computation, with no changes necessary for generative sampling.

翻訳日:2023-04-04 15:39:44 公開日:2023-04-03

# DreamAvatar: 拡散モデルによる3次元人体アバター生成

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models ( http://arxiv.org/abs/2304.00916v1 )

ライセンス: Link先を確認

Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong

(参考訳) 筆者はdreamavatarという,高品質な3dアバターを制御可能なポーズで生成するためのテキスト・アンド・シェイプガイドフレームワークを提案する。近年,テキストガイドによる3次元共通物体生成の手法が提案されているが,人体の形状・ポーズ・外観が複雑化しているため,高品質なアバターの生成が課題となっている。この課題に対処するためにDreamAvatarを提案する。これは3Dポイントの密度と色の特徴を予測するためのトレーニング可能なNeRFと、2Dセルフスーパービジョンを提供するための事前訓練されたテキスト-画像拡散モデルである。具体的には、SMPLモデルを利用して、生成のための粗いポーズと形状ガイダンスを提供する。我々は、標準空間と観測空間からなる双対空間設計を導入する。これは、学習可能な変形場によってNeRFを介して関連付けられ、最適化されたテクスチャと幾何を標準空間から目標とするアバターへ転送することができる。さらに,より詳細な形状とテクスチャを持ったより鮮明な生成を可能にするために,正規性正規化を利用する。広範な評価を通じて,DreamAvatarは既存の手法を著しく上回り,テキスト・アンド・シェイプ3次元世代のための新しい最先端技術を確立した。

We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been produced by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tackle this challenge, which utilizes a trainable NeRF for predicting density and color features for 3D points and a pre-trained text-to-image diffusion model for providing 2D self-supervision. Specifically, we leverage SMPL models to provide rough pose and shape guidance for the generation. We introduce a dual space design that comprises a canonical space and an observation space, which are related by a learnable deformation field through the NeRF, allowing for the transfer of well-optimized texture and geometry from the canonical space to the target posed avatar. Additionally, we exploit a normal-consistency regularization to allow for more vivid generation with detailed geometry and texture. Through extensive evaluations, we demonstrate that DreamAvatar significantly outperforms existing methods, establishing a new state-of-the-art for text-and-shape guided 3D human generation.

翻訳日:2023-04-04 15:39:22 公開日:2023-04-03

# e^{-i\pi n/2}\nabla_x ^{^{n}}\Psi =(E-\Delta(x) )\Psi $における境界状態の量子化条件

Quantization Condition of the Bound States in $e^{-i\pi n/2}\nabla_x ^{^{n}}\Psi =(E-\Delta(x) )\Psi $ ( http://arxiv.org/abs/2304.00914v1 )

ライセンス: Link先を確認

Xiong Fan

(参考訳) 一般近似量子化規則 $% \int_{L_{E}}^{R_{E}}k_0$ $dx=(N+\frac{1}{2})\pi $ for the bound states in the potential Well of the equations $e^{-i\pi n/2}\nabla ^{^{n}}\Psi =[E-\Delta (x)]\Psi ,$ where $k_0=(E-\Delta )^{1/n}$ with $N\in\mathbb{N}_{0} $, $n$ is an even natural number, $L_{E}$ and $R_{E}$ 古典的に禁止された領域の境界点が許される。唯一の仮説は、指数的に成長するすべての成分は無視可能であることである。 Schr\"{o}dinger 方程式や Bogoliubov-de Gennes 方程式を含む応用について論じる。

We will prove a general approximate quantization rule $% \int_{L_{E}}^{R_{E}}k_0$ $dx=(N+\frac{1}{2})\pi $ for the bound states in the potential well of the equations $e^{-i\pi n/2}\nabla_x ^{^{n}}\Psi =[E-\Delta (x)]\Psi ,$ where $k_0=(E-\Delta )^{1/n}$ with $N\in\mathbb{N}_{0} $, $n$ is an even natural number, and $L_{E}$ and $R_{E}$ the boundary points between the classically forbidden regions and the allowed region. The only hypothesis is that all exponentially growing components are negligible, which is appropriate for not narrow wells. Applications including the Schr\"{o}dinger equation and Bogoliubov-de Gennes equation will be discussed.

翻訳日:2023-04-04 15:38:56 公開日:2023-04-03

# ランダム関数型ニューラルネットワークの特性と応用の可能性

Properties and Potential Applications of Random Functional-Linked Types of Neural Networks ( http://arxiv.org/abs/2304.00957v1 )

ライセンス: Link先を確認

Guang-Yong Chen, Yong-Hang Yu, Min Gan, C. L. Philip Chen, Wenzhong Guo

(参考訳) ランダム関数リンク型ニューラルネットワーク(RFLNN)、例えば、極端学習機械(ELM)と広範学習システム(BLS)は、時間を要するトレーニングプロセスに苦しむことを回避し、深層構造における学習の代替手段を提供する。 rflnnは様々な分類と回帰タスクで優れた性能を達成しているが、これらのネットワークの性質と説明は以前の研究では無視されている。本稿では、rflnnの特性について、周波数領域の観点から考察し、これらのネットワークにおける周波数原理の存在、すなわち、低頻度を迅速に捕捉し、トレーニングプロセス中に高周波数成分を適合させることを見出した。これらの発見は、rflnnの理解と応用拡大に有用である。周波数原理によって導かれ、より優れた性能でBLSネットワークを生成する方法を提案し、ジャコビ反復法とBLSネットワークに現れる異なる周波数原理の観点から、ポゾン方程式を解くための効率的なアルゴリズムを設計する。

Random functional-linked types of neural networks (RFLNNs), e.g., the extreme learning machine (ELM) and broad learning system (BLS), which avoid suffering from a time-consuming training process, offer an alternative way of learning in deep structure. The RFLNNs have achieved excellent performance in various classification and regression tasks, however, the properties and explanations of these networks are ignored in previous research. This paper gives some insights into the properties of RFLNNs from the viewpoints of frequency domain, and discovers the presence of frequency principle in these networks, that is, they preferentially capture low-frequencies quickly and then fit the high frequency components during the training process. These findings are valuable for understanding the RFLNNs and expanding their applications. Guided by the frequency principle, we propose a method to generate a BLS network with better performance, and design an efficient algorithm for solving Poison's equation in view of the different frequency principle presenting in the Jacobi iterative method and BLS network.

翻訳日:2023-04-04 15:32:06 公開日:2023-04-03

# AirLoc: オブジェクトベースの屋内再ローカライゼーション

AirLoc: Object-based Indoor Relocalization ( http://arxiv.org/abs/2304.00954v1 )

ライセンス: Link先を確認

Aryan, Bowen Li, Sebastian Scherer, Yun-Jou Lin, Chen Wang

(参考訳) 屋内再ローカライズは、自律探索のようなロボットのタスクと、ショッピングモールでの携帯電話によるナビゲーションのような民間用途の両方に不可欠である。従来の手法では、キーポイントの特徴や局所的なテクスチャなどの幾何学的情報を用いて屋内再局在を行うが、視覚的に類似したシーンを持つ環境では容易に失敗するか、多くのデータベースイメージを必要とする。人間がユニークなランドマークを認識して場所を覚えているという事実にインスパイアされた私たちは、幾何学的要素よりも有益である物体に頼る。そこで本研究では,AirLocと呼ばれるシンプルなオブジェクトベース屋内再配置手法を提案する。オブジェクト再識別とオブジェクト関係の記憶という重要な課題を克服するために,オブジェクトの外観の埋め込みとオブジェクト間の幾何学的関係を抽出する。幾何学的特徴と外観特徴を統合して累積的なシーン特徴を生成する。その結果、ロバストで正確でポータブルな屋内再局在システムとなり、室内レベルの再局在における最先端の手法を9.5%、精度7%で上回る結果となった。徹底的な評価に加えて, 重度の咬合, 知覚的エイリアス, 視点シフト, 変形などの課題において, 気流のロバスト性を示す実世界テストも実施する。

Indoor relocalization is vital for both robotic tasks like autonomous exploration and civil applications such as navigation with a cell phone in a shopping mall. Some previous approaches adopt geometrical information such as key-point features or local textures to carry out indoor relocalization, but they either easily fail in an environment with visually similar scenes or require many database images. Inspired by the fact that humans often remember places by recognizing unique landmarks, we resort to objects, which are more informative than geometry elements. In this work, we propose a simple yet effective object-based indoor relocalization approach, dubbed AirLoc. To overcome the critical challenges of object reidentification and remembering object relationships, we extract object-wise appearance embedding and inter-object geometric relationships. The geometry and appearance features are integrated to generate cumulative scene features. This results in a robust, accurate, and portable indoor relocalization system, which outperforms the state-of-the-art methods in room-level relocalization by 9.5% of PR-AUC and 7% of accuracy. In addition to exhaustive evaluation, we also carry out real-world tests, where AirLoc shows robustness in challenges like severe occlusion, perceptual aliasing, viewpoint shift, and deformation.

翻訳日:2023-04-04 15:31:47 公開日:2023-04-03

# バイナリニューラルネットワークにおけるデータフローの最適化

Optimizing data-flow in Binary Neural Networks ( http://arxiv.org/abs/2304.00952v1 )

ライセンス: Link先を確認

L. Vorabbi, D. Maltoni, S. Santi

(参考訳) バイナリニューラルネットワーク(BNN)は、高価な浮動小数点演算をビット演算に置き換えることで、ニューラルネットワークの推論時間を著しく加速することができる。しかし、既存のソリューションの多くはBNN層のデータフローを完全に最適化していないため、1ビットから16/32ビットへの中間変換は効率を損なうことが多い。我々は,BNNパイプラインにおけるデータフローと並列性を向上する新たなトレーニング手法を提案し,具体的には,データ幅を32ビットから8ビットに削減するクリッピングブロックを提案する。さらに、通常32ビットで保持されるバイナリ層の内部アキュムレータのサイズを小さくし、精度を損なうことなくデータのオーバーフローを防止する。さらに、レイテンシを低減し、デプロイを簡単にするBatch Normalizationレイヤの最適化も提供しています。最後に、ARM命令セットに対するバイナリ直接変換の最適化実装を提案する。実験の結果,少なくとも1つの完全精度モデルに対して精度を低下させることなく,推論速度を一貫した改善(最先端の2つのBNNフレームワークと比較して最大1.91と2.73倍)した。

Binary Neural Networks (BNNs) can significantly accelerate the inference time of a neural network by replacing its expensive floating-point arithmetic with bitwise operations. Most existing solutions, however, do not fully optimize data flow through the BNN layers, and intermediate conversions from 1 to 16/32 bits often further hinder efficiency. We propose a novel training scheme that can increase data flow and parallelism in the BNN pipeline; specifically, we introduce a clipping block that decreases the data-width from 32 bits to 8. Furthermore, we reduce the internal accumulator size of a binary layer, usually kept using 32-bit to prevent data overflow without losing accuracy. Additionally, we provide an optimization of the Batch Normalization layer that both reduces latency and simplifies deployment. Finally, we present an optimized implementation of the Binary Direct Convolution for ARM instruction sets. Our experiments show a consistent improvement of the inference speed (up to 1.91 and 2.73x compared to two state-of-the-art BNNs frameworks) with no drop in accuracy for at least one full-precision model.

翻訳日:2023-04-04 15:31:25 公開日:2023-04-03

# 人工樹状体計算 : 神経形回路における樹状体の場合

Artificial Dendritic Computation: The case for dendrites in neuromorphic circuits ( http://arxiv.org/abs/2304.00951v1 )

ライセンス: Link先を確認

Daniel John Mannion, Anthony John Kenyon

(参考訳) バイオインスパイアされたコンピューティングは、ニューロンとシナプスに焦点を当て、大きな成功を収めている。しかし、これらのデンドライトのつながりも重要な役割を担っている。本稿では,デンドリティック計算を複製する動機について検討し,その構築における今後の試みを導く枠組みを提案する。このフレームワークはデンドライトの重要な性質を特定し,音像定位処理におけるデンドライト計算の例を示す。我々は,BiLSTMニューラルネットワークの性能に及ぼすデンドライトの影響を評価し,デンドライト前処理がしきい値性能に必要なネットワークサイズを減らすことを発見した。

Bio-inspired computing has focused on neuron and synapses with great success. However, the connections between these, the dendrites, also play an important role. In this paper, we investigate the motivation for replicating dendritic computation and present a framework to guide future attempts in their construction. The framework identifies key properties of the dendrites and presents and example of dendritic computation in the task of sound localisation. We evaluate the impact of dendrites on an BiLSTM neural network's performance, finding that dendrite pre-processing reduce the size of network required for a threshold performance.

翻訳日:2023-04-04 15:31:09 公開日:2023-04-03

# 複数の産業エンティティの半自動コンピュータビジョンによる追跡 -フレームワークとデータセット作成アプローチ-

Semi-Automated Computer Vision based Tracking of Multiple Industrial Entities -- A Framework and Dataset Creation Approach ( http://arxiv.org/abs/2304.00950v1 )

ライセンス: Link先を確認

J\'er\^ome Rutinowski, Hazem Youssef, Sven Franke, Irfan Fachrudin Priyanta, Frederik Polachowski, Moritz Roidl, Christopher Reining

(参考訳) この貢献は、産業エンティティ(例えばパレット、クレート、バレル)を6台のrgbカメラのネットワーク上で連続的に追跡するためのフレームワークであるtomie framework (tracking of multiple industrial entities)を提示している。このフレームワークは、複数のセンサー、データパイプライン、データアノテーション手順を使用しており、このコントリビューションで詳細に説明されている。産業部門のための完全自動化トラッキングシステムのビジョンを念頭に置いて、研究者は産業環境で効率的に高品質なデータをキャプチャできる。このフレームワークを使用すると、画像データセットであるTOMIEデータセットが作成され、同時にフレームワークの妥当性を評価するために使用される。このデータセットには、112,860フレームのアノテーションファイルと640,936のエンティティインスタンスが含まれている。このデータセットは、同等のデータセットを4倍にスケールし、ウェアハウスセクターの産業アプリケーションから引き出されたシナリオで構成されている。このデータセットにはByteTrack、Bot-Sort、SiamMOTという3つのトラッキングアルゴリズムが適用される。

This contribution presents the TOMIE framework (Tracking Of Multiple Industrial Entities), a framework for the continuous tracking of industrial entities (e.g., pallets, crates, barrels) over a network of, in this example, six RGB cameras. This framework, makes use of multiple sensors, data pipelines and data annotation procedures, and is described in detail in this contribution. With the vision of a fully automated tracking system for industrial entities in mind, it enables researchers to efficiently capture high quality data in an industrial setting. Using this framework, an image dataset, the TOMIE dataset, is created, which at the same time is used to gauge the framework's validity. This dataset contains annotation files for 112,860 frames and 640,936 entity instances that are captured from a set of six cameras that perceive a large indoor space. This dataset out-scales comparable datasets by a factor of four and is made up of scenarios, drawn from industrial applications from the sector of warehousing. Three tracking algorithms, namely ByteTrack, Bot-Sort and SiamMOT are applied to this dataset, serving as a proof-of-concept and providing tracking results that are comparable to the state of the art.

翻訳日:2023-04-04 15:31:00 公開日:2023-04-03

# VTAE:マニフォールド学習を用いた変分変換器オートエンコーダ

VTAE: Variational Transformer Autoencoder with Manifolds Learning ( http://arxiv.org/abs/2304.00948v1 )

ライセンス: Link先を確認

Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Dacheng Tao, Xuelong Li

(参考訳) 深層生成モデルは、複数の潜伏変数を通して非線形データ分布を学習する成功例を示し、これらのモデルは潜伏サンプルをデータ空間にマッピングするために非線形関数(ジェネレータ)を使用する。一方、ジェネレータの非線形性は、潜在空間がデータ空間の不満足な投影を示し、表現学習が不十分であることを意味する。しかし、この弱射影はリーマン計量によって対処することができ、リーマン多様体上のデータサンプル間の測地計算と正確な補間が、深い生成モデルの性能を大幅に改善できることを示す。本稿では、リーマン多様体上の測地線を最小化し、表現学習を改善するために、変分空間変換オートエンコーダ(VTAE)を提案する。特に,空間変換器を符号化した変分オートエンコーダを慎重に設計し,潜在変数モデルをリーマン多様体上のデータに明示的に拡張し,大域的文脈モデリングを実現する。さらに, 2つの異なる対象の潜在表現間を横断しながら, 滑らかで妥当な補間を行うため, 性能の劣る線形補間を用いる既存モデルとは異なる測地補間ネットワークを提案する。ベンチマーク実験により,画像補間や再構成を含む様々なコンピュータビジョンタスクに対して,提案モデルにより予測精度と汎用性を向上できることが示された。

Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables and these models use a nonlinear function (generator) to map latent samples into the data space. On the other hand, the nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. This weak projection, however, can be addressed by a Riemannian metric, and we show that geodesics computation and accurate interpolations between data samples on the Riemannian manifold can substantially improve the performance of deep generative models. In this paper, a Variational spatial-Transformer AutoEncoder (VTAE) is proposed to minimize geodesics on a Riemannian manifold and improve representation learning. In particular, we carefully design the variational autoencoder with an encoded spatial-Transformer to explicitly expand the latent variable model to data on a Riemannian manifold, and obtain global context modelling. Moreover, to have smooth and plausible interpolations while traversing between two different objects' latent representations, we propose a geodesic interpolation network different from the existing models that use linear interpolation with inferior performance. Experiments on benchmarks show that our proposed model can improve predictive accuracy and versatility over a range of computer vision tasks, including image interpolations, and reconstructions.

翻訳日:2023-04-04 15:30:38 公開日:2023-04-03

# 再現:相対ポーズ注意シーン表現トランスフォーマ

RePAST: Relative Pose Attention Scene Representation Transformer ( http://arxiv.org/abs/2304.00947v1 )

ライセンス: Link先を確認

Aleksandr Safin, Daniel Durckworth, Mehdi S. M. Sajjadi

(参考訳) SRT(Scene Representation Transformer)はインタラクティブなレートで新しいビューを描画する手法である。 SRTは任意に選択された参照カメラに対してカメラポーズを使用するため、入力ビューの順序に不変ではない。その結果、SRTは参照フレームを定期的に変更する必要がある大規模シーンには直接適用できない。本研究では,入力に基準フレームを固定する代わりに,対方向の相対カメラポーズ情報をトランスフォーマの注意機構に直接注入する相対ポーズ注意srt(repast)を提案する。これは定義上、任意のグローバル参照フレームの選択に不変でありながら、元のメソッドの完全な能力を保っているモデルにつながる。経験的な結果は、モデルにこの不変性を加えると品質が低下しないことを示している。これは、完全に潜在的なトランスフォーマーベースのレンダリング方法を大規模シーンに適用するためのステップであると考えています。

The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a result, SRT is not directly applicable to large-scale scenes where the reference frame would need to be changed regularly. In this work, we propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference frame at the input, we inject pairwise relative camera pose information directly into the attention mechanism of the Transformers. This leads to a model that is by definition invariant to the choice of any global reference frame, while still retaining the full capabilities of the original method. Empirical results show that adding this invariance to the model does not lead to a loss in quality. We believe that this is a step towards applying fully latent transformer-based rendering methods to large-scale scenes.

翻訳日:2023-04-04 15:30:14 公開日:2023-04-03

# MoLo:Few-shot行動認識のためのモーション強化ロングショートコントラスト学習

MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition ( http://arxiv.org/abs/2304.00946v1 )

ライセンス: Link先を確認

Xiang Wang, Shiwei Zhang, Zhiwu Qing, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang

(参考訳) 学習した視覚特徴のフレームレベルでのマッチングを行うことで、有望な性能を実現するための最先端のアクション認識手法しかし、一般的には2つの制限がある。一長期的時間的知覚を強制する指導の欠如により、局所的フレーム間の一致手続が不正確になる傾向があること。二明示的な動作学習は、通常無視され、部分的な情報を失うこと。これらの問題に対処するために、長短コントラスト目標と運動オートデコーダを含む2つの重要なコンポーネントを含む運動強化長短コントラスト学習法(MoLo)を開発した。特に、ロングショートのコントラストの目的は、同じクラスに属するビデオのグローバルトークンとの合意を最大化することで、ロングフォームな時間認識を伴うローカルフレームの特徴を付与することである。 motion autodecoderは、異なる特徴からピクセルの動きを再構築する軽量なアーキテクチャで、ネットワークにモーションダイナミクスを明示的に組み込む。これにより、MoLoは、広範囲の時間的コンテキストとモーションキューを同時に学習し、包括的な数ショットマッチングを行うことができる。提案手法の有効性を示すために,MoLoを5つの標準ベンチマークで評価し,MoLoが最近の先進的手法よりも良好に優れていることを示す。ソースコードはhttps://github.com/alibaba-mmai-research/moloで入手できる。

Current state-of-the-art approaches for few-shot action recognition achieve promising performance by conducting frame-level matching on learned visual features. However, they generally suffer from two limitations: i) the matching procedure between local frames tends to be inaccurate due to the lack of guidance to force long-range temporal perception; ii) explicit motion learning is usually ignored, leading to partial information loss. To address these issues, we develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder. Specifically, the long-short contrastive objective is to endow local frame features with long-form temporal awareness by maximizing their agreement with the global token of videos belonging to the same class. The motion autodecoder is a lightweight architecture to reconstruct pixel motions from the differential features, which explicitly embeds the network with motion dynamics. By this means, MoLo can simultaneously learn long-range temporal context and motion cues for comprehensive few-shot matching. To demonstrate the effectiveness, we evaluate MoLo on five standard benchmarks, and the results show that MoLo favorably outperforms recent advanced methods. The source code is available at https://github.com/alibaba-mmai-research/MoLo.

翻訳日:2023-04-04 15:29:59 公開日:2023-04-03

# VCR修復の教訓: カリフォルニア州消費者プライバシ法(CCPA)によるAndroidアプリ開発者のコンプライアンス

Lessons in VCR Repair: Compliance of Android App Developers with the California Consumer Privacy Act (CCPA) ( http://arxiv.org/abs/2304.00944v1 )

ライセンス: Link先を確認

Nikita Samarin, Shayna Kothari, Zaina Siyed, Oscar Bjorkman, Reena Yuan, Primal Wijesekera, Noura Alomar, Jordan Fischer, Chris Hoofnagle and Serge Egelman

(参考訳) カリフォルニア州消費者プライバシ法(CCPA)は、カリフォルニア州住民に幅広いプライバシー保護と権利を付与している。当社は,androidアプリ開発者がccpaの規定に準拠している程度を調査し,消費者がビジネス目的や商業目的のために収集,利用,共有した個人情報を開示することにより,消費者に正確なプライバシー通知を提供し,"検証可能な消費者要求"(vcrs)に対応するように要求する。私たちは、CCPAに従わなければならない109のアプリケーションの実際のネットワークトラフィックと、アプリがプライバシポリシで収集したデータと、アプリ開発者に提出した"知る権利"要求に対する応答に含まれるデータを比較しました。当社の要求に即応した69人のアプリ開発者のうち、特定の個人情報(分類情報のみではなく)を提供したのは1人だけだった。しかし、識別子(55のアプリ、80%)、位置情報データ(21のアプリ、30%)、知覚データ(18のアプリ、26%)など、開示されていない情報のかなりの割合が収集された。我々は、アプリ開発者が"知る権利"要求やその他の関連する規則に従うのに役立つCCPAの改善について議論する。

The California Consumer Privacy Act (CCPA) provides California residents with a range of enhanced privacy protections and rights. Our research investigated the extent to which Android app developers comply with the provisions of the CCPA that require them to provide consumers with accurate privacy notices and respond to "verifiable consumer requests" (VCRs) by disclosing personal information that they have collected, used, or shared about consumers for a business or commercial purpose. We compared the actual network traffic of 109 apps that we believe must comply with the CCPA to the data that apps state they collect in their privacy policies and the data contained in responses to "right to know" requests that we submitted to the app's developers. Of the 69 app developers who substantively replied to our requests, all but one provided specific pieces of personal data (as opposed to only categorical information). However, a significant percentage of apps collected information that was not disclosed, including identifiers (55 apps, 80%), geolocation data (21 apps, 30%), and sensory data (18 apps, 26%) among other categories. We discuss improvements to the CCPA that could help app developers comply with "right to know" requests and other related regulations.

翻訳日:2023-04-04 15:29:36 公開日:2023-04-03

# 半局所結合型ポテンシャルエネルギー面を用いた機械反応研究のためのマルチスケールプロトコル

Multi-scale Protocol for Mechanistic Reaction Studies Using Semi-local Fitted Potential Energy Surfaces ( http://arxiv.org/abs/2304.00942v1 )

ライセンス: Link先を確認

Tomislav Piskor, Peter Pinski, Thilo Mast, Vladimir V. Rybkin

(参考訳) 本研究では,化学反応機構の日常的理論的研究のためのマルチスケールプロトコルを提案する。安価な電子構造法により駆動されるNudged-Elastic Band (NEB) 法を用いて, 本システムの初期反応経路をサンプリングした。経路上の一組の点に対するより正確な電子構造理論で再計算された力は、半局所反応性ポテンシャルエネルギー表面(PES)を生成するための機械学習技術(この場合、対称勾配領域機械学習またはsGDML)を装着し、反応体、生成物、遷移状態(TS)領域を受け入れる。このアプローチは単分子(エンジインのベルグマン環化)と双分子(S$_\text{N}$2置換)反応にうまく適用されている。特に, 正確な参照法(casscfとccsd)を用いた50～150のエネルギー-力評価では, 静止点ジオメトリ, 固有反応-配位, バリアに対して定性的合意を与える半局所的pesを構築することが可能である。さらに, 振動周波数と反応速度係数の定性的な一致を見出した。この手法の性能の重要な側面は、計算の労力を省くだけでなく、反応経路に沿って有意義な情報を抽出することを可能にするマルチスケールな性質である。 TSの性質や計算経済によらず、このプロトコルは容易に自動化され、機械的反応の研究に日常的に利用できる。

In this work, we propose a multi-scale protocol for routine theoretical studies of chemical reaction mechanisms. The initial reaction paths of our investigated systems are sampled using the Nudged-Elastic Band (NEB) method driven by a cheap electronic structure method. Forces recalculated at the more accurate electronic structure theory for a set of points on the path are fitted with a machine-learning technique (in our case symmetric gradient domain machine learning or sGDML) to produce a semi-local reactive Potential Energy Surface (PES), embracing reactants, products and transition state (TS) regions. This approach has been successfully applied to a unimolecular (Bergman cyclization of enediyne) and a bimolecular (S$_\text{N}$2 substitution) reaction. In particular, we demonstrate that with only 50 to 150 energy-force evaluations with the accurate reference methods (here CASSCF and CCSD) it is possible to construct a semi-local PES giving qualitative agreement for stationary-point geometries, intrinsic reaction-coordinates and barriers. Furthermore, we find a qualitative agreement in vibrational frequencies and reaction rate coefficients. The key aspect of the method's performance is its multi-scale nature, which not only saves computational effort but also allows extracting meaningful information along the reaction path, characterized by zero gradients in all but one direction. Agnostic to the nature of the TS and computationally economic, the protocol can be readily automated and routinely used for mechanistic reaction studies.

翻訳日:2023-04-04 15:29:13 公開日:2023-04-03

# 都市景観における共同2次元3次元マルチタスク学習:3次元検出,セグメンテーション,深さ推定

Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation ( http://arxiv.org/abs/2304.00971v1 )

ライセンス: Link先を確認

Hanrong Ye

(参考訳) 本報告は、Cityscapes-3Dに基づく新しい2D-3Dマルチタスク学習ベンチマークの実装を詳述したTaskPrompterの補足文書として機能する。 TaskPrompterが学習を統一する革新的なマルチタスクプロンプトフレームワークを発表 (i)タスクジェネリック表現 (ii)タスク固有の表現、及び (iii)これらの学習目的を異なるネットワークモジュールに分離する従来のアプローチとは対照的に,クロスタスクインタラクション。この統一されたアプローチは、巧妙な経験的構造設計の必要性を低減させるだけでなく、モデル全体の能力が3つの目的を同時に最適化することに集中するため、マルチタスクネットワークの表現学習能力を大幅に向上させる。 taskprompterはcityscapes-3dデータセットに基づく新しいマルチタスクベンチマークを導入している。これは、モノクロ3d車両検出、セマンティックセグメンテーション、モノクロ深度推定の予測を同時生成するマルチタスクモデルを必要とする。これらのタスクは、特に自律運転システムの開発において、視覚シーンの2D-3Dの共同理解を達成するために不可欠である。この難解なベンチマークでは,マルチタスクモデルは,単一タスクのステート・オブ・ザ・アート法と比較して強い性能を示し,挑戦的な3次元検出と深さ推定タスクにおいて新たな最先端結果を確立する。

This report serves as a supplementary document for TaskPrompter, detailing its implementation on a new joint 2D-3D multi-task learning benchmark based on Cityscapes-3D. TaskPrompter presents an innovative multi-task prompting framework that unifies the learning of (i) task-generic representations, (ii) task-specific representations, and (iii) cross-task interactions, as opposed to previous approaches that separate these learning objectives into different network modules. This unified approach not only reduces the need for meticulous empirical structure design but also significantly enhances the multi-task network's representation learning capability, as the entire model capacity is devoted to optimizing the three objectives simultaneously. TaskPrompter introduces a new multi-task benchmark based on Cityscapes-3D dataset, which requires the multi-task model to concurrently generate predictions for monocular 3D vehicle detection, semantic segmentation, and monocular depth estimation. These tasks are essential for achieving a joint 2D-3D understanding of visual scenes, particularly in the development of autonomous driving systems. On this challenging benchmark, our multi-task model demonstrates strong performance compared to single-task state-of-the-art methods and establishes new state-of-the-art results on the challenging 3D detection and depth estimation tasks.

翻訳日:2023-04-04 15:23:09 公開日:2023-04-03

# QSARのための等角予測手法の開発と評価

Development and Evaluation of Conformal Prediction Methods for QSAR ( http://arxiv.org/abs/2304.00970v1 )

ライセンス: Link先を確認

Yuting Xu, Andy Liaw, Robert P. Sheridan, Vladimir Svetnik

(参考訳) qsar回帰モデル(quantical structure-activity relationship)は、分子記述子を用いて化合物の生物活性を予測する手法である。 QSARモデルからの予測は、例えば分子構造を最適化し、化合物をさらなる実験的試験に優先し、毒性を推定するのに役立つ。活性の正確な推定に加えて、予測に関連する不確実性(例えば、特定の確率で真の分子活性を含む予測間隔(PI)を70%、90%、95%の確率で計算することなどが好ましい。課題は、予測性能の優れた機械学習(ML)アルゴリズムの多くは、予測の不確実性を推定するためにいくつかのアドオンメソッドを必要とすることである。これらのアルゴリズムの開発は統計およびMLコミュニティによる活発な研究領域であるが、QSARモデリングの実装は限定的である。共形予測(cp)は有望なアプローチである。予測アルゴリズムと無関係であり、データ分布の弱い仮定の下で有効な予測間隔を生成することができる。我々は,Deep Neural NetworksやGradient Boosting Machinesなど,最も高度なMLモデルに適した計算効率の高いCPアルゴリズムを提案する。提案する共形予測器の有効性と効率は,QSARデータセットの多種多様な収集とシミュレーション研究で実証された。

The quantitative structure-activity relationship (QSAR) regression model is a commonly used technique for predicting biological activities of compounds using their molecular descriptors. Predictions from QSAR models can help, for example, to optimize molecular structure; prioritize compounds for further experimental testing; and estimate their toxicity. In addition to the accurate estimation of the activity, it is highly desirable to obtain some estimate of the uncertainty associated with the prediction, e.g., calculate a prediction interval (PI) containing the true molecular activity with a pre-specified probability, say 70%, 90% or 95%. The challenge is that most machine learning (ML) algorithms that achieve superior predictive performance require some add-on methods for estimating uncertainty of their prediction. The development of these algorithms is an active area of research by statistical and ML communities but their implementation for QSAR modeling remains limited. Conformal prediction (CP) is a promising approach. It is agnostic to the prediction algorithm and can produce valid prediction intervals under some weak assumptions on the data distribution. We proposed computationally efficient CP algorithms tailored to the most advanced ML models, including Deep Neural Networks and Gradient Boosting Machines. The validity and efficiency of proposed conformal predictors are demonstrated on a diverse collection of QSAR datasets as well as simulation studies.

翻訳日:2023-04-04 15:22:44 公開日:2023-04-03

# いつももっといいの? 推薦者システムにおける説明の知覚に及ぼす個人的特徴と詳細度の影響

Is More Always Better? The Effects of Personal Characteristics and Level of Detail on the Perception of Explanations in a Recommender System ( http://arxiv.org/abs/2304.00969v1 )

ライセンス: Link先を確認

Mohamed Amine Chatti and Mouadh Guesmi and Laura Vorgerd and Thao Ngo and Shoeb Joarder and Qurat Ul Ain and Arham Muslim

(参考訳) 説明の認識はエンドユーザによって大きく異なる可能性があるが、説明可能なレコメンデーションシステム(rs)は伝統的に1サイズモデルに従っており、個々のユーザのコンテキスト、すなわち目標や個人の特性を考慮せずに、各ユーザに対して同じ詳細な説明レベルを提供する。この研究ギャップを埋めるため,本稿では,ユーザエージェンシーに,どの説明を見たいかを決めることによって,パーソナライズされたアプローチから,パーソナライズ可能なレコメンデーションへの転換を目指す。我々は,様々なタイプのエンドユーザの要求を満たすために,3段階の詳細な情報(基本,中間,高度)を,オンデマンドでパーソナライズしたレコメンデーションの説明を提供する透明なレコメンデーションと関心モデリングアプリケーション(RIMA)を開発した。対象内調査(n=31)を行い,利用者の個人的特性と詳細な説明レベルとの関係と,これら2つの変数が異なる説明目標に対する説明可能なrsの知覚に及ぼす影響について検討した。その結果,細部レベルの異なる説明可能なrsの認識は,説明目標とユーザタイプによって異なる程度に影響を受けることがわかった。そこで本稿では,ユーザのコンテキストに合わせて記述インタフェースの体系設計を支援するための理論的および設計ガイドラインを提案する。

Despite the acknowledgment that the perception of explanations may vary considerably between end-users, explainable recommender systems (RS) have traditionally followed a one-size-fits-all model, whereby the same explanation level of detail is provided to each user, without taking into consideration individual user's context, i.e., goals and personal characteristics. To fill this research gap, we aim in this paper at a shift from a one-size-fits-all to a personalized approach to explainable recommendation by giving users agency in deciding which explanation they would like to see. We developed a transparent Recommendation and Interest Modeling Application (RIMA) that provides on-demand personalized explanations of the recommendations, with three levels of detail (basic, intermediate, advanced) to meet the demands of different types of end-users. We conducted a within-subject study (N=31) to investigate the relationship between user's personal characteristics and the explanation level of detail, and the effects of these two variables on the perception of the explainable RS with regard to different explanation goals. Our results show that the perception of explainable RS with different levels of detail is affected to different degrees by the explanation goal and user type. Consequently, we suggested some theoretical and design guidelines to support the systematic design of explanatory interfaces in RS tailored to the user's context.

翻訳日:2023-04-04 15:22:25 公開日:2023-04-03

# 歴史的物体予測による多視点3次元物体検出器の時間的訓練

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction ( http://arxiv.org/abs/2304.00967v1 )

ライセンス: Link先を確認

Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu

(参考訳) 本稿では,時間的情報をより効果的に活用するための,多視点3D検出のための新しいパラダイムである履歴オブジェクト予測(HoP)を提案する。現在のタイムスタンプtを考えると、隣接するフレームからタイムスタンプt-kの擬似Bird's-Eye View(BEV)機能を生成し、この機能を使用してタイムスタンプt-kに設定されたオブジェクトを予測する。我々のアプローチは、歴史的タイムスタンプで発生する物体の空間的位置と時間的動きを検知するために検出器を強制することが、より正確なBEV特徴学習につながるという観察によって動機づけられている。まず,短期および長期の時間デコーダを精巧に設計し,対応するカメラ画像の関与なしにタイムスタンプt-kの擬似bev機能を生成する。第二に、生成された擬似BEV機能を用いて対象目標を予測するために、追加のオブジェクトデコーダを柔軟に取り付ける。トレーニング中にのみHoPを実行するので、提案手法は推論時に余分なオーバーヘッドを導入しない。プラグアンドプレイのアプローチとして、HoPはBEVFormerやBEVDetシリーズを含む最先端のBEV検出フレームワークに簡単に組み込める。さらに、補助的なHoPアプローチは、一般的な時間的モデリング手法と相補的であり、大幅な性能向上をもたらす。提案したHoPがnuScenesデータセットに与える影響を評価するために,大規模な実験を行った。 BEVFormerやBEVDet4D-Depthなど代表的手法を選択して評価する。驚いたことに、HoP は nuScenes テストで 68.5% の NDS と 62.4% の mAP を達成し、リーダーボード上の全ての3Dオブジェクト検出器を上回っている。コードはhttps://github.com/Sense-X/HoP.comから入手できる。

In this paper, we propose a new paradigm, named Historical Object Prediction (HoP) for multi-view 3D detection to leverage temporal information more effectively. The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our approach is motivated by the observation that enforcing the detector to capture both the spatial location and temporal motion of objects occurring at historical timestamps can lead to more accurate BEV feature learning. First, we elaborately design short-term and long-term temporal decoders, which can generate the pseudo BEV feature for timestamp t-k without the involvement of its corresponding camera images. Second, an additional object decoder is flexibly attached to predict the object targets using the generated pseudo BEV feature. Note that we only perform HoP during training, thus the proposed method does not introduce extra overheads during inference. As a plug-and-play approach, HoP can be easily incorporated into state-of-the-art BEV detection frameworks, including BEVFormer and BEVDet series. Furthermore, the auxiliary HoP approach is complementary to prevalent temporal modeling methods, leading to significant performance gains. Extensive experiments are conducted to evaluate the effectiveness of the proposed HoP on the nuScenes dataset. We choose the representative methods, including BEVFormer and BEVDet4D-Depth to evaluate our method. Surprisingly, HoP achieves 68.5% NDS and 62.4% mAP with ViT-L on nuScenes test, outperforming all the 3D object detectors on the leaderboard. Codes will be available at https://github.com/Sense-X/HoP.

翻訳日:2023-04-04 15:21:59 公開日:2023-04-03

# StyleGANとCLIPの潜在空間における方向を適応的に探索するロバストテキスト駆動画像編集法

Robust Text-driven Image Editing Method that Adaptively Explores Directions in Latent Spaces of StyleGAN and CLIP ( http://arxiv.org/abs/2304.00964v1 )

ライセンス: Link先を確認

Tsuyoshi Baba, Kosuke Nishida, Kyosuke Nishida

(参考訳) 自動画像編集には多くの応用があるため大きな需要があり、ユーザが想像するように柔軟で直感的な編集を実現するためには自然言語命令の使用が不可欠である。テキスト駆動画像編集における先駆的な作業であるStyleCLIPは、CLIP空間の編集方向を見つけ、その方向をStyleGAN空間にマッピングすることで画像を編集する。同時に、原画像以外の適切な入力と、画像編集のためのテキスト命令を調整することは困難である。本研究では,SVMを用いたStyleGANとCLIP空間における編集方向を適応的に構築する手法を提案する。本モデルは,SVMをトレーニングして正負の画像を分類したCLIP空間において,編集方向を正規ベクトルとして表現する。画像は、画像とテキスト命令のCLIP類似性に従って、StyleGANの事前トレーニングに使用された大規模な画像コーパスから検索される。提案方式はStyleCLIPベースラインと同様に動作し,計算時間を増やすことなく簡単な入力が可能であることを確認した。

Automatic image editing has great demands because of its numerous applications, and the use of natural language instructions is essential to achieving flexible and intuitive editing as the user imagines. A pioneering work in text-driven image editing, StyleCLIP, finds an edit direction in the CLIP space and then edits the image by mapping the direction to the StyleGAN space. At the same time, it is difficult to tune appropriate inputs other than the original image and text instructions for image editing. In this study, we propose a method to construct the edit direction adaptively in the StyleGAN and CLIP spaces with SVM. Our model represents the edit direction as a normal vector in the CLIP space obtained by training a SVM to classify positive and negative images. The images are retrieved from a large-scale image corpus, originally used for pre-training StyleGAN, according to the CLIP similarity between the images and the text instruction. We confirmed that our model performed as well as the StyleCLIP baseline, whereas it allows simple inputs without increasing the computational time.

翻訳日:2023-04-04 15:21:22 公開日:2023-04-03

# キャビティ光学におけるダークモード工学によるメカニカルクイズリングの制御可能生成

Controllable generation of mechanical quadrature squeezing via dark-mode engineering in cavity optomechanics ( http://arxiv.org/abs/2304.00963v1 )

ライセンス: Link先を確認

Jian Huang, Deng-Gao Lai, and Jie-Qiao Liao

(参考訳) 量子スクイージングは、量子精度測定や連続可変量子情報処理のような現代の量子技術において重要な資源である。メカニカルモードの圧縮状態の生成は、キャビティ光学において重要な課題である。近年のマルチモード光学への関心に触発され、マルチメカニカル共振器における二次スキューズ生成の興味深い話題となっている。しかし、多重縮退型メカニカルモード光学系では、ダークモード効果はメカニカルモードの量子効果を強く抑制する。本稿では, 合成ゲージ場法でダークモード効果を破り, メカニカルモード光学系におけるメカニカルスクイーズの発生について検討する。また, 機械モードが有限温度で作用すると, ダークモード効果により機械的なスクイーズが弱くなり, 消滅するのに対し, ダークモード効果が破られると強い機械的なスクイーズが発生することがわかった。特に、メカニカルスクイージングの熱-フォノン占有耐性は、ダークモード効果を壊さずに、それよりも約3桁大きい。また、この手法を一般化してダークモードを破り、マルチメカニカルモードの光学系で機械的スクイーズを生成する。本研究は, 一般の物理機構を記述し, ノイズ耐性量子リソース生成への道を開く。

Quantum squeezing is an important resource in modern quantum technologies, such as quantum precision measurement and continuous-variable quantum information processing. The generation of squeezed states of mechanical modes is a significant task in cavity optomechanics. Motivated by recent interest in multimode optomechanics, it becomes an interesting topic to create quadrature squeezing in multiple mechanical resonators. However, in the multiple-degenerate-mechanical-mode optomechanical systems, the dark-mode effect strongly suppresses the quantum effects in mechanical modes. Here we study the generation of mechanical squeezing in a two-mechanical-mode optomechanical system by breaking the dark-mode effect with the synthetic-gauge-field method. We find that when the mechanical modes work at a finite temperature, the mechanical squeezing is weak or even disappeared due to the dark-mode effect, while the strong mechanical squeezing can be generated once the dark-mode effect is broken. In particular, the thermal-phonon-occupation tolerance of the mechanical squeezing is approximately three orders of magnitude larger than that without breaking the dark-mode effect. We also generalize this method to break the dark modes and to create the mechanical squeezing in a multiple-mechanical-mode optomechanical system. Our results describe a general physical mechanism and pave the way towards the generation of noise-resistant quantum resources.

翻訳日:2023-04-04 15:21:02 公開日:2023-04-03

# regionplc: オープンワールド3dシーン理解のための局所的ポイント言語コントラスト学習

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding ( http://arxiv.org/abs/2304.00962v1 )

ライセンス: Link先を確認

Jihan Yang, Runyu Ding, Zhe Wang, Xiaojuan Qi

(参考訳) 既存の3Dシーン理解タスクは、クローズセットベンチマークで高いパフォーマンスを達成したが、現実のアプリケーションでは新しいカテゴリを処理できなかった。そこで本研究では,オープンボキャブラリー認識機能を備えたクローズドセットデータセット上で学習されたモデルを取り入れた,open-world 3dシーン理解のための地域的ポイント言語コントラスト学習フレームワークであるregionplcを提案する。本研究では,2次元基礎モデルから地域レベルの視覚言語知識をキャプションを通して引き出すための密集した視覚プロンプトを提案する。次に,シーン理解のためのキャプションから無意味なポイント独立学習を可能にするために,ポイント識別型コントラスト学習目標を設計する。 ScanNet, ScanNet200, nuScenesデータセットについて広範な実験を行った。我々のRereaPLCは,従来の3次元オープンワールドシーン理解手法を,セマンティックスとインスタンスセグメンテーションで平均11.6\%,平均6.6\%で大幅に上回っている。また、トレーニングと推論のコストが低い人間のアノテーションがない場合に、オープンワールドが有望な結果を示す。コードはリリースされる。

Existing 3D scene understanding tasks have achieved high performance on close-set benchmarks but fail to handle novel categories in real-world applications. To this end, we propose a Regional Point-Language Contrastive learning framework, namely RegionPLC, for open-world 3D scene understanding, which equips models trained on closed-set datasets with open-vocabulary recognition capabilities. We propose dense visual prompts to elicit region-level visual-language knowledge from 2D foundation models via captioning, which further allows us to build dense regional point-language associations. Then, we design a point-discriminative contrastive learning objective to enable point-independent learning from captions for dense scene understanding. We conduct extensive experiments on ScanNet, ScanNet200, and nuScenes datasets. Our RegionPLC significantly outperforms previous base-annotated 3D open-world scene understanding approaches by an average of 11.6\% and 6.6\% for semantic and instance segmentation, respectively. It also shows promising open-world results in absence of any human annotation with low training and inference costs. Code will be released.

翻訳日:2023-04-04 15:20:39 公開日:2023-04-03

# セルフオーダーポイント雲

Self-Ordering Point Clouds ( http://arxiv.org/abs/2304.00961v1 )

ライセンス: Link先を確認

Pengwan Yang, Yuki M. Asano, Cees G. M. Snoek

(参考訳) 本稿では,3次元点群内の点の代表的な部分集合を点順順序で見つけるタスクについて述べる。ポイントとクラウドのラベルを取得するのが難しいため、この困難なビジョン問題に対処する試みはごくわずかである。これらの作業とは違って,我々はセルフスーパービジョン(self-supervision)と呼ばれる3dポイントクラウドにおけるポイントワイズオーダリングのタスクを導入する。さらに、自己教師型でポイントワイズを学習する最初のエンドツーエンドのトレーニング可能なネットワークにも貢献する。新たな微分可能な点採点ソート戦略を採用し、階層的なコントラストスキームを構築して自己スーパービジョン信号を得る。複数のデータセットやタスクの教師付き順序付け手法と比較しても,この手法を広範囲に拡張し,スケーラビリティと優れた性能を示す。

In this paper we address the task of finding representative subsets of points in a 3D point cloud by means of a point-wise ordering. Only a few works have tried to address this challenging vision problem, all with the help of hard to obtain point and cloud labels. Different from these works, we introduce the task of point-wise ordering in 3D point clouds through self-supervision, which we call self-ordering. We further contribute the first end-to-end trainable network that learns a point-wise ordering in a self-supervised fashion. It utilizes a novel differentiable point scoring-sorting strategy and it constructs an hierarchical contrastive scheme to obtain self-supervision signals. We extensively ablate the method and show its scalability and superior performance even compared to supervised ordering methods on multiple datasets and tasks including zero-shot ordering of point clouds from unseen categories.

翻訳日:2023-04-04 15:20:21 公開日:2023-04-03

# DrBERT : フランスの医学・臨床領域におけるロバスト事前訓練モデル

DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains ( http://arxiv.org/abs/2304.00958v1 )

ライセンス: Link先を確認

Yanis Labrak and Adrien Bazoge and Richard Dufour and Mickael Rouvier and Emmanuel Morin and B\'eatrice Daille and Pierre-Antoine Gourraud

(参考訳) 近年,学習済み言語モデル (PLM) は,幅広い自然言語処理(NLP)タスクにおいて最高の性能を達成している。最初のモデルは一般的なドメインデータに基づいてトレーニングされたが、特定のドメインをより効果的に扱うために特別なモデルが登場した。本稿では,医学領域におけるフランス語のPLMに関する独自の研究を提案する。私たちは初めて、webからの公開データと医療機関のプライベートデータの両方で訓練されたplmのパフォーマンスを比較しました。また, 生物医学的課題の組において, 異なる学習戦略を評価する。特に,既存のバイオメディカルPLMを外国語で活用し,対象とするデータに基づいて事前学習を行うことが可能であることを示す。最後に、DrBERTと呼ばれるフランスのバイオメディカル分野のためのPLMと、これらのモデルがトレーニングされているフリーライセンス下の医療データの最大コーパスをリリースする。

In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains. In this paper, we propose an original study of PLMs in the medical domain on French language. We compare, for the first time, the performance of PLMs trained on both public data from the web and private data from healthcare establishments. We also evaluate different learning strategies on a set of biomedical tasks. In particular, we show that we can take advantage of already existing biomedical PLMs in a foreign language by further pre-train it on our targeted data. Finally, we release the first specialized PLMs for the biomedical field in French, called DrBERT, as well as the largest corpus of medical data under free license on which these models are trained.

翻訳日:2023-04-04 15:20:06 公開日:2023-04-03

# セキュアなiotベースのデバイス監視サービスのためのフェデレーションカルマンフィルタ

Federated Kalman Filter for Secure IoT-based Device Monitoring Services ( http://arxiv.org/abs/2304.00991v1 )

ライセンス: Link先を確認

Marc Jayson Baucas and Petros Spachos

(参考訳) デバイス監視サービスは、最近の技術の進化と、継続的に増加するモノのインターネット(IoT)デバイスによって人気が高まっている。人気のサービスには、デバイス位置情報を使用するサービスがある。しかし、これらのサービスはデータ収集と送信の性質上、プライバシーの問題にぶつかる。本研究では,フェデレートカルマンフィルタ(FKF)とフェデレーションラーニングアプローチとプライバシ保護のためのプライベートブロックチェーン技術を組み合わせたプラットフォームを導入する。標準カルマンフィルタ (kf) による受信信号強度指標 (rssi) に基づく位置推定手法に対する提案設計の精度について検討した。実験結果から、デバイス監視におけるRSSIに基づくローカライゼーションのためのデータ推定の改善の可能性が示された。

Device monitoring services have increased in popularity with the evolution of recent technology and the continuously increased number of Internet of Things (IoT) devices. Among the popular services are the ones that use device location information. However, these services run into privacy issues due to the nature of data collection and transmission. In this work, we introduce a platform incorporating Federated Kalman Filter (FKF) with a federated learning approach and private blockchain technology for privacy preservation. We analyze the accuracy of the proposed design against a standard Kalman Filter (KF) implementation of localization based on the Received Signal Strength Indicator (RSSI). The experimental results reveal significant potential for improved data estimation for RSSI-based localization in device monitoring.

翻訳日:2023-04-04 15:12:53 公開日:2023-04-03

# 反復的洗練と統計的結果検証によるループ内深層学習モデルの効率的な訓練

Efficient human-in-loop deep learning model training with iterative refinement and statistical result validation ( http://arxiv.org/abs/2304.00990v1 )

ライセンス: Link先を確認

Manuel Zahn, Douglas P. Perrin

(参考訳) 画像の注釈とラベル付けは、深層学習を医療データに適用する際の最大の課題である。現在のプロセスは時間とコストがかかるため、テクノロジを広く採用する上での制限要因となっている。さらに、最良のモデルを選択するためには、測定されたパフォーマンス改善が重要であることを検証することが重要です。本稿では,超音波イメージング機械学習パイプラインのためのデータクリーニングの必要な部分であるセグメンテーションを作成する方法を示す。本研究では,自動生成したトレーニングデータと高速人間の視覚チェックを活用し,時間/感情とコストを低く保ちながらモデルの精度を向上させる4段階の手法を提案する。また,統計解析の活用のために,複数回実施実験を行った。粗悪な品質の地上真実データと迅速な視覚検査は、より高価な人為的な地上真実データを用いて改良された初期ベースモデルを効率的に訓練する。本手法は、静的PHIを含む背景データを除去し、心臓超音波セグメンテーションタスクで実演する。実験を複数回行い、生徒のt-testをパフォーマンス分布で使用することで、意義が示される。 92%の単純なしきい値アルゴリズムの初期セグメンテーション精度を98%に改善した。複雑なアルゴリズムでトレーニングされたモデルの性能は、より貧弱な実行アルゴリズムと少量の高品質なデータとの事前トレーニングによって一致または打ち負かすことができる。ディープラーニングモデルに対する統計学的意義分析の導入は、測定された性能改善の検証に役立つ。この方法は、高品質なトレーニングデータを取得するコストと労力を最小にしつつ、精度の高いモデルを達成するためのコスト効率と迅速なアプローチを提供する。

Annotation and labeling of images are some of the biggest challenges in applying deep learning to medical data. Current processes are time and cost-intensive and, therefore, a limiting factor for the wide adoption of the technology. Additionally validating that measured performance improvements are significant is important to select the best model. In this paper, we demonstrate a method for creating segmentations, a necessary part of a data cleaning for ultrasound imaging machine learning pipelines. We propose a four-step method to leverage automatically generated training data and fast human visual checks to improve model accuracy while keeping the time/effort and cost low. We also showcase running experiments multiple times to allow the usage of statistical analysis. Poor quality automated ground truth data and quick visual inspections efficiently train an initial base model, which is refined using a small set of more expensive human-generated ground truth data. The method is demonstrated on a cardiac ultrasound segmentation task, removing background data, including static PHI. Significance is shown by running the experiments multiple times and using the student's t-test on the performance distributions. The initial segmentation accuracy of a simple thresholding algorithm of 92% was improved to 98%. The performance of models trained on complicated algorithms can be matched or beaten by pre-training with the poorer performing algorithms and a small quantity of high-quality data. The introduction of statistic significance analysis for deep learning models helps to validate the performance improvements measured. The method offers a cost-effective and fast approach to achieving high-accuracy models while minimizing the cost and effort of acquiring high-quality training data.

翻訳日:2023-04-04 15:12:42 公開日:2023-04-03

# ホーキング効果はシュワルツシルト時空における量子テレポーテーションの忠実度を常に低下させるか?

Does Hawking effect always degrade fidelity of quantum teleportation in Schwarzschild spacetime? ( http://arxiv.org/abs/2304.00984v1 )

ライセンス: Link先を確認

Xiao-Wei Fan, Hao-Yu Wu, Rui-Di Wang, Xiao-Li Huang, Hao-Sheng Zeng, Shu-Min Wu

(参考訳) 以前の研究では、ホーキング効果がシュワルツシルトブラックホールにおける量子相関と量子テレポーテーションの忠実性を破壊することが示されている。本稿では,シュワルツシルト時空におけるユーザ間のディラック場の量子テレポーテーションの忠実性について検討する。ホーキング温度の上昇に伴い、量子テレポーテーションの忠実度は、初期状態の選択に応じて単調に増加し、単調に減少し、あるいは非単調に増加し、つまりホーキング効果が量子テレポーテーションの純忠実度を生じさせる。この顕著な結果は、ブラックホールのホーキング効果が量子テレポーテーションの忠実さを損なうことができるという広範な信念を覆す。また、量子ステアリングはシュワルツシルト時空における量子テレポーテーションの完全性を保証することはできない。この新しい予期せぬ情報源は、ホーキング効果の実験的な証拠に新しいアイデアをもたらすかもしれない。

Previous studies have shown that the Hawking effect always destroys quantum correlations and the fidelity of quantum teleportation in the Schwarzschild black hole. Here, we investigate the fidelity of quantum teleportation of Dirac fields between users in Schwarzschild spacetime. We find that, with the increase of the Hawking temperature, the fidelity of quantum teleportation can monotonically increases, monotonically decreases, or non-monotonically increases, depending on the choice of the initial state, which means that the Hawking effect can create net fidelity of quantum teleportation. This striking result banishes the extended belief that the Hawking effect of the black hole can only destroy the fidelity of quantum teleportation. We also find that quantum steering cannot fully guarantee the fidelity of quantum teleportation in Schwarzschild spacetime. This new unexpected source may provide a new idea for the experimental evidence of the Hawking effect.

翻訳日:2023-04-04 15:11:46 公開日:2023-04-03

# 二層グラフェン量子ドットにおける長寿命バレー状態

Long-lived valley states in bilayer graphene quantum dots ( http://arxiv.org/abs/2304.00980v1 )

ライセンス: Link先を確認

Rebekka Garreis and Chuyao Tong and Jocelyn Terle and Max Josef Ruckriegel and Jonas Daniel Gerber and Lisa Maria G\"achter and Kenji Watanabe and Takashi Taniguchi and Thomas Ihn and Klaus Ensslin and Wei Wister Huang

(参考訳) 二層グラフェン(blg)は、2次元系における電気制御可能な量子ビットのための有望なホスト材料として出現している。特に興味深いのは、量子情報をスピン量子ビットとしてだけでなく、六角形ブラベイ格子の対称性から生じる2次元の軌道縮退、いわゆるバレー自由度でエンコードする能力である。既知のスピン混合と軌道混合のメカニズムは、バレーで機能する可能性は低い。さらに、BLGにおけるバレー州のトポロジカルな性質は、コヒーレント量子ビット操作のためのユニークな経路を約束する。ゲート定義のBLG量子ドットデバイスは、近年、高品質なスピンとバレー量子状態にアクセスするための汎用的なビルディングブロックとして確立されている。しかし、これらの量子ビットのコヒーレンス特性を最終的に制限し、実用的な量子ビットとしての適合性を制限したバレー状態の緩和時間の測定は、これまでにも残っていない。ここでは、ゲート定義二重量子ドットを含む高品質なBLGデバイスで得られたスピンおよびバレー状態の特性緩和時間を初めて測定する。谷の状態は99\,\text{%}$よりはるかに高い忠実度で区別でき、また、谷三重項と一重項の間の非常に長い緩和時間(500\,\text{ms}$=B=250\,\text{mT}$)はスピン状態の緩和時間よりも1桁長くなる。バレー状態の孤立化に対する我々のアプローチは、コヒーレント・バレー・クビット振動の測定方法を舗装し、最近実証された谷の電気的チューニング性と組み合わせることで、BLGの長寿命バレー・クビットの電気的制御のための実用的なプラットフォームを提供することができる。

Bilayer graphene (BLG) is emerging as a promising host material for electrically controllable qubits in a two-dimensional system. Of particular interest is the ability to encode quantum information not only as spin qubits but also in the so-called valley degree of freedom, a two-fold orbital degeneracy that arises from the symmetry of the hexagonal Bravais lattice. Known spin-mixing and orbital-mixing mechanisms are unlikely to be at work for valleys. Moreover, the topological nature of valley states in BLG promises unique routes for coherent qubit manipulation. Gate-defined BLG quantum-dot devices have been recently established as a versatile building block for accessing high-quality spin and valley quantum states. However, measurements of the relaxation time of valley states -- which ultimately limits these qubits' coherence properties and therefore their suitability as practical qubits -- remained so far elusive. Here we report the first measurement of the characteristic relaxation times of spin and valley states, obtained in a high-quality BLG device containing gate-defined double quantum dots. We show that valley states can be distinguished with a fidelity of well above $99\,\text{%}$ and report remarkably long relaxation time between valley triplets and singlets, exceeding $500\,\text{ms}$ at $B=250\,\text{mT}$, more than one order of magnitude longer than the relaxation times we measure for spin states. Our approach to isolating valley states paves the way to measuring coherent valley-qubit oscillations and, in combination with the recently demonstrated electrical tunability of the valley $g$-factor, could provide a practical platform for the electrical control of long-lived valley qubits in BLG.

翻訳日:2023-04-04 15:11:27 公開日:2023-04-03

# ワイルドデータベースにおける潜時フィンガープリント

A Latent Fingerprint in the Wild Database ( http://arxiv.org/abs/2304.00979v1 )

ライセンス: Link先を確認

Xinwei Liu, Kiran Raja, Renfang Wang, Hong Qiu, Hucheng Wu, Dechao Sun, Qiguang Zheng, Nian Liu, Xiaoxia Wang, Gehang Huang, Raghavendra Ramachandra, Christoph Busch

(参考訳) 潜在指紋は、犯罪現場、デジタル法医学、法執行機関において、最も重要かつ広く利用されている証拠の一つである。最近の研究で報告された進歩の数にもかかわらず、独立ベンチマークやアルゴリズムを改善するための大規模評価データベースの欠如といった重大なオープン問題が不十分に解決されていることに注意する。利用可能なデータベースの大部分は、セミパブリックな性質、野生環境での買収の欠如、後処理パイプラインである。さらに、アルゴリズムの堅牢性を評価するために、実際の犯罪シーンと同様の現実的なキャプチャシナリオを表現していない。さらに、既存の潜在指紋認識用データベースは、多数のユニークなサブジェクト/指紋インスタンスを持っておらず、また、潜在指紋に対するクロス比較を行うための根拠となる真実/参照指紋画像を提供していない。本稿では,(1)光学および(2)静電容量センサからの参照指紋,(3)スマートフォンの指紋,(4)壁面からの潜在指紋,(5)ipad表面,(6)アルミニウムホイル表面からの参照指紋,という5つの異なる取得シナリオを含む,新たな野生の潜在指紋データベースを提案する。新しいデータベースは、上記のすべての設定でキャプチャされた1,318のユニークな指紋インスタンスで構成されている。この研究では、光学式および容量型センサーの2,636個の指紋、スマートフォンの1,318個の指紋、および132人の被験者の9,224個の潜伏指紋が提供された。データセットは、さまざまな年齢グループ、性別と背景の等しい表現を考慮して構築される。さらに,潜伏指紋認識研究における今後の方向性の課題を明らかにするために,様々なサブセット評価を幅広く分析する。

Latent fingerprints are among the most important and widely used evidence in crime scenes, digital forensics and law enforcement worldwide. Despite the number of advancements reported in recent works, we note that significant open issues such as independent benchmarking and lack of large-scale evaluation databases for improving the algorithms are inadequately addressed. The available databases are mostly of semi-public nature, lack of acquisition in the wild environment, and post-processing pipelines. Moreover, they do not represent a realistic capture scenario similar to real crime scenes, to benchmark the robustness of the algorithms. Further, existing databases for latent fingerprint recognition do not have a large number of unique subjects/fingerprint instances or do not provide ground truth/reference fingerprint images to conduct a cross-comparison against the latent. In this paper, we introduce a new wild large-scale latent fingerprint database that includes five different acquisition scenarios: reference fingerprints from (1) optical and (2) capacitive sensors, (3) smartphone fingerprints, latent fingerprints captured from (4) wall surface, (5) Ipad surface, and (6) aluminium foil surface. The new database consists of 1,318 unique fingerprint instances captured in all above mentioned settings. A total of 2,636 reference fingerprints from optical and capacitive sensors, 1,318 fingerphotos from smartphones, and 9,224 latent fingerprints from each of the 132 subjects were provided in this work. The dataset is constructed considering various age groups, equal representations of genders and backgrounds. In addition, we provide an extensive set of analysis of various subset evaluations to highlight open challenges for future directions in latent fingerprint recognition research.

翻訳日:2023-04-04 15:10:54 公開日:2023-04-03

# 完全配向量子センサを用いた超伝導渦の広視野定量磁気イメージング

Wide-field quantitative magnetic imaging of superconducting vortices using perfectly aligned quantum sensors ( http://arxiv.org/abs/2304.01024v1 )

ライセンス: Link先を確認

Shunsuke Nishimura, Taku Kobayashi, Daichi Sasaki, Takeyuki Tsuji, Takayuki Iwasaki, Mutsuko Hatano, Kento Sasaki, and Kensuke Kobayashi

(参考訳) 超伝導渦の可視化に様々な技術が応用され、電磁応答の手がかりとなっている。ここでは, 完全に整列したダイヤモンド量子センサを用いて, 超伝導薄膜中の渦の成層場を広範囲に定量的に可視化する。センサの不均一性の影響を軽減する解析により,yba$_2$cu$_3$o$_{7-\delta}$における単一渦の磁束を,精度$\pm10~\%$で可視化する。得られた渦形状は理論モデルと一致し, 浸透深さと温度依存性は従来の研究と一致し, 精度と広い適用性が証明された。この広視野イメージングは、原理的には極端条件下でも機能し、様々な超伝導体のキャラクタリゼーションを可能にする。

Various techniques have been applied to visualize superconducting vortices, providing clues to their electromagnetic response. Here, we present a wide-field, quantitative imaging of the stray field of the vortices in a superconducting thin film using perfectly aligned diamond quantum sensors. Our analysis, which mitigates the influence of the sensor inhomogeneities, visualizes the magnetic flux of single vortices in YBa$_2$Cu$_3$O$_{7-\delta}$ with an accuracy of $\pm10~\%$. The obtained vortex shape is consistent with the theoretical model, and penetration depth and its temperature dependence agree with previous studies, proving our technique's accuracy and broad applicability. This wide-field imaging, which in principle works even under extreme conditions, allows the characterization of various superconductors.

翻訳日:2023-04-04 15:04:30 公開日:2023-04-03

# ニューラルネットワーク探索のための自己教師付き学習

Self-Supervised learning for Neural Architecture Search (NAS) ( http://arxiv.org/abs/2304.01023v1 )

ライセンス: Link先を確認

Samuel Ducros

(参考訳) このインターンシップの目的は、不正なデータ、すなわちAIが自動的に正しい結果を予測することができるデータを使用する革新的な方法を提案することである。この段階にたどり着くためには,(1) 技術の状況を調べ,それに対して自分自身を配置すること,(2) 開発経路のアイデアを思いついたこと,(3) それらのアイデアを実践すること,(4) , そして最後に, 技術の状況に対して私たち自身を配置すること, そして再びシーケンスを開始すること,といった手順を踏襲する。インターンシップの間、このシーケンスは何度か行われ、インターンシップ中に探索されたトラックを提供する。

The objective of this internship is to propose an innovative method that uses unlabelled data, i.e. data that will allow the AI to automatically learn to predict the correct outcome. To reach this stage, the steps to be followed can be defined as follows: (1) consult the state of the art and position ourself against it, (2) come up with ideas for development paths, (3) implement these ideas, (4) and finally test them to position ourself against the state of the art, and then start the sequence again. During my internship, this sequence was done several times and therefore gives the tracks explored during the internship.

翻訳日:2023-04-04 15:04:15 公開日:2023-04-03

# 言語間情報検索のためのシンプルで効果的なニューラルランク付けとリランクベースライン

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval ( http://arxiv.org/abs/2304.01019v1 )

ライセンス: Link先を確認

Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang

(参考訳) 多言語言語モデルの出現により、言語間情報検索(CLIR)への関心が復活した。しかし、急速な進歩は、手法の混乱を招き、再現性は芸術の状況に遅れを取っている。第一に,単言語検索を足場として,多段階アーキテクチャを用いた言語横断検索の異なるアプローチを組織するための概念的枠組みを提供する。第二に、ペルシア、ロシア、中国のTREC 2022 NeuCLIRトラックから収集したテストコレクションに対して、Anserini IRツールキットとPyserini IRツールキットに単純かつ効果的に再現可能なベースラインを実装した。私たちの取り組みは、TREC評価に最も効果的な実行を提出した2つのチームのコラボレーションに基づいています。これらの貢献は将来の進歩の確固たる基盤を提供する。

The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another. However, the rapid pace of progress has led to a confusing panoply of methods and reproducibility has lagged behind the state of the art. In this context, our work makes two important contributions: First, we provide a conceptual framework for organizing different approaches to cross-lingual retrieval using multi-stage architectures for mono-lingual retrieval as a scaffold. Second, we implement simple yet effective reproducible baselines in the Anserini and Pyserini IR toolkits for test collections from the TREC 2022 NeuCLIR Track, in Persian, Russian, and Chinese. Our efforts are built on a collaboration of the two teams that submitted the most effective runs to the TREC evaluation. These contributions provide a firm foundation for future advances.

翻訳日:2023-04-04 15:04:01 公開日:2023-04-03

# ディープフェイクテキストの検出における個人およびチームに基づくヒューマンファクターの理解

Understanding Individual and Team-based Human Factors in Detecting Deepfake Texts ( http://arxiv.org/abs/2304.01002v1 )

ライセンス: Link先を確認

Adaku Uchendu, Jooyoung Lee, Hua Shen, Thai Le, Ting-Hao 'Kenneth' Huang, Dongwon Lee

(参考訳) 近年、AIにおける自然言語生成(NLG)技術(T5、GPT-3、ChatGPT)は大幅に改善され、人間のような長いコヒーレントテキストを大規模に生成できるようになり、いわゆるディープフェイクテキストを生み出している。この進歩は、その利益にもかかわらず、セキュリティとプライバシの問題(例えば、盗作、アイデンティティの難読化、偽情報攻撃)を引き起こす可能性がある。そのため、人文テキストとディープフェイクテキストを区別するために、効果的で実用的でスケーラブルなソリューションを開発することが重要になっている。この課題に向けて、本研究では、人間がディープフェイクテキストを識別する方法に、スキルレベルやコラボレーションなどの要因がどう影響するかを調査し、(1)協調チームが個人よりもディープフェイクテキストをよりよく検出できるか、という3つの研究課題を研究する。 2) 専門家は非専門家よりもディープフェイクテキストを検出できるのか? (3)人間の検出性能を最大化する要因は何か。我々は,(1) amazon mechanical turk (amt) 上の非専門家の人間または非同期のチーム,(2)専門家の人間または同期のチーム,という2つのプラットフォーム上でこれらの質問を実装した。 By analyzing the detection performance and the factors that affected performance, some of our key findings are: (1) expert humans detect deepfake texts significantly better than non-expert humans, (2) synchronous teams on the Upwork detect deepfake texts significantly better than individuals, while asynchronous teams on the AMT detect deepfake texts weakly better than individuals, and (3) among various error categories, examining coherence and consistency in texts is useful in detecting deepfake texts. 結論として,我々の研究は,ディープフェイクテキストの協調的人間検出を改善するための,今後のツールやフレームワークの設計に影響を及ぼす可能性がある。

In recent years, Natural Language Generation (NLG) techniques in AI (e.g., T5, GPT-3, ChatGPT) have shown a massive improvement and are now capable of generating human-like long coherent texts at scale, yielding so-called deepfake texts. This advancement, despite their benefits, can also cause security and privacy issues (e.g., plagiarism, identity obfuscation, disinformation attack). As such, it has become critically important to develop effective, practical, and scalable solutions to differentiate deepfake texts from human-written texts. Toward this challenge, in this work, we investigate how factors such as skill levels and collaborations impact how humans identify deepfake texts, studying three research questions: (1) do collaborative teams detect deepfake texts better than individuals? (2) do expert humans detect deepfake texts better than non-expert humans? (3) what are the factors that maximize the detection performance of humans? We implement these questions on two platforms: (1) non-expert humans or asynchronous teams on Amazon Mechanical Turk (AMT) and (2) expert humans or synchronous teams on the Upwork. By analyzing the detection performance and the factors that affected performance, some of our key findings are: (1) expert humans detect deepfake texts significantly better than non-expert humans, (2) synchronous teams on the Upwork detect deepfake texts significantly better than individuals, while asynchronous teams on the AMT detect deepfake texts weakly better than individuals, and (3) among various error categories, examining coherence and consistency in texts is useful in detecting deepfake texts. In conclusion, our work could inform the design of future tools/framework to improve collaborative human detection of deepfake texts.

翻訳日:2023-04-04 15:01:48 公開日:2023-04-03

# VoxelFormer:多視点3Dオブジェクト検出のためのデュアルビューアテンションに基づく鳥の視点特徴生成

VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection ( http://arxiv.org/abs/2304.01054v1 )

ライセンス: Link先を確認

Zhuoling Li, Chuanrui Zhang, Wei-Chiu Ma, Yipin Zhou, Linyan Huang, Haoqian Wang, SerNam Lim, Hengshuang Zhao

(参考訳) 近年,変圧器を用いた検出器は2次元視覚知覚タスクにおいて顕著な性能を示した。しかし、多視点3Dオブジェクト検出におけるそれらの性能は、畳み込みニューラルネットワークに基づく検出器の最先端(SOTA)よりも劣っている。本研究では,バードアイビュー(BEV)機能生成の観点から,この問題を考察する。具体的には,変換器をベースとしたSOTA,BEVFormerが採用するBEV特徴生成手法について検討し,その2つの限界を同定する。 (i)bevからのみ注意重みを発生させるため、監視のためのライダーポイントの使用を妨げ、 (II)デフォルマブルサンプリングによりカメラビュー機能をBEVに集約し、少数の機能のみを選択し、すべての情報を利用することができない。これらの制約を克服するため、BEVとカメラの両方から注目重みを生成する新しいBEV特徴生成手法、デュアルビューアテンションを提案する。この方法は、すべてのカメラ機能をBEV機能にエンコードする。デュアルビューとBEVFormerアーキテクチャを組み合わせることで、VoxelFormerという新しい検出器を構築する。 nuScenesベンチマークで大規模な実験を行い、デュアルビューアテンションとVoxelForerの優位性を検証する。トレーニング中に3エンコーダと1つの歴史的なフレームを採用するだけで、VoxelFormerは依然としてBEVFormerよりも大幅に優れています。同じ環境でのトレーニングでは、VoxelFormerはBEVFormerを4.9% NDSポイント上回ることができる。コードはhttps://github.com/lizhuoling/voxelformer-public.gitで入手できる。

In recent years, transformer-based detectors have demonstrated remarkable performance in 2D visual perception tasks. However, their performance in multi-view 3D object detection remains inferior to the state-of-the-art (SOTA) of convolutional neural network based detectors. In this work, we investigate this issue from the perspective of bird's-eye-view (BEV) feature generation. Specifically, we examine the BEV feature generation method employed by the transformer-based SOTA, BEVFormer, and identify its two limitations: (i) it only generates attention weights from BEV, which precludes the use of lidar points for supervision, and (ii) it aggregates camera view features to the BEV through deformable sampling, which only selects a small subset of features and fails to exploit all information. To overcome these limitations, we propose a novel BEV feature generation method, dual-view attention, which generates attention weights from both the BEV and camera view. This method encodes all camera features into the BEV feature. By combining dual-view attention with the BEVFormer architecture, we build a new detector named VoxelFormer. Extensive experiments are conducted on the nuScenes benchmark to verify the superiority of dual-view attention and VoxelForer. We observe that even only adopting 3 encoders and 1 historical frame during training, VoxelFormer still outperforms BEVFormer significantly. When trained in the same setting, VoxelFormer can surpass BEVFormer by 4.9% NDS point. Code is available at: https://github.com/Lizhuoling/VoxelFormer-public.git.

翻訳日:2023-04-04 14:54:19 公開日:2023-04-03

# ViT-DAE: 組織像解析のためのトランスフォーマー駆動拡散オートエンコーダ

ViT-DAE: Transformer-driven Diffusion Autoencoder for Histopathology Image Analysis ( http://arxiv.org/abs/2304.01053v1 )

ライセンス: Link先を確認

Xuan Xu, Saarthak Kapse, Rajarsi Gupta, Prateek Prasanna

(参考訳) 生成aiは、元のデータソースによく似たデータを合成する能力によって、近年かなりの注目を集めている。 generative adversarial networks (gans) は病理組織学的画像解析に革新的なアプローチを提供してきたが、モード崩壊や判別器の過剰フィットといった限界に苦しめられている。近年,ノイズ拡散モデルがコンピュータビジョンにおいて有望な結果を示している。これらのモデルはトレーニング中に優れた安定性を示し、分散カバレッジが向上し、高品質な多様な画像を生成する。さらに、ノイズや摂動に対する高い弾力性を示しており、画像は一般的に人工物を含み、染色のかなりのバリエーションを示すデジタル病理学での使用に適している。本稿では,視覚変換器(ViT)と拡散オートエンコーダを統合し,高品質な病理画像合成を行う新しいアプローチであるViT-DAEを提案する。 vitが計算病理学の拡散オートエンコーダに導入されたのはこれが初めてであり、このモデルが組織病理画像の複雑で複雑な詳細をよりよく捉えることができる。公開されている3つのデータセットに対するViT-DAEの有効性を示す。提案手法は, 実写画像生成におけるGAN法とバニラDAE法より優れている。

Generative AI has received substantial attention in recent years due to its ability to synthesize data that closely resembles the original data source. While Generative Adversarial Networks (GANs) have provided innovative approaches for histopathological image analysis, they suffer from limitations such as mode collapse and overfitting in discriminator. Recently, Denoising Diffusion models have demonstrated promising results in computer vision. These models exhibit superior stability during training, better distribution coverage, and produce high-quality diverse images. Additionally, they display a high degree of resilience to noise and perturbations, making them well-suited for use in digital pathology, where images commonly contain artifacts and exhibit significant variations in staining. In this paper, we present a novel approach, namely ViT-DAE, which integrates vision transformers (ViT) and diffusion autoencoders for high-quality histopathology image synthesis. This marks the first time that ViT has been introduced to diffusion autoencoders in computational pathology, allowing the model to better capture the complex and intricate details of histopathology images. We demonstrate the effectiveness of ViT-DAE on three publicly available datasets. Our approach outperforms recent GAN-based and vanilla DAE methods in generating realistic images.

翻訳日:2023-04-04 14:53:57 公開日:2023-04-03

# 光ボックス電位における準1次元ボースガスの最適制御

Optimal control of quasi-1D Bose gases in optical box potentials ( http://arxiv.org/abs/2304.01051v1 )

ライセンス: Link先を確認

Andreas Deutschmann-Olek and Katharina Schrom and Nikolaus W\"urkner and J\"org Schmiedmayer and Sebastian Erne and Andreas Kugi

(参考訳) 本稿では, 最適制御手法を用いて, 高伸長ポテンシャルに閉じ込められた擬似-1次元ボースガスの操作について検討する。気体の有効な平均場ダイナミクスは、1次元の非多項schr\"odinger方程式によって説明できる。我々は Winckel と Borzi (2008) による Gross-Pitaevskii 方程式の間接最適制御法を拡張し、状態およびエネルギーコスト汎関数に必要な最適条件を得る。このアプローチは、最適条件を数値的に解くことにより、準1Dボースガスを(光学的)ボックスポテンシャルで最適に圧縮し、いわゆる近距離断熱性を見つけるために適用される。提案手法の挙動を解析し,還元基底関数を用いた簡単な直接最適化手法と比較した。シミュレーションの結果,提案手法の有効性が示された。

In this paper, we investigate the manipulation of quasi-1D Bose gases that are trapped in a highly elongated potential by optimal control methods. The effective meanfield dynamics of the gas can be described by a one-dimensional non-polynomial Schr\"odinger equation. We extend the indirect optimal control method for the Gross-Pitaevskii equation by Winckel and Borzi (2008) to obtain necessary optimality conditions for state and energy cost functionals. This approach is then applied to optimally compress a quasi-1D Bose gase in an (optical) box potential, i.e., to find a so-called short-cut to adiabaticity, by solving the optimality conditions numerically. The behavior of the proposed method is finally analyzed and compared to simple direct optimization strategies using reduced basis functions. Simulations results demonstrate the feasibility of the proposed approach.

翻訳日:2023-04-04 14:53:36 公開日:2023-04-03

# Polytuplet Loss: 可読性学習と論理的推論モデルへの逆アプローチ

Polytuplet Loss: A Reverse Approach to Training Reading Comprehension and Logical Reasoning Models ( http://arxiv.org/abs/2304.01046v1 )

ライセンス: Link先を確認

Jeffrey Lu, Ivan Rodriguez

(参考訳) 授業中、生徒は理解力と論理的推論力でテストされる。学生はこうした試験を修了するための様々な戦略を開発しており、その一部は一般に他よりも優れていると考えられている。そのような戦略の1つは、絶対的精度よりも相対的精度を強調することであり、理論的には問題の解答に必要な情報を完全に知ることなく正しい解を生成できる。本稿では,この戦略を転校学習モデルの学習に応用し,読解と論理推論の問題を解く効果について検討する。モデルは、難読性理解と論理的推論ベンチマークであるreclorデータセットで評価された。これまでの研究は論理推論のスキルを対象としていたが,一般的なトレーニング方法とモデルアーキテクチャに注目した。本稿では,三重項損失関数の拡張であるポリタップレット損失関数を提案する。その結果,ポリタプレット損失モデルの方が既存のベースラインモデルより優れていることがわかった。ポリタプレット損失は他のコントラスト損失関数の代替として有望なものであるが、その利点を定量化するためにさらなる研究が必要である。

Throughout schooling, students are tested on reading comprehension and logical reasoning. Students have developed various strategies for completing such exams, some of which are generally thought to outperform others. One such strategy involves emphasizing relative accuracy over absolute accuracy and can theoretically produce the correct answer without full knowledge of the information required to solve the question. This paper examines the effectiveness of applying such a strategy to train transfer learning models to solve reading comprehension and logical reasoning questions. The models were evaluated on the ReClor dataset, a challenging reading comprehension and logical reasoning benchmark. While previous studies targeted logical reasoning skills, we focus on a general training method and model architecture. We propose the polytuplet loss function, an extension of the triplet loss function, to ensure prioritization of learning the relative correctness of answer choices over learning the true accuracy of each choice. Our results indicate that models employing polytuplet loss outperform existing baseline models. Although polytuplet loss is a promising alternative to other contrastive loss functions, further research is required to quantify the benefits it may present.

翻訳日:2023-04-04 14:53:19 公開日:2023-04-03

# DivClust: ディープクラスタリングにおける多様性の制御

DivClust: Controlling Diversity in Deep Clustering ( http://arxiv.org/abs/2304.01042v1 )

ライセンス: Link先を確認

Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, Ioannis Patras

(参考訳) クラスタリングは機械学習の分野で主要な研究トピックであり、最近Deep Learningが大きな成功を収めた。しかしながら、既存のディープクラスタリング手法では対処されないクラスタリングの側面は、所定のデータセットに対して、効率的に複数の多様なパーティションを生成することである。これは特に重要であり、コンセンサスクラスタリングには多様なベースクラスタリングが必要であり、単一のクラスタリングに依存するよりも、より良く、より堅牢な結果を生み出すことが判明している。このギャップに対処するために、既存のディープクラスタリングフレームワークに組み込むことが可能な多様性制御損失であるdivclustを提案する。複数のデータセットと深いクラスタリングフレームワークで実験を行い、それを示しています。 a) 計算コストが極めて小さいフレームワークやデータセットの多様性を効果的に制御する手法。 b) DivClustが学んだクラスタリングの集合には、単一クラスタリングベースラインを著しく上回るソリューションが含まれており、 c) 既成のコンセンサスクラスタリングアルゴリズムを用いて、DivClustは、単一クラスタリングベースラインを一貫して上回り、ベースとなるディープクラスタリングフレームワークの性能を効果的に向上するコンセンサスクラスタリングソリューションを生成する。

Clustering has been a major research topic in the field of machine learning, one to which Deep Learning has recently been applied with significant success. However, an aspect of clustering that is not addressed by existing deep clustering methods, is that of efficiently producing multiple, diverse partitionings for a given dataset. This is particularly important, as a diverse set of base clusterings are necessary for consensus clustering, which has been found to produce better and more robust results than relying on a single clustering. To address this gap, we propose DivClust, a diversity controlling loss that can be incorporated into existing deep clustering frameworks to produce multiple clusterings with the desired degree of diversity. We conduct experiments with multiple datasets and deep clustering frameworks and show that: a) our method effectively controls diversity across frameworks and datasets with very small additional computational cost, b) the sets of clusterings learned by DivClust include solutions that significantly outperform single-clustering baselines, and c) using an off-the-shelf consensus clustering algorithm, DivClust produces consensus clustering solutions that consistently outperform single-clustering baselines, effectively improving the performance of the base deep clustering framework.

翻訳日:2023-04-04 14:53:02 公開日:2023-04-03

# 非エルミート超可積分系

Non-Hermitian superintegrable systems ( http://arxiv.org/abs/2304.01039v1 )

ライセンス: Link先を確認

Francisco Correa, Luis Inzunza, Ian Marquette

(参考訳) マースデン-ワインスタイン還元法の非エルミート一般化は、n 次元球面 $s^n$ 上の量子 $\mathcal{pt}$-symmetric superintegrable モデルの族を構築するために導入された。このメカニズムは、それぞれ$u(2)$と$u(3)$ Lie代数に関連する1次元および2次元の例で説明され、実スペクトルを持つ新しい量子モデルと自発な$\mathcal{PT}$-対称破壊を与える。ある極限において、モデルは既知の非エルミート系と以前に研究された実超可積分系の複素拡張に還元される。

A non-Hermitian generalisation of the Marsden--Weinstein reduction method is introduced to construct families of quantum $\mathcal{PT}$-symmetric superintegrable models over an $n$-dimensional sphere $S^n$. The mechanism is illustrated with one- and two-dimensional examples, related to $u(2)$ and $u(3)$ Lie algebras respectively, providing new quantum models with real spectra and spontaneous $\mathcal{PT}$-symmetric breaking. In certain limits, the models reduce to known non-Hermitian systems and complex extensions of previously studied real superintegrable systems.

翻訳日:2023-04-04 14:52:41 公開日:2023-04-03

# 臨界1+1Dアベリアン・ヒッグス模型のスペクトル特性

Spectral properties of critical 1+1D Abelian-Higgs model ( http://arxiv.org/abs/2304.01030v1 )

ライセンス: Link先を確認

Titas Chanda, Marcello Dalmonte, Maciej Lewenstein, Jakub Zakrzewski, Luca Tagliacozzo

(参考訳) 1+1d におけるゲージ対称性の存在は、動的ゲージボソンの存在を意味するものではないため冗長であることが知られている。その結果、連続体において、光子と相互作用するボソニック物質の理論は、高次元ヒッグスとクーロン相が非摂動効果によって連結されるため、単一の位相を持つ。しかし, [phys. rev. lett. 18, 090601 (2022)] で発表された最近の研究により, 格子上で系を離散化した場合の予期せぬ相転移が明らかになった。この遷移は中心電荷が$c=3/2$である共形場理論によって記述される。本稿では、この$c=3/2$理論の2つの成分、すなわち自由マヨラナフェルミオンおよびボゾン成分を平衡および外平衡スペクトル分析によって特徴づけることを目的とする。

The presence of gauge symmetry in 1+1D is known to be redundant, since it does not imply the existence of dynamical gauge bosons. As a consequence, in the continuum, the Abelian-Higgs model, the theory of bosonic matter interacting with photons, just possesses a single phase, as the higher dimensional Higgs and Coulomb phases are connected via non-perturbative effects. However, recent research published in [Phys. Rev. Lett. 128, 090601 (2022)] has revealed an unexpected phase transition when the system is discretized on the lattice. This transition is described by a conformal field theory with a central charge of $c=3/2$. In this paper, we aim to characterize the two components of this $c=3/2$ theory -- namely the free Majorana fermionic and bosonic parts -- through equilibrium and out-of-equilibrium spectral analyses.

翻訳日:2023-04-04 14:52:31 公開日:2023-04-03

# 知識蒸留による作物分別領域の一般化

Domain Generalization for Crop Segmentation with Knowledge Distillation ( http://arxiv.org/abs/2304.01029v1 )

ライセンス: Link先を確認

Simone Angarano, Mauro Martini, Alessandro Navone, Marcello Chiaberge

(参考訳) 近年、精密農業は、フィールドマネジメントに関わるすべての活動をサポートするために、自動化プロセスに近づいた農業を徐々に方向付けている。サービスロボティクスは、監視、噴霧、収穫といった人間の介入なしにフィールドをナビゲートできる自律エージェントを配置することで、この進化において主要な役割を果たす。これらの正確な行動を実行するには、移動ロボットは周囲を理解し、野生のターゲットを特定するリアルタイム認識システムが必要である。新しい作物や環境条件への一般化は、ラベル付きサンプルがほとんど利用できないため、実用化には不可欠である。本稿では,作物の分節化の問題を調査し,知識蒸留によるドメインの一般化を促進する新しい手法を提案する。提案フレームワークでは,ソースドメイン上で個別に訓練されたモデルの集合から,未知のターゲットドメインに適応可能な学生モデルへ知識を伝達する。そこで本研究では,5万種以上の植物を対象とし,異なる地形様式,気象条件,光シナリオをカバーする作物区分のための多領域合成データセットを提案する。我々は最先端手法よりも優れた性能を示す。このアプローチは作物の分節化におけるドメインの一般化に有望な解決策を提供し、精密な農業応用を促進する可能性を秘めている。

In recent years, precision agriculture has gradually oriented farming closer to automation processes to support all the activities related to field management. Service robotics plays a predominant role in this evolution by deploying autonomous agents that can navigate fields while performing tasks without human intervention, such as monitoring, spraying, and harvesting. To execute these precise actions, mobile robots need a real-time perception system that understands their surroundings and identifies their targets in the wild. Generalizing to new crops and environmental conditions is critical for practical applications, as labeled samples are rarely available. In this paper, we investigate the problem of crop segmentation and propose a novel approach to enhance domain generalization using knowledge distillation. In the proposed framework, we transfer knowledge from an ensemble of models individually trained on source domains to a student model that can adapt to unseen target domains. To evaluate the proposed method, we present a synthetic multi-domain dataset for crop segmentation containing plants of variegate shapes and covering different terrain styles, weather conditions, and light scenarios for more than 50,000 samples. We demonstrate significant improvements in performance over state-of-the-art methods. Our approach provides a promising solution for domain generalization in crop segmentation and has the potential to enhance precision agriculture applications.

翻訳日:2023-04-04 14:52:15 公開日:2023-04-03

# ゲージ場理論におけるベル-CHSH不等式のBRST不変式

BRST invariant formulation of the Bell-CHSH inequality in gauge field theories ( http://arxiv.org/abs/2304.01028v1 )

ライセンス: Link先を確認

David Dudal, Philipe De Fabritiis, Marcelo S. Guimaraes, Giovani Peruzzo, Silvio P. Sorella

(参考訳) ゲージ場理論におけるベル-CHSHの不等式について述べる。フォック空間におけるBRST電荷コホモロジーの九五大島解析を用いて、ベル-CHSH不等式は明らかにBRST不変の方法で定式化される。自由四次元マックスウェル理論とアベリアン・ヒッグス模型の例は精査されている。不等式はBRST不変の圧縮状態を用いて探索され、Tsirelson境界に近い大きなベル-CHSH不等式違反を可能にする。量子力学における2つの1/2$スピン粒子の絡み合った状態と比較した。

A study of the Bell-CHSH inequality in gauge field theories is presented. By using the Kugo-Ojima analysis of the BRST charge cohomology in Fock space, the Bell-CHSH inequality is formulated in a manifestly BRST invariant way. The examples of the free four-dimensional Maxwell theory and the Abelian Higgs model are scrutinized. The inequality is probed by using BRST invariant squeezed states, allowing for large Bell-CHSH inequality violations, close to Tsirelson's bound. An illustrative comparison with the entangled state of two $1/2$ spin particles in Quantum Mechanics is provided.

翻訳日:2023-04-04 14:51:53 公開日:2023-04-03

# 人工ニューラルネットワークと時系列数:非線形INGARCHモデルの一類

Artificial neural networks and time series of counts: A class of nonlinear INGARCH models ( http://arxiv.org/abs/2304.01025v1 )

ライセンス: Link先を確認

Malte Jahn

(参考訳) 条件付きヘテロスケダスティック性(INGARCH)を持つ一般化整数値自己回帰モデルを用いて、時系列のカウントを頻繁に解析する。これらのモデルは応答関数を用いて過去の観測ベクトルと過去の条件予測を現在の観測の条件予測にマッピングする。本稿では,INGARCHモデルと人工ニューラルネットワーク(ANN)の応答関数を組み合わせることで,非線形INGARCHモデルのクラスを得る方法について述べる。 ANNフレームワークは、対応するニューラルモデルの退化バージョンとして、既存のINGARCHモデルの解釈を可能にする。最大確率推定、限界効果、信頼区間の詳細が与えられる。有界数と非有界数の時系列の実証分析により、ニューラルINGARCHモデルは、情報損失の観点から、合理的に退化した競合モデルより優れていることが示された。

Time series of counts are frequently analyzed using generalized integer-valued autoregressive models with conditional heteroskedasticity (INGARCH). These models employ response functions to map a vector of past observations and past conditional expectations to the conditional expectation of the present observation. In this paper, it is shown how INGARCH models can be combined with artificial neural network (ANN) response functions to obtain a class of nonlinear INGARCH models. The ANN framework allows for the interpretation of many existing INGARCH models as a degenerate version of a corresponding neural model. Details on maximum likelihood estimation, marginal effects and confidence intervals are given. The empirical analysis of time series of bounded and unbounded counts reveals that the neural INGARCH models are able to outperform reasonable degenerate competitor models in terms of the information loss.

翻訳日:2023-04-04 14:51:42 公開日:2023-04-03

# 光:統合型マルチタスク学習ネットワークによる衛星画像からの個別建物抽出と高さ推定

LIGHT: Joint Individual Building Extraction and Height Estimation from Satellite Images through a Unified Multitask Learning Network ( http://arxiv.org/abs/2304.01090v1 )

ライセンス: Link先を確認

Yongqiang Mao, Xian Sun, Xingliang Huang, Kaiqiang Chen

(参考訳) ビルの抽出と高さ推定はリモートセンシング画像解釈における2つの重要な基本課題であり、都市計画、現実世界の3D構築、その他の分野で広く利用されている。現存する研究のほとんどは、この2つの課題を独立した研究とみなしている。したがって、高さ情報は、建物の抽出精度を向上させるために完全には利用できない。本研究では,建物の高さマップ,境界ボックス,セグメンテーションマスクマップを同時に出力する統合マルチタスク学習ネットワーク(LIGHT)を用いて,IndividuaL buIlding extract と heiGHt Estimation を初めて組み合わせた。具体的には、LIGHTはインスタンスセグメンテーションブランチと高さ推定ブランチで構成される。特に,マルチスケール機能ブランチを効果的に統一し,ブランチ間の機能を緩和するために,ブランチ間の機能インタラクションを効率的に行うGCTI (Gated Cross Task Interaction) モジュールを提案する。 DFC2023データセットの実験では、LIGHTは優れた性能を達成でき、ResNet101をバックボーンとしたGCTIモジュールは、それぞれ2.8%のAP50と6.5%のデルタ1でマルチタスク学習の性能を大幅に向上させることができる。

Building extraction and height estimation are two important basic tasks in remote sensing image interpretation, which are widely used in urban planning, real-world 3D construction, and other fields. Most of the existing research regards the two tasks as independent studies. Therefore the height information cannot be fully used to improve the accuracy of building extraction and vice versa. In this work, we combine the individuaL buIlding extraction and heiGHt estimation through a unified multiTask learning network (LIGHT) for the first time, which simultaneously outputs a height map, bounding boxes, and a segmentation mask map of buildings. Specifically, LIGHT consists of an instance segmentation branch and a height estimation branch. In particular, so as to effectively unify multi-scale feature branches and alleviate feature spans between branches, we propose a Gated Cross Task Interaction (GCTI) module that can efficiently perform feature interaction between branches. Experiments on the DFC2023 dataset show that our LIGHT can achieve superior performance, and our GCTI module with ResNet101 as the backbone can significantly improve the performance of multitask learning by 2.8% AP50 and 6.5% delta1, respectively.

翻訳日:2023-04-04 14:45:51 公開日:2023-04-03

# RPTQ:大規模言語モデルのためのリオーダーベースポストトレーニング量子化

RPTQ: Reorder-based Post-training Quantization for Large Language Models ( http://arxiv.org/abs/2304.01089v1 )

ライセンス: Link先を確認

Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

(参考訳) 大規模言語モデル(llm)は様々なタスクにおいて優れた性能を示しているが、そのデプロイは、その巨大なモデルサイズのために困難をもたらす。本稿では,LCMの量子化における主な課題は,外乱の問題だけでなく,チャネル間のアクティベーション範囲の違いによるものであることを確認し,LCMのアクティベーションの定量化の問題に対処する,新しいリオーダーベースの量子化手法であるRTPQを提案する。 RPTQはアクティベーション中のチャネルを並べ替え、クラスタ内でそれらを定量化することで、チャネルの範囲差の影響を低減する。さらに,明示的な順序変更を回避し,ストレージと計算オーバーヘッドを削減する。このアプローチを実装することで,LLMモデルを3ビットアクティベーションに初めてプッシュすることで,大きなブレークスルーを達成した。

Large-scale language models (LLMs) have demonstrated outstanding performance on various tasks, but their deployment poses challenges due to their enormous model size. In this paper, we identify that the main challenge in quantizing LLMs stems from the different activation ranges between the channels, rather than just the issue of outliers.We propose a novel reorder-based quantization approach, RPTQ, that addresses the issue of quantizing the activations of LLMs. RPTQ rearranges the channels in the activations and then quantizing them in clusters, thereby reducing the impact of range difference of channels. In addition, we reduce the storage and computation overhead by avoiding explicit reordering. By implementing this approach, we achieved a significant breakthrough by pushing LLM models to 3 bit activation for the first time.

翻訳日:2023-04-04 14:45:13 公開日:2023-04-03

# 自己構築型ニューラルネットワーク

Self-building Neural Networks ( http://arxiv.org/abs/2304.01086v1 )

ライセンス: Link先を確認

Andrea Ferigo, Giovanni Iacca

(参考訳) 生命の前半では、シナプト形成と呼ばれる過程を通じて学習しながら脳が発達する。ニューロンは互いに成長し相互作用し、シナプスを形成する。しかし、最終的には脳はシナプスを吐き出す。従来の研究は学習とプルーニングを独立に重視していたが、本研究では、ヘビアン学習とプルーニングの組み合わせにより、シナプト生成過程をシミュレートすることを目的として、生物学的に妥当なモデルを提案する。このようにして、タスクの解き方を学習しながら、エージェントはその経験を特定のネットワーク構造に変換する。すなわち、ネットワーク構造はタスクの実行中に自身を構築する。このアプローチを自己構築ニューラルネットワーク(SBNN)と呼ぶ。提案したSBNNと従来のニューラルネットワーク(NN)をOpenAIの3つの古典的な制御タスクと比較する。その結果,我々のモデルは従来のNNよりも性能がよいことがわかった。また,本モデルでは, NNよりも, 刈り込み速度を増大させながら, 性能劣化が小さいことが観察された。最後に,検証テストを実施し,学習段階では認識できないタスクでモデルをテストする。このケースでは、sbnnが従来のnnよりも新しいタスクに適応できることが示されている。

During the first part of life, the brain develops while it learns through a process called synaptogenesis. The neurons, growing and interacting with each other, create synapses. However, eventually the brain prunes those synapses. While previous work focused on learning and pruning independently, in this work we propose a biologically plausible model that, thanks to a combination of Hebbian learning and pruning, aims to simulate the synaptogenesis process. In this way, while learning how to solve the task, the agent translates its experience into a particular network structure. Namely, the network structure builds itself during the execution of the task. We call this approach Self-building Neural Network (SBNN). We compare our proposed SBNN with traditional neural networks (NNs) over three classical control tasks from OpenAI. The results show that our model performs generally better than traditional NNs. Moreover, we observe that the performance decay while increasing the pruning rate is smaller in our model than with NNs. Finally, we perform a validation test, testing the models over tasks unseen during the learning phase. In this case, the results show that SBNNs can adapt to new tasks better than the traditional NNs, especially when over $80\%$ of the weights are pruned.

翻訳日:2023-04-04 14:44:49 公開日:2023-04-03

# ソースデータのない非教師なし肺結節の検出

Unsupervised Cross-domain Pulmonary Nodule Detection without Source Data ( http://arxiv.org/abs/2304.01085v1 )

ライセンス: Link先を確認

Rui Xu, Yong Luo, Bo Du

(参考訳) クロスドメイン肺結節検出は、ソースとターゲットドメイン間のデータ分布の大きなシフトにより、性能劣化に悩まされる。また、医療データアノテーションのコストが高いことから、対象画像がラベル付けされていないと仮定されることが多い。既存のアプローチは、この教師なしドメイン適応設定に大きく進歩した。しかし、プライバシー上の懸念から、ソース医療データがアクセスできないことが多いため、医療アプリケーションでこの設定が有効なことは滅多にない。そこで本研究では,肺結節検出(SUP)のためのソースフリーな非教師なしクロスドメイン手法を提案する。まず、インスタンスレベルのコントラスト学習を利用して、ソースモデルをターゲットドメインに適応させる。そして、適応モデルを教師と学生のインタラクション方法で訓練し、さらに精度を向上させるために重み付きエントロピー損失を組み込む。トレーニング済みのソースモデルを3つの一般的な肺結節データセットに適用することにより,本手法の有効性を実証した。

Cross domain pulmonary nodule detection suffers from performance degradation due to large shift of data distributions between the source and target domain. Besides, considering the high cost of medical data annotation, it is often assumed that the target images are unlabeled. Existing approaches have made much progress for this unsupervised domain adaptation setting. However, this setting is still rarely plausible in the medical application since the source medical data are often not accessible due to the privacy concerns. This motivates us to propose a Source-free Unsupervised cross-domain method for Pulmonary nodule detection (SUP). It first adapts the source model to the target domain by utilizing instance-level contrastive learning. Then the adapted model is trained in a teacher-student interaction manner, and a weighted entropy loss is incorporated to further improve the accuracy. Extensive experiments by adapting a pre-trained source model to three popular pulmonary nodule datasets demonstrate the effectiveness of our method.

翻訳日:2023-04-04 14:44:29 公開日:2023-04-03

# 大きな言語モデルの推論ロジックは、シンボリックな概念に分解できるだろうか?

Can the Inference Logic of Large Language Models be Disentangled into Symbolic Concepts? ( http://arxiv.org/abs/2304.01083v1 )

ライセンス: Link先を確認

Wen Shen, Lei Cheng, Yuxiao Yang, Mingjie Li, Quanshi Zhang

(参考訳) 本稿では,大言語モデル(llms)の推論論理を記号的概念の集合として説明する。最近の多くの研究で、伝統的なDNNは、通常スパースシンボルの概念を符号化している。しかし、llm は従来の dnn よりも多くのパラメータを持つため、llm がスパースシンボリック概念を符号化するかどうかはまだ未解決の問題である。そこで本研究では,対話タスクのためのLLMの推論スコアを,少数の記号的概念に分解することを提案する。入力文の任意のマスキング状態に対して,これらの疎い概念を用いて LLM のすべての推測スコアを適切に推定できることを検証する。また、LLMで符号化された概念の転送可能性を評価し、シンボリックな概念が類似の入力文間で高い転送性を示すことを検証する。より重要なことに、これらの象徴的な概念は、LLMの予測エラーの原因となる正確な理由を説明するために使用できる。

In this paper, we explain the inference logic of large language models (LLMs) as a set of symbolic concepts. Many recent studies have discovered that traditional DNNs usually encode sparse symbolic concepts. However, because an LLM has much more parameters than traditional DNNs, whether the LLM also encodes sparse symbolic concepts is still an open problem. Therefore, in this paper, we propose to disentangle the inference score of LLMs for dialogue tasks into a small number of symbolic concepts. We verify that we can use those sparse concepts to well estimate all inference scores of the LLM on all arbitrarily masking states of the input sentence. We also evaluate the transferability of concepts encoded by an LLM and verify that symbolic concepts usually exhibit high transferability across similar input sentences. More crucially, those symbolic concepts can be used to explain the exact reasons accountable for the LLM's prediction errors.

翻訳日:2023-04-04 14:44:16 公開日:2023-04-03

# fmgnn:融合多様体グラフニューラルネットワーク

FMGNN: Fused Manifold Graph Neural Network ( http://arxiv.org/abs/2304.01081v1 )

ライセンス: Link先を確認

Cheng Deng, Fan Xu, Jiaxing Ding, Luoyi Fu, Weinan Zhang, Xinbing Wang

(参考訳) グラフ表現学習は様々なグラフタスクにおいて広く研究され、効果を示している。既存のほとんどの作品はユークリッド空間にグラフデータを埋め込んでいるが、近年の作品は埋め込みモデルを双曲空間や球面空間に拡張し、階層構造や環構造のような複雑な構造を持つグラフの性能を向上させる。異なる多様体からの埋め込みを融合することは、異なるグラフ構造上の埋め込み能力をさらに活用することができる。しかし、既存の埋め込み融合法は、異なる多様体上の同じ頂点の埋め込みの相互作用や調整を考慮せずに、出力埋め込みの連結や総和に焦点を当てており、最終的な融合結果に歪みや印象をもたらす可能性がある。さらに、異なる座標系から同じ頂点の埋め込みを融合させることも困難である。これらの課題に直面して,これらの多様体間の相互作用とアライメントを伴う異なるリーマン多様体にグラフを埋め込み,頂点と選択されたランドマークの間の異なる多様体上の距離を通して頂点埋め込みを融合する,新しいgnnアーキテクチャであるfmgnnを提案する。実験により,FMGNNはノード分類とリンク予測タスクのベンチマークにおいて,強いベースラインよりも優れた性能が得られることが示された。

Graph representation learning has been widely studied and demonstrated effectiveness in various graph tasks. Most existing works embed graph data in the Euclidean space, while recent works extend the embedding models to hyperbolic or spherical spaces to achieve better performance on graphs with complex structures, such as hierarchical or ring structures. Fusing the embedding from different manifolds can further take advantage of the embedding capabilities over different graph structures. However, existing embedding fusion methods mostly focus on concatenating or summing up the output embeddings, without considering interacting and aligning the embeddings of the same vertices on different manifolds, which can lead to distortion and impression in the final fusion results. Besides, it is also challenging to fuse the embeddings of the same vertices from different coordinate systems. In face of these challenges, we propose the Fused Manifold Graph Neural Network (FMGNN), a novel GNN architecture that embeds graphs into different Riemannian manifolds with interaction and alignment among these manifolds during training and fuses the vertex embeddings through the distances on different manifolds between vertices and selected landmarks, geometric coresets. Our experiments demonstrate that FMGNN yields superior performance over strong baselines on the benchmarks of node classification and link prediction tasks.

翻訳日:2023-04-04 14:44:00 公開日:2023-04-03

# 線形相補性プログラミングを用いた時系列の等角予測領域

Conformal Prediction Regions for Time Series using Linear Complementarity Programming ( http://arxiv.org/abs/2304.01075v1 )

ライセンス: Link先を確認

Matthew Cleaveland, Insup Lee, George J. Pappas, Lars Lindemann

(参考訳) コンフォーマル予測は、高い確率で有効な機械学習モデルの予測領域を生成する統計ツールである。しかし、時系列データに共形予測を適用すると、保守的な予測領域が生じる。実際、信頼度1-\delta$でT$以上の予測領域を得るには、 {previous works requires each individual prediction region is valid} with confidence $1-\delta/T$。学習可能な時系列予測器を使用する場合,この保守性を低減する最適化手法を提案する。複数の時間ステップで予測誤差を個別に考慮する代わりに、パラメータ化された予測誤差をパラメータ化する。追加データセット上でパラメータを最適化することにより、保守的でない予測領域を見つける。この問題を混合整数線形相補性プログラム (MILCP) としてキャストし, 線形相補性プログラム (LCP) に緩和することを示した。さらに、緩和されたLPは元のMILCPと同じ最適コストであることを示す。最後に,歩行者軌道予測器を用いたケーススタディにおいて,本手法の有効性を示す。

Conformal prediction is a statistical tool for producing prediction regions of machine learning models that are valid with high probability. However, applying conformal prediction to time series data leads to conservative prediction regions. In fact, to obtain prediction regions over $T$ time steps with confidence $1-\delta$, {previous works require that each individual prediction region is valid} with confidence $1-\delta/T$. We propose an optimization-based method for reducing this conservatism to enable long horizon planning and verification when using learning-enabled time series predictors. Instead of considering prediction errors individually at each time step, we consider a parameterized prediction error over multiple time steps. By optimizing the parameters over an additional dataset, we find prediction regions that are not conservative. We show that this problem can be cast as a mixed integer linear complementarity program (MILCP), which we then relax into a linear complementarity program (LCP). Additionally, we prove that the relaxed LP has the same optimal cost as the original MILCP. Finally, we demonstrate the efficacy of our method on a case study using pedestrian trajectory predictors.

翻訳日:2023-04-04 14:43:35 公開日:2023-04-03

# 節の絡み合い, 理論を探る例

Entanglement of Sections, Examples Looking for a Theory ( http://arxiv.org/abs/2304.01072v1 )

ライセンス: Link先を確認

M. H. Freedman and M. B. Hastings

(参考訳) 量子情報は状態の絡み合いに関するものである。この出発点にパラメータを追加し、単一の状態がバンドルの非バナッシングセクションとなるようにします。例を通してセクションの絡み合いのパターンを考察する。

Quantum information is about the entanglement of states. To this starting point we add parameters whereby a single state becomes a non-vanishing section of a bundle. We consider through examples the possible entanglement patterns of sections.

翻訳日:2023-04-04 14:43:17 公開日:2023-04-03

# HyperThumbnail: レート歪み最適化によるリアルタイム6Kイメージ再スケーリング

HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization ( http://arxiv.org/abs/2304.01064v1 )

ライセンス: Link先を確認

Chenyang Qi, Xin Yang, Ka Leong Cheng, Ying-Cong Chen, Qifeng Chen

(参考訳) 現代の画像再構成は、HR画像再構成のための埋め込み情報を含む低解像度(LR)サムネイル画像に高解像度(HR)画像を埋め込むことを目的としている。従来の超解像とは異なり、LRサムネイルに埋め込まれた情報から、元の画像に忠実な高忠実なHR画像復元を可能にする。しかし、最先端画像再スケーリング手法では、lr画像ファイルサイズを最適化せず、超高解像度(例えば6k)画像再構成のリアルタイム性能を低下させる。これら2つの課題に対処するために、リアルタイム6Kレート歪み認識画像再スケーリングのための新しいフレームワーク(HyperThumbnail)を提案する。提案する量子化予測モジュールにより,まずHR画像のJPEG LRサムネイルへの埋め込みを行い,HR再構成の品質を最大化しながら,埋め込みLR JPEGサムネイルのファイルサイズを最小化する。そして、効率的な周波数認識復号器は、LR1から高忠実度HR画像をリアルタイムに再構成する。広範な実験により,従来の画像再スケーリングベースラインよりも性能が優れており,リアルタイムに6k画像再構成が可能となった。

Contemporary image rescaling aims at embedding a high-resolution (HR) image into a low-resolution (LR) thumbnail image that contains embedded information for HR image reconstruction. Unlike traditional image super-resolution, this enables high-fidelity HR image restoration faithful to the original one, given the embedded information in the LR thumbnail. However, state-of-the-art image rescaling methods do not optimize the LR image file size for efficient sharing and fall short of real-time performance for ultra-high-resolution (e.g., 6K) image reconstruction. To address these two challenges, we propose a novel framework (HyperThumbnail) for real-time 6K rate-distortion-aware image rescaling. Our framework first embeds an HR image into a JPEG LR thumbnail by an encoder with our proposed quantization prediction module, which minimizes the file size of the embedding LR JPEG thumbnail while maximizing HR reconstruction quality. Then, an efficient frequency-aware decoder reconstructs a high-fidelity HR image from the LR one in real time. Extensive experiments demonstrate that our framework outperforms previous image rescaling baselines in rate-distortion performance and can perform 6K image reconstruction in real time.

翻訳日:2023-04-04 14:43:13 公開日:2023-04-03

# 多層平均場ネットワークによる深度分離

Depth Separation with Multilayer Mean-Field Networks ( http://arxiv.org/abs/2304.01063v1 )

ライセンス: Link先を確認

Yunwei Ren, Mo Zhou, Rong Ge

(参考訳) 深層ネットワークが浅層ネットワークよりも強力な理由である深層分離は、ディープラーニング理論において大きな問題となっている。以前の結果はしばしば表現力にフォーカスする。例えば、arXiv:1904.06984は3層ネットワークを使って簡単に近似できる関数を構築したが、2層ネットワークでは近似できない。本稿では,arxiv:1904.06984によって構築された関数を,多項式数のニューロンを効率的に持つ超パラメータネットワークを用いて学習することができる。その結果、平均場限界を多層ネットワークに拡張する新しい方法と、無限幅平均場ネットワークの離散化によって生じる誤差を要因とする損失の分解に依拠する。

Depth separation -- why a deeper network is more powerful than a shallower one -- has been a major problem in deep learning theory. Previous results often focus on representation power. For example, arXiv:1904.06984 constructed a function that is easy to approximate using a 3-layer network but not approximable by any 2-layer network. In this paper, we show that this separation is in fact algorithmic: one can learn the function constructed by arXiv:1904.06984 using an overparameterized network with polynomially many neurons efficiently. Our result relies on a new way of extending the mean-field limit to multilayer networks, and a decomposition of loss that factors out the error introduced by the discretization of infinite-width mean-field networks.

翻訳日:2023-04-04 14:42:52 公開日:2023-04-03

# テキスト教師付き意味セグメンテーションによる空間的一貫性のあるグループ化

Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation ( http://arxiv.org/abs/2304.01114v1 )

ライセンス: Link先を確認

Yabo Zhang, Zihao Wang, Jun Hao Liew, Jingjia Huang, Manyu Zhu, Jiashi Feng, Wangmeng Zuo

(参考訳) 本研究では,画像-文対の学習を通してのみ意味的セグメンテーションを行う。アノテーションが不足しているため、既存のテキスト管理手法では、ピクセル非感性フィードバックによってイメージをセマンティック領域にグループ化することしか学べない。その結果、グループ化された結果は粗く、しばしば小さなスプリアス領域を含んでおり、セグメンテーションの上限性能を制限している。一方,自己教師モデルによるグループ化の結果は,より意味的に一貫性があり,既存の手法のボトルネックを解消している。そこで本研究では,テキスト教師付きセマンティックセマンティックセグメンテーションを用いた自己教師付き空間一貫性グループを提案する。部分的グループ化の結果を考えると、2つのコア設計による画像レベルから領域レベル認識へのテキスト教師付きモデルの適用をさらに進める。まず,一方向の名詞と地域間の対比損失による微粒なアライメントを奨励し,不一致な名詞と地域間のペアを減らす。第2に、すべてのグループ領域の同時認識を可能にするために、コンテキスト対応マスキング戦略を採用する。空間的に一貫性のあるグループ化と領域適応認識を併用して,パスカルVOCおよびパスカルコンテキストのベンチマークにおいて59.2% mIoUと32.4% mIoUを達成し,最先端の手法をはるかに上回っている。

In this work, we investigate performing semantic segmentation solely through the training on image-sentence pairs. Due to the lack of dense annotations, existing text-supervised methods can only learn to group an image into semantic regions via pixel-insensitive feedback. As a result, their grouped results are coarse and often contain small spurious regions, limiting the upper-bound performance of segmentation. On the other hand, we observe that grouped results from self-supervised models are more semantically consistent and break the bottleneck of existing methods. Motivated by this, we introduce associate self-supervised spatially-consistent grouping with text-supervised semantic segmentation. Considering the part-like grouped results, we further adapt a text-supervised model from image-level to region-level recognition with two core designs. First, we encourage fine-grained alignment with a one-way noun-to-region contrastive loss, which reduces the mismatched noun-region pairs. Second, we adopt a contextually aware masking strategy to enable simultaneous recognition of all grouped regions. Coupled with spatially-consistent grouping and region-adapted recognition, our method achieves 59.2% mIoU and 32.4% mIoU on Pascal VOC and Pascal Context benchmarks, significantly surpassing the state-of-the-art methods.

翻訳日:2023-04-04 14:36:18 公開日:2023-04-03

# mcmcにおける神経制御変動の理論的保証

Theoretical guarantees for neural control variates in MCMC ( http://arxiv.org/abs/2304.01111v1 )

ライセンス: Link先を確認

Denis Belomestny, Artur Goldman, Alexey Naumov, Sergey Samsonov

(参考訳) 本稿では,加法制御変数に基づくマルコフ連鎖の分散低減手法と,漸近的分散に対する適切な推定値の最小化を提案する。制御変数がディープニューラルネットワークとして表現される場合、特に注目する。マルコフ連鎖上の様々なエルゴード性仮定の下での漸近分散の最適収束率を導出する。提案手法は分散還元アルゴリズムと関数近似理論の確率的誤差に関する最近の結果に依拠する。

In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance. We focus on the particular case when control variates are represented as deep neural networks. We derive the optimal convergence rate of the asymptotic variance under various ergodicity assumptions on the underlying Markov chain. The proposed approach relies upon recent results on the stochastic errors of variance reduction algorithms and function approximation theory.

翻訳日:2023-04-04 14:35:54 公開日:2023-04-03

# AutoLabel: オープンセットビデオドメイン適応のためのCLIPベースのフレームワーク

AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation ( http://arxiv.org/abs/2304.01110v1 )

ライセンス: Link先を確認

Giacomo Zara, Subhankar Roy, Paolo Rota, Elisa Ricci

(参考訳) open-set unsupervised video domain adaptation (ouvda) は、ラベル付きソースドメインから、ターゲットに存在するがソースに存在しない"ターゲット-プライベート"カテゴリを含むラベル付きターゲットドメインへのアクション認識モデルを適用するタスクを扱う。本研究は、事前学習された言語と視覚モデル(CLIP)の使用を提案することにより、特定のオープンセット分類器や重み付けされた対人学習を訓練する以前の作業から逸脱する。 CLIPは、リッチな表現とゼロショット認識機能のために、OUVDAに適している。しかし、CLIPのゼロショットプロトコルでターゲットプライベートなインスタンスを拒否するには、ターゲットプライベートなラベル名に関するオラクルの知識が必要である。本稿では,ラベル名の知識の欠如を回避するために,オブジェクト中心の合成候補クラス名を自動的に発見・生成するAutoLabelを提案する。その単純さにもかかわらず、AutoLabelを装備したCLIPは、ターゲットプライベートなインスタンスを十分に拒否できるため、2つのドメインの共有クラス間のアライメントがより容易になる。コードは利用可能です。

Open-set Unsupervised Video Domain Adaptation (OUVDA) deals with the task of adapting an action recognition model from a labelled source domain to an unlabelled target domain that contains "target-private" categories, which are present in the target but absent in the source. In this work we deviate from the prior work of training a specialized open-set classifier or weighted adversarial learning by proposing to use pre-trained Language and Vision Models (CLIP). The CLIP is well suited for OUVDA due to its rich representation and the zero-shot recognition capabilities. However, rejecting target-private instances with the CLIP's zero-shot protocol requires oracle knowledge about the target-private label names. To circumvent the impossibility of the knowledge of label names, we propose AutoLabel that automatically discovers and generates object-centric compositional candidate target-private class names. Despite its simplicity, we show that CLIP when equipped with AutoLabel can satisfactorily reject the target-private instances, thereby facilitating better alignment between the shared classes of the two domains. The code is available.

翻訳日:2023-04-04 14:35:48 公開日:2023-04-03

# 偶然の世代

Coincidental Generation ( http://arxiv.org/abs/2304.01108v1 )

ライセンス: Link先を確認

Jordan W. Suchow and Necdet G\"urkan

(参考訳) Generative AI models are emerging as a versatile tool across diverse industries with applications in synthetic data generation computational art personalization of products and services and immersive entertainment Here we introduce a new privacy concern in the adoption and use of generative AI models that of coincidental generation Coincidental generation occurs when a models output inadvertently bears a likeness to a realworld entity Consider for example synthetic portrait generators which are today deployed in commercial applications such as virtual modeling agencies and synthetic stock photography We argue that the low intrinsic dimensionality of human face perception implies that every synthetically generated face will coincidentally resemble an actual person all but guaranteeing a privacy violation in the form of a misappropriation of likeness.

Generative AI models are emerging as a versatile tool across diverse industries with applications in synthetic data generation computational art personalization of products and services and immersive entertainment Here we introduce a new privacy concern in the adoption and use of generative AI models that of coincidental generation Coincidental generation occurs when a models output inadvertently bears a likeness to a realworld entity Consider for example synthetic portrait generators which are today deployed in commercial applications such as virtual modeling agencies and synthetic stock photography We argue that the low intrinsic dimensionality of human face perception implies that every synthetically generated face will coincidentally resemble an actual person all but guaranteeing a privacy violation in the form of a misappropriation of likeness.

翻訳日:2023-04-04 14:35:26 公開日:2023-04-03

# crossword: マスキングによるデータ圧縮への意味的アプローチ

Crossword: A Semantic Approach to Data Compression via Masking ( http://arxiv.org/abs/2304.01106v1 )

ライセンス: Link先を確認

Mingxiao Li, Rui Jin, Liyao Xiang, Kaiming Shen, Shuguang Cui

(参考訳) データ圧縮の伝統的な手法は、典型的には記号レベルの統計に基づいており、情報ソースは確率変数や確率過程の長いシーケンスとしてモデル化され、損失のない圧縮のエントロピーや損失のない圧縮の相互情報として基本的な限界を確立する。しかし、現実世界のソース(テキスト、音楽、音声を含む)は、人間の知覚と密接な関係があるため、統計的に定義できないことが多いため、モデル駆動のアプローチはかなり最適ではない。本研究は英語テキストに注意を集中させ,その意味的側面を利用して圧縮効率をさらに高める。主なアイデアはパズルのクロスワードに由来するもので、いくつかのキー文字が提供される限り、隠された単語を正確に再構築することができる。提案手法は上記のゲームに類似している。簡単に言えば、エンコーダは意味的損失に応じて各単語の意味的重要性を評価し、その後、マイナーな単語をマスキングし、デコーダは意味的文脈から意味的文脈でマスクされた単語を復元する。実験により,提案手法はhuffman codeやutf-8 codeのような従来の手法に比べて圧縮効率が向上すると同時に,目的とするテキストの意味をかなり保持できることを示した。

The traditional methods for data compression are typically based on the symbol-level statistics, with the information source modeled as a long sequence of i.i.d. random variables or a stochastic process, thus establishing the fundamental limit as entropy for lossless compression and as mutual information for lossy compression. However, the source (including text, music, and speech) in the real world is often statistically ill-defined because of its close connection to human perception, and thus the model-driven approach can be quite suboptimal. This study places careful emphasis on English text and exploits its semantic aspect to enhance the compression efficiency further. The main idea stems from the puzzle crossword, observing that the hidden words can still be precisely reconstructed so long as some key letters are provided. The proposed masking-based strategy resembles the above game. In a nutshell, the encoder evaluates the semantic importance of each word according to the semantic loss and then masks the minor ones, while the decoder aims to recover the masked words from the semantic context by means of the Transformer. Our experiments show that the proposed semantic approach can achieve much higher compression efficiency than the traditional methods such as Huffman code and UTF-8 code, while preserving the meaning in the target text to a great extent.

翻訳日:2023-04-04 14:35:15 公開日:2023-04-03

# RunBugRun - プログラムの自動修復のための実行可能なデータセット

RunBugRun -- An Executable Dataset for Automated Program Repair ( http://arxiv.org/abs/2304.01102v1 )

ライセンス: Link先を確認

Julian Aron Prenner and Romain Robbes

(参考訳) 近年、APR(Automated Program repair)において、特にディープニューラルネットワークへのデータ駆動技術への移行が注目されている。これは数十万、あるいは数百万の実行不能なコードフラグメントのトレーニングを伴います。我々は、ニューラルプログラム修復(NPR)でしばしば無視されるコードの側面、すなわちその実行にもっと注意を向けたいと思います。コード実行にはいくつかの大きな利点がある。候補修正をテストベースで評価することができ、修復を支援する貴重な情報を提供することができる。本研究では,8つの異なるプログラミング言語で書かれたプログラム競合サイトに提出された,45万個の小さなバグ/修正プログラムペアの完全な実行データセットを示す。データセットとともに、プログラムをコンパイル、安全に実行、テストするためのインフラストラクチャと、きめ細かいバグタイプのラベルを提供します。参照点を与えるため,提案手法は2つのベースラインに対して,1つは生成と検証に基づく評価結果であり,もう1つは深層学習に関する評価結果である。このデータセットでは、完全な静的コード表現を超えて、ニューラルプログラムの修復を強化し、実行ベースの機能の使用を促進し、いくつかの異なる言語を含めることで、現在のAPRデータセットとベンチマークの状況において、Javaの優位性と相反する、いくつかの目標を達成したいと考えています。

Recently, we can notice a transition to data-driven techniques in Automated Program Repair (APR), in particular towards deep neural networks. This entails training on hundreds of thousands or even millions of non-executable code fragments. We would like to bring more attention to an aspect of code often neglected in Neural Program Repair (NPR), namely its execution. Code execution has several significant advantages. It allows for test-based evaluation of candidate fixes and can provide valuable information to aid repair. In this work we present a fully executable dataset of 450,000 small buggy/fixed program pairs originally submitted to programming competition websites written in eight different programming languages. Along with the dataset we provide infrastructure to compile, safely execute and test programs as well as fine-grained bug-type labels. To give a point of reference, we provide basic evaluation results for two baselines, one based on a generate-and-validate approach and one on deep learning. With this dataset we follow several goals: we want to lift Neural Program Repair beyond fully static code representations, foster the use of execution-based features and, by including several different languages, counterbalance the predominance of Java in the current landscape of APR datasets and benchmarks.

翻訳日:2023-04-04 14:34:50 公開日:2023-04-03

# Dsfer-Net:近代ホップフィールドネットワークを用いたバイテンポラル変化検出のための深層スーパービジョンと特徴検索ネットワーク

Dsfer-Net: A Deep Supervision and Feature Retrieval Network for Bitemporal Change Detection Using Modern Hopfield Networks ( http://arxiv.org/abs/2304.01101v1 )

ライセンス: Link先を確認

Shizhen Chang, Michael Kopp, Pedram Ghamisi

(参考訳) 高解像度リモートセンシング画像への重要な応用として,地表面の変化の監視と解析が目的である。高解像度リモートセンシングデータの量の増加とテクスチャの特徴の複雑さにより、多くの定量的深層学習手法が提案されている。これらの手法は,深部特徴抽出と空間時空間情報の組み合わせによって従来の変化検出手法を上回っているが,検出性能向上における深い特徴の作用についての合理的な説明はいまだに欠けている。本研究では,現代のホップフィールドネットワーク層がセマンティック理解においてかなりの性能を発揮することを示す。本稿では,バイテンポラル変化検出のためのDeep Supervision and feature Retrieval Network (Dsfer-Net)を提案する。具体的には,完全畳み込み型シャムネットワークを用いて,バイテンポラル画像の高度に代表される深い特徴を抽出する。バイテンポラル画像の逐次的地理情報に基づいて特徴検索モジュールを設計し,特徴特徴を抽出し,識別情報を深く教師された方法で活用する。また,教師付き特徴検索モジュールは,提案するネットワークの深い層における意味的理解について説明可能な証明を与える。最後に、このエンドツーエンドネットワークは、異なるレイヤから取得した特徴と特徴ペアを集約することで、新しいフレームワークを実現する。 3つの公開データセット(LEVIR-CD、WHU-CD、CDD)で実施された実験は、提案したDsfer-Netが他の最先端手法よりも優れていることを確認した。コードはオンラインで入手できる(https://github.com/ShizhenChang/Dsfer-Net)。

Change detection, as an important application for high-resolution remote sensing images, aims to monitor and analyze changes in the land surface over time. With the rapid growth in the quantity of high-resolution remote sensing data and the complexity of texture features, a number of quantitative deep learning-based methods have been proposed. Although these methods outperform traditional change detection methods by extracting deep features and combining spatial-temporal information, reasonable explanations about how deep features work on improving the detection performance are still lacking. In our investigations, we find that modern Hopfield network layers achieve considerable performance in semantic understandings. In this paper, we propose a Deep Supervision and FEature Retrieval network (Dsfer-Net) for bitemporal change detection. Specifically, the highly representative deep features of bitemporal images are jointly extracted through a fully convolutional Siamese network. Based on the sequential geo-information of the bitemporal images, we then design a feature retrieval module to retrieve the difference feature and leverage discriminative information in a deeply supervised manner. We also note that the deeply supervised feature retrieval module gives explainable proofs about the semantic understandings of the proposed network in its deep layers. Finally, this end-to-end network achieves a novel framework by aggregating the retrieved features and feature pairs from different layers. Experiments conducted on three public datasets (LEVIR-CD, WHU-CD, and CDD) confirm the superiority of the proposed Dsfer-Net over other state-of-the-art methods. Code will be available online (https://github.com/ShizhenChang/Dsfer-Net).

翻訳日:2023-04-04 14:34:29 公開日:2023-04-03

# doctorglm:中国の医師の微調整はハーキュリアンの仕事ではない

DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task ( http://arxiv.org/abs/2304.01097v1 )

ライセンス: Link先を確認

Honglin Xiong, Sheng Wang, Yitao Zhu, Zihao Zhao, Yuxiao Liu, Qian Wang, Dinggang Shen

(参考訳) chatgptやgpt-4を含む大規模言語モデル(llm)の最近の進歩は、人間の指示に対する理解と応答において顕著である。にもかかわらず、これらのモデルは英語でよく機能し、医学領域で明示的に訓練されていないため、診断、医薬品の推奨、その他の医療アドバイスにおいて最適でない精度をもたらす。加えて、対話モデルの訓練と展開は、まだ病院にとって不可能であると考えられており、LLMの推進を妨げる。これらの課題に対処するため,我々はchatgptの助けを借りて,中国語の医療対話データベースを収集し,容易に展開できるllmの訓練手法をいくつか採用した。注目すべきは、ChatGLM-6Bを1台のA100 80Gで13時間で微調整できたことです。 DoctorGLMは現在、様々な誤りを含む初期段階のエンジニアリングの試みである。私たちは、医療に焦点を当てた機能を改善するためのフィードバックや提案を広くコミュニティと共有しています。

The recent progress of large language models (LLMs), including ChatGPT and GPT-4, in comprehending and responding to human instructions has been remarkable. Nevertheless, these models typically perform better in English and have not been explicitly trained for the medical domain, resulting in suboptimal precision in diagnoses, drug recommendations, and other medical advice. Additionally, training and deploying a dialogue model is still believed to be impossible for hospitals, hindering the promotion of LLMs. To tackle these challenges, we have collected databases of medical dialogues in Chinese with ChatGPT's help and adopted several techniques to train an easy-deploy LLM. Remarkably, we were able to fine-tune the ChatGLM-6B on a single A100 80G in 13 hours, which means having a healthcare-purpose LLM can be very affordable. DoctorGLM is currently an early-stage engineering attempt and contain various mistakes. We are sharing it with the broader community to invite feedback and suggestions to improve its healthcare-focused capabilities: https://github.com/xionghonglin/DoctorGLM.

翻訳日:2023-04-04 14:34:02 公開日:2023-04-03

# 忍における人間行動の緩和に向けたニューラルネットワークの進化III : 忍者師範の復帰

Evolving Artificial Neural Networks To Imitate Human Behaviour In Shinobi III : Return of the Ninja Master ( http://arxiv.org/abs/2304.01096v1 )

ライセンス: Link先を確認

Maximilien Le Clei

(参考訳) 私たちの社会はますます計算ツールが好きだ。この現象は、新しい人工知能パラダイムの出現に続いて、過去10年間で大きく増加している。具体的には、Deep Neural NetworksとStochastic Gradient Descentという2つのアルゴリズム技術の結合によって、計算能力が指数関数的に増加し、多くの現代技術において主要な資産となり続けている。しかし、進歩が進むにつれ、他の方法がこのような様々なハードウェアの進歩の恩恵を享受できるかどうか疑問視する向きもある。この研究をさらに進めるために、我々はこの論文を進化的アルゴリズムとその動的ニューラルネットワークへの応用に当てはめている。強力な計算資源を活用しながら新たな手法を考案することで、様々なベンチマークで強力なパフォーマンスを持つエージェントを開発できるだけでなく、ゲームshinobi iiiの人間と非常によく似たエージェントを開発できることがわかった。

Our society is increasingly fond of computational tools. This phenomenon has greatly increased over the past decade following, among other factors, the emergence of a new Artificial Intelligence paradigm. Specifically, the coupling of two algorithmic techniques, Deep Neural Networks and Stochastic Gradient Descent, thrusted by an exponentially increasing computing capacity, has and is continuing to become a major asset in many modern technologies. However, as progress takes its course, some still wonder whether other methods could similarly or even more greatly benefit from these various hardware advances. In order to further this study, we delve in this thesis into Evolutionary Algorithms and their application to Dynamic Neural Networks, two techniques which despite enjoying many advantageous properties have yet to find their niche in contemporary Artificial Intelligence. We find that by elaborating new methods while exploiting strong computational resources, it becomes possible to develop strongly performing agents on a variety of benchmarks but also some other agents behaving very similarly to human subjects on the video game Shinobi III : Return of The Ninja Master, typical complex tasks previously out of reach for non-gradient-based optimization.

翻訳日:2023-04-04 14:33:45 公開日:2023-04-03

# キャプションの変更:リモートセンシングによる変更キャプションのための注意ネットワーク

Changes to Captions: An Attentive Network for Remote Sensing Change Captioning ( http://arxiv.org/abs/2304.01091v1 )

ライセンス: Link先を確認

Shizhen Chang and Pedram Ghamisi

(参考訳) 近年,自然言語処理(NLP)技術を用いたリモートセンシング画像の直接学習と解析に注目が集まっている。多時期リモートセンシング画像における変化を正確に記述する能力は,地理空間の理解や土地計画においてますます重要になっている。自然画像変化キャプションタスクとは異なり、リモートセンシング変化キャプションは、照明、季節効果、複雑な土地被覆など、さまざまな要因に関わらず、最も重要な変化を捉えることを目的としている。本研究では,リモートセンシング画像の変化を正確に記述することの重要性を強調し,自然画像と合成画像とリモートセンシング画像における変化キャプションタスクの比較を行う。正確なキャプション生成の課題に対処するため,両時間リモートセンシング画像に対して,Chg2Capと呼ばれる注意的変更対キャプションネットワークを提案する。ネットワークは3つの主要コンポーネントから構成される。 1) 画像ペアごとに高レベル表現を収集するシームズCNNに基づく特徴抽出器 2 画像埋め込みを生成するための変更関連特徴の特定のための階層的自己注意ブロック及び残留ブロックを含む注意的復号器 3) 画像埋め込みと記述への単語埋め込みの関係をデコードするトランスベースのキャプション生成装置。提案するChg2Capネットワークを2つの代表的なリモートセンシングデータセットで評価し,総合的な実験分析を行った。コードと事前訓練されたモデルはhttps://github.com/ShizhenChang/Chg2Cap.comからオンラインで入手できる。

In recent years, advanced research has focused on the direct learning and analysis of remote sensing images using natural language processing (NLP) techniques. The ability to accurately describe changes occurring in multi-temporal remote sensing images is becoming increasingly important for geospatial understanding and land planning. Unlike natural image change captioning tasks, remote sensing change captioning aims to capture the most significant changes, irrespective of various influential factors such as illumination, seasonal effects, and complex land covers. In this study, we highlight the significance of accurately describing changes in remote sensing images and present a comparison of the change captioning task for natural and synthetic images and remote sensing images. To address the challenge of generating accurate captions, we propose an attentive changes-to-captions network, called Chg2Cap for short, for bi-temporal remote sensing images. The network comprises three main components: 1) a Siamese CNN-based feature extractor to collect high-level representations for each image pair; 2) an attentive decoder that includes a hierarchical self-attention block to locate change-related features and a residual block to generate the image embedding; and 3) a transformer-based caption generator to decode the relationship between the image embedding and the word embedding into a description. The proposed Chg2Cap network is evaluated on two representative remote sensing datasets, and a comprehensive experimental analysis is provided. The code and pre-trained models will be available online at https://github.com/ShizhenChang/Chg2Cap.

翻訳日:2023-04-04 14:33:25 公開日:2023-04-03

# 非可積分非負超マーチンガールに対する拡張ヴィルの不等式

The extended Ville's inequality for nonintegrable nonnegative supermartingales ( http://arxiv.org/abs/2304.01163v1 )

ライセンス: Link先を確認

Hongjian Wang and Aaditya Ramdas

(参考訳) ロビンズの最初の研究に続いて、積分性も有限性も必要とせず、非負超行列の延長理論を厳格に提示する。特に、ロビンズによって予見された重要な極大不等式が導出され、これは拡張ヴィルの不等式と呼ばれ、古典ヴィルの不等式(可積分な非負のスーパーマーチンガール)を強化し、また我々の非可積分な設定にも適用される。我々は混合法の拡張を導出し、拡張された非負超行列の$\sigma$-finite混合に適用する。非パラメトリックな信頼シーケンスの導出における不適切な混合(プライアー)や(拡張された)e-プロセスの使用など、シーケンシャルな統計に対する我々の理論のいくつかの意味を示す。

Following initial work by Robbins, we rigorously present an extended theory of nonnegative supermartingales, requiring neither integrability nor finiteness. In particular, we derive a key maximal inequality foreshadowed by Robbins, which we call the extended Ville's inequality, that strengthens the classical Ville's inequality (for integrable nonnegative supermartingales), and also applies to our nonintegrable setting. We derive an extension of the method of mixtures, which applies to $\sigma$-finite mixtures of our extended nonnegative supermartingales. We present some implications of our theory for sequential statistics, such as the use of improper mixtures (priors) in deriving nonparametric confidence sequences and (extended) e-processes.

翻訳日:2023-04-04 14:26:57 公開日:2023-04-03

# 確率的ミラー降下は敵の遅延攻撃に弱いか? 交通割り当てのレジリエンスに関する研究

Is Stochastic Mirror Descent Vulnerable to Adversarial Delay Attacks? A Traffic Assignment Resilience Study ( http://arxiv.org/abs/2304.01161v1 )

ライセンス: Link先を確認

Yunian Pan, Tao Li, and Quanyan Zhu

(参考訳) Intextit{Intelligent Navigation Systems} (INS) は、データ収集プロセス中に、INSとトランスポートネットワークの間の通信チャネルをインターセプトする情報攻撃ベクトルの増加に晒される。 insの弾力性を測定するために、ウォードロップ非平衡解 (wardrop non-equilibrium solution,wanes) という概念を用いる。集中度引数を用いることで、任意の有界なフィードバック遅延攻撃は、遅延ミラー降下(dmd)オンライン学習フレームワーク内のトラフィックフローの軌跡に沿って、$\tilde{\mathcal{o}}(\sqrt{{{d^3}{t^{-1}}})のオーダーまで体系的なパフォーマンスを低下させるだけであることが判明した。この性能低下は軽度の仮定だけで起こり得る。以上の結果から,学習ベースのinsインフラストラクチャは,情報構造に一定期間の混乱が生じても,ウォードロップ非平衡を実現できることが示唆された。これらの発見は、輸送エコシステムの異なる層にまたがる妨害攻撃に対する防御メカニズムを設計するための貴重な洞察を提供する。

\textit{Intelligent Navigation Systems} (INS) are exposed to an increasing number of informational attack vectors, which often intercept through the communication channels between the INS and the transportation network during the data collecting process. To measure the resilience of INS, we use the concept of a Wardrop Non-Equilibrium Solution (WANES), which is characterized by the probabilistic outcome of learning within a bounded number of interactions. By using concentration arguments, we have discovered that any bounded feedback delaying attack only degrades the systematic performance up to order $\tilde{\mathcal{O}}(\sqrt{{d^3}{T^{-1}}})$ along the traffic flow trajectory within the Delayed Mirror Descent (DMD) online-learning framework. This degradation in performance can occur with only mild assumptions imposed. Our result implies that learning-based INS infrastructures can achieve Wardrop Non-equilibrium even when experiencing a certain period of disruption in the information structure. These findings provide valuable insights for designing defense mechanisms against possible jamming attacks across different layers of the transportation ecosystem.

翻訳日:2023-04-04 14:26:39 公開日:2023-04-03

# DribbleBot: 野生での動的レッグ操作

DribbleBot: Dynamic Legged Manipulation in the Wild ( http://arxiv.org/abs/2304.01159v1 )

ライセンス: Link先を確認

Yandong Ji, Gabriel B. Margolis, Pulkit Agrawal

(参考訳) ドリブルボット(DribbleBot、Dexterous Ball Manipulation with a Legged Robot)は、人間と同じ現実の条件下でサッカーボールをドリブルできるロボットシステムである。我々は,強化学習を用いたシミュレーションにおけるトレーニング政策のパラダイムを採用し,それらを現実世界に移す。異なる地形における変動球運動の計算における批判的課題を克服し,オンボード・コンピューティングの制約下でボディマウントカメラを用いて球を知覚する。以上の結果から,現在の四足歩行プラットフォームは,同時移動と感覚観察から直接操作を含む動的全身制御問題の研究に適していることを示す。

DribbleBot (Dexterous Ball Manipulation with a Legged Robot) is a legged robotic system that can dribble a soccer ball under the same real-world conditions as humans (i.e., in-the-wild). We adopt the paradigm of training policies in simulation using reinforcement learning and transferring them into the real world. We overcome critical challenges of accounting for variable ball motion dynamics on different terrains and perceiving the ball using body-mounted cameras under the constraints of onboard computing. Our results provide evidence that current quadruped platforms are well-suited for studying dynamic whole-body control problems involving simultaneous locomotion and manipulation directly from sensory observations.

翻訳日:2023-04-04 14:26:19 公開日:2023-04-03

# 測定の超循環系

Hypercyclic systems of measurements ( http://arxiv.org/abs/2304.01155v1 )

ライセンス: Link先を確認

Victor H. Cervantes and Ehtibar N. Dzhafarov

(参考訳) 循環系は量子力学の基礎、特に文脈分析において主要な役割を担ってきた。現在までには、文脈性とその性質の異なる尺度を含む、外乱のない循環系に関する本質的に完全な理論が存在する。本稿では, 循環システムのクラスを一般化し, 構造的特性のいくつかを保存した新しい種類の測定システムを紹介する。これらの超循環系の理論的および実験的解析は、文脈性の理論の発展に有用であることが示唆される。

Cyclic systems have played a dominant role in the foundations of quantum mechanics, especially in contextuality analysis. By now we have an essentially complete theory of the cyclic systems, both without and with disturbance, including different measures of contextuality and their properties. In this concept paper we introduce a new class of systems of measurements, one that generalizes the class of cyclic systems while preserving some of their structural characteristics. We suggest that theoretical and experimental analysis of these hypercyclic systems may prove to be beneficial in developing theories of contextuality.

翻訳日:2023-04-04 14:26:07 公開日:2023-04-03

# 空間ネットワークのための代数的および幾何学的モデル

Algebraic and Geometric Models for Space Networking ( http://arxiv.org/abs/2304.01150v1 )

ライセンス: Link先を確認

William Bernardoni, Robert Cardona, Jacob Cleveland, Justin Curry, Robert Green, Brian Heller, Alan Hylton, Tung Lam, Robert Kassouf-Short

(参考訳) 本稿では,ネットワーク空間通信における代数的および幾何学的視点を紹介する。我々の主な貢献は、実数直線 P(R) の部分集合の値を持つ行列の項で定義される時間変化グラフ(TVG)の新たな定義である。我々は、P(R) の半環特性を利用して、行列乗算と切り離されたクリーネ星を用いたテレビGにおけるマルチホップ通信をモデル化する。これにより、無作為に選択されたSTARLINK衛星の大規模なサンプルに対して、ライフタイムカーブと呼ばれるTVGの通信能力に関する新たな統計が生み出される。トポロジカルデータ解析(TDA)にインスパイアされた新しい指標を用いて,STARLINKの大規模サブサンプルが時間的に強く連結されている場合の判定を行う。地球と火星の間のネットワークシナリオをより良くモデル化するために,伝播遅延をモデル化できる様々なセミリングと,保存・フォワードなどの遅延耐性ネットワーク(DTN)に共通するプロトコルを導入する。最後に,異なる宇宙ネットワークの実現に向けたzigzagの持続性の適用可能性を示し,k-nearest neighbors (knn) 分類による時変トポロジーのみを用いた地球・月衛星の識別の有効性を示す。

In this paper we introduce some new algebraic and geometric perspectives on networked space communications. Our main contribution is a novel definition of a time-varying graph (TVG), defined in terms of a matrix with values in subsets of the real line P(R). We leverage semi-ring properties of P(R) to model multi-hop communication in a TVG using matrix multiplication and a truncated Kleene star. This leads to novel statistics on the communication capacity of TVGs called lifetime curves, which we generate for large samples of randomly chosen STARLINK satellites, whose connectivity is modeled over day-long simulations. Determining when a large subsample of STARLINK is temporally strongly connected is further analyzed using novel metrics introduced here that are inspired by topological data analysis (TDA). To better model networking scenarios between the Earth and Mars, we introduce various semi-rings capable of modeling propagation delay as well as protocols common to Delay Tolerant Networking (DTN), such as store-and-forward. Finally, we illustrate the applicability of zigzag persistence for featurizing different space networks and demonstrate the efficacy of K-Nearest Neighbors (KNN) classification for distinguishing Earth-Mars and Earth-Moon satellite systems using time-varying topology alone.

翻訳日:2023-04-04 14:25:59 公開日:2023-04-03

# 頭を使う: 長距離ビデオ認識を改良

Use Your Head: Improving Long-Tail Video Recognition ( http://arxiv.org/abs/2304.01143v1 )

ライセンス: Link先を確認

Toby Perrett, Saptarshi Sinha, Tilo Burghardt, Majid Mirmehdi, Dima Damen

(参考訳) 本稿では,ロングテールビデオ認識について検討する。自然に収集されたビデオデータセットや既存のロングテール画像ベンチマークとは異なり、現在のビデオベンチマークは複数のロングテールプロパティで不足している。一番重要なのは、尻尾にショットのクラスがほとんどないことです。そこで本研究では,ssv2とvideoltの2つのデータセットからサブセットをサンプリングすることで,ロングテール認識を評価する新しいビデオベンチマークを提案する。そこで本研究では,ヘッドクラスからサンプルを重み付けした組み合わせとして再構成することで,少数クラスからのインスタンスへの過度適合を低減できるLong-Tail Mixed Reconstructionを提案する。 lmrはラベル混合を用いてロバストな決定境界を学習する。 EPIC-KITCHENS と提案した SSv2-LT と VideoLT-LT で最先端の平均クラス精度を実現する。ベンチマークとコード: tobyperrett.github.io/lmr

This paper presents an investigation into long-tail video recognition. We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties. Most critically, they lack few-shot classes in their tails. In response, we propose new video benchmarks that better assess long-tail recognition, by sampling subsets from two datasets: SSv2 and VideoLT. We then propose a method, Long-Tail Mixed Reconstruction, which reduces overfitting to instances from few-shot classes by reconstructing them as weighted combinations of samples from head classes. LMR then employs label mixing to learn robust decision boundaries. It achieves state-of-the-art average class accuracy on EPIC-KITCHENS and the proposed SSv2-LT and VideoLT-LT. Benchmarks and code at: tobyperrett.github.io/lmr

翻訳日:2023-04-04 14:25:34 公開日:2023-04-03

# 2つの非エルゴード可逆セルオートマトン、1つは古典的、もう1つは量子的

On two non-ergodic reversible cellular automata, one classical, the other quantum ( http://arxiv.org/abs/2304.01130v1 )

ライセンス: Link先を確認

Tomaz Prosen

(参考訳) 本稿では, 1+1次元のセルオートマトンと, さらなる研究や応用を保証できる, 単純で興味深い性質を持つ2種類の運動粒子モデルを提案し, 議論する。最初のモデルは2種類の準粒子を記述した決定論的で可逆的なオートマトンであり、安定な質量を持たない物質粒子は速度$\pm 1$で動き、不安定で立ち上がり(速度ゼロ)の磁場粒子である。モデルの3つの保存電荷に対する2つの異なる連続性方程式について議論する。最初の2つの電荷と対応する電流は、3つの(3)格子サイトをサポートし、保存されたエネルギー-運動量テンソルの格子類似体を表すが、9つの(9)サイトをサポートする追加の保存電荷と電流を見つける。 2つ目のモデルは、最近導入され研究された荷電ハードポイント格子気体の量子(または確率的)変形を表しており、異なる二元電荷(\pm 1$)と二元速度(\pm 1$)の粒子は弾性衝突散乱によって非自明に混合することができる。このモデルのユニタリ進化則は、ヤン・バクスター方程式を完全に満たさないが、局所保存作用素の無限集合、いわゆるグライダー作用素を産む興味深い関連する同一性を満たす。

We propose and discuss two variants of kinetic particle models - cellular automata in 1+1 dimensions, which have some appeal due to their simplicity and intriguing properties which could warrant further research and applications. The first model is a deterministic and reversible automaton describing two species of quasiparticles: stable massless matter particles moving with velocity $\pm 1$ and unstable, standing (zero velocity) field particles. We discuss two distinct continuity equations for three conserved charges of the model. While the first two charges and the corresponding currents have support three (3) lattice sites and represent a lattice analogue of conserved energy-momentum tensor, we find an additional conserved charge and current with support of nine (9) sites, implying non-ergodic behaviour and potentially signalling integrability of the model with a highly nested R-matrix structure. The second model represents a quantum (or stochastic) deformation of a recently introduced and studied charged hardpoint lattice gas, where particles of different binary charge ($\pm 1$) and binary velocity ($\pm 1$) can nontrivially mix upon elastic collisional scattering. We show that while the unitary evolution rule of this model does not satisfy the full Yang-Baxter equation, it still satisfies an intriguing related identity which gives birth to an infinite set of local conserved operators, the-so-called glider operators.

翻訳日:2023-04-04 14:25:21 公開日:2023-04-03

# 定量的表現と高次元分布距離を用いた合成パラメータ効果検出

Synthesis parameter effect detection using quantitative representations and high dimensional distribution distances ( http://arxiv.org/abs/2304.01120v1 )

ライセンス: Link先を確認

Alex Hagen, Shane Jackson

(参考訳) 合成過程のパラメータが材料の微細構造に及ぼす影響の検出は、材料科学の重要で、しかし、理解に足らない目標である。我々は,Pu(III)オキサレートから酸化プルトニウムを合成する設計実験を解析するために,コプラ理論,高次元分布距離,置換統計に基づく効果の検出法を開発した。結果の酸化プルトニウムの微細構造に及ぼすストライクオーダーとシュウ酸フィードの影響を,文献とよく一致させた。また, 酸性濃度, ストライクオーダー, 降水温度の2つのペア間の過剰な二変量効果も検出した。

Detection of effects of the parameters of the synthetic process on the microstructure of materials is an important, yet elusive goal of materials science. We develop a method for detecting effects based on copula theory, high dimensional distribution distances, and permutational statistics to analyze a designed experiment synthesizing plutonium oxide from Pu(III) Oxalate. We detect effects of strike order and oxalic acid feed on the microstructure of the resulting plutonium oxide, which match the literature well. We also detect excess bivariate effects between the pairs of acid concentration, strike order and precipitation temperature.

翻訳日:2023-04-04 14:24:51 公開日:2023-04-03

# データサイエンスのための解釈可能なシンボリック回帰:2022年競争の分析

Interpretable Symbolic Regression for Data Science: Analysis of the 2022 Competition ( http://arxiv.org/abs/2304.01117v1 )

ライセンス: Link先を確認

F. O. de Franca, M. Virgolin, M. Kommenda, M. S. Majumder, M. Cranmer, G. Espada, L. Ingelse, A. Fonseca, M. Landajuela, B. Petersen, R. Glatt, N. Mundhenk, C. S. Lee, J. D. Hochhalter, D. L. Randall, P. Kamienny, H. Zhang, G. Dick, A. Simon, B. Burlacu, Jaan Kasak, Meera Machado, Casper Wilstrup, W. G. La Cava

(参考訳) 現象を正確に記述した解析式に対する記号回帰探索このアプローチの主な魅力は、ユーザにとって洞察力のある解釈可能なモデルを返すことだ。歴史的に、記号回帰のアルゴリズムの大半は進化的アルゴリズムに基づいている。しかし、最近、列挙アルゴリズム、混合線形整数プログラミング、ニューラルネットワーク、ベイズ最適化のようなアプローチを利用する新しい提案が急増している。これらの新しいアプローチが現実世界のデータでしばしば直面する共通の課題に対してどのように振る舞うかを評価するために、私たちは2022年の遺伝的および進化的計算会議でコンペティションを開催しました。実世界のトラックでは,ドメインエキスパートを用いて,候補モデルの信頼性を判断し,現実的に解釈可能性を評価する。このコンペで得られた結果の詳細な分析を行い,シンボル回帰アルゴリズムの課題について議論し,今後の競争改善の可能性を明らかにする。

Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.

翻訳日:2023-04-04 14:24:41 公開日:2023-04-03

# ReMoDiffuse:Retrieval-Augmented Motion Diffusion Model

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model ( http://arxiv.org/abs/2304.01116v1 )

ライセンス: Link先を確認

Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, Ziwei Liu

(参考訳) 3Dモーション生成はクリエイティブ産業にとって不可欠だ。最近の進歩は、テキスト駆動モーション生成のためのドメイン知識を持つ生成モデルに依存している。しかし、より多様な動きでの演奏は満足できないままである。本研究では,検索機構を統合した拡散モデルに基づく動き生成フレームワーク remodiffuse を提案する。 ReMoDiffuseは3つの重要な設計でテキスト駆動モーション生成の一般化性と多様性を高める 1) ハイブリッド検索は, 意味的およびキネマティックな類似性の観点から, データベースから適切な参照を求める。 2)Semantic-Modulated Transformerは検索知識を選択的に吸収し,検索したサンプルと対象の動作シーケンスの差に適応する。 3) 条件混合は, 推論中に検索データベースをより活用し, 分類器フリーガイダンスの尺度感度を克服する。広範な実験により、remodiffuseは、特により多様なモーション生成のために、テキスト・モーションの一貫性と動作品質の両方をバランスさせることにより、最先端の手法よりも優れていることが示されている。

3D human motion generation is crucial for creative industry. Recent advances rely on generative models with domain knowledge for text-driven motion generation, leading to substantial progress in capturing common motions. However, the performance on more diverse motions remains unsatisfactory. In this work, we propose ReMoDiffuse, a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process. ReMoDiffuse enhances the generalizability and diversity of text-driven motion generation with three key designs: 1) Hybrid Retrieval finds appropriate references from the database in terms of both semantic and kinematic similarities. 2) Semantic-Modulated Transformer selectively absorbs retrieval knowledge, adapting to the difference between retrieved samples and the target motion sequence. 3) Condition Mixture better utilizes the retrieval database during inference, overcoming the scale sensitivity in classifier-free guidance. Extensive experiments demonstrate that ReMoDiffuse outperforms state-of-the-art methods by balancing both text-motion consistency and motion quality, especially for more diverse motion generation.

翻訳日:2023-04-04 14:24:24 公開日:2023-04-03

# 量子インスツルメンテーション制御キットシステムを用いた絡み合った光子対源デモンストレータ

Entangled Photon Pair Source Demonstrator using the Quantum Instrumentation Control Kit System ( http://arxiv.org/abs/2304.01190v1 )

ライセンス: Link先を確認

Si Xie, Leandro Stefanazzi, Christina Wang, Cristian Pena, Raju Valivarthi, Lautaro Narvaez, Gustavo Cancelo, Keshav Kapoor, Boris Korzh, Matthew Shaw, Panagiotis Spentzouris, Maria Spiropulu

(参考訳) 本稿では,RFSoCFPGA技術を用いた量子計測制御キット(QICK)システムによる光子対の絡み合った光源の駆動と光子信号の検出について報告する。 QICKシステムでは、一致事故率150を超え、エンタングルメントの可視性は95%を超え、従来の波形生成器を用いた性能測定値と一致している。また,QICKのディジタル化関数を用いた同時検出読み出しを行い,内部の同期時間3.2 psを実現した。本稿では,量子ネットワークの動作において,商用波形生成器とタイムタガーをrfsoc-fpga技術で置き換える実現可能性を明確に示し,コストを1桁以上削減することを示す。

We report the first demonstration of using the Quantum Instrumentation and Control Kit (QICK) system on RFSoCFPGA technology to drive an entangled photon pair source and to detect the photon signals. With the QICK system, we achieve high levels of performance metrics including coincidence-to-accidental ratio exceeding 150, and entanglement visibility exceeding 95%, consistent with performance metrics achieved using conventional waveform generators. We also demonstrate simultaneous detector readout using the digitization functional of QICK, achieving internal system synchronization time resolution of 3.2 ps. The work reported in this paper represents an explicit demonstration of the feasibility for replacing commercial waveform generators and time taggers with RFSoC-FPGA technology in the operation of a quantum network, representing a cost reduction of more than an order of magnitude.

翻訳日:2023-04-04 14:17:16 公開日:2023-04-03

# Poseをフォローする: Pose-Guided Text-to-Video Generation by Pose-free Videos

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos ( http://arxiv.org/abs/2304.01186v1 )

ライセンス: Link先を確認

Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen

(参考訳) テキスト編集可能でポーズ制御可能なキャラクタビデオの生成は、さまざまなデジタル人間を作成する上で不必要に要求される。それでも、このタスクは、ペア化されたビデオの字幕と、ビデオの生成前のモデルを含む包括的なデータセットが存在しないことで制限されている。本研究では,手軽に得られるデータセット(画像ポーズペアとポーズフリービデオ)と事前学習されたテキスト・ツー・イメージモデル(t2i)を活用し,ポーズ制御可能なキャラクタビデオを得ることのできる,新たな2段階学習方式を提案する。具体的には、第1段階では、キーポイントと画像のペアのみが制御可能なテキストと画像の生成にのみ使用される。我々はポーズ情報をエンコードするゼロイニシャライズ畳み込みエンコーダを学習する。第2段階では,学習可能な時間的自己着脱ブロックと再構成されたクロスフレーム自己着脱ブロックを付加することにより,ポーズフリービデオデータセットを介して,上記ネットワークの動作を微調整する。本手法は,新たな設計により,事前学習したt2iモデルの編集と概念合成能力を維持しつつ,連続的なポーズ制御可能なキャラクタビデオの生成に成功している。コードとモデルは公開される予定だ。

Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human. Nevertheless, this task has been restricted by the absence of a comprehensive dataset featuring paired video-pose captions and the generative prior models for videos. In this work, we design a novel two-stage training scheme that can utilize easily obtained datasets (i.e.,image pose pair and pose-free video) and the pre-trained text-to-image (T2I) model to obtain the pose-controllable character videos. Specifically, in the first stage, only the keypoint-image pairs are used only for a controllable text-to-image generation. We learn a zero-initialized convolu- tional encoder to encode the pose information. In the second stage, we finetune the motion of the above network via a pose-free video dataset by adding the learnable temporal self-attention and reformed cross-frame self-attention blocks. Powered by our new designs, our method successfully generates continuously pose-controllable character videos while keeps the editing and concept composition ability of the pre-trained T2I model. The code and models will be made publicly available.

翻訳日:2023-04-04 14:17:00 公開日:2023-04-03

# weaktr: 弱教師付き意味セグメンテーションのためのプレーンビジョントランスフォーマの検討

WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation ( http://arxiv.org/abs/2304.01184v1 )

ライセンス: Link先を確認

Lianghui Zhu, Yingyue Li, Jieming Fang, Yan Liu, Hao Xin, Wenyu Liu, Xinggang Wang

(参考訳) 本稿では,Wakly-supervised Semantic Segmentation (WSSS) のためのプレーンビジョン変換器 (ViT) の特性について検討する。クラスアクティベーションマップ(CAM)は、分類ネットワークを理解してWSSSを起動する上で非常に重要である。我々は、ViTの異なるアテンションヘッドが異なる画像領域に焦点を当てていることを観察する。そこで, より完全な対象を持つ傾向のある高品質CAM結果に対して, 自己注意マップを適応的に融合させながら, 注目ヘッドの重要性をエンドツーエンドで推定する手法を提案する。さらに,CAMの結果をオンラインリトレーニングしてWSSSタスクを完了するためのViTベースの勾配クリッピングデコーダを提案する。我々はこの平易なTransformerベースのWeakly教師付き学習フレームワークをWeakTrと名付けた。標準的なベンチマークでは、PASCAL VOC 2012のvalセットでは78.4% mIoU、COCO 2014のvalセットでは50.3% mIoUである。コードはhttps://github.com/hustvl/WeakTr.comで入手できる。

This paper explores the properties of the plain Vision Transformer (ViT) for Weakly-supervised Semantic Segmentation (WSSS). The class activation map (CAM) is of critical importance for understanding a classification network and launching WSSS. We observe that different attention heads of ViT focus on different image areas. Thus a novel weight-based method is proposed to end-to-end estimate the importance of attention heads, while the self-attention maps are adaptively fused for high-quality CAM results that tend to have more complete objects. Besides, we propose a ViT-based gradient clipping decoder for online retraining with the CAM results to complete the WSSS task. We name this plain Transformer-based Weakly-supervised learning framework WeakTr. It achieves the state-of-the-art WSSS performance on standard benchmarks, i.e., 78.4% mIoU on the val set of PASCAL VOC 2012 and 50.3% mIoU on the val set of COCO 2014. Code is available at https://github.com/hustvl/WeakTr.

翻訳日:2023-04-04 14:16:38 公開日:2023-04-03

# schr\"odinger方程式の非線形拡大の完全可解モデル

Exactly solvable models of nonlinear extensions of the Schr\"odinger equation ( http://arxiv.org/abs/2304.01183v1 )

ライセンス: Link先を確認

Tom Dodge and Peter Schweitzer

(参考訳) schr\"odinger方程式の完全可解な非線形拡張を構成する方法を提案する。この方法は、正確に解ける通常のシュリンガー方程式と、正確に解ける非線形理論の間の一定の条件下で確立できる対応を探索する。本手法の具体例をいくつか紹介する。我々はよく知られたソリトン解を再定義し、様々な空間次元において解くことができる新しい非線形理論を見つける。この手法は、より非線形な理論を構築し、相対論的ソリトン理論に一般化することができ、多くの応用が期待できる。

A method is presented to construct exactly solvable nonlinear extensions of the Schr\"odinger equation. The method explores a correspondence which can be established under certain conditions between exactly solvable ordinary Schr\"odinger equations and exactly solvable nonlinear theories. We provide several examples illustrating the method. We rederive well-known soliton solutions and find new exactly solvable nonlinear theories in various space dimensions which, to the best of our knowledge, have not yet been discussed in literature. Our method can be used to construct further nonlinear theories and generalized to relativistic soliton theories, and may have many applications.

翻訳日:2023-04-04 14:16:20 公開日:2023-04-03

# 双極子対称性破壊からの非フェルミ液体

Non-Fermi Liquids from Dipolar Symmetry Breaking ( http://arxiv.org/abs/2304.01181v1 )

ライセンス: Link先を確認

Amogh Anakru, Zhen Bi

(参考訳) フラクトロニック位相の出現と量子力学の新しい普遍性クラスは、凝縮系における双極子対称性の重要性を強調している。本研究では,種々の空間次元のフェルミオンモデルにおける双極子対称性の対称性破断相の性質について検討する。このような系では、フェルミオンは双極子凝縮によってエネルギー分散を得る。変換対称性と双極子対称性の間の非自明な可換性のため、二極子縮合の金石モードは分散フェルミオンに強く結合し、自然に低エネルギーで非フェルミ液体を生じさせる。双極子対称性の破れ相のIR記述は、創発的U(1)ゲージ場と結合するフェルミ曲面のよく知られた理論に類似している。また,双極子対称性がわずかに破れた場合の交叉挙動と異方性双極子保存の場合についても論じる。

The emergence of fractonic topological phases and novel universality classes for quantum dynamics highlights the importance of dipolar symmetry in condensed matter systems. In this work, we study the properties of symmetry-breaking phases of the dipolar symmetries in fermionic models in various spatial dimensions. In such systems, fermions obtain energy dispersion through dipole condensation. Due to the nontrivial commutation between the translation symmetry and dipolar symmetry, the Goldstone modes of the dipolar condensate are strongly coupled to the dispersive fermions and naturally give rise to non-Fermi liquids at low energies. The IR description of the dipolar symmetry-breaking phase is analogous to the well-known theory of a Fermi surface coupled to an emergent U(1) gauge field. We also discuss the crossover behavior when the dipolar symmetry is slightly broken and the cases with anisotropic dipolar conservation.

翻訳日:2023-04-04 14:16:10 公開日:2023-04-03

# BERTを用いたパーラーにおけるヘイトスピーチターゲット検出

Hate Speech Targets Detection in Parler using BERT ( http://arxiv.org/abs/2304.01179v1 )

ライセンス: Link先を確認

Nadav Schneider, Shimon Shouei, Saleem Ghantous, Elad Feldman

(参考訳) オンラインソーシャルネットワークは、私たちの日常生活の基本的な構成要素となっている。残念ながら、これらのプラットフォームはヘイトスピーチの舞台でもある。人気ソーシャルネットワークはヘイトスピーチに対する規則を定めている。その結果、ParlerやGabのようなソーシャルネットワークは、無料の音声プラットフォームを提唱し、主張している。これらのプラットフォームは、様々なターゲットに対するヘイトスピーチの地区となっている。本稿では、ヘイトスピーチとそのターゲットを検知し、パラーヘイトターゲットの分布を作成するためのパイプラインを提案する。パイプラインは2つのモデルで構成されており、1つはヘイトスピーチ検出用、もう1つはターゲット分類用である。この作業で使用されるソースコードと他の関連するソースは、https://github.com/NadavSc/HateRecognition.gitで公開されている。

Online social networks have become a fundamental component of our everyday life. Unfortunately, these platforms are also a stage for hate speech. Popular social networks have regularized rules against hate speech. Consequently, social networks like Parler and Gab advocating and claiming to be free speech platforms have evolved. These platforms have become a district for hate speech against diverse targets. We present in our paper a pipeline for detecting hate speech and its targets and use it for creating Parler hate targets' distribution. The pipeline consists of two models; one for hate speech detection and the second for target classification, both based on BERT with Back-Translation and data pre-processing for improved results. The source code used in this work, as well as other relevant sources, are available at: https://github.com/NadavSc/HateRecognition.git

翻訳日:2023-04-04 14:15:55 公開日:2023-04-03

# エンタングルメントスペクトル平坦性による非安定化性の定量化

Quantifying non-stabilizerness through entanglement spectrum flatness ( http://arxiv.org/abs/2304.01175v1 )

ライセンス: Link先を確認

Emanuele Tirrito, Poetri Sonya Tarabunga, Gugliemo Lami, Titas Chanda, Lorenzo Leone, Salvatore F.E. Oliviero, Marcello Dalmonte, Mario Collura, and Alioscia Hamma

(参考訳) 非安定化性(non-stabilizerness)は、量子コンピューティングにおいて有利なリソースであり、非クリフォード演算へのアクセスにある。非安定性がどのように量子化され、他の量子資源とどのように関連しているかを包括的に理解することは、量子複雑性の起源の研究と特徴付けに不可欠である。本研究では、純量子状態に対する非安定度と絡み合いスペクトルの平坦度との直接接続を確立する。この接続を利用して、ノイズがあっても非安定化剤の効率よく探索できることを示す。以上の結果から,非安定化性と絡み合い応答の直接関係を明らかにし,コールドアトムおよび固体プラットフォームにおける非安定化性を調べるための明快な実験プロトコルを定義した。

Non-stabilizerness - also colloquially referred to as magic - is the a resource for advantage in quantum computing and lies in the access to non-Clifford operations. Developing a comprehensive understanding of how non-stabilizerness can be quantified and how it relates other quantum resources is crucial for studying and characterizing the origin of quantum complexity. In this work, we establish a direct connection between non-stabilizerness and entanglement spectrum flatness for a pure quantum state. We show that this connection can be exploited to efficiently probe non-stabilizerness even in presence of noise. Our results reveal a direct connection between non-stabilizerness and entanglement response, and define a clear experimental protocol to probe non-stabilizerness in cold atom and solid-state platforms.

翻訳日:2023-04-04 14:15:42 公開日:2023-04-03

# 3次元認識画像生成のための生成多面ニューラルラミアンス

Generative Multiplane Neural Radiance for 3D-Aware Image Generation ( http://arxiv.org/abs/2304.01172v1 )

ライセンス: Link先を確認

Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

(参考訳) 本稿では,複数のターゲットビューに対して連続した3次元高解像度画像を効率よく生成する手法を提案する。 GMNRと呼ばれる提案された多面体ニューラルラジアンスモデルは、ビュー依存情報を学習するための新しいビュー依存表現({\alpha}-VdR)モジュールから構成される。 α}-vdr モジュールは {\alpha} で誘導されたピクセルサンプリング技術により実現され、ビュー方向と位置係数を学習することで、ビュー依存表現を効率的に計算する。さらに、複数のビューにまたがって光度類似性を強制するビュー一貫性損失を提案する。 GMNRモデルは、トレーニング時間と推論時間の両方で計算効率を保ちながら、複数のカメラのポーズに一貫性のある3D対応高解像度画像を生成することができる。 3つのデータセットに関する実験により、提案するモジュールの有効性が示され、既存のアプローチと比較して、生成品質と推論時間の両方において良好な結果が得られた。我々のGMNRモデルは、単一のV100上で17.6FPSの1024×1024ピクセルの3D認識画像を生成する。コード:https://github.com/VIROBO-15/GMNR

We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views. The proposed multiplane neural radiance model, named GMNR, consists of a novel {\alpha}-guided view-dependent representation ({\alpha}-VdR) module for learning view-dependent information. The {\alpha}-VdR module, faciliated by an {\alpha}-guided pixel sampling technique, computes the view-dependent representation efficiently by learning viewing direction and position coefficients. Moreover, we propose a view-consistency loss to enforce photometric similarity across multiple views. The GMNR model can generate 3D-aware high-resolution images that are viewconsistent across multiple camera poses, while maintaining the computational efficiency in terms of both training and inference time. Experiments on three datasets demonstrate the effectiveness of the proposed modules, leading to favorable results in terms of both generation quality and inference time, compared to existing approaches. Our GMNR model generates 3D-aware images of 1024 X 1024 pixels with 17.6 FPS on a single V100. Code : https://github.com/VIROBO-15/GMNR

翻訳日:2023-04-04 14:15:30 公開日:2023-04-03

# 自然画像マッティングにおけるコンテキストアグリゲーションの再考

Rethinking Context Aggregation in Natural Image Matting ( http://arxiv.org/abs/2304.01171v1 )

ライセンス: Link先を確認

Qinglin Liu, Shengping Zhang, Quanling Meng, Ru Li, Bineng Zhong, Liqiang Nie

(参考訳) 自然な画像マッチングでは、背景と背景を区別することが困難である場合、文脈情報はアルファマットの推定において重要な役割を果たす。ディープラーニングベースのメソッドの出力は、特に設計されたコンテキストアグリゲーションモジュールを利用してエンコーダ機能を洗練する。しかし、これらのモジュールの有効性は十分に調査されていない。本稿では,コンテキストアグリゲーションモジュールが期待したほど効果的ではないことを示すために,広範な実験を行う。また,大きなイメージパッチで学習すると,より大きな受容領域を持つ基本エンコーダ・デコーダネットワークは,コンテキストを効果的に集約し,より優れた性能を実現することができることを実証する。本報告では,エンコーダに外観強調軸方向学習ブロックを組み込んで,ハイブリッドトランスフォーマデコーダを採用することで,受容領域を拡大する簡易かつ効果的なマットリングネットワークaematterを提案する。 4つのデータセットに対する実験結果から、我々のAEMatterは最先端のマッティング手法(例えばAdobe Composition-1Kデータセットでは、SADとMSEのそれぞれで、それぞれ \textbf{25\%} と \textbf{40\%} の削減)を大幅に上回っていることが示されています。コードとモデルは \url{https://github.com/qlyoo/aematter} で利用可能である。

For natural image matting, context information plays a crucial role in estimating alpha mattes especially when it is challenging to distinguish foreground from its background. Exiting deep learning-based methods exploit specifically designed context aggregation modules to refine encoder features. However, the effectiveness of these modules has not been thoroughly explored. In this paper, we conduct extensive experiments to reveal that the context aggregation modules are actually not as effective as expected. We also demonstrate that when learned on large image patches, basic encoder-decoder networks with a larger receptive field can effectively aggregate context to achieve better performance.Upon the above findings, we propose a simple yet effective matting network, named AEMatter, which enlarges the receptive field by incorporating an appearance-enhanced axis-wise learning block into the encoder and adopting a hybrid-transformer decoder. Experimental results on four datasets demonstrate that our AEMatter significantly outperforms state-of-the-art matting methods (e.g., on the Adobe Composition-1K dataset, \textbf{25\%} and \textbf{40\%} reduction in terms of SAD and MSE, respectively, compared against MatteFormer). The code and model are available at \url{https://github.com/QLYoo/AEMatter}.

翻訳日:2023-04-04 14:15:12 公開日:2023-04-03

# DeepAccident: V2X自動運転の動作と事故予測ベンチマーク

DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving ( http://arxiv.org/abs/2304.01168v1 )

ライセンス: Link先を確認

Tianqi Wang, Sukmin Kim, Wenxuan Ji, Enze Xie, Chongjian Ge, Junsong Chen, Zhenguo Li, Ping Luo

(参考訳) 安全は自動運転の優先事項である。それでも、現在公表されているデータセットは、自律運転の直接的かつ説明可能な安全性評価をサポートしていない。本研究では,実世界の運転時に頻繁に発生する多様な事故シナリオを含む現実的なシミュレータを用いて生成された大規模データセットであるdeepaccidentを提案する。提案するdeepaccidentデータセットは57kの注釈付きフレームと285kの注釈付きサンプルを含み、40kの注釈付きサンプルを持つ大規模nuscenesデータセットの約7倍である。さらに,提案したデータセットに基づいて,新たなタスク,エンドツーエンド動作と事故予測を提案し,異なる自律運転アルゴリズムの事故予測能力を直接評価することができる。さらに,各シナリオに対して,データ記録のための4台の車両と1台のインフラを設定し,事故シナリオの多様な視点を提供し,V2X(車間通信)による知覚と予測タスクの実現を可能にした。最後に,V2XFormerと呼ばれるベースラインV2Xモデルを提案する。

Safety is the primary priority of autonomous driving. Nevertheless, no published dataset currently supports the direct and explainable safety evaluation for autonomous driving. In this work, we propose DeepAccident, a large-scale dataset generated via a realistic simulator containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset contains 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset with 40k annotated samples. In addition, we propose a new task, end-to-end motion and accident prediction, based on the proposed dataset, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms. Furthermore, for each scenario, we set four vehicles along with one infrastructure to record data, thus providing diverse viewpoints for accident scenarios and enabling V2X (vehicle-to-everything) research on perception and prediction tasks. Finally, we present a baseline V2X model named V2XFormer that demonstrates superior performance for motion and accident prediction and 3D object detection compared to the single-vehicle model.

翻訳日:2023-04-04 14:14:45 公開日:2023-04-03

# 準メトリック学習による最適ゴールリーチ強化学習

Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning ( http://arxiv.org/abs/2304.01203v1 )

ライセンス: Link先を確認

Tongzhou Wang, Antonio Torralba, Phillip Isola, Amy Zhang

(参考訳) 目標到達強化学習(rl)では、最適値関数は準メトリック構造と呼ばれる特定の幾何学を持つ。本稿では,準メトリックモデルを用いて最適値関数を学習する新しい rl 手法である quasimetric reinforcement learning (qrl) を提案する。従来のアプローチとは違い、QRLの目標は特に準計量のために設計されており、強力な理論的回復保証を提供する。実験的に、離散化されたマウンテンカー環境を徹底的に分析し、QRLの特性と代替品に対する優位性を識別する。オフラインおよびオンラインの目標達成ベンチマークでは、QRLは、状態ベースと画像ベースの両方で、サンプル効率とパフォーマンスが改善されている。

In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.

翻訳日:2023-04-04 14:08:34 公開日:2023-04-03

# 視覚ロコモーション制御のためのニューラルボリュームメモリ

Neural Volumetric Memory for Visual Locomotion Control ( http://arxiv.org/abs/2304.01201v1 )

ライセンス: Link先を確認

Ruihan Yang, Ge Yang, Xiaolong Wang

(参考訳) 脚のあるロボットは、舗装道路を超えて自律性の範囲を広げる可能性がある。本研究では,1つの前方深度カメラを用いて,挑戦的地形における移動の難しさについて考察する。問題の部分的な観測性のため、ロボットは現在の地形を推定するために過去の観測に頼らなければならない。この問題を解決するために,シーンの3次元形状を明示的にモデル化するコンピュータビジョンのパラダイムに従い,3次元世界のse(3)等分散を明示的に考慮した幾何学的メモリアーキテクチャであるneural volumetric memory (nvm)を提案する。 NVMは、複数のカメラビューの特徴量を、まずロボットのエゴ中心のフレームに戻すことで集約する。我々は,物理ロボットで学習した視覚運動ポリシーをテストし,学習中に幾何学的事前化を明示的に導入する手法が,na\"ive法よりも優れた性能をもたらすことを示す。また,神経容積記憶に記憶されている表現が,シーンを再構築するための十分な幾何学的情報を取得することを示した。ビデオ付きプロジェクトページはhttps://rchalyang.github.io/NVM です。

Legged robots have the potential to expand the reach of autonomy beyond paved roads. In this work, we consider the difficult problem of locomotion on challenging terrains using a single forward-facing depth camera. Due to the partial observability of the problem, the robot has to rely on past observations to infer the terrain currently beneath it. To solve this problem, we follow the paradigm in computer vision that explicitly models the 3D geometry of the scene and propose Neural Volumetric Memory (NVM), a geometric memory architecture that explicitly accounts for the SE(3) equivariance of the 3D world. NVM aggregates feature volumes from multiple camera views by first bringing them back to the ego-centric frame of the robot. We test the learned visual-locomotion policy on a physical robot and show that our approach, which explicitly introduces geometric priors during training, offers superior performance than more na\"ive methods. We also include ablation studies and show that the representations stored in the neural volumetric memory capture sufficient geometric information to reconstruct the scene. Our project page with videos is https://rchalyang.github.io/NVM .

翻訳日:2023-04-04 14:08:23 公開日:2023-04-03

# オープンワールドにおけるビデオインスタンスセグメンテーション

Video Instance Segmentation in an Open-World ( http://arxiv.org/abs/2304.01200v1 )

ライセンス: Link先を確認

Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

(参考訳) 既存のビデオ・インスタンス・セグメンテーション(VIS)のアプローチは一般的にクローズド・ワールドの仮定に従う。オープンワールドの定式化は、次のような密世界の静的学習の仮定を緩和する。 (a)まず、既知のカテゴリの集合を区別し、未知のオブジェクトを「未知」とラベルし、次に b) 未知のクラスと対応するセマンティックラベルが利用可能になったときのクラスを漸進的に学習する。 OW-VISFormerという名前のオープンワールドVISアプローチを提案し、新しい機能強化機構と時空間オブジェクトネス(STO)モジュールを提案する。軽量補助ネットワークに基づく特徴強調機構は,背景からの正確な画素レベルの(未知の)オブジェクト記述と,カテゴリ固有の既知のセマンティッククラスを識別することを目的としている。 STOモジュールは、対照的な損失によって前景のアクティベーションを強化することで、インスタンスレベルの擬似ラベルを生成する。さらに、OW-VISの特性を測定するための広範な実験プロトコルも導入する。我々のOW-VISFormerはOW-VIS設定において、ソリッドベースラインに対して良好に動作します。さらに,最新のSeqFormerに組み込むことで,標準のフル教師付きVIS設定へのコントリビューションを評価し, Youtube-VIS 2019 val において 1.6 % AP の絶対ゲインを実現した。セット最後に,open-world detection (owod) 設定に対する我々の貢献の汎用性を示す。 OW-VISスプリットと共にコード、モデルは \url{https://github.com/OmkarThawakar/OWVISFormer} で入手できる。

Existing video instance segmentation (VIS) approaches generally follow a closed-world assumption, where only seen category instances are identified and spatio-temporally segmented at inference. Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available. We propose the first open-world VIS approach, named OW-VISFormer, that introduces a novel feature enrichment mechanism and a spatio-temporal objectness (STO) module. The feature enrichment mechanism based on a light-weight auxiliary network aims at accurate pixel-level (unknown) object delineation from the background as well as distinguishing category-specific known semantic classes. The STO module strives to generate instance-level pseudo-labels by enhancing the foreground activations through a contrastive loss. Moreover, we also introduce an extensive experimental protocol to measure the characteristics of OW-VIS. Our OW-VISFormer performs favorably against a solid baseline in OW-VIS setting. Further, we evaluate our contributions in the standard fully-supervised VIS setting by integrating them into the recent SeqFormer, achieving an absolute gain of 1.6\% AP on Youtube-VIS 2019 val. set. Lastly, we show the generalizability of our contributions for the open-world detection (OWOD) setting, outperforming the best existing OWOD method in the literature. Code, models along with OW-VIS splits are available at \url{https://github.com/OmkarThawakar/OWVISFormer}.

翻訳日:2023-04-04 14:08:04 公開日:2023-04-03

# 人間の行動認識における3次元ポーズとトラッキングの利点について

On the Benefits of 3D Pose and Tracking for Human Action Recognition ( http://arxiv.org/abs/2304.01199v1 )

ライセンス: Link先を確認

Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph Feichtenhofer, Jitendra Malik

(参考訳) 本研究では,行動認識のためのトラッキングと3Dポーズの利点について検討する。これを達成するために、空間の定点ではなく、人間の運動の軌道上の行動を分析するラグランジュ的視点を採る。この立場を取ることで、人々のトラックレットを使って行動を予測することができます。この精神の中では、まず3Dのポーズを用いて行動を推測し、対人インタラクションを研究することの利点を示す。次に,トラックレット上での3次元ポーズと文脈的外観を用いてラグランジュ的行動認識モデルを提案する。そこで本手法は,AVA v2.2データセットのポーズのみの設定と標準ベンチマーク設定の両方で,最先端のパフォーマンスを実現する。ポーズキューのみを用いてアクションを推論すると、ポーズモデルは対応する最先端モデルに対して+10.0mAP、融合モデルは最高の最先端モデルに対して+2.8mAPとなる。コードと結果は以下の通りである。 https://brjathu.github.io/lart

In this work we study the benefits of using tracking and 3D poses for action recognition. To achieve this, we take the Lagrangian view on analysing actions over a trajectory of human motion rather than at a fixed point in space. Taking this stand allows us to use the tracklets of people to predict their actions. In this spirit, first we show the benefits of using 3D pose to infer actions, and study person-person interactions. Subsequently, we propose a Lagrangian Action Recognition model by fusing 3D pose and contextualized appearance over tracklets. To this end, our method achieves state-of-the-art performance on the AVA v2.2 dataset on both pose only settings and on standard benchmark settings. When reasoning about the action using only pose cues, our pose model achieves +10.0 mAP gain over the corresponding state-of-the-art while our fused model has a gain of +2.8 mAP over the best state-of-the-art model. Code and results are available at: https://brjathu.github.io/LART

翻訳日:2023-04-04 14:07:36 公開日:2023-04-03

# デカップリングワンパスネットワークを用いたゼロショットセマンティクスセグメンテーション

Zero-Shot Semantic Segmentation with Decoupled One-Pass Network ( http://arxiv.org/abs/2304.01198v1 )

ライセンス: Link先を確認

Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma

(参考訳) 近年,ゼロショット意味セグメンテーション問題に注目が集まっており,提案マスク生成のためのストリームと事前学習されたビジュアル言語モデルを用いたセグメンテーション分類という,2つのストリームネットワークに基づく手法が最適である。しかし、既存の2ストリーム手法では、非常に非効率な視覚言語モデルに大量の(最大100まで)画像作物を渡す必要がある。この問題に対処するために、入力画像ごとに視覚言語モデルに1回だけパスする必要のあるネットワークを提案する。具体的には,まず,事前学習した視覚エンコーダ内のパッチ埋め込み間の有害干渉を制限するために,パッチ切断と呼ぶ新しいネットワーク適応手法を提案する。そこで我々は,ネットワークがより差別的な特徴に着目するように,分類アンカー学習を提案する。実験の結果,提案手法は最先端の手法を4倍から7倍の速さで上回り,優れた性能を発揮することが示された。コードをhttps://github.com/CongHan0808/DeOP.gitでリリースします。

Recently, the zero-shot semantic segmentation problem has attracted increasing attention, and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pre-trained visual-language model. However, existing two-stream methods require passing a great number of (up to a hundred) image crops into the visuallanguage model, which is highly inefficient. To address the problem, we propose a network that only needs a single pass through the visual-language model for each input image. Specifically, we first propose a novel network adaptation approach, termed patch severance, to restrict the harmful interference between the patch embeddings in the pre-trained visual encoder. We then propose classification anchor learning to encourage the network to spatially focus on more discriminative features for classification. Extensive experiments demonstrate that the proposed method achieves outstanding performance, surpassing state-of-theart methods while being 4 to 7 times faster at inference. We release our code at https://github.com/CongHan0808/DeOP.git.

翻訳日:2023-04-04 14:07:23 公開日:2023-04-03

# あらゆるデスクにテレプレゼンスをもたらす

Bringing Telepresence to Every Desk ( http://arxiv.org/abs/2304.01197v1 )

ライセンス: Link先を確認

Shengze Wang, Ziheng Wang, Ryan Schmelzle, Liujie Zheng, YoungJoong Kwon, Soumyadip Sengupta, Henry Fuchs

(参考訳) 本稿では,すべてのデスクトップにテレプレゼンスを導入する。商用システムとは異なり、パーソナル3dビデオ会議システムは、平均的な消費者にとって経済的かつ計算可能でありながら、高品質なビデオをレンダリングしなければならない。そこで本研究では,4種類のrgbdカメラを必要とせず,ユーザと環境の高品質な自由視点映像を合成するキャプチャ・レンダリングシステムを提案する。実験の結果,オブジェクトテンプレートや重度前処理を使わずに高品質な自由視点映像をレンダリングできることがわかった。リアルタイムではないものの、システムは高速であり、ビデオ単位の最適化を必要としない。さらに,複雑な手のジェスチャーや衣服に対してロバストなシステムであり,新たなユーザに一般化することができる。この作業は、さらなる最適化のための強力な基盤を提供し、近い将来、すべてのデスクにテレプレゼンスをもたらすのに役立ちます。コードとデータセットは当社のwebサイトhttps://mcmvmc.github.io/personaltelepresence/で利用可能になります。

In this paper, we work to bring telepresence to every desktop. Unlike commercial systems, personal 3D video conferencing systems must render high-quality videos while remaining financially and computationally viable for the average consumer. To this end, we introduce a capturing and rendering system that only requires 4 consumer-grade RGBD cameras and synthesizes high-quality free-viewpoint videos of users as well as their environments. Experimental results show that our system renders high-quality free-viewpoint videos without using object templates or heavy pre-processing. While not real-time, our system is fast and does not require per-video optimizations. Moreover, our system is robust to complex hand gestures and clothing, and it can generalize to new users. This work provides a strong basis for further optimization, and it will help bring telepresence to every desk in the near future. The code and dataset will be made available on our website https://mcmvmc.github.io/PersonalTelepresence/.

翻訳日:2023-04-04 14:07:05 公開日:2023-04-03

# Baize: セルフチャットデータに基づくパラメータ効率チューニングを備えたオープンソースのチャットモデル

Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data ( http://arxiv.org/abs/2304.01196v1 )

ライセンス: Link先を確認

Canwen Xu and Daya Guo and Nan Duan and Julian McAuley

(参考訳) ChatGPTのようなチャットモデルは印象的な機能を示しており、多くのドメインで急速に採用されている。しかし、これらのモデルは制限付きAPIを通じてのみアクセス可能であり、この分野における新たな研究と進歩の障壁となる。そこで本研究では,chatgptを利用して対話を行うことで,高品質なマルチターンチャットコーパスを自動生成するパイプラインを提案する。その後,オープンソースの大規模言語モデルであるLLaMAを強化するためにパラメータ効率のチューニングを用いる。得られたモデルBaizeは、潜在的なリスクを最小限に抑えるガードレールとのマルチターン対話において、優れたパフォーマンスを示す。

Chat models, such as ChatGPT, have shown impressive capabilities and have been rapidly adopted across numerous domains. However, these models are only accessible through a restricted API, creating barriers for new research and progress in the field. We propose a pipeline that can automatically generate a high-quality multi-turn chat corpus by leveraging ChatGPT to engage in a conversation with itself. Subsequently, we employ parameter-efficient tuning to enhance LLaMA, an open-source large language model. The resulting model, named Baize, demonstrates good performance in multi-turn dialogues with guardrails that minimize potential risks.

翻訳日:2023-04-04 14:06:49 公開日:2023-04-03

# すべての機能が重要ではない: 適応的な事前リファインメントによるFew-shot CLIPの強化

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement ( http://arxiv.org/abs/2304.01195v1 )

ライセンス: Link先を確認

Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, Peng Gao

(参考訳) Contrastive Language-Image Pre-Training (CLIP) の人気は、様々な下流視覚タスクへの応用を促している。下流タスクの能力を向上させるために、数発の学習が広く採用されている。しかし、既存の方法は限られた性能を示すか、過剰に学習可能なパラメータに悩まされる。本稿では,CLIP の事前学習知識に対する適応的事前 rEfinement 手法である APE を提案する。先行改良モジュールを用いて下流データにおけるクラス間格差を分析し,そのドメイン固有の知識をクリップ抽出キャッシュモデルから分離する。それに加えて、トレーニング不要のAPEとトレーニング不要のAPE-Tの2つのモデル変種を導入する。テスト画像,事前キャッシュモデル,テキスト表現間の三国間親和性を探索し,軽量なカテゴリ対応モジュールのトレーニングのみを可能にする。 11以上のベンチマークの平均精度では、APEとAPE-Tはいずれも最先端に達し、x30より学習可能なパラメータの少ない16ショットで、それぞれ1.59%、+1.99%で2番目のベットを上回っている。

The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its application to diverse downstream vision tasks. To improve its capacity on downstream tasks, few-shot learning has become a widely-adopted technique. However, existing methods either exhibit limited performance or suffer from excessive learnable parameters. In this paper, we propose APE, an Adaptive Prior rEfinement method for CLIP's pre-trained knowledge, which achieves superior accuracy with high computational efficiency. Via a prior refinement module, we analyze the inter-class disparity in the downstream data and decouple the domain-specific knowledge from the CLIP-extracted cache model. On top of that, we introduce two model variants, a training-free APE and a training-required APE-T. We explore the trilateral affinities between the test image, prior cache model, and textual representations, and only enable a lightweight category-residual module to be trained. For the average accuracy over 11 benchmarks, both APE and APE-T attain state-of-the-art and respectively outperform the second-best by +1.59% and +1.99% under 16 shots with x30 less learnable parameters.

翻訳日:2023-04-04 14:06:41 公開日:2023-04-03

# Burstormer:バーストイメージ復元と強化トランスフォーマー

Burstormer: Burst Image Restoration and Enhancement Transformer ( http://arxiv.org/abs/2304.01194v1 )

ライセンス: Link先を確認

Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

(参考訳) シャッタープレスでは、現代のハンドヘルドカメラが高速に複数の画像をキャプチャし、それらをマージして単一の画像を生成する。しかし、バースト内の個々のフレームは避けられない動きのために不整列であり、複数の劣化を含む。課題は、連続した画像を適切に調整し、その補完的な情報をマージして高品質な出力を達成することである。本稿では,バースト画像の復元と拡張のための新しいトランスフォーマーアーキテクチャであるburstormerを提案する。既存の手法と比較して,マルチスケールの局所的特徴と非局所的特徴を活用し,アライメントと機能融合の改善を図る。私たちのキーとなるアイデアは、バーストワイドコンテキストをモデル化しながら、情報集約とプログレッシブフュージョンのためのバースト地区でのフレーム間通信を可能にすることです。しかし、入力バーストフレームは、情報を融合する前に適切に整列する必要がある。そこで本論文では,バースト特徴を参照フレームにアライメントするための拡張変形可能なアライメントモジュールを提案する。既存の手法と異なり,提案するアライメントモジュールはバースト特徴の整列だけでなく,複雑な動きの処理を容易にする参照ベース機能拡張機構を通じて,特徴情報を交換し,参照フレームとの集中的なコミュニケーションを維持する。マルチレベルアライメントおよびエンリッチメントの後、循環バーストサンプリングモジュールを用いてバースト内のフレーム間通信を再強調する。最後に、提案したバースト機能融合モジュールを用いてフレーム間情報を集約し、さらにプログレッシブアップサンプリングを行う。私たちのBurstormerは、バースト超解像、バーストデノイング、バースト低照度向上の最先端手法よりも優れています。私たちのコードと事前訓練済みモデルはhttps:// github.com/akshaydudhane16/Burstormerで利用可能です。

On a shutter press, modern handheld cameras capture multiple images in rapid succession and merge them to generate a single image. However, individual frames in a burst are misaligned due to inevitable motions and contain multiple degradations. The challenge is to properly align the successive image shots and merge their complimentary information to achieve high-quality outputs. Towards this direction, we propose Burstormer: a novel transformer-based architecture for burst image restoration and enhancement. In comparison to existing works, our approach exploits multi-scale local and non-local features to achieve improved alignment and feature fusion. Our key idea is to enable inter-frame communication in the burst neighborhoods for information aggregation and progressive fusion while modeling the burst-wide context. However, the input burst frames need to be properly aligned before fusing their information. Therefore, we propose an enhanced deformable alignment module for aligning burst features with regards to the reference frame. Unlike existing methods, the proposed alignment module not only aligns burst features but also exchanges feature information and maintains focused communication with the reference frame through the proposed reference-based feature enrichment mechanism, which facilitates handling complex motions. After multi-level alignment and enrichment, we re-emphasize on inter-frame communication within burst using a cyclic burst sampling module. Finally, the inter-frame information is aggregated using the proposed burst feature fusion module followed by progressive upsampling. Our Burstormer outperforms state-of-the-art methods on burst super-resolution, burst denoising and burst low-light enhancement. Our codes and pretrained models are available at https:// github.com/akshaydudhane16/Burstormer

翻訳日:2023-04-04 14:06:21 公開日:2023-04-03

# 画像で特定されたオブジェクトへのナビゲート

Navigating to Objects Specified by Images ( http://arxiv.org/abs/2304.01192v1 )

ライセンス: Link先を確認

Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot

(参考訳) イメージは、具体化エージェントがナビゲートすべき特定のオブジェクトインスタンスを指定するための便利な方法である。この課題を解決するには、未知の環境の視覚的推論と探索が必要である。本稿では,この課題をシミュレーションと実世界の両方で行うシステムを提案する。モジュール方式は探索,目標インスタンスの再同定,目標位置特定,局所ナビゲーションといったサブタスクを解決する。特徴マッチングを用いてゴールインスタンスを再同定し、一致した特徴をマップに投影することでゴールインスタンスをローカライズする。各サブタスクは、ゼロの微調整を必要とするオフザシェルフコンポーネントを使用して解決される。 HM3D InstanceImageNavベンチマークでは、このシステムはベースラインのエンドツーエンドのRLポリシー7xと最先端のImageNavモデル2.3x(56%対25%の成功)を上回っている。我々は,このシステムを移動ロボットプラットフォームにデプロイし,実世界の効果的なパフォーマンスを実証し,家庭とオフィス環境全体で88%の成功率を達成した。

Images are a convenient way to specify which particular object instance an embodied agent should navigate to. Solving this task requires semantic visual reasoning and exploration of unknown environments. We present a system that can perform this task in both simulation and the real world. Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation. We re-identify the goal instance in egocentric vision using feature-matching and localize the goal instance by projecting matched features to a map. Each sub-task is solved using off-the-shelf components requiring zero fine-tuning. On the HM3D InstanceImageNav benchmark, this system outperforms a baseline end-to-end RL policy 7x and a state-of-the-art ImageNav model 2.3x (56% vs 25% success). We deploy this system to a mobile robot platform and demonstrate effective real-world performance, achieving an 88% success rate across a home and an office environment.

翻訳日:2023-04-04 14:05:57 公開日:2023-04-03

# SynthVSR:Synthetic Supervisionによる視覚音声認識のスケールアップ

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision ( http://arxiv.org/abs/2303.17200v2 )

ライセンス: Link先を確認

Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, J\'achym Kol\'a\v{r}, Stavros Petridis, Maja Pantic, Christian Fuegen

(参考訳) 最近報告された、視覚音声認識(VSR)における最先端の結果は、しばしば大量のビデオデータに依存するが、公開されている転写されたビデオデータセットのサイズは限られている。本稿では,VSRに合成視覚データを活用する可能性について,初めて考察する。本手法は,合成唇運動を用いたVSRシステムの性能を大幅に向上させる。 SynthVSRの背後にある重要なアイデアは、入力音声に条件付き唇の動きを生成する音声駆動の唇アニメーションモデルを活用することである。音声駆動のリップアニメーションモデルはラベルなしの音声ビジュアルデータセットでトレーニングされ、ラベル付きビデオが利用可能であれば、事前訓練されたvsrモデルにさらに最適化することができる。多くの転写された音響データと顔画像が利用可能であるので、半教師付きVSRトレーニングのためのリップアニメーションモデルを用いて大規模な合成データを生成することができる。提案手法を,最大公用VSRベンチマークであるLip Reading Sentences 3 (LRS3)で評価した。 SynthVSR の WER は 43.3% に達し、実際のラベル付きデータは 30 時間しかなく、何千時間ものビデオを使った既成のアプローチよりも優れている。 WERは、最先端の自己監督型AV-HuBERT法と同等のRS3から438時間のラベル付きデータを使用すると、さらに27.9%に削減される。さらに、大規模な擬似ラベル音声視覚データ合成と組み合わせると、公開されているデータのみを使用して、新しい最先端vsr werが16.9%となり、29倍の非公開機械によるビデオデータ(90,000時間)でトレーニングされた最新の最先端のアプローチを上回っている。最後に,提案手法における各成分の効果を理解するため,広範なアブレーション研究を行った。

Recently reported state-of-the-art results in visual speech recognition (VSR) often rely on increasingly large amounts of video data, while the publicly available transcribed video datasets are limited in size. In this paper, for the first time, we study the potential of leveraging synthetic visual data for VSR. Our method, termed SynthVSR, substantially improves the performance of VSR systems with synthetic lip movements. The key idea behind SynthVSR is to leverage a speech-driven lip animation model that generates lip movements conditioned on the input speech. The speech-driven lip animation model is trained on an unlabeled audio-visual dataset and could be further optimized towards a pre-trained VSR model when labeled videos are available. As plenty of transcribed acoustic data and face images are available, we are able to generate large-scale synthetic data using the proposed lip animation model for semi-supervised VSR training. We evaluate the performance of our approach on the largest public VSR benchmark - Lip Reading Sentences 3 (LRS3). SynthVSR achieves a WER of 43.3% with only 30 hours of real labeled data, outperforming off-the-shelf approaches using thousands of hours of video. The WER is further reduced to 27.9% when using all 438 hours of labeled data from LRS3, which is on par with the state-of-the-art self-supervised AV-HuBERT method. Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16.9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90,000 hours). Finally, we perform extensive ablation studies to understand the effect of each component in our proposed method.

翻訳日:2023-04-04 11:47:45 公開日:2023-04-03

# 光路変調を用いた表面音波の定量的光学画像化法

Quantitative optical imaging method for surface acoustic wave using optical path modulation ( http://arxiv.org/abs/2212.07369v4 )

ライセンス: Link先を確認

Ryusuke Hisatomi, Kotaro Taga, Ryo Sasaki, Yoichi Shiota, Takahiro Moriyama, Teruo Ono

(参考訳) レイリー型表面音響波(SAW)は、その表面局在化、高電気制御性、低伝搬損失により、古典的および量子情報キャリアとして様々な分野で用いられている。 SAWと他の物理系、例えば磁化、電子電荷、電子スピンとの結合とハイブリダイゼーションは、最近のフォノニクスやスピントロニクスの焦点である。表面波振幅の精密測定は結合強度を議論するためにしばしば必要となる。しかし、そのような測定技術はごくわずかであり、概してかなり複雑な分析を必要とする。そこで我々は,SAWを定量的に特徴付ける簡単な測定手法を開発し,実証する。この技術は、光路変調により、コヒーレント駆動SAWによる表面の揺動を光学的に検出する。また、ショットノイズ制限状態で測定システムが動作した場合、光路変調信号から光スポットの表面傾斜及び変位を導出することができる。我々の実証技術は,SAW関連研究にとって重要なツールとなる。

Rayleigh-type surface acoustic wave (SAW) is used in various fields as classical and quantum information carriers because of its surface localization, high electrical controllability, and low propagation loss. Coupling and hybridization between the SAW and other physical systems such as magnetization, electron charge, and electron spin are the recent focuses in phononics and spintronics. Precise measurement of surface wave amplitude is often necessary to discuss the coupling strengths. However, there are only a few such measurement techniques and they generally require a rather complex analysis. Here we develop and demonstrate a straightforward measurement technique that can quantitatively characterize the SAW. The technique optically detects the surface waving due to the coherently driven SAW by the optical path modulation. Furthermore, when the measurement system operates in the shot-noise-limited regime, the surface slope and displacement at the optical spot can be deduced from the optical path modulation signal. Our demonstrated technique will be an important tool for SAW-related research.

翻訳日:2023-04-04 11:46:18 公開日:2023-04-03

# 不確実性のあるユニバーサルドメイン適応

Provably Uncertainty-Guided Universal Domain Adaptation ( http://arxiv.org/abs/2209.09616v7 )

ライセンス: Link先を確認

Yifan Wang, Lin Zhang, Ran Song, Paul L. Rosin, Yibin Li, and Wei Zhang

(参考訳) ユニバーサルドメイン適応(UniDA)は、ラベル付きソースドメインからラベルセットの仮定なしにラベル付きターゲットドメインに知識を転送することを目的としている。 UniDAの主な課題は、識別できないラベルセットが2つのドメイン間のミスアライメントを引き起こすことである。さらに、ソース領域におけるドメインの不一致と教師付き目的は、モデル全体を共通のクラスに偏りやすくし、未知のサンプルに対して過信な予測を生成する。上記の課題に対処するため、我々は新しい不確実性誘導型UniDAフレームワークを提案する。まず、未知のクラスに属する対象サンプルの確率を実証的に推定し、潜在空間における対象サンプルの分布を完全に活用する。次に,この推定に基づいて,$\delta$-filter の線形部分空間における新しい近傍探索スキームを提案し,対象サンプルの不確かさスコアを推定し,未知のサンプルを探索する。ソースドメイン内のターゲットサンプルとその隣人との関係を完全に活用し、ドメインのミスアライメントの影響を避ける。次に,未知のクラスに対するクラス内分散の差を低減できる未知のサンプルの信頼度に基づく不確実性誘導マージン損失により,未知のサンプルと未知のサンプルの両方に対する予測の信頼度をバランスさせる。最後に,3つの公開データセットを用いた実験により,本手法が既存の最先端手法を大幅に上回ることを示した。

Universal domain adaptation (UniDA) aims to transfer the knowledge from a labeled source domain to an unlabeled target domain without any assumptions of the label sets, which requires distinguishing the unknown samples from the known ones in the target domain. A main challenge of UniDA is that the nonidentical label sets cause the misalignment between the two domains. Moreover, the domain discrepancy and the supervised objectives in the source domain easily lead the whole model to be biased towards the common classes and produce overconfident predictions for unknown samples. To address the above challenging problems, we propose a new uncertainty-guided UniDA framework. Firstly, we introduce an empirical estimation of the probability of a target sample belonging to the unknown class which fully exploits the distribution of the target samples in the latent space. Then, based on the estimation, we propose a novel neighbors searching scheme in a linear subspace with a $\delta$-filter to estimate the uncertainty score of a target sample and discover unknown samples. It fully utilizes the relationship between a target sample and its neighbors in the source domain to avoid the influence of domain misalignment. Secondly, this paper well balances the confidences of predictions for both known and unknown samples through an uncertainty-guided margin loss based on the confidences of discovered unknown samples, which can reduce the gap between the intra-class variances of known classes with respect to the unknown class. Finally, experiments on three public datasets demonstrate that our method significantly outperforms existing state-of-the-art methods.

翻訳日:2023-04-04 11:45:32 公開日:2023-04-03

# 不完全情報を用いた総合型ゲームの準最適学習

Near-Optimal Learning of Extensive-Form Games with Imperfect Information ( http://arxiv.org/abs/2202.01752v3 )

ライセンス: Link先を確認

Yu Bai, Chi Jin, Song Mei, Tiancheng Yu

(参考訳) 本稿では,バンディットフィードバックから不完全な情報を広範に学習するための,最適に近いアルゴリズムを設計するという課題を解決する。 x,y$ は情報集合の数であり、$a,b$ は2人のプレイヤーのアクションの数である2人のゼロサムゲームにおいて$\varepsilon$-approximate nash平衡を見つけるためにプレイのエピソードのうち、$\widetilde{\mathcal{o}}((xa+yb)/\varepsilon^2) だけを必要とするアルゴリズムの最初の行を示す。これにより、$\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$の係数が$\widetilde{\mathcal{O}}(\max\{X, Y\})$の最もよく知られたサンプル複雑性が改善され、情報理論の下限を対数因子に合わせる。我々はこのサンプルの複雑さを2つの新しいアルゴリズム: Balanced Online Mirror Descent と Balanced Counterfactual Regret Minimization によって達成する。どちらのアルゴリズムも、古典的手法に『emph{balanced exploration policies}』を統合する新しい手法に依存している。また,マルチプレイヤー汎用ゲームにおける粗相関平衡学習にも適用した。

This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating \emph{balanced exploration policies} into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multi-player general-sum games.

翻訳日:2023-04-04 11:45:03 公開日:2023-04-03

# aiチャットボットは、エンジニアリングの基本(fe)とエンジニアリングの原則と実践(pe)構造試験に合格できるか?

Can AI Chatbots Pass the Fundamentals of Engineering (FE) and Principles and Practice of Engineering (PE) Structural Exams? ( http://arxiv.org/abs/2303.18149v2 )

ライセンス: Link先を確認

M.Z. Naser, Brandon Ross, Jennier Ogle, Venkatesh Kodur, Rami Hawileh, Jamal Abdalla, Huu-Tai Thai

(参考訳) エンジニアリングコミュニティは最近、openai chatgpt-4とgoogle bardのリリースでチャットボット技術の出現を目撃した。これらのチャットボットは、医療や法律の試験を含む様々な標準試験に合格することが報告されているが、このフォーラムの論文は、これらのチャットボットがエンジニアリングの基本(fe)とエンジニアリングの原則と実践(pe)試験にも合格できるかどうかを考察している。 FE試験やPE試験で一般的に見られるように、様々な土木工学や環境工学の質問やシナリオがチャットボットのパフォーマンスを評価するために使用される。チャットボットの応答は,その関連性,正確性,明確性に基づいて分析し,NCEES(National Council of Examiners for Engineering and Surveying)の勧告と比較した。調査の結果,ChatGPT-4 と Bard はそれぞれ FE 試験で 70.9% と 39.2%,PE 試験で 46.2% と 41% を獲得した。現在のChatGPT-4はFE試験に合格する可能性があることは明らかである。将来の版は両方の試験に合格する可能性が高いが、この研究はチャットボットをアシスタントや指導エンジニアとして使う可能性を強調している。

The engineering community has recently witnessed the emergence of chatbot technology with the release of OpenAI ChatGPT-4 and Google Bard. While these chatbots have been reported to perform well and even pass various standardized tests, including medical and law exams, this forum paper explores whether these chatbots can also pass the Fundamentals of Engineering (FE) and Principles and Practice of Engineering (PE) exams. A diverse range of civil and environmental engineering questions and scenarios are used to evaluate the chatbots' performance, as commonly present in the FE and PE exams. The chatbots' responses were analyzed based on their relevance, accuracy, and clarity and then compared against the recommendations of the National Council of Examiners for Engineering and Surveying (NCEES). Our report shows that ChatGPT-4 and Bard, respectively scored 70.9% and 39.2% in the FE exam and 46.2% and 41% in the PE exam. It is evident that the current version of ChatGPT-4 could potentially pass the FE exam. While future editions are much more likely to pass both exams, this study also highlights the potential of using chatbots as teaching assistants and guiding engineers.

翻訳日:2023-04-04 11:38:16 公開日:2023-04-03

# 深層ニューラルネットワーク学習のための2レベルkfac法の解析と比較

Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks ( http://arxiv.org/abs/2303.18083v2 )

ライセンス: Link先を確認

Abdoulaye Koroko, Ani Anciaux-Sedrakian, Ibtihel Ben Gharbia, Val\'erie Gar\`es, Mounir Haddou, Quang Huy Tran

(参考訳) 2次の方法として、Natural Gradient Descent (NGD)はニューラルネットワークのトレーニングを高速化する能力を持っている。しかし、計算とFIM(Fiher Information Matrix)の反転の禁止された計算とメモリコストのため、NGDをディープニューラルネットワーク(DNN)にスケーラブルにするには効率的な近似が必要である。多くの近似が試みられている。最も洗練されたKFACは、FIMをブロック対角行列として近似し、各ブロックはニューラルネットワークの層に対応する。これにより、KFACは異なるレイヤ間の相互作用を無視します。本研究では,二段階法を用いて層間の低周波相互作用を復元する関心について検討する。領域分解から着想を得て、異なる粗い空間を用いたKFACの2段階補正を提案し、評価した。その結果, この方法で層間相互作用を組み込むことで, KFACの性能は向上しないことがわかった。このことは、ブロック対角法が計算時間において十分に堅牢で正確かつ経済的であるため、FIMの対角ブロックを破棄することは安全であることを示している。

As a second-order method, the Natural Gradient Descent (NGD) has the ability to accelerate training of neural networks. However, due to the prohibitive computational and memory costs of computing and inverting the Fisher Information Matrix (FIM), efficient approximations are necessary to make NGD scalable to Deep Neural Networks (DNNs). Many such approximations have been attempted. The most sophisticated of these is KFAC, which approximates the FIM as a block-diagonal matrix, where each block corresponds to a layer of the neural network. By doing so, KFAC ignores the interactions between different layers. In this work, we investigate the interest of restoring some low-frequency interactions between the layers by means of two-level methods. Inspired from domain decomposition, several two-level corrections to KFAC using different coarse spaces are proposed and assessed. The obtained results show that incorporating the layer interactions in this fashion does not really improve the performance of KFAC. This suggests that it is safe to discard the off-diagonal blocks of the FIM, since the block-diagonal approach is sufficiently robust, accurate and economical in computation time.

翻訳日:2023-04-04 11:37:53 公開日:2023-04-03

# グローバルローカルコンテキスト特徴を用いたゼロショット参照画像分割

Zero-shot Referring Image Segmentation with Global-Local Context Features ( http://arxiv.org/abs/2303.17811v2 )

ライセンス: Link先を確認

Seonghoon Yu, Paul Hongsuck Seo, Jeany Son

(参考訳) 参照画像セグメンテーション(RIS)は、入力画像の領域に接する参照表現を与えられたセグメンテーションマスクを見つけることを目的とする。しかし、このタスクのためのラベル付きデータセットの収集はコストと労力がかかることで悪名高い。この問題を克服するために,CLIPから事前学習したクロスモーダル知識を利用した,シンプルで効果的なゼロショット参照画像セグメンテーション手法を提案する。入力テキストに接地したセグメンテーションマスクを得るために,入力画像のグローバルおよびローカルな文脈情報をキャプチャするマスク誘導型ビジュアルエンコーダを提案する。本手法は,市販マスクの提案手法から得られた事例マスクを利用して,細粒度Istanceレベルのグラウンドを分割することができる。また、グローバル機能は入力式全体の複雑な文レベルの意味をキャプチャし、ローカル機能は依存構文解析器によって抽出されたターゲット名詞句に焦点を当てるグローバルローカルテキストエンコーダも導入する。実験では,提案手法は,タスクのゼロショットベースラインや,弱教師付き参照表現セグメンテーションにおいても,かなりのマージンで性能を向上する。私たちのコードはhttps://github.com/seonghoon-yu/zero-shot-risで利用可能です。

Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP. In order to obtain segmentation masks grounded to the input text, we propose a mask-guided visual encoder that captures global and local contextual information of an input image. By utilizing instance masks obtained from off-the-shelf mask proposal techniques, our method is able to segment fine-detailed Istance-level groundings. We also introduce a global-local text encoder where the global feature captures complex sentence-level semantics of the entire input expression while the local feature focuses on the target noun phrase extracted by a dependency parser. In our experiments, the proposed method outperforms several zero-shot baselines of the task and even the weakly supervised referring expression segmentation method with substantial margins. Our code is available at https://github.com/Seonghoon-Yu/Zero-shot-RIS.

翻訳日:2023-04-04 11:37:35 公開日:2023-04-03

# 軽量ビジョントランスにおける局所認識の再考

Rethinking Local Perception in Lightweight Vision Transformer ( http://arxiv.org/abs/2303.17803v2 )

ライセンス: Link先を確認

Qihang Fan, Huaibo Huang, Jiyang Guan, Ran He

(参考訳) 視覚変換器(ViT)は様々な視覚タスクに有効であることが示されている。しかし、それらをモバイルフレンドリーなサイズにリサイズすると、パフォーマンスが大幅に低下する。そのため、軽量な視覚トランスフォーマーの開発は重要な研究分野となっている。本稿では,コンテキスト対応の局所拡張を利用した軽量視覚トランスフォーマであるcloformerを紹介する。 cloformerは、バニラ畳み込み演算子でよく使われるグローバルな共有重みと注意を向けるトークン固有のコンテキスト認識重みの関係を探求し、高頻度の局所情報をキャプチャする効果的で簡単なモジュールを提案する。 CloFormerでは、注意スタイルの畳み込み演算子であるAttnConvを紹介します。提案するattnconvは、共有重みを使ってローカル情報を集約し、注意深く設計されたコンテキストアウェア重みを配置し、ローカル機能を強化する。 CloFormerのFLOPを減らすためにプールを使用するAttnConvとバニラアテンションを組み合わせることで、モデルは高周波と低周波の情報を認識することができる。画像分類,物体検出,意味セグメンテーションなどの広範な実験を行い,cloformerの優位性を実証した。

Vision Transformers (ViTs) have been shown to be effective in various vision tasks. However, resizing them to a mobile-friendly size leads to significant performance degradation. Therefore, developing lightweight vision transformers has become a crucial area of research. This paper introduces CloFormer, a lightweight vision transformer that leverages context-aware local enhancement. CloFormer explores the relationship between globally shared weights often used in vanilla convolutional operators and token-specific context-aware weights appearing in attention, then proposes an effective and straightforward module to capture high-frequency local information. In CloFormer, we introduce AttnConv, a convolution operator in attention's style. The proposed AttnConv uses shared weights to aggregate local information and deploys carefully designed context-aware weights to enhance local features. The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information. Extensive experiments were conducted in image classification, object detection, and semantic segmentation, demonstrating the superiority of CloFormer.

翻訳日:2023-04-04 11:37:14 公開日:2023-04-03

# 半弱教師付き物体運動予測

Semi-Weakly Supervised Object Kinematic Motion Prediction ( http://arxiv.org/abs/2303.17774v2 )

ライセンス: Link先を確認

Gengxin Liu, Qian Sun, Haibin Huang, Chongyang Ma, Yulan Guo, Li Yi, Hui Huang, Ruizhen Hu

(参考訳) 3Dオブジェクトが与えられた場合、運動予測は移動部と対応する運動パラメータを識別することを目的としている。 3Dオブジェクトのトポロジ的構造と幾何学的詳細の両方に大きなバリエーションがあるため、これは依然として困難な課題であり、大規模ラベル付きデータの欠如はディープラーニングに基づくアプローチの性能を制限している。本稿では,物体運動予測問題の課題を半弱教師付き方式で解決する。私たちの重要な観察は2つある。まず、完全に注釈付けされたモーションラベルを持つ3Dデータセットは限られているが、大規模にオブジェクト部分のセマンティックセマンティックセグメンテーションのためのデータセットやメソッドが存在する。第2に、セマンティクス部分のセグメンテーションと移動部分のセグメンテーションは必ずしも一貫してはいないが、基盤となる3d構造から移動部分を検出することが可能である。この目的に向けて,階層的部分レベルのセグメンテーションと移動部パラメータのマップを学習するグラフニューラルネットワークを提案する。このネットワークは、まず完全なラベル付きモビリティ情報を持つPartNet-Mobilityデータセットでトレーニングし、さらに粒度の細かい階層的な部分レベルのセグメンテーションでPartNetデータセットに適用することができる。ネットワーク予測は、擬似ラベル付き移動情報を持つ大規模な3次元オブジェクトを生成し、既存のセグメンテーションによる弱い教師付き学習にも利用できる。実験の結果, 従来の3次元部分走査における運動予測のための拡張データでは, 顕著な性能向上が見られた。

Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters. Due to the large variations in both topological structure and geometric details of 3D objects, this remains a challenging task and the lack of large scale labeled data also constrain the performance of deep learning based approaches. In this paper, we tackle the task of object kinematic motion prediction problem in a semi-weakly supervised manner. Our key observations are two-fold. First, although 3D dataset with fully annotated motion labels is limited, there are existing datasets and methods for object part semantic segmentation at large scale. Second, semantic part segmentation and mobile part segmentation is not always consistent but it is possible to detect the mobile parts from the underlying 3D structure. Towards this end, we propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters, which are further refined based on geometric alignment. This network can be first trained on PartNet-Mobility dataset with fully labeled mobility information and then applied on PartNet dataset with fine-grained and hierarchical part-level segmentation. The network predictions yield a large scale of 3D objects with pseudo labeled mobility information and can further be used for weakly-supervised learning with pre-existing segmentation. Our experiments show there are significant performance boosts with the augmented data for previous method designed for kinematic motion prediction on 3D partial scans.

翻訳日:2023-04-04 11:36:55 公開日:2023-04-03

# 強化学習を用いた英語中規模GPTモデルをスペイン語の小さな閉領域にアライメントする

Aligning a medium-size GPT model in English to a small closed domain in Spanish using reinforcement learning ( http://arxiv.org/abs/2303.17649v2 )

ライセンス: Link先を確認

Oscar R. Navarrete-Parra, Victor Uc-Cetina, Jorge Reyes-Magana

(参考訳) 本稿では,もともとオープンドメインのために英語で訓練された中規模gptモデルを,スペイン語の小さなクローズドドメインに整合させる手法を提案する。モデルを微調整したアプリケーションは、質問応答タスクである。これを実現するためには、別のニューラルネットワーク(報酬モデルと呼んでいます)をトレーニングし、実装する必要があります。このコンポーネントは、システムのデコードと応答の生成を改善するのに役立った。 BLEUやパープレキシティなどの数値指標をモデル評価に使用し、デコード手法と他の手法との比較にも人的判断を用いた。その結果,提案手法が好適であり,報奨モデルを用いて応答の生成を調整することが可能であることが判明した。

In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.

翻訳日:2023-04-04 11:36:31 公開日:2023-04-03

# アダプティブリファインメントとカントロビッチ計量によるデータ駆動抽象化 [拡張版]

Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version] ( http://arxiv.org/abs/2303.17618v2 )

ライセンス: Link先を確認

Adrien Banse, Licio Romao, Alessandro Abate, Rapha\"el M. Jungers

(参考訳) 本稿では,動的システムのスマートでスケーラブルな抽象化のための適応的改良手順を提案する。我々の手法は将来の出力の観測に依存する状態空間の分割に依存している。しかし、この知識は適応的で非対称な方法で動的に構築される。最適構造を学ぶために,マルコフ鎖間のカントロヴィチに触発された計量を定義し,損失関数として用いる。私たちの技術はデータ駆動型フレームワークに傾向がありますが、制限はありません。また、上記のマルコフ連鎖間の計量の性質について研究し、より広い目的のために応用できると考えている。近似アルゴリズムを提案し,従来の線形プログラミング手法よりも計算の複雑さがはるかに高いことを示す。

We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we use it as a loss function. Our technique is prone to data-driven frameworks, but not restricted to. We also study properties of the above mentioned metric between Markov chains, which we believe could be of application for wider purpose. We propose an algorithm to approximate it, and we show that our method yields a much better computational complexity than using classical linear programming techniques.

翻訳日:2023-04-04 11:36:18 公開日:2023-04-03

PDF登録状況（公開日: 20230403）