Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230528となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 境界満足度検査による法的コンプライアンスの早期検証 Early Verification of Legal Compliance via Bounded Satisfiability Checking ( http://arxiv.org/abs/2209.04052v3 ) ライセンス: Link先を確認	Nick Feng, Lina Marsso, Mehrdad Sabetzadeh, Marsha Chechik	(参考訳) 法的特性には、データ値と時間に関する推論が含まれる。計量一階時間論理(mfotl)は、法的性質を特定するための豊富な形式を提供する。 MFOTLは実行時監視による運用システム上の法的特性の検証に成功しているが、要求によってキャプチャされた初期システム開発におけるMFOTLベースの検証のためのソリューションは存在しない。 MFOTLで形式化された法的特性とシステム要件が与えられた場合、その特性のコンプライアンスは満足度チェックによって要求に基づいて検証することができる。本稿では,mfotlの実用的,音質的,完全性(所定のバウンド内)の充足性チェック手法を提案する。充足性モジュラー理論(smt)に基づいたこのアプローチでは、満足のいく解を漸進的に探索するために反例誘導戦略を用いる。本手法をZ3 SMTソルバを用いて実施し,医療・経営・銀行・航空分野にまたがる5つのケーススタディで評価した。提案手法は, 利害関係の法的性質が満たされているか否かを効率よく判断し, コンプライアンス違反につながる反例を生成できることを示す。 Legal properties involve reasoning about data values and time. Metric first-order temporal logic (MFOTL) provides a rich formalism for specifying legal properties. While MFOTL has been successfully used for verifying legal properties over operational systems via runtime monitoring, no solution exists for MFOTL-based verification in early-stage system development captured by requirements. Given a legal property and system requirements, both formalized in MFOTL, the compliance of the property can be verified on the requirements via satisfiability checking. In this paper, we propose a practical, sound, and complete (within a given bound) satisfiability checking approach for MFOTL. The approach, based on satisfiability modulo theories (SMT), employs a counterexample-guided strategy to incrementally search for a satisfying solution. We implemented our approach using the Z3 SMT solver and evaluated it on five case studies spanning the healthcare, business administration, banking and aviation domains. Our results indicate that our approach can efficiently determine whether legal properties of interest are met, or generate counterexamples that lead to compliance violations.	翻訳日:2023-10-24 14:54:49 公開日:2023-05-28
# RefBERT: 自動リネームリファクタリングのための2段階の事前トレーニングフレームワーク RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring ( http://arxiv.org/abs/2305.17708v1 ) ライセンス: Link先を確認	Hao Liu, Yanlin Wang, Zhao Wei, Yong Xu, Juhong Wang, Hui Li, Rongrong Ji	(参考訳) リファクタリングは、ソフトウェア進化におけるソースコードの品質と保守性を改善するために欠かせないプラクティスです。リネームリファクタリングは最も頻繁に実行されるリファクタリングで、識別子の命名が不十分な場合に読みやすさを高めるために識別子の新しい名前を提案する。しかしながら、既存の作業の多くは、ソースコードの2つのバージョン間のリネームアクティビティのみを識別するが、新しい名前を提案する方法についての懸念を表明する作業はほとんどない。本稿では,変数名に対する自動リネームリファクタリングについて検討し,他のリネームリファクタリング活動よりも難しいと考えられる。まず,リネームリファクタリングと一般的な学習パラダイムとの関係,および自然言語処理におけるリネームリファクタリングと一般的なテキスト生成の違いを指摘する。本稿では,変数名のリファクタリングを行うための2段階事前学習フレームワークRefBERTを提案する。 RefBERTはまず、新しい名前のサブトークン数を予測し、それに従ってサブトークンを生成する。制約付きマスク付き言語モデリング、コントラスト学習、およびバッグ・オブ・トークンの損失を含むいくつかのテクニックをRefBERTに組み込んで、変数名の自動リネームリファクタリングをカスタマイズする。構築したリファクタリングデータセットに関する広範な実験を通して、RefBERTの生成した変数名は、既存のメソッドよりも正確で有意義であることを示す。 Refactoring is an indispensable practice of improving the quality and maintainability of source code in software evolution. Rename refactoring is the most frequently performed refactoring that suggests a new name for an identifier to enhance readability when the identifier is poorly named. However, most existing works only identify renaming activities between two versions of source code, while few works express concern about how to suggest a new name. In this paper, we study automatic rename refactoring on variable names, which is considered more challenging than other rename refactoring activities. We first point out the connections between rename refactoring and various prevalent learning paradigms and the difference between rename refactoring and general text generation in natural language processing. Based on our observations, we propose RefBERT, a two-stage pre-trained framework for rename refactoring on variable names. RefBERT first predicts the number of sub-tokens in the new name and then generates sub-tokens accordingly. Several techniques, including constrained masked language modeling, contrastive learning, and the bag-of-tokens loss, are incorporated into RefBERT to tailor it for automatic rename refactoring on variable names. Through extensive experiments on our constructed refactoring datasets, we show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.	翻訳日:2023-10-24 05:17:34 公開日:2023-05-28
# ソフトウェアエコシステムの生活と死 The Life and Death of Software Ecosystems ( http://arxiv.org/abs/2306.10020v1 ) ライセンス: Link先を確認	Raula Gaikovina Kula and Gregorio Robles	(参考訳) ソフトウェアエコシステムは近年,多くの注目を集めています。このような取り組みの境界がプロジェクトの数を超えると、Free/Libre と Open Source Software (FLOSS) エコシステムが出現するのを目の当たりにしています。この章では、アトラクション(と減退)と生態系の死に関連する、健全なエコシステムに寄与する2つの側面を探求する。機能と生存のためには、生態系は人々を惹きつけ、それらを乗船させ、保持する必要がある。第1節では、貢献者(とユーザ)を惹きつけるための挑発的な研究疑問として、flossエコシステムのライフサイクルを探求する。そして、第2節では、システムの死に焦点を合わせ、デッドシステムと推定されるシステムとその死後の状態を探索する。 Software ecosystems have gained a lot of attention in recent times. Industry and developers gather around technologies and collaborate to their advancement; when the boundaries of such an effort go beyond certain amount of projects, we are witnessing the appearance of Free/Libre and Open Source Software (FLOSS) ecosystems. In this chapter, we explore two aspects that contribute to a healthy ecosystem, related to the attraction (and detraction) and the death of ecosystems. To function and survive, ecosystems need to attract people, get them on-boarded and retain them. In Section One we explore possibilities with provocative research questions for attracting and detracting contributors (and users): the lifeblood of FLOSS ecosystems. Then in the Section Two, we focus on the death of systems, exploring some presumed to be dead systems and their state in the afterlife.	翻訳日:2023-10-23 19:25:19 公開日:2023-05-28
# JutePestDetect: 微調整変換学習を用いた害虫識別のためのインテリジェントアプローチ JutePestDetect: An Intelligent Approach for Jute Pest Identification Using Fine-Tuned Transfer Learning ( http://arxiv.org/abs/2308.05179v1 ) ライセンス: Link先を確認	Md. Simul Hasan Talukder, Mohammad Raziuddin Chowdhury, Md Sakib Ullah Sourav, Abdullah Al Rakin, Shabbir Ahmed Shuvo, Rejwan Bin Sulaiman, Musarrat Saberin Nipun, Muntarin Islam, Mst Rumpa Islam, Md Aminul Islam, Zubaer Haque	(参考訳) あるアジア諸国では、ジュートは農業部門の収入と国内総生産(gdp)の主要な源の1つである。他の多くの作物と同様に、ジュテは害虫の媒介になりがちで、バングラデシュ、インド、ミャンマー、中国などの国では一般的に識別される。さらに、この方法は時間がかかり、挑戦的であり、やや不正確であり、かなりの財政的リスクをもたらす。この問題に対処するため,本研究では,早期にジュト害虫を同定する,高性能かつレジリエントな転写学習(TL)に基づくJutePestDetectモデルを提案する。まず,17クラス,約380枚の写真を含むjute pestデータセットを作成し,手作業や自動前処理,背景除去やリサイズなどのクリーニングにより評価した。その後、JutePestDetectモデルを設計するための先行研究から、DenseNet201、InceptionV3、MobileNetV2、VGG19、ResNet50の5つの著名な事前訓練モデルが選ばれた。各モデルは, 分類層をグローバル平均プール層に置き換え, 正則化のためのドロップアウト層を組み込むことで, 再検討を行った。モデルの性能を評価するために、精度、リコール、F1スコア、ROC曲線、混乱行列などの様々な指標を用いた。これらの分析は、モデルの有効性を決定するための追加の洞察を与えた。その中でも、DenseNet201ベースのカスタマイズされたJutePestDetectモデルは、他のモデルよりも優れ、99%の精度を実現した。その結果, 提案手法と戦略は, 全世界の農家にとって有益であり, ジュテの場合, 害虫識別の高度化に寄与する。 In certain Asian countries, Jute is one of the primary sources of income and Gross Domestic Product (GDP) for the agricultural sector. Like many other crops, Jute is prone to pest infestations, and its identification is typically made visually in countries like Bangladesh, India, Myanmar, and China. In addition, this method is time-consuming, challenging, and somewhat imprecise, which poses a substantial financial risk. To address this issue, the study proposes a high-performing and resilient transfer learning (TL) based JutePestDetect model to identify jute pests at the early stage. Firstly, we prepared jute pest dataset containing 17 classes and around 380 photos per pest class, which were evaluated after manual and automatic pre-processing and cleaning, such as background removal and resizing. Subsequently, five prominent pre-trained models -DenseNet201, InceptionV3, MobileNetV2, VGG19, and ResNet50 were selected from a previous study to design the JutePestDetect model. Each model was revised by replacing the classification layer with a global average pooling layer and incorporating a dropout layer for regularization. To evaluate the models performance, various metrics such as precision, recall, F1 score, ROC curve, and confusion matrix were employed. These analyses provided additional insights for determining the efficacy of the models. Among them, the customized regularized DenseNet201-based proposed JutePestDetect model outperformed the others, achieving an impressive accuracy of 99%. As a result, our proposed method and strategy offer an enhanced approach to pest identification in the case of Jute, which can significantly benefit farmers worldwide.	翻訳日:2023-10-23 14:52:33 公開日:2023-05-28
# GIMLET:授業に基づくゼロショット学習のための統一グラフテキストモデル GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning ( http://arxiv.org/abs/2306.13089v1 ) ライセンス: Link先を確認	Haiteng Zhao, Shengchao Liu, Chang Ma, Hannan Xu, Jie Fu, Zhi-Hong Deng, Lingpeng Kong, Qi Liu	(参考訳) 近年,分子特性の予測が注目されている。主なボトルネックは、高価な実験実験によるラベルの不足である。本研究は、この問題を緩和し、タスクのテキスト知識をより活用するために、ゼロショット設定で分子関連タスクを達成するために自然言語命令を用いることの可能性を検討する。既存の分子テキストモデルは,命令の不適切な処理やグラフのキャパシティの制限などにより,この設定では性能に乏しいことが判明した。これらの問題を解決するために,グラフデータとテキストデータの言語モデルを統合するGIMLETを提案する。一般化された位置埋め込みを採用することにより、我々のモデルはグラフ構造と命令文の両方を追加のグラフ符号化モジュールなしでエンコードするように拡張される。 GIMLETはまた、アテンションメカニズムのタスク命令からグラフのエンコーディングを分離し、新しいタスク間のグラフ機能の一般化を強化する。我々は、タスク記述から派生した命令を含む、2,000分子以上のタスクからなるデータセットを構築する。我々は、GIMLETを分子タスクにプリトレーニングし、命令とともにモデルが幅広いタスクに効果的に転送できるようにする。実験の結果、gimletは命令ベースのゼロショット学習において分子テキストベースラインを大きく上回り、toxcastやmmvなどのタスクでgnnモデルを監督する閉じた結果を得ることができた。 Molecule property prediction has gained significant attention in recent years. The main bottleneck is the label insufficiency caused by expensive lab experiments. In order to alleviate this issue and to better leverage textual knowledge for tasks, this study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. We discover that existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. To overcome these issues, we propose GIMLET, which unifies language models for both graph and text data. By adopting generalized position embedding, our model is extended to encode both graph structures and instruction text without additional graph encoding modules. GIMLET also decouples encoding of the graph from tasks instructions in the attention mechanism, enhancing the generalization of graph features across novel tasks. We construct a dataset consisting of more than two thousand molecule tasks with corresponding instructions derived from task descriptions. We pretrain GIMLET on the molecule tasks along with instructions, enabling the model to transfer effectively to a broad range of tasks. Experimental results demonstrate that GIMLET significantly outperforms molecule-text baselines in instruction-based zero-shot learning, even achieving closed results to supervised GNN models on tasks such as toxcast and muv.	翻訳日:2023-06-26 01:11:56 公開日:2023-05-28
# 統合的舗装維持戦略(TDADSS-IPM)のための技術主導型適応的意思決定支援システムを目指して : 気候変動適応のためのリスクアセスメントフレームワークに着目して Towards a Technology-Driven Adaptive Decision Support System for Integrated Pavement and Maintenance strategies (TDADSS-IPM): focus on risk assessment framework for climate change adaptation ( http://arxiv.org/abs/2306.01769v1 ) ライセンス: Link先を確認	Shahrzad Pour, Amir Masoumi, Niels Skov Dujardin	(参考訳) 舗装と維持戦略の意思決定支援システムは、伝統的にサイロが局所最適システムへと導かれるように設計されてきた。さらに、今日の業界4.0の結果、ビッグデータの利用は存在しなかったため、dssは当初、不確実性の源に適応した設計ではなかったため、厳格な決定につながった。気候現象に対する道路資産の脆弱性に触発され,TDADSS-IPMと呼ばれる統合的舗装・保守活動のための技術駆動型適応決定支援システムの導入に向けたビジョン的な一歩を踏み出した。このようなdssの一部として、ボトムアップリスク評価モデルがベイズ信条ネットワーク(bbn)を介して検討され、天候条件によるデンマークの道路の実際の状況を実現する。このようなモデルは知識ドメインのギャップを埋め、時間とともにトレーニングし、実際のイベントにリアルタイムで適用可能なプラットフォームを開発する。 Decision Support Systems for pavement and maintenance strategies have traditionally been designed as silos led to local optimum systems. Moreover, since big data usage didn't exist as result of Industry 4.0 as of today, DSSs were not initially designed adaptive to the sources of uncertainties led to rigid decisions. Motivated by the vulnerability of the road assets to the climate phenomena, this paper takes a visionary step towards introducing a Technology-Driven Adaptive Decision Support System for Integrated Pavement and Maintenance activities called TDADSS-IPM. As part of such DSS, a bottom-up risk assessment model is met via Bayesian Belief Networks (BBN) to realize the actual condition of the Danish roads due to weather condition. Such model fills the gaps in the knowledge domain and develops a platform that can be trained over time, and applied in real-time to the actual event.	翻訳日:2023-06-11 13:58:39 公開日:2023-05-28
# 言語モデル効率研究の定量的考察 A Quantitative Review on Language Model Efficiency Research ( http://arxiv.org/abs/2306.01768v1 ) ライセンス: Link先を確認	Meng Jiang, Hy Dang, Lingbo Tong	(参考訳) 言語モデル(LM)は拡張され、強力になっています。効率の向上は、ニューラル情報処理システムの中核的な研究テーマの1つである。 tay et al. (2022) はnlpの分野において必須となる効率的なトランスフォーマーの包括的な概要を提供した。しかし、『オン・アセスメント』のセクションでは、「多くの研究論文が独自のベンチマークを選択している」ため、彼らは「どの基本的な効率的なトランスフォーマーが考慮すべきか」というオープンな疑問を残した。残念ながら、あらゆるベンチマークでTransformerのパフォーマンスについて定量的な分析は行われなかった。さらに、状態空間モデル(SSM)は、前回レビューでは議論されなかった非アテンション機構を持つ長距離シーケンスをモデル化する能力を示した。本稿では、効率的なトランスフォーマーに関する一連の論文およびssmsに関する論文から得られた結果についてメタ分析を行う。 lm効率研究の定量的なレビューと今後の研究への提案を提供する。 Language models (LMs) are being scaled and becoming powerful. Improving their efficiency is one of the core research topics in neural information processing systems. Tay et al. (2022) provided a comprehensive overview of efficient Transformers that have become an indispensable staple in the field of NLP. However, in the section of "On Evaluation", they left an open question "which fundamental efficient Transformer one should consider," answered by "still a mystery" because "many research papers select their own benchmarks." Unfortunately, there was not quantitative analysis about the performances of Transformers on any benchmarks. Moreover, state space models (SSMs) have demonstrated their abilities of modeling long-range sequences with non-attention mechanisms, which were not discussed in the prior review. This article makes a meta analysis on the results from a set of papers on efficient Transformers as well as those on SSMs. It provides a quantitative review on LM efficiency research and gives suggestions for future research.	翻訳日:2023-06-11 13:58:20 公開日:2023-05-28
# 自己相似性に基づく無室性心音の検出法 A Method for Detecting Murmurous Heart Sounds based on Self-similar Properties ( http://arxiv.org/abs/2306.05283v1 ) ライセンス: Link先を確認	Dixon Vimalajeewa, Chihoon Lee, Brani Vidakovic	(参考訳) 心室とは、心臓の血流によって生じる非定型的な音である。重症心疾患の徴候となるため、心室の検出は心血管疾患の特定と管理に重要である。しかし、現在の無室性心音の同定法は、心音信号の固有特性を探究することで得られる貴重な知見を十分に活用していない。そこで本研究では,ウェーブレット領域から導かれる心臓音の自己相似性と複雑性特性に基づく,新たな識別的特徴セットを提案する。自己相似性はフラクタル挙動の評価によって特徴づけられる一方、複雑性はウェーブレットエントロピーの計算によって調べられる。標準分類器のセットを用いて, 大腿骨の検出におけるこれらの特徴の診断性能を評価した。一般に公開されている心拍データに適用した場合,提案するウェーブレットベースのマルチスケール機能は,より少ない特徴を持つ既存手法に匹敵する性能を示した。これは、心臓音における自己相似性と複雑性特性が、大腿骨検出の精度を向上させる潜在的なバイオマーカーであることを示唆している。 A heart murmur is an atypical sound produced by the flow of blood through the heart. It can be a sign of a serious heart condition, so detecting heart murmurs is critical for identifying and managing cardiovascular diseases. However, current methods for identifying murmurous heart sounds do not fully utilize the valuable insights that can be gained by exploring intrinsic properties of heart sound signals. To address this issue, this study proposes a new discriminatory set of multiscale features based on the self-similarity and complexity properties of heart sounds, as derived in the wavelet domain. Self-similarity is characterized by assessing fractal behaviors, while complexity is explored by calculating wavelet entropy. We evaluated the diagnostic performance of these proposed features for detecting murmurs using a set of standard classifiers. When applied to a publicly available heart sound dataset, our proposed wavelet-based multiscale features achieved comparable performance to existing methods with fewer features. This suggests that self-similarity and complexity properties in heart sounds could be potential biomarkers for improving the accuracy of murmur detection.	翻訳日:2023-06-11 13:16:49 公開日:2023-05-28
# 海面温度画像の再構成:クラウドマスキングと再構成のためのマスク付きオートエンコーダアプローチ Reconstructing Sea Surface Temperature Images: A Masked Autoencoder Approach for Cloud Masking and Reconstruction ( http://arxiv.org/abs/2306.00835v1 ) ライセンス: Link先を確認	Angelina Agabin (1) and J. Xavier Prochaska (1) ((1) University of California, Santa Cruz)	(参考訳) この論文では、12ミクロン未満の波長を用いたリモートセンシングデータの解析をクラウドが妨害し、使用可能なデータの量を大幅に制限し、偏りのある地理的分布(赤道および沿岸地域)を創出するなど、リモートセンシング技術によって生成された海面温度(SST)データの解析において雲マスキングを緩和する新しいアルゴリズムを提案する。この問題を解決するために,マスク付き画素の再構成にMasked Autoencodingを用いたビジョントランスフォーマを用いた,教師なし機械学習アルゴリズムEnkiを提案する。生成したOGCM(Ocean General Circulation Model, OGCM)データセットにおいて, マスク比(t)が10%, 35%, 50%, 75%の4種類のモデルで訓練を行った。性能評価のために,4x4ピクセル^2のパッチを個別にパッチして,画像の10%,20%,30%,40%,50%を乱すランダムな ``clouds''' によるllc4320 sst画像の検証セットを再構成した。 p の全てのレベルにおいて、平均 rmse が 0.03k 未満、すなわち viirs データの推定センサ誤差よりも低い1つまたは複数のモデルが存在することを一貫して発見する。同様に、個々のパッチレベルでは、再構成はパッチの変動よりもRMSE 8倍小さい。そして、予想通り、高い複雑さを持つ画像では、復元エラーが大きくなる。また,画像境界に沿ったパッチが系統的に高い再構成誤差を呈することを明らかにした。円喜は雲のマスキングを再構築する手段として、インペインティングを乗り越える大きな約束を持っていると結論づける。今後の研究は、現実世界のデータを再構築するエンキを開発する。 This thesis presents a new algorithm to mitigate cloud masking in the analysis of sea surface temperature (SST) data generated by remote sensing technologies, e.g., Clouds interfere with the analysis of all remote sensing data using wavelengths shorter than 12 microns, significantly limiting the quantity of usable data and creating a biased geographical distribution (towards equatorial and coastal regions). To address this issue, we propose an unsupervised machine learning algorithm called Enki which uses a Vision Transformer with Masked Autoencoding to reconstruct masked pixels. We train four different models of Enki with varying mask ratios (t) of 10%, 35%, 50%, and 75% on the generated Ocean General Circulation Model (OGCM) dataset referred to as LLC4320. To evaluate performance, we reconstruct a validation set of LLC4320 SST images with random ``clouds'' corrupting p=10%, 20%, 30%, 40%, 50% of the images with individual patches of 4x4 pixel^2. We consistently find that at all levels of p there is one or multiple models that reconstruct the images with a mean RMSE of less than 0.03K, i.e. lower than the estimated sensor error of VIIRS data. Similarly, at the individual patch level, the reconstructions have RMSE 8x smaller than the fluctuations in the patch. And, as anticipated, reconstruction errors are larger for images with a higher degree of complexity. Our analysis also reveals that patches along the image border have systematically higher reconstruction error; we recommend ignoring these in production. We conclude that Enki shows great promise to surpass in-painting as a means of reconstructing cloud masking. Future research will develop Enki to reconstruct real-world data.	翻訳日:2023-06-02 14:46:57 公開日:2023-05-28
# 完全拡張ラグランジアンおよびランダム化反復スケッチによる制約付き最適化 Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching ( http://arxiv.org/abs/2305.18379v1 ) ライセンス: Link先を確認	Ilgee Hong, Sen Na, Michael W. Mahoney, Mladen Kolar	(参考訳) 等式制約付き非線形非凸最適化問題を解くことを検討する。このタイプの問題は、制約付きディープニューラルネットワークから最適制御、PDE制約付き最適化まで、機械学習とエンジニアリングの様々な応用に広く見られる。この問題クラスに対して適応的不適合ニュートン法を開発した。各イテレーションでは、ランダム化反復スケッチ解法を用いてラグランジアンニュートン系を非現実的に解き、正確に拡張されたラグランジアンメリット関数で行探索を行うことで、適切なステップを選択する。ランダム化された解法は、適切なスケッチ行列を備える場合、解法あたりのフロップの複雑性と保存コストを著しく低減し、決定論的線形系解法よりも有利である。本手法は,ランダム化ソルバの精度と正確な拡張ラグランジアンのペナルティパラメータを適応的に制御し,不正確なニュートン方向が正確な拡張ラグランジアンの降下方向であることを保証する。これにより、ほぼ確実にグローバルな収束を確立することができます。また, 単位ステップ化は局所的に許容されるので, 局所線形収束を示す。さらに, ランダム化解器の適応精度条件を徐々に鋭くすれば, 線形収束を超線形収束に強化できることを示す。 CUTEstテストセットにおけるベンチマーク非線形問題,LIBSVMのデータによる制約付きロジスティック回帰,PDE制約問題に対する本手法の優れた性能を示す。 We consider solving equality-constrained nonlinear, nonconvex optimization problems. This class of problems appears widely in a variety of applications in machine learning and engineering, ranging from constrained deep neural networks, to optimal control, to PDE-constrained optimization. We develop an adaptive inexact Newton method for this problem class. In each iteration, we solve the Lagrangian Newton system inexactly via a randomized iterative sketching solver, and select a suitable stepsize by performing line search on an exact augmented Lagrangian merit function. The randomized solvers have advantages over deterministic linear system solvers by significantly reducing per-iteration flops complexity and storage cost, when equipped with suitable sketching matrices. Our method adaptively controls the accuracy of the randomized solver and the penalty parameters of the exact augmented Lagrangian, to ensure that the inexact Newton direction is a descent direction of the exact augmented Lagrangian. This allows us to establish a global almost sure convergence. We also show that a unit stepsize is admissible locally, so that our method exhibits a local linear convergence. Furthermore, we prove that the linear convergence can be strengthened to superlinear convergence if we gradually sharpen the adaptive accuracy condition on the randomized solver. We demonstrate the superior performance of our method on benchmark nonlinear problems in CUTEst test set, constrained logistic regression with data from LIBSVM, and a PDE-constrained problem.	翻訳日:2023-05-31 22:04:56 公開日:2023-05-28
# 潜在量子化による解離 Disentanglement via Latent Quantization ( http://arxiv.org/abs/2305.18378v1 ) ライセンス: Link先を確認	Kyle Hsu and Will Dorrell and James C. R. Whittington and Jiajun Wu and Chelsea Finn	(参考訳) 乱れた表現学習では、モデルはデータセットの基盤となる変動源を区別し、互いに独立して表現するように要求される。モデルにはこれらの情報源に関する基礎的な真理情報がないため、帰納的バイアスは遠絡を可能にする上で最重要である。本研究では,厳しい通信ボトルネックを伴って,データを合成符号化・復号化するための帰納的バイアスを構築する。具体的には、これを行う。 (i)次元ごとに独立したスカラー符号帳で学習可能な離散符号に潜在空間を定量化すること。 (ii)異常に高い重量減少による強モデル正則化の適用。直感的には、量子化はエンコーダに多数のデータポイントにまたがる少数の潜在値の使用を強制し、デコーダは各値に一貫した意味を割り当てることを可能にする。正規化は、モデルをこの控えめな戦略へと導くのに役立ちます。本稿では,基本データ再構成 (vanilla autoencoder) と潜在データ再構成 (InfoGAN) の両方に付加することで,このアプローチの適用性を示す。また,これらのモデルを確実な評価のために,情報理論に密着した絡み合いのための新しい指標であるInfoMECを提案する。正規化とともに、潜在量子化は、ベンチマークデータセットの代表スイートにおける学習された表現のモジュラリティと明示性を劇的に改善する。特に、当社の量子化遅延オートエンコーダ(QLAE)は、データ再構成を損なうことなく、これらのキー不整合特性において、従来から強い手法よりも一貫して優れています。 In disentangled representation learning, a model is asked to tease apart a dataset's underlying sources of variation and represent them independently of one another. Since the model is provided with no ground truth information about these sources, inductive biases take a paramount role in enabling disentanglement. In this work, we construct an inductive bias towards compositionally encoding and decoding data by enforcing a harsh communication bottleneck. Concretely, we do this by (i) quantizing the latent space into learnable discrete codes with a separate scalar codebook per dimension and (ii) applying strong model regularization via an unusually high weight decay. Intuitively, the quantization forces the encoder to use a small number of latent values across many datapoints, which in turn enables the decoder to assign a consistent meaning to each value. Regularization then serves to drive the model towards this parsimonious strategy. We demonstrate the broad applicability of this approach by adding it to both basic data-reconstructing (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models. In order to reliably assess these models, we also propose InfoMEC, new metrics for disentanglement that are cohesively grounded in information theory and fix well-established shortcomings in previous metrics. Together with regularization, latent quantization dramatically improves the modularity and explicitness of learned representations on a representative suite of benchmark datasets. In particular, our quantized-latent autoencoder (QLAE) consistently outperforms strong methods from prior work in these key disentanglement properties without compromising data reconstruction.	翻訳日:2023-05-31 22:04:31 公開日:2023-05-28
# BadLabel: ラベルノイズ学習の評価と改善に関するロバストな視点 BadLabel: A Robust Perspective on Evaluating and Enhancing Label-noise Learning ( http://arxiv.org/abs/2305.18377v1 ) ライセンス: Link先を確認	Jingfeng Zhang, Bo Song, Haohan Wang, Bo Han, Tongliang Liu, Lei Liu, Masashi Sugiyama	(参考訳) ラベルノイズ学習(LNL)は、ノイズラベルを用いたトレーニングデータに基づいてモデルの一般化を促進することを目的としている。実用的なLNLアルゴリズムを実現するために、研究者はクラス条件からインスタンス依存ノイズまで様々なラベルノイズタイプを提案している。本稿では,既存のlnlアルゴリズムの性能を大きなマージンで著しく低下させることができるbadlabelというラベルノイズ型を提案する。 badlabelは、特定のサンプルを選択してラベルを他のラベルにフリップすることで、クリーンでノイズの多いラベルの損失値が区別不能になるような、標準分類に対するラベルフリッピング攻撃に基づいて作成される。さらに,badlabelが提示する課題に対処するために,各時代においてラベルを逆さまに摂動させるロバストなlnl法を提案し,クリーンラベルとノイズラベルの損失値を再度識別する。ラベル付きデータの小さなセットを一度選択すれば、セミ教師付き学習のテクニックを適用してモデルを正確に訓練することができる。実験の結果,既存のlnlアルゴリズムが新たに導入されたbadlabelノイズタイプに対して脆弱であることを実証し,提案するロバストなlnl手法は様々なラベルノイズ下でのモデルの一般化性能を効果的に向上できることを示した。ノイズの多いラベルの新しいデータセットとロバストなLNLアルゴリズムのソースコードはhttps://github.com/zjfheart/BadLabelsで入手できる。 Label-noise learning (LNL) aims to increase the model's generalization given training data with noisy labels. To facilitate practical LNL algorithms, researchers have proposed different label noise types, ranging from class-conditional to instance-dependent noises. In this paper, we introduce a novel label noise type called BadLabel, which can significantly degrade the performance of existing LNL algorithms by a large margin. BadLabel is crafted based on the label-flipping attack against standard classification, where specific samples are selected and their labels are flipped to other labels so that the loss values of clean and noisy labels become indistinguishable. To address the challenge posed by BadLabel, we further propose a robust LNL method that perturbs the labels in an adversarial manner at each epoch to make the loss values of clean and noisy labels again distinguishable. Once we select a small set of (mostly) clean labeled data, we can apply the techniques of semi-supervised learning to train the model accurately. Empirically, our experimental results demonstrate that existing LNL algorithms are vulnerable to the newly introduced BadLabel noise type, while our proposed robust LNL method can effectively improve the generalization performance of the model under various types of label noise. The new dataset of noisy labels and the source codes of robust LNL algorithms are available at https://github.com/zjfheart/BadLabels.	翻訳日:2023-05-31 22:04:08 公開日:2023-05-28
# 不規則なテンソルのための高速かつ正確なデュアルウェイストリーミングPARAFAC2 -アルゴリズムとその応用- Fast and Accurate Dual-Way Streaming PARAFAC2 for Irregular Tensors -- Algorithm and Application ( http://arxiv.org/abs/2305.18376v1 ) ライセンス: Link先を確認	Jun-Gi Jang, Jeongyoung Lee, Yong-chan Park, U Kang	(参考訳) 2次元のテンソルのサイズが時間とともに増加する2方向ストリーミング設定における不規則テンソルの効率的かつ正確に解析するにはどうすればよいか? 双方向ストリーミング設定には、どのような異常がありますか? 不規則なテンソルは列の長さが同じで行の長さが異なる行列の集合である。デュアルウェイストリーミングでは、既存の行列の新しい行と新しい行列の両方が時間とともに現れる。 PARAFAC2分解は不規則なテンソルを解析するための重要なツールである。双方向ストリーミングにはリアルタイム解析が必要であるが、静的PARAFAC2分解法は、新しいデータが到着するたびに蓄積テンソルに対してPARAFAC2分解を実行するため、この設定では効率的に動作しない。既存のストリーミング PARAFAC2 分解法は限られた設定で動作し、新しい行列列を効率的に処理できない。本稿では,双方向ストリーミング環境で動作する効率的かつ高精度なparafac2分解手法であるdashを提案する。新しいデータが与えられると、Dashは、古いデータと新しいデータに関する用語を慎重に分割し、古いデータに関連する単純な計算を避けることで、PARAFAC2分解を効率的に行う。さらに、忘れる要因を適用することで、Dashは最近の動きに従うことができる。広範な実験により、dashは新しく到着したデータに対する既存のparafac2分解法よりも最大14.0倍高速になった。また、サブプライム・モルトゲージ危機やCOVID-19など、現実世界のデータセットの異常を検出するための発見も提供する。 How can we efficiently and accurately analyze an irregular tensor in a dual-way streaming setting where the sizes of two dimensions of the tensor increase over time? What types of anomalies are there in the dual-way streaming setting? An irregular tensor is a collection of matrices whose column lengths are the same while their row lengths are different. In a dual-way streaming setting, both new rows of existing matrices and new matrices arrive over time. PARAFAC2 decomposition is a crucial tool for analyzing irregular tensors. Although real-time analysis is necessary in the dual-way streaming, static PARAFAC2 decomposition methods fail to efficiently work in this setting since they perform PARAFAC2 decomposition for accumulated tensors whenever new data arrive. Existing streaming PARAFAC2 decomposition methods work in a limited setting and fail to handle new rows of matrices efficiently. In this paper, we propose Dash, an efficient and accurate PARAFAC2 decomposition method working in the dual-way streaming setting. When new data are given, Dash efficiently performs PARAFAC2 decomposition by carefully dividing the terms related to old and new data and avoiding naive computations involved with old data. Furthermore, applying a forgetting factor makes Dash follow recent movements. Extensive experiments show that Dash achieves up to 14.0x faster speed than existing PARAFAC2 decomposition methods for newly arrived data. We also provide discoveries for detecting anomalies in real-world datasets, including Subprime Mortgage Crisis and COVID-19.	翻訳日:2023-05-31 22:03:41 公開日:2023-05-28
# ジャンプする学習: ジェネレーティブモデリングのための薄型化と薄型化 Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling ( http://arxiv.org/abs/2305.18375v1 ) ライセンス: Link先を確認	Tianqi Chen and Mingyuan Zhou	(参考訳) 自然画像のための最先端の深層生成モデルを設計するための顕著なパラダイムとして認知学習が登場した。連続実数値データと分類データの分布をモデル化する方法は,最近提案された拡散モデルにおいてよく研究されている。しかし,本論文では,数量や非負の連続データといった,しばしばスパース,スキュード,ヘビーテール,および/またはオーバー分散といった,他の種類のデータをモデル化する能力に制限があることが判明した。そこで本研究では,様々な種類のデータを生成するための一般的なレシピとして,ジャンプ学習を提案する。ディープニューラルネットワークをトレーニングするための学習目標を構築するために、フォワードカウントシンニングプロセスを使用して、リバースカウント厚みプロセスを使用して、そのネットワークを通じてその生成を反復的に洗練する。我々は,ジャンプの学習が認知の学習と相容れないパフォーマンスを期待される場合と,それがよりよいパフォーマンスを期待される場合を実証する。例えば、トレーニングデータが非負の場合、ジャンプの学習が推奨され、強いスパーシリティ、歪み、重く、および/または不均一性を示す。 Learning to denoise has emerged as a prominent paradigm to design state-of-the-art deep generative models for natural images. How to use it to model the distributions of both continuous real-valued data and categorical data has been well studied in recently proposed diffusion models. However, it is found in this paper to have limited ability in modeling some other types of data, such as count and non-negative continuous data, that are often highly sparse, skewed, heavy-tailed, and/or overdispersed. To this end, we propose learning to jump as a general recipe for generative modeling of various types of data. Using a forward count thinning process to construct learning objectives to train a deep neural network, it employs a reverse count thickening process to iteratively refine its generation through that network. We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better. For example, learning to jump is recommended when the training data is non-negative and exhibits strong sparsity, skewness, heavy-tailedness, and/or heterogeneity.	翻訳日:2023-05-31 22:03:17 公開日:2023-05-28
# 純スペクトルグラフ埋め込み:Top-Nレコメンデーションのためのグラフ畳み込みの解釈 Pure Spectral Graph Embeddings: Reinterpreting Graph Convolution for Top-N Recommendation ( http://arxiv.org/abs/2305.18374v1 ) ライセンス: Link先を確認	Edoardo D'Amico, Aonghus Lawlor, Neil Hurley	(参考訳) レコメンダシステムアルゴリズムの開発におけるグラフ畳み込みの利用は、最近、コラボレーティブフィルタリングタスク(cf)において最先端の結果を達成している。グラフ畳み込み演算がグラフスペクトル領域のフィルタリング操作に結びついていることが証明されているが、なぜこれが協調フィルタリング問題により高い性能をもたらすのか理論的根拠は分かっていない。提示された作品には2つの貢献がある。まず,ユーザおよびアイテム表現学習プロセス全体におけるグラフ畳み込みの利用の効果について検討し,正規化随伴行列の最大固有値に対応する固有ベクトルが伝搬する部分空間に対して,フィルタリング操作から学習した潜在機能がどのようにプッシュされるか,および,この部分空間に横たわるベクトルがトレーニングデータ上の予測関数の総和に関連する目的関数の最適解であるかを示す。次に、グラフ畳み込みによって得られる解をエミュレートするために固有ベクトルを直接利用し、時間を要する勾配降下訓練手順の必要性をなくし、3つの実世界のデータセットで高いパフォーマンスを提供するアプローチを提案する。 The use of graph convolution in the development of recommender system algorithms has recently achieved state-of-the-art results in the collaborative filtering task (CF). While it has been demonstrated that the graph convolution operation is connected to a filtering operation on the graph spectral domain, the theoretical rationale for why this leads to higher performance on the collaborative filtering problem remains unknown. The presented work makes two contributions. First, we investigate the effect of using graph convolution throughout the user and item representation learning processes, demonstrating how the latent features learned are pushed from the filtering operation into the subspace spanned by the eigenvectors associated with the highest eigenvalues of the normalised adjacency matrix, and how vectors lying on this subspace are the optimal solutions for an objective function related to the sum of the prediction function over the training data. Then, we present an approach that directly leverages the eigenvectors to emulate the solution obtained through graph convolution, eliminating the requirement for a time-consuming gradient descent training procedure while also delivering higher performance on three real-world datasets.	翻訳日:2023-05-31 22:02:54 公開日:2023-05-28
# KAFA:視覚言語モデルの知識付加的特徴適応による画像広告理解の再考 KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models ( http://arxiv.org/abs/2305.18373v1 ) ライセンス: Link先を確認	Zhiwei Jia and Pradyumna Narayana and Arjun R. Akula and Garima Pruthi and Hao Su and Sugato Basu and Varun Jampani	(参考訳) 画像広告の理解は、幅広い現実世界のアプリケーションにとって重要な課題だ。多様な非定型シーン、現実世界の実体、シーンテキストの推論の関与は極めて困難であるが、画像広告の解釈方法は、特に目覚しい一般化性と適応性を特徴とする基礎的な視覚言語モデル(VLM)の時代において、比較的過小評価されている。本稿では、事前学習したvlmのレンズを通して、画像広告理解に関する最初の実証研究を行う。我々は、これらのVLMを画像広告理解に適用するための実践的な課題をベンチマークし、明らかにする。本稿では,画像広告にマルチモーダル情報を効果的に融合し,実世界の知識を付与するシンプルな特徴適応戦略を提案する。我々は、この研究が、広告業界に広く関連する画像広告理解にさらに注意を向けることを望む。 Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper, we perform the first empirical study of image ad understanding through the lens of pre-trained VLMs. We benchmark and reveal practical challenges in adapting these VLMs to image ad understanding. We propose a simple feature adaptation strategy to effectively fuse multimodal information for image ads and further empower it with knowledge of real-world entities. We hope our study draws more attention to image ad understanding which is broadly relevant to the advertising industry.	翻訳日:2023-05-31 22:02:33 公開日:2023-05-28
# AnoRand:ランダムラベルによる半教師付きディープラーニング異常検出手法 AnoRand: A Semi Supervised Deep Learning Anomaly Detection Method by Random Labeling ( http://arxiv.org/abs/2305.18389v1 ) ライセンス: Link先を確認	Mansour Zoubeirou A Mayaki and Michel Riveill	(参考訳) 異常検出(英: Anomaly detection)またはより一般的には異常検出(英: outliers detection)は、理論的および応用機械学習において最も一般的で困難な課題の一つである。最大の課題は、一般的にラベル付きデータやラベルがまったくないものはほとんどないことです。本稿では,深層学習アーキテクチャとランダムな合成ラベル生成を組み合わせることで,‘textbf{AnoRand}’と呼ばれる半教師付き異常検出手法を提案する。提案アーキテクチャは,(1)フィードフォワードフェルセプトロンからなるノイズ検出(ND)ブロックと(2)オートエンコーダ(AE)ブロックの2つの構成ブロックを有する。この新しいアーキテクチャの主な考え方は、1つのクラス(例えば、異常検出の場合の多数クラス)を学習することであり、潜在空間でデータを表現できるオートエンコーダの能力と、データが高度に不均衡な場合に1つのクラスを学ぶためのフィードフォワードパーセプトロン(ffp)の能力を活用することである。まず、トレーニングセットから少数のサンプル(例えば2\%)をランダムに乱す(ノイズを加える)ことにより、合成異常を作成する。第2に, モデルへの入力として, 正常試料と合成試料を用いる。提案手法の性能を,合成データセットと57実世界のデータセットの17の非教師なし異常検出法と比較した。提案手法は一般に最先端の手法よりも優れており,ほとんどの参照データセット上で最高の性能(AUC ROCとAUC PR)を有することを示す。また、実際のラベルを使ってモデルをトレーニングすることで、教師ありの方法で手法をテストした。その結果、最先端の教師付きアルゴリズムと比較して非常に優れた性能を示した。 Anomaly detection or more generally outliers detection is one of the most popular and challenging subject in theoretical and applied machine learning. The main challenge is that in general we have access to very few labeled data or no labels at all. In this paper, we present a new semi-supervised anomaly detection method called \textbf{AnoRand} by combining a deep learning architecture with random synthetic label generation. The proposed architecture has two building blocks: (1) a noise detection (ND) block composed of feed forward ferceptron and (2) an autoencoder (AE) block. The main idea of this new architecture is to learn one class (e.g. the majority class in case of anomaly detection) as well as possible by taking advantage of the ability of auto encoders to represent data in a latent space and the ability of Feed Forward Perceptron (FFP) to learn one class when the data is highly imbalanced. First, we create synthetic anomalies by randomly disturbing (add noise) few samples (e.g. 2\%) from the training set. Second, we use the normal and the synthetic samples as input to our model. We compared the performance of the proposed method to 17 state-of-the-art unsupervised anomaly detection method on synthetic datasets and 57 real-world datasets. Our results show that this new method generally outperforms most of the state-of-the-art methods and has the best performance (AUC ROC and AUC PR) on the vast majority of reference datasets. We also tested our method in a supervised way by using the actual labels to train the model. The results show that it has very good performance compared to most of state-of-the-art supervised algorithms.	翻訳日:2023-05-31 21:53:56 公開日:2023-05-28
# 価値推定のための分位時間微分学習の統計的効果 The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation ( http://arxiv.org/abs/2305.18388v1 ) ライセンス: Link先を確認	Mark Rowland, Yunhao Tang, Clare Lyle, R\'emi Munos, Marc G. Bellemare, Will Dabney	(参考訳) 強化学習における時間差に基づく政策評価の問題について検討する。特に,この課題に対して,分散強化学習アルゴリズムである量子時間差分学習(QTD)を用いて分析を行う。平均以上のリターン分布に興味がなくても、qtd(リターンの完全な分布について予測を学ぶ)は、表的な設定であっても平均リターンのみを予測する古典的td学習のようなアプローチよりも優れたパフォーマンスを提供する可能性があるという驚くべき結論に達した。 We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reach the surprising conclusion that even if a practitioner has no interest in the return distribution beyond the mean, QTD (which learns predictions about the full distribution of returns) may offer performance superior to approaches such as classical TD learning, which predict only the mean return, even in the tabular setting.	翻訳日:2023-05-31 21:53:27 公開日:2023-05-28
# 生成型adversarial networkを用いたキャラクタデザイナの創造性向上 Augmenting Character Designers Creativity Using Generative Adversarial Networks ( http://arxiv.org/abs/2305.18387v1 ) ライセンス: Link先を確認	Mohammad Lataifeh, Xavier Carrasco, Ashraf Elnagar, Naveed Ahmed	(参考訳) GAN(Generative Adversarial Networks)の最近の進歩は、様々な分野の研究者の注目を集めている。最近のganはリアリズムに重点を置いているが、ハイパーリアルなアウトプットを生成することは、この仕事の場合のように、いくつかのドメインにとって優先事項ではない。生成された結果は、様々なマルチメディアプロジェクトのために新しいキャラクターを概念化しながら、キャラクターデザイナーの創造性を高める認知コンポーネントとして使われる。このような創造的な文脈で最も適したGANを選択するために、まず、単一のグラフィックス処理ユニットを用いて新しいビジュアル文字データセットをスクラッチからトレーニングした場合に、異なるGANアーキテクチャとそれらのパフォーマンスの比較を示す。また,この分野の多くの研究者が直面する課題である計算資源の制限を克服するために,転送学習やデータ拡張といった代替手法も検討する。さらに, キャラクタデザイナーエージェンシー上で生成した視覚の認知的価値を評価するために, 混合手法を用いている。その結果,文字設計プロセスへの早期適応が示すように,この文脈において極めて効果的であることが証明された。この研究の延長として、提案手法は人間と機械間の新しい共同設計プロセスとしてさらに評価され、生成した概念がどのように相互作用し、設計プロセスの結果に影響を与えるかを調査する。 Recent advances in Generative Adversarial Networks (GANs) continue to attract the attention of researchers in different fields due to the wide range of applications devised to take advantage of their key features. Most recent GANs are focused on realism, however, generating hyper-realistic output is not a priority for some domains, as in the case of this work. The generated outcomes are used here as cognitive components to augment character designers creativity while conceptualizing new characters for different multimedia projects. To select the best-suited GANs for such a creative context, we first present a comparison between different GAN architectures and their performance when trained from scratch on a new visual characters dataset using a single Graphics Processing Unit. We also explore alternative techniques, such as transfer learning and data augmentation, to overcome computational resource limitations, a challenge faced by many researchers in the domain. Additionally, mixed methods are used to evaluate the cognitive value of the generated visuals on character designers agency conceptualizing new characters. The results discussed proved highly effective for this context, as demonstrated by early adaptations to the characters design process. As an extension for this work, the presented approach will be further evaluated as a novel co-design process between humans and machines to investigate where and how the generated concepts are interacting with and influencing the design process outcome.	翻訳日:2023-05-31 21:53:17 公開日:2023-05-28
# エアロフォイル空力学における計算流体力学の合成のためのオートエンコーダと生成逆ネットワークを用いた相乗的枠組み A Synergistic Framework Leveraging Autoencoders and Generative Adversarial Networks for the Synthesis of Computational Fluid Dynamics Results in Aerofoil Aerodynamics ( http://arxiv.org/abs/2305.18386v1 ) ライセンス: Link先を確認	Tanishk Nandal, Vaibhav Fulara, Raj Kumar Singh	(参考訳) 計算流体力学(cfd)では、空力挙動の正確な予測は翼の設計と最適化において重要な役割を果たす。本研究では,CFD結果を生成するために,自動エンコーダとGAN(Generative Adversarial Networks)を相乗的に組み合わせた新しい手法を提案する。我々の革新的なフレームワークは、オートエンコーダの本質的な能力を利用して、エアロフォイルジオメトリーを圧縮された20長ベクトル表現にエンコードする。その後、条件付きganネットワークは、このベクトルを正確に圧力分布プロットに変換し、固定風速、攻撃角、乱流レベル仕様を説明する。トレーニングプロセスは、javafoilソフトウェアから取得した細心の注意深いキュレートされたデータセットを使用し、広範囲の翼のジオメトリを包含する。提案手法は空力予測にかかわる時間とコストを低減し, 翼の性能を効果的に評価できる可能性を示す。この結果は流体力学における計算技術の進歩に寄与し、空気力学における設計および最適化プロセスの強化への道を開いた。 In the realm of computational fluid dynamics (CFD), accurate prediction of aerodynamic behaviour plays a pivotal role in aerofoil design and optimization. This study proposes a novel approach that synergistically combines autoencoders and Generative Adversarial Networks (GANs) for the purpose of generating CFD results. Our innovative framework harnesses the intrinsic capabilities of autoencoders to encode aerofoil geometries into a compressed and informative 20-length vector representation. Subsequently, a conditional GAN network adeptly translates this vector into precise pressure-distribution plots, accounting for fixed wind velocity, angle of attack, and turbulence level specifications. The training process utilizes a meticulously curated dataset acquired from JavaFoil software, encompassing a comprehensive range of aerofoil geometries. The proposed approach exhibits profound potential in reducing the time and costs associated with aerodynamic prediction, enabling efficient evaluation of aerofoil performance. The findings contribute to the advancement of computational techniques in fluid dynamics and pave the way for enhanced design and optimization processes in aerodynamics.	翻訳日:2023-05-31 21:52:54 公開日:2023-05-28
# Heterophily グラフのための自己注意型デュアル埋め込み Self-attention Dual Embedding for Graphs with Heterophily ( http://arxiv.org/abs/2305.18385v1 ) ライセンス: Link先を確認	Yurui Lai, Taiyan Zhang, Rui Fan	(参考訳) グラフニューラルネットワーク(GNN)はノード分類タスクにおいて高い成功を収めている。 GNNはグラフがホモフィル性である、すなわち、隣接するノードは同じクラスに属する可能性が高いと仮定する。しかし、多くの実世界のグラフはヘテロ親和性があり、標準のGNNを用いた分類精度ははるかに低い。本研究ではヘテロ親和性グラフとホモ親和性グラフの両方に有効な新しいGNNを設計する。私たちの仕事は3つの主要な観察に基づいている。まず、ノードの特徴とグラフトポロジが異なるグラフで異なる量の情報を提供するため、それらを独立してエンコードし、適応的に優先順位付けする必要があることを示す。第2に,グラフトポロジ情報を伝播する際の負の注意重み付けを行うことで,精度が向上することを示す。最後に,ノード間の非対称な注意重み付けが有効であることを示す。我々は、これらの観測を新しい自己認識機構を通じて活用するGNNを設計する。本アルゴリズムは,数千から数百万のノードを含む実世界のグラフ上で評価し,既存のGNNと比較して最先端の結果が得られることを示す。また,設計の主成分が異なるグラフ上で有効であることも分析した。 Graph Neural Networks (GNNs) have been highly successful for the node classification task. GNNs typically assume graphs are homophilic, i.e. neighboring nodes are likely to belong to the same class. However, a number of real-world graphs are heterophilic, and this leads to much lower classification accuracy using standard GNNs. In this work, we design a novel GNN which is effective for both heterophilic and homophilic graphs. Our work is based on three main observations. First, we show that node features and graph topology provide different amounts of informativeness in different graphs, and therefore they should be encoded independently and prioritized in an adaptive manner. Second, we show that allowing negative attention weights when propagating graph topology information improves accuracy. Finally, we show that asymmetric attention weights between nodes are helpful. We design a GNN which makes use of these observations through a novel self-attention mechanism. We evaluate our algorithm on real-world graphs containing thousands to millions of nodes and show that we achieve state-of-the-art results compared to existing GNNs. We also analyze the effectiveness of the main components of our design on different graphs.	翻訳日:2023-05-31 21:52:34 公開日:2023-05-28
# インクリメンタル学習者に対するバックドア攻撃 : 実証的評価研究 Backdoor Attacks Against Incremental Learners: An Empirical Evaluation Study ( http://arxiv.org/abs/2305.18384v1 ) ライセンス: Link先を確認	Yiqi Zhong, Xianming Liu, Deming Zhai, Junjun Jiang, Xiangyang Ji	(参考訳) 時系列のシーケンシャルなデータを扱う際に発生する破滅的な忘れる問題を軽減するために、大量のインクリメンタル学習アルゴリズムが提案されている。しかし、インクリメンタル学習者の敵対的堅牢性は広く検証されておらず、潜在的なセキュリティリスクは残る。具体的には、中毒ベースのバックドア攻撃では、ILにおけるストリーミングデータの性質は、分散およびクロスタスク攻撃の可能性を生み出すことで、敵にとって非常に便利なものであると論じる。研究コミュニティの注目を引き付けるため,我々は,3つの学習シナリオ,特にバックドア知識のクロスタスク一般化効果に対して,11人の典型的なインクリメンタル学習者の高い脆弱性を実証的に明らかにした。最後に、アクティベーションクラスタリングに基づく防御機構は、潜在的なセキュリティリスクを軽減するトリガーパターンの検出に有効であることが判明した。 Large amounts of incremental learning algorithms have been proposed to alleviate the catastrophic forgetting issue arises while dealing with sequential data on a time series. However, the adversarial robustness of incremental learners has not been widely verified, leaving potential security risks. Specifically, for poisoning-based backdoor attacks, we argue that the nature of streaming data in IL provides great convenience to the adversary by creating the possibility of distributed and cross-task attacks -- an adversary can affect \textbf{any unknown} previous or subsequent task by data poisoning \textbf{at any time or time series} with extremely small amount of backdoor samples injected (e.g., $0.1\%$ based on our observations). To attract the attention of the research community, in this paper, we empirically reveal the high vulnerability of 11 typical incremental learners against poisoning-based backdoor attack on 3 learning scenarios, especially the cross-task generalization effect of backdoor knowledge, while the poison ratios range from $5\%$ to as low as $0.1\%$. Finally, the defense mechanism based on activation clustering is found to be effective in detecting our trigger pattern to mitigate potential security risks.	翻訳日:2023-05-31 21:52:16 公開日:2023-05-28
# ネットワーク・プルーニングの3段階モデル A Three-regime Model of Network Pruning ( http://arxiv.org/abs/2305.18383v1 ) ライセンス: Link先を確認	Yefan Zhou, Yaoqing Yang, Arin Chang, Michael W. Mahoney	(参考訳) 最近の研究は、例えばトレーニングエポックの数など、複雑なインフルエンストレーニングハイパーパラメータが機械学習モデルの実行可能性に与える影響を強調している。おそらく意外なことに、特定のハイパーパラメータの調整がprunabilityにどのように影響するかを正確に予測する体系的なアプローチは、いまだに解明されていない。このギャップに対処するために,学習の統計力学に基づく現象論的モデルを導入する。提案手法は,ニューラルネットワーク(NN)トレーニングハイパーパラメータが刈り取り性能に与える影響をモデル化するために,温度的パラメータと負荷的パラメータを用いる。プレプルーニングモデルにおける負荷様パラメータの値に依存すると、プレプルーニングモデルにおける温度様パラメータの値が増加するか、その後のプルーニング性能が向上または損なわれる可能性がある。この変遷に基づき,pruned nn 損失景観のグローバル構造を分類することにより,3次元の登録モデルを構築した。本モデルでは, 高温のディコトミウス効果は, ポストプランンモデルにおける異なるタイプの大域構造間の遷移と関係していることを明らかにした。結果から,ケーススタディを3つ提示した。 1) 刈取改善のための過度パラメータの増大又は減少の判定 2) モデル群からプルーンする最良のモデルを選択すること,及び 3) シャープネス認識最小化法のハイパーパラメータを調整し, 刈り取り性能を向上する。 Recent work has highlighted the complex influence training hyperparameters, e.g., the number of training epochs, can have on the prunability of machine learning models. Perhaps surprisingly, a systematic approach to predict precisely how adjusting a specific hyperparameter will affect prunability remains elusive. To address this gap, we introduce a phenomenological model grounded in the statistical mechanics of learning. Our approach uses temperature-like and load-like parameters to model the impact of neural network (NN) training hyperparameters on pruning performance. A key empirical result we identify is a sharp transition phenomenon: depending on the value of a load-like parameter in the pruned model, increasing the value of a temperature-like parameter in the pre-pruned model may either enhance or impair subsequent pruning performance. Based on this transition, we build a three-regime model by taxonomizing the global structure of the pruned NN loss landscape. Our model reveals that the dichotomous effect of high temperature is associated with transitions between distinct types of global structures in the post-pruned model. Based on our results, we present three case-studies: 1) determining whether to increase or decrease a hyperparameter for improved pruning; 2) selecting the best model to prune from a family of models; and 3) tuning the hyperparameter of the Sharpness Aware Minimization method for better pruning performance.	翻訳日:2023-05-31 21:51:52 公開日:2023-05-28
# 変圧器を用いた効率的な時系列予測のための訓練中の適応的スパーシリティレベル Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers ( http://arxiv.org/abs/2305.18382v1 ) ライセンス: Link先を確認	Zahra Atashgahi, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu	(参考訳) リアルタイムアプリケーション、特にディープニューラルネットワーク(DNN)では、効率的な時系列予測が重要になっている。 DNNの効率性は、疎結合とモデルサイズの削減によって達成できる。しかしながら、トレーニング中に自動的にスパーシリティレベルを見つけることは、データセット間のロススパーシティトレードオフの不均一性のため、依然として難しい課題である。本稿では,事前定義されたスパーシリティレベルを必要とせず,損失とスパーシリティの最適なバランスを求めるために, \textbf{a}daptive \textbf{s}parsity \textbf{l}evel} (\textbf{pals}) による\enquote{\textbf{p}runingを提案する。 PALSはスパーストレーニングとインターントレーニングの両方からインスピレーションを得ている。スパースニューラルネットワークのトレーニングにおいて、新しい"expand"メカニズムを導入し、モデルを動的に縮小、拡張、あるいは安定して、適切なスパース性レベルを見つけることができる。本稿では,その優れた時系列予測性能と計算コストで知られている変圧器の効率向上に着目する。それでも、PALSは任意のDNNに直接適用することができる。これらの議論の範囲では、DLinearモデルにもその効果が示される。 6つのベンチマークデータセットと5つの最先端トランスフォーマーによる実験結果から,PALSは高密度モデルに匹敵する性能を維持しながら,モデルサイズを大幅に削減することが示された。さらに興味深いことに、PALSは、MSEとMAEの損失でそれぞれ30例中12例と14例において、密度モデルよりも優れており、パラメータ数が65%、FLOPが63%減少している。私たちのコードは、論文の受理時に公開されます。 Efficient time series forecasting has become critical for real-world applications, particularly with deep neural networks (DNNs). Efficiency in DNNs can be achieved through sparse connectivity and reducing the model size. However, finding the sparsity level automatically during training remains a challenging task due to the heterogeneity in the loss-sparsity tradeoffs across the datasets. In this paper, we propose \enquote{\textbf{P}runing with \textbf{A}daptive \textbf{S}parsity \textbf{L}evel} (\textbf{PALS}), to automatically seek an optimal balance between loss and sparsity, all without the need for a predefined sparsity level. PALS draws inspiration from both sparse training and during-training methods. It introduces the novel "expand" mechanism in training sparse neural networks, allowing the model to dynamically shrink, expand, or remain stable to find a proper sparsity level. In this paper, we focus on achieving efficiency in transformers known for their excellent time series forecasting performance but high computational cost. Nevertheless, PALS can be applied directly to any DNN. In the scope of these arguments, we demonstrate its effectiveness also on the DLinear model. Experimental results on six benchmark datasets and five state-of-the-art transformer variants show that PALS substantially reduces model size while maintaining comparable performance to the dense model. More interestingly, PALS even outperforms the dense model, in 12 and 14 cases out of 30 cases in terms of MSE and MAE loss, respectively, while reducing 65% parameter count and 63% FLOPs on average. Our code will be publicly available upon acceptance of the paper.	翻訳日:2023-05-31 21:51:30 公開日:2023-05-28
# 大量鉱石から溶出する金: 臨界試料選択による効率的なデータセット蒸留 Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection ( http://arxiv.org/abs/2305.18381v1 ) ライセンス: Link先を確認	Yue Xu, Yong-Lu Li, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang	(参考訳) データ効率の学習は、特にデータセットの蒸留が有効な解となる大規模なマルチモーダルモデルの現在の傾向を考えると、大きな注目を集めている。しかし、データセットの蒸留プロセス自体は依然として非常に非効率である。本研究では,情報理論を参考に蒸留問題をモデル化する。データセットの蒸留に重大なデータ冗長性が存在することを観察し、トレーニングサンプルの有用性をより強調する。最適なデータ選択の包括的分析によって検証される,最も価値のあるサンプルを活用するための一連の手法を提案する。新しい戦略はトレーニングコストを大幅に削減し、既存の蒸留アルゴリズムをより大きく、より多様化したデータセットに拡張する。さらに, この戦略は, 蒸留とネットワークのダイナミクスに関する新たな分析を開拓し, 性能を継続的に向上させる。本手法は,imagenet-1k や kinetics-400 など,より大規模なデータセットや不均一なデータセットに蒸留アルゴリズムを拡張できる。私たちのコードは公開されます。 Data-efficient learning has drawn significant attention, especially given the current trend of large multi-modal models, where dataset distillation can be an effective solution. However, the dataset distillation process itself is still very inefficient. In this work, we model the distillation problem with reference to information theory. Observing that severe data redundancy exists in dataset distillation, we argue to put more emphasis on the utility of the training samples. We propose a family of methods to exploit the most valuable samples, which is validated by our comprehensive analysis of the optimal data selection. The new strategy significantly reduces the training cost and extends a variety of existing distillation algorithms to larger and more diversified datasets, e.g. in some cases only 0.04% training data is sufficient for comparable distillation performance. Moreover, our strategy consistently enhances the performance, which may open up new analyses on the dynamics of distillation and networks. Our method is able to extend the distillation algorithms to much larger-scale datasets and more heterogeneous datasets, e.g. ImageNet-1K and Kinetics-400. Our code will be made publicly available.	翻訳日:2023-05-31 21:50:56 公開日:2023-05-28
# 自律走行車両の協調RL試験における可能性に基づくクレジットアサインメント Potential-based Credit Assignment for Cooperative RL-based Testing of Autonomous Vehicles ( http://arxiv.org/abs/2305.18380v1 ) ライセンス: Link先を確認	Utku Ayvaz, Chih-Hong Cheng, Hao Shen	(参考訳) 自律走行車(AV)は、一般的な現実のケースでは極めてよく機能するが、予期せぬケースでは不合理な動作が重大な安全上の懸念を引き起こす。本稿では,av計画と意思決定モジュールのための挑戦的なテストケースを生成するための協調強化学習(rl)の概念を提案する。コラボレーティブrlの重要な課題の1つは、クレジット割り当て問題であり、すべてのパラメータとタイミングを考慮して、トラフィックシナリオで相互作用する複数のエージェントに対して適切な報酬の割り当てが非自明であることが判明した。この課題に対処するために,信用割り当て問題を解決するために,反事実分析に着想を得た,新たな可能性ベースの報酬形成手法を提案する。シミュレーション環境における評価は,局所的および大域的な報酬を用いた他の手法に対する提案手法の優位性を示す。 While autonomous vehicles (AVs) may perform remarkably well in generic real-life cases, their irrational action in some unforeseen cases leads to critical safety concerns. This paper introduces the concept of collaborative reinforcement learning (RL) to generate challenging test cases for AV planning and decision-making module. One of the critical challenges for collaborative RL is the credit assignment problem, where a proper assignment of rewards to multiple agents interacting in the traffic scenario, considering all parameters and timing, turns out to be non-trivial. In order to address this challenge, we propose a novel potential-based reward-shaping approach inspired by counterfactual analysis for solving the credit-assignment problem. The evaluation in a simulated environment demonstrates the superiority of our proposed approach against other methods using local and global rewards.	翻訳日:2023-05-31 21:50:38 公開日:2023-05-28
# 初期化時の等尺埋め込み獲得における活性化と正規化の影響について On the impact of activation and normalization in obtaining isometric embeddings at initialization ( http://arxiv.org/abs/2305.18399v1 ) ライセンス: Link先を確認	Amir Joudaki, Hadi Daneshmand, Francis Bach	(参考訳) 本稿では,入力のバッチに対応する出力のペアワイズ内積を含むディープニューラルネットワークにおけるペナルティメートグラム行列の構造について検討する。いくつかのアーキテクチャでは、このグラム行列は初期化の深さで縮退し、トレーニングが劇的に遅くなることが観察されている。バッチやレイヤの正規化といった正規化層は、ランクの崩壊を防止する上で重要な役割を果たす。有望な進歩にもかかわらず、既存の理論結果 (i) 変圧器で広く使用される層正規化には拡張しない。 (ii) 正規化のバイアスを有限深さで定量的に特徴づけることができない。このギャップを埋めるために, 活性化層と連動して, 層正規化により, 多層パーセプトロンのグラム行列が初期化深さの指数関数的速度で等化に偏っていることを証明した。活性化関数のエルミート展開を用いてこの速度を定量化し、アイソメトリへのバイアスにおける高次($2$)エルミート係数の重要性を強調する。 In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs. In several architectures it has been observed that this Gram matrix becomes degenerate with depth at initialization, which dramatically slows training. Normalization layers, such as batch or layer normalization, play a pivotal role in preventing the rank collapse issue. Despite promising advances, the existing theoretical results (i) do not extend to layer normalization, which is widely used in transformers, (ii) can not characterize the bias of normalization quantitatively at finite depth. To bridge this gap, we provide a proof that layer normalization, in conjunction with activation layers, biases the Gram matrix of a multilayer perceptron towards isometry at an exponential rate with depth at initialization. We quantify this rate using the Hermite expansion of the activation function, highlighting the importance of higher order ($\ge 2$) Hermite coefficients in the bias towards isometry.	翻訳日:2023-05-31 21:44:34 公開日:2023-05-28
# 画像生成における不適切さの軽減:世界のユリティーを反映する価値はあるか? Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World's Ugliness? ( http://arxiv.org/abs/2305.18398v1 ) ライセンス: Link先を確認	Manuel Brack, Felix Friedrich, Patrick Schramowski, Kristian Kersting	(参考訳) テキスト条件付き画像生成モデルは近年,画像品質とテキストアライメントの驚くべき結果が得られ,急速に成長するアプリケーションに採用されている。非常にデータ駆動であり、ウェブからランダムにスクラップされた数十億規模のデータセットに依存しているため、不適切な人間の行動を再現する。具体的には,様々な生成型テキストから画像へのモデルに対して,大規模に発生する不適切なデジェネレーションを実証する。そこで我々は,不適切なコンテンツの生成を抑制するため,推論時の緩和戦略を評価する。以上の結果から,モデルの表現を人間の好みに合わせるために活用できることが示唆された。 Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the web, they also reproduce inappropriate human behavior. Specifically, we demonstrate inappropriate degeneration on a large-scale for various generative text-to-image models, thus motivating the need for monitoring and moderating them at deployment. To this end, we evaluate mitigation strategies at inference to suppress the generation of inappropriate content. Our findings show that we can use models' representations of the world's ugliness to align them with human preferences.	翻訳日:2023-05-31 21:44:16 公開日:2023-05-28
# ソーシャルメディアデータを用いた2023年トルコ大統領選挙結果の予測 Prediction of the 2023 Turkish Presidential Election Results Using Social Media Data ( http://arxiv.org/abs/2305.18397v1 ) ライセンス: Link先を確認	Aysun Bozanta, Fuad Bayrak, Ayse Basar	(参考訳) ソーシャルメディアプラットフォームは政治キャンペーンの運営方法に影響を与えるため、政治家が市民と直接対話するための重要なツールとなっている。各国の選挙は、ソーシャルメディアのデータが選挙結果に大きな影響を及ぼす可能性があることを示している。本研究では,2023年トルコ総選挙における政党の投票シェアを,様々なプラットフォームからのソーシャルメディアデータと従来の投票データを組み合わせて予測することを目的とする。私たちのアプローチは、コンテンツよりもソーシャルメディアの対話の数を考えるボリュームベースのアプローチです。様々な時間窓の予測モデルを比較した。その結果、全ての時間ウィンドウにおいて、ARIMAXモデルは他のアルゴリズムよりも優れていることがわかった。 Social media platforms influence the way political campaigns are run and therefore they have become an increasingly important tool for politicians to directly interact with citizens. Previous elections in various countries have shown that social media data may significantly impact election results. In this study, we aim to predict the vote shares of parties participating in the 2023 elections in Turkey by combining social media data from various platforms together with traditional polling data. Our approach is a volume-based approach that considers the number of social media interactions rather than content. We compare several prediction models across varying time windows. Our results show that for all time windows, the ARIMAX model outperforms the other algorithms.	翻訳日:2023-05-31 21:44:03 公開日:2023-05-28
# LLMは暗号化プロンプトを理解できる:プライバシーに配慮したフレンドリーなトランスフォーマーを目指して LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers ( http://arxiv.org/abs/2305.18396v1 ) ライセンス: Link先を確認	Xuanqi Liu and Zhuotao Liu	(参考訳) 以前の作業では、サーバクライアント設定でトランスフォーマーベースの大規模言語モデル(llms)用のプライベート推論フレームワークを構築しようとしており、そこではサーバがモデルパラメータを保持し、クライアントが推論のためにプライベートデータを入力する。しかし、これらのフレームワークは、プライベートインプットが元のllmを通じて前方に伝播するときに大きなオーバーヘッドを課す。本稿では,プライバシ計算フレンドリー近似を用いたトランスフォーマアーキテクチャにおける計算・通信重演算子の置換により,モデル性能への影響を小さくして,プライベート推論コストを大幅に削減できることを示す。最先端のiron(neurips 2022)と比較して、当社のプライバシコンピューティングフレンドリーなモデル推論パイプラインは、ほぼ同じ精度を維持しながら、計算速度が5\times$で、通信オーバーヘッドが80\%削減されます。 Prior works have attempted to build private inference frameworks for transformer-based large language models (LLMs) in a server-client setting, where the server holds the model parameters and the client inputs the private data for inference. However, these frameworks impose significant overhead when the private inputs are forward propagated through the original LLMs. In this paper, we show that substituting the computation- and communication-heavy operators in the transformer architecture with privacy-computing friendly approximations can greatly reduce the private inference costs with minor impact on model performance. Compared to the state-of-the-art Iron (NeurIPS 2022), our privacy-computing friendly model inference pipeline achieves a $5\times$ acceleration in computation and an 80\% reduction in communication overhead, while retaining nearly identical accuracy.	翻訳日:2023-05-31 21:43:53 公開日:2023-05-28
# 知識集約型タスクにおける小言語モデルの知識強化推論蒸留 Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks ( http://arxiv.org/abs/2305.18395v1 ) ライセンス: Link先を確認	Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang	(参考訳) 大規模言語モデル(LLM)は、知識の複雑な理解を必要とする知識集約的推論タスクにおいて、有望な性能を示す。しかし、LLMの実際のアプリケーションへの展開は、高い計算要求とデータプライバシに関する懸念のために困難である可能性がある。従来の研究は、ラベル付きデータで微調整したり、LLMを蒸留することで、タスク固有小言語モデル(LM)の構築に重点を置いてきた。しかしながら、これらのアプローチは、必要となる知識を記憶する小さなlmsの能力に制限があるため、知識集約的推論タスクには不向きである。記憶の理論的解析により,外部知識ベースから獲得した知識を付加した理性を生成するため,小さなLMを微調整する新しい手法であるKARD(Knowledge-Augmented Reasoning Distillation)を提案する。さらに,理論生成に関連する文書を得るためのニューラルリランカも提案する。我々は、KARDが知識集約推論データセットであるMedQA-USMLEとStrategyQAにおいて、小さなT5モデルとFlan-T5モデルの性能を著しく向上させることを示す。特に,MedQA-USMLEベンチマークとStrategyQAベンチマークの2倍のパラメータを持つ細調整された3Bモデルに対して,2億5000万モデルで優れた性能を実現する。 Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small language models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and Flan-T5 models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE and StrategyQA. Notably, our method makes the 250M models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.	翻訳日:2023-05-31 21:43:34 公開日:2023-05-28
# バイレベル学習による最適正規化パラメータについて On Optimal Regularization Parameters via Bilevel Learning ( http://arxiv.org/abs/2305.18394v1 ) ライセンス: Link先を確認	Matthias J. Ehrhardt, Silvia Gazzola and Sebastian J. Scott (Department of Mathematical Sciences, University of Bath, Bath, UK)	(参考訳) 変分正規化は線形逆問題を解くためによく使われ、正規化子によるデータの忠実度を増強する。正規化器は事前情報を促進するために使用され、正規化パラメータによって重み付けされる。適切な正規化パラメータの選択は重要であり、様々な選択が全く異なる再構成につながる。相違原理やL曲線といった既存の戦略を用いて適切なパラメータ値を決定することができるが、近年はバイレベル学習と呼ばれる教師付き機械学習アプローチが採用されている。バイレベル学習は最適パラメータを決定する強力なフレームワークであり、ネスト最適化問題を解決することを含む。従来の戦略は様々な理論的な成果を享受するが、この環境における二段階学習はいまだ発展途上である。 1つの必須性質は、決定された正則化パラメータの正則性である。本研究では,既存の理論よりも最適正則化パラメータの正値性をよりよく特徴付ける新しい条件を提案する。数値計算により、この新条件を小・大ともに検証・検討する。 Variational regularization is commonly used to solve linear inverse problems, and involves augmenting a data fidelity by a regularizer. The regularizer is used to promote a priori information, and is weighted by a regularization parameter. Selection of an appropriate regularization parameter is critical, with various choices leading to very different reconstructions. Existing strategies such as the discrepancy principle and L-curve can be used to determine a suitable parameter value, but in recent years a supervised machine learning approach called bilevel learning has been employed. Bilevel learning is a powerful framework to determine optimal parameters, and involves solving a nested optimisation problem. While previous strategies enjoy various theoretical results, the well-posedness of bilevel learning in this setting is still a developing field. One necessary property is positivity of the determined regularization parameter. In this work, we provide a new condition that better characterises positivity of optimal regularization parameters than the existing theory. Numerical results verify and explore this new condition for both small and large dimensional problems.	翻訳日:2023-05-31 21:43:12 公開日:2023-05-28
# 知らないことを知っているプライベートなモデルを訓練する Training Private Models That Know What They Don't Know ( http://arxiv.org/abs/2305.18393v1 ) ライセンス: Link先を確認	Stephan Rabanser, Anvith Thudi, Abhradeep Thakurta, Krishnamurthy Dvijotham, Nicolas Papernot	(参考訳) 自信過剰だが誤った予測を避けるための、信頼できるディープラーニングモデルのトレーニングは、長年の課題である。センシティブなデータに提供される保護は、学習プロセスに付加的なランダムさを注入するコストでもたらされます。本研究では、差分プライバシー制約の下で、選択型分類器(不確実性のある場合に排除できる)を徹底的に調査する。プライバシリークのリスクを増大させるため、いくつかの一般的な選択予測アプローチは、差分プライベート環境では効果がないことがわかった。同時に,市販のプライベート学習アルゴリズムが生成するチェックポイントのみを使用する最近のアプローチが,dp下では特に適していることを示す。さらに、差分プライバシーは実用性を損なうだけでなく、選択分類性能を低下させることを示した。プライバシレベルにまたがるこの効果を分析するために,モデルユーティリティレベルにまたがる選択的予測性能を分離する新しい評価機構を提案する。実験の結果,プライバシ予算の減少に伴い,非プライベートモデルで達成可能な性能レベルを回復することは可能であるが,かなりのカバレッジコストが伴うことがわかった。 Training reliable deep learning models which avoid making overconfident but incorrect predictions is a longstanding challenge. This challenge is further exacerbated when learning has to be differentially private: protection provided to sensitive data comes at the price of injecting additional randomness into the learning process. In this work, we conduct a thorough empirical investigation of selective classifiers -- that can abstain when they are unsure -- under a differential privacy constraint. We find that several popular selective prediction approaches are ineffective in a differentially private setting as they increase the risk of privacy leakage. At the same time, we identify that a recent approach that only uses checkpoints produced by an off-the-shelf private learning algorithm stands out as particularly suitable under DP. Further, we show that differential privacy does not just harm utility but also degrades selective classification performance. To analyze this effect across privacy levels, we propose a novel evaluation mechanism which isolate selective prediction performance across model utility levels. Our experimental results show that recovering the performance level attainable by non-private models is possible but comes at a considerable coverage cost as the privacy budget decreases.	翻訳日:2023-05-31 21:42:57 公開日:2023-05-28
# 不確かさ量化を用いた発音の良さを用いた構音障害児の音声明瞭度評価 Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification ( http://arxiv.org/abs/2305.18392v1 ) ライセンス: Link先を確認	Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung	(参考訳) 本稿では,不確実性定量化(UQ)を利用した変形性音声の自動明瞭度評価のための改良されたGoP(Goodness of Pronunciation)を提案する。現在のgop法は、ニューラルネットワークによる自信過剰な予測に大きく依存している。この問題を軽減するため、GoPではUQテクニックが使用された。 1)音素予測の正規化(エントロピー,マージン,maxlogit,logit-margin) 2)スコア関数の変更(スケーリング,事前正規化)。その結果、事前正規化されたmaxlogit gopは、英語、韓国語、タミル語のベースラインgopと比較して、それぞれ5.66%、3.91%、23.65%という高いパフォーマンスを達成している。さらに、音素分析を行い、どの音素スコアが各言語におけるインテリジェンススコアと大きく相関しているかを特定する。 This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes Uncertainty Quantification (UQ) for automatic speech intelligibility assessment for dysarthric speech. Current GoP methods rely heavily on neural network-driven overconfident predictions, which is unsuitable for assessing dysarthric speech due to its significant acoustic differences from healthy speech. To alleviate the problem, UQ techniques were used on GoP by 1) normalizing the phoneme prediction (entropy, margin, maxlogit, logit-margin) and 2) modifying the scoring function (scaling, prior normalization). As a result, prior-normalized maxlogit GoP achieves the best performance, with a relative increase of 5.66%, 3.91%, and 23.65% compared to the baseline GoP for English, Korean, and Tamil, respectively. Furthermore, phoneme analysis is conducted to identify which phoneme scores significantly correlate with intelligibility scores in each language.	翻訳日:2023-05-31 21:42:39 公開日:2023-05-28
# MemeGraphs: ミームを知識グラフにリンクする MemeGraphs: Linking Memes to Knowledge Graphs ( http://arxiv.org/abs/2305.18391v1 ) ライセンス: Link先を確認	Vasiliki Kougia, Simon Fetzel, Thomas Kirchmair, Erion \c{C}ano, Sina Moayed Baharlou, Sahand Sharifzadeh, Benjamin Roth	(参考訳) ミームは、ソーシャルメディアやインターネット全般において、画像とテキストのモダリティを組み合わせることで、トレンドやアイデアを伝える一般的な形態である。ユーモアや皮肉を表現できるが、不快な内容を持つこともある。ミームの自動分析と分類は、その解釈が視覚要素、言語、背景知識の理解に依存しているため、難しい。したがって、ミーム全体を分類するために、これらのソースとそれらの相互作用を有意義に表現することが重要である。本研究では,映像をオブジェクトとその視覚的関係で表現するシーングラフと,トランスフォーマーアーキテクチャを用いたミーム分類のための構造化表現として知識グラフを提案する。提案手法を,ミームの学習(構造化)表現のみを用いるマルチモーダルモデルImgBERTと比較し,一貫した改善を観察する。さらに、自動生成されたグラフとエンティティリンクを比較した、人間のグラフアノテーションによるデータセットも提供します。分析により、人間のアノテーションよりも多くのエンティティをリンクする自動手法が示され、自動的に生成されたグラフはミームのヘイトフルネス分類に適していることが示された。 Memes are a popular form of communicating trends and ideas in social media and on the internet in general, combining the modalities of images and text. They can express humor and sarcasm but can also have offensive content. Analyzing and classifying memes automatically is challenging since their interpretation relies on the understanding of visual elements, language, and background knowledge. Thus, it is important to meaningfully represent these sources and the interaction between them in order to classify a meme as a whole. In this work, we propose to use scene graphs, that express images in terms of objects and their visual relations, and knowledge graphs as structured representations for meme classification with a Transformer-based architecture. We compare our approach with ImgBERT, a multimodal model that uses only learned (instead of structured) representations of the meme, and observe consistent improvements. We further provide a dataset with human graph annotations that we compare to automatically generated graphs and entity linking. Analysis shows that automatic methods link more entities than human annotators and that automatically generated graphs are better suited for hatefulness classification in memes.	翻訳日:2023-05-31 21:42:22 公開日:2023-05-28
# 予習変圧器における創発的モジュラリティ Emergent Modularity in Pre-trained Transformers ( http://arxiv.org/abs/2305.18390v1 ) ライセンス: Link先を確認	Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou	(参考訳) この研究は、人間の脳によく見られる特徴であり、汎用知能に欠かせない機能である、事前訓練されたトランスフォーマーにおけるモジュラリティの存在を調べる。 1)ニューロンの機能的特殊化:各ニューロンが主に特定の機能に特化しているかどうかを評価し,その答えがイエスであることを確かめる。 2) 機能に基づくニューロングループ化: 機能によってニューロンをモジュールに分類する構造を探索し, 各モジュールが対応する機能のために機能する。考えられる膨大な量の構造を考えると、我々は期待できる候補としてMixture-of-Expertsに注目し、ニューロンを専門家に分割し、通常異なる入力に対して異なる専門家を活性化する。実験の結果,特定の機能に特化しているニューロンがクラスター化されている機能の専門家がいることがわかった。さらに、機能専門家のアクティベーションの摂動は、対応する機能に大きく影響する。最後に,事前学習中にモジュール構造がどのように出現するかを調べ,モジュール構造が早期に安定化し,ニューロン安定化よりも高速であることが判明した。トランスフォーマーはまずモジュール構造を構築し、次に細粒度のニューロン機能を学ぶことを示唆する。コードとデータはhttps://github.com/THUNLP/modularity-analysis.comで公開されています。 This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis.	翻訳日:2023-05-31 21:42:03 公開日:2023-05-28
# 機能学習ネットワークは、現実的なスケールで幅に一貫性がある Feature-Learning Networks Are Consistent Across Widths At Realistic Scales ( http://arxiv.org/abs/2305.18411v1 ) ライセンス: Link先を確認	Nikhil Vyas, Alexander Atanasov, Blake Bordelon, Depen Morwani, Sabarish Sainathan, Cengiz Pehlevan	(参考訳) 様々なアーキテクチャやデータセットにわたる特徴学習ニューラルネットワークのダイナミクスに対する幅の影響について検討する。トレーニングの初期段階では、オンラインデータでトレーニングされた広いニューラルネットワークは、同じ損失曲線を持つだけでなく、トレーニングを通じてポイントワイズテスト予測にも同意している。 CIFAR-5mのような単純なタスクでは、これは現実的な幅のネットワークのトレーニングを通して行われる。また,内部表現,前活性化分布,安定性現象のエッジ,大きな学習速度効果などモデルの構造的性質が広い幅にわたって一致していることが示されている。これは、現実のモデルに見られる現象が無限幅、特徴学習の限界によって捉えられるという仮説を動機付ける。難しいタスク(イメージネットや言語モデリングなど)や後のトレーニング時間では、有限幅偏差は体系的に増加する。 2つの異なる効果は、これらの幅の偏差を引き起こす。まず、ネットワーク出力は、幅に逆らって初期化依存分散スケーリングを持ち、ネットワークをセンシングすることで除去できる。しかし、より狭いネットワークのアンサンブルは、単一のワイドネットワークよりも性能が劣っている。これを幅の狭いバイアスと呼ぶ。この有限幅バイアスの起源に関するスペクトル的な視点で結論付ける。 We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training. For simple tasks such as CIFAR-5m this holds throughout training for networks of realistic widths. We also show that structural properties of the models, including internal representations, preactivation distributions, edge of stability phenomena, and large learning rate effects are consistent across large widths. This motivates the hypothesis that phenomena seen in realistic models can be captured by infinite-width, feature-learning limits. For harder tasks (such as ImageNet and language modeling), and later training times, finite-width deviations grow systematically. Two distinct effects cause these deviations across widths. First, the network output has initialization-dependent variance scaling inversely with width, which can be removed by ensembling networks. We observe, however, that ensembles of narrower networks perform worse than a single wide network. We call this the bias of narrower width. We conclude with a spectral perspective on the origin of this finite-width bias.	翻訳日:2023-05-31 21:35:44 公開日:2023-05-28
# 乳がん生存の理解:マルチオミクスデータを用いた因果関係と言語モデルを用いて Understanding Breast Cancer Survival: Using Causality and Language Models on Multi-omics Data ( http://arxiv.org/abs/2305.18410v1 ) ライセンス: Link先を確認	Mugariya Farooq, Shahad Hardan, Aigerim Zhumbhayeva, Yujia Zheng, Preslav Nakov, Kun Zhang	(参考訳) 医療におけるより有用で説明可能な機械学習モデルの必要性は、観察データの解析による因果関係の発見を目的とした因果発見アルゴリズムの開発と活用の重要性を高める。説明可能なアプローチは、臨床医や生物学者が疾患の予後を予測し、適切な治療を提案するのを助ける。しかし、因果発見、ゲノム学、乳癌の交差点での研究はほとんど行われておらず、このギャップを埋めることを目指しています。また,実データに対する因果関係が不明なため,実データにおける因果関係の発見手法の評価は一般には困難であり,そのために,大規模言語モデルを用いた評価問題に対処することを提案する。特に,乳がんと診断された患者の生存にゲノムの様々な摂動がどのように影響するかを調べるために,適切な因果発見アルゴリズムを利用する。我々は, PC, Greedy Equivalence Search (GES), Generalized Precision Matrixベースの3つの因果探索アルゴリズムを用いた。 The Cancer Genome Atlasのサブセットを実験し、705例の乳癌患者に対して、突然変異、コピー数の変化、タンパク質レベル、遺伝子発現に関する情報を含む。以上より,因果発見アルゴリズムを用いた患者の生命状態に関連する重要な因子が明らかになった。しかし、これらの結果の信頼性は医療分野でも懸念されている。それゆえ、この研究の別の貢献として、結果は、ブルーバートなどの生物医学文献で訓練された言語モデルと、医療コーパスで訓練された他の大きな言語モデルによって検証される。本研究は, 臨床応用における信頼性の高い因果関係を明らかにするために, 因果発見アルゴリズムと言語モデルの適切な利用を約束する。 The need for more usable and explainable machine learning models in healthcare increases the importance of developing and utilizing causal discovery algorithms, which aim to discover causal relations by analyzing observational data. Explainable approaches aid clinicians and biologists in predicting the prognosis of diseases and suggesting proper treatments. However, very little research has been conducted at the crossroads between causal discovery, genomics, and breast cancer, and we aim to bridge this gap. Moreover, evaluation of causal discovery methods on real data is in general notoriously difficult because ground-truth causal relations are usually unknown, and accordingly, in this paper, we also propose to address the evaluation problem with large language models. In particular, we exploit suitable causal discovery algorithms to investigate how various perturbations in the genome can affect the survival of patients diagnosed with breast cancer. We used three main causal discovery algorithms: PC, Greedy Equivalence Search (GES), and a Generalized Precision Matrix-based one. We experiment with a subset of The Cancer Genome Atlas, which contains information about mutations, copy number variations, protein levels, and gene expressions for 705 breast cancer patients. Our findings reveal important factors related to the vital status of patients using causal discovery algorithms. However, the reliability of these results remains a concern in the medical domain. Accordingly, as another contribution of the work, the results are validated through language models trained on biomedical literature, such as BlueBERT and other large language models trained on medical corpora. Our results profess proper utilization of causal discovery algorithms and language models for revealing reliable causal relations for clinical applications.	翻訳日:2023-05-31 21:35:09 公開日:2023-05-28
# 方向性指向多目的学習:単純で証明可能な確率的アルゴリズム Direction-oriented Multi-objective Learning: Simple and Provable Stochastic Algorithms ( http://arxiv.org/abs/2305.18409v1 ) ライセンス: Link先を確認	Peiyao Xiao, Hao Ban, Kaiyi Ji	(参考訳) 多目的最適化(MOO)は、複数の基準による学習やマルチタスク学習(MTL)など、多くの機械学習問題において重要なフレームワークとなっている。本稿では,MTLにおける平均損失などの目的の線形結合を最適化する方向の近傍において,共通降下方向を正規化することにより,新たな方向指向多目的問題を提案する。この定式化には特殊ケースとしてGDとMGDAが含まれ、CAGradのような方向指向の利点を享受し、確率的アルゴリズムの設計を容易にする。そこで本研究では,SGD方式の簡易な更新による確率方向指向型多目的勾配降下(SDMGrad)と,目的数が大きければ効率的な客観的サンプリングを行うSDMGrad-OSを提案する。定数レベルの正則化パラメータ $\lambda$ に対して、SDMGrad と SDMGrad-OS がパレート定常点に確実に収束することを示す。増加する$\lambda$ に対して、この収束点は目的の線形結合の定常点に還元される。マルチタスク型教師付き学習と強化学習の一連の課題において提案手法の優れた性能を示す。コードはhttps://github.com/ml-opt-lab/sdmgrad.comで提供される。 Multi-objective optimization (MOO) has become an influential framework in many machine learning problems with multiple objectives such as learning with multiple criteria and multi-task learning (MTL). In this paper, we propose a new direction-oriented multi-objective problem by regularizing the common descent direction within a neighborhood of a direction that optimizes a linear combination of objectives such as the average loss in MTL. This formulation includes GD and MGDA as special cases, enjoys the direction-oriented benefit as in CAGrad, and facilitates the design of stochastic algorithms. To solve this problem, we propose Stochastic Direction-oriented Multi-objective Gradient descent (SDMGrad) with simple SGD type of updates, and its variant SDMGrad-OS with an efficient objective sampling in the setting where the number of objectives is large. For a constant-level regularization parameter $\lambda$, we show that SDMGrad and SDMGrad-OS provably converge to a Pareto stationary point with improved complexities and milder assumptions. For an increasing $\lambda$, this convergent point reduces to a stationary point of the linear combination of objectives. We demonstrate the superior performance of the proposed methods in a series of tasks on multi-task supervised learning and reinforcement learning. Code is provided at https://github.com/ml-opt-lab/sdmgrad.	翻訳日:2023-05-31 21:34:20 公開日:2023-05-28
# 分子マルチモーダルプリトレーニングのための群対称確率微分方程式モデル A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining ( http://arxiv.org/abs/2305.18407v1 ) ライセンス: Link先を確認	Shengchao Liu, Weitao Du, Zhiming Ma, Hongyu Guo, Jian Tang	(参考訳) 分子事前トレーニングは、AIベースの薬物発見のパフォーマンスを高めるための、急速にゴーツースキーマになりつつある。当然、分子は2次元トポロジカルグラフや3次元幾何学的点雲として表すことができる。既存のほとんどの関連手法は単一のモダリティにのみ焦点をあてているが、最近の研究により、これらの2つのモダリティ間の相互情報(MI)の最大化は分子表現能力を高めることが示されている。一方、既存の分子のマルチモーダル事前学習は、トポロジーと幾何学から符号化された表現空間に基づいて近似MIに近づき、分子の臨界構造情報が失われる。この問題に対処するため,MoleculeSDEを提案する。分子SDEは群対称(SE(3)-等変および反射反対称)確率微分方程式モデルを利用して、2Dトポロジーから3次元幾何学を生成する。より厳密なMIバウンドを得るだけでなく、以前の作業よりも豊富な下流タスクを可能にする。プレトレーニングベースライン17点と比較することにより,32のダウンストリームタスク中26点において,MoleculeSDEが最先端のパフォーマンスを持つ表現表現を学習できることを実証的に検証する。 Molecule pretraining has quickly become the go-to schema to boost the performance of AI-based drug discovery. Naturally, molecules can be represented as 2D topological graphs or 3D geometric point clouds. Although most existing pertaining methods focus on merely the single modality, recent research has shown that maximizing the mutual information (MI) between such two modalities enhances the molecule representation ability. Meanwhile, existing molecule multi-modal pretraining approaches approximate MI based on the representation space encoded from the topology and geometry, thus resulting in the loss of critical structural information of molecules. To address this issue, we propose MoleculeSDE. MoleculeSDE leverages group symmetric (e.g., SE(3)-equivariant and reflection-antisymmetric) stochastic differential equation models to generate the 3D geometries from 2D topologies, and vice versa, directly in the input space. It not only obtains tighter MI bound but also enables prosperous downstream tasks than the previous work. By comparing with 17 pretraining baselines, we empirically verify that MoleculeSDE can learn an expressive representation with state-of-the-art performance on 26 out of 32 downstream tasks.	翻訳日:2023-05-31 21:33:48 公開日:2023-05-28
# マイクロチャネルにおける熱伝達係数の予測に対する機械学習アプローチ A machine learning approach to the prediction of heat-transfer coefficients in micro-channels ( http://arxiv.org/abs/2305.18406v1 ) ライセンス: Link先を確認	Tullio Traverso, Francesco Coletti, Luca Magri, Tassos G. Karayiannis, Omar K. Matar	(参考訳) 小型熱交換器の最適設計と運転には, 作業流体, チャネルジオメトリー, プロセス条件の関数としての二相熱伝達係数(HTC)の正確な予測が重要である。人工知能研究の進歩は、HTCのデータ駆動サロゲートモデルを得るための機械学習(ML)アルゴリズムの適用を最近強化した。ほとんどの教師付き学習アルゴリズムでは、そのタスクは非線形回帰問題である。これらのモデルは従来の経験的相関よりも優れていることが証明されているにもかかわらず、データの過度な適合、不確実性推定の欠如、結果の解釈可能性といった重要な制限がある。これらの制約に対処するために,本稿では,多出力ガウス過程回帰(gpr)を用いて,マイクロチャネル内のhtcを質量流量,熱流束,システム圧力,チャネル径,長さの関数として推定する。モデルは高忠実度実験データのBrunel Two-Phase Flowデータベースを用いて訓練される。 GPRの利点は、データ効率、トレーニング対象のハイパーパラメータ(典型的には入力次元の数と同じ順序)の少なさ、および限界可能性の最大化によって保証されるデータ適合とモデル複雑性の間の自動トレードオフ(ベイズ的アプローチ)である。本稿では,外挿におけるGPRモデルの性能向上のための研究指針を提案する。 The accurate prediction of the two-phase heat transfer coefficient (HTC) as a function of working fluids, channel geometries and process conditions is key to the optimal design and operation of compact heat exchangers. Advances in artificial intelligence research have recently boosted the application of machine learning (ML) algorithms to obtain data-driven surrogate models for the HTC. For most supervised learning algorithms, the task is that of a nonlinear regression problem. Despite the fact that these models have been proven capable of outperforming traditional empirical correlations, they have key limitations such as overfitting the data, the lack of uncertainty estimation, and interpretability of the results. To address these limitations, in this paper, we use a multi-output Gaussian process regression (GPR) to estimate the HTC in microchannels as a function of the mass flow rate, heat flux, system pressure and channel diameter and length. The model is trained using the Brunel Two-Phase Flow database of high-fidelity experimental data. The advantages of GPR are data efficiency, the small number of hyperparameters to be trained (typically of the same order of the number of input dimensions), and the automatic trade-off between data fit and model complexity guaranteed by the maximization of the marginal likelihood (Bayesian approach). Our paper proposes research directions to improve the performance of the GPR-based model in extrapolation.	翻訳日:2023-05-31 21:33:27 公開日:2023-05-28
# Dink-Net: 大きなグラフ上のニューラルクラスタリング Dink-Net: Neural Clustering on Large Graphs ( http://arxiv.org/abs/2305.18405v1 ) ライセンス: Link先を確認	Yue Liu, Ke Liang, Jun Xia, Sihang Zhou, Xihong Yang, Xinwang Liu, Stan Z. Li	(参考訳) ディープグラフクラスタリング(ディープグラフクラスタリング)は、グラフのノードをディープニューラルネットワークで結合しないクラスタにグループ化することを目的としている。しかし、既存の方法は百万のノードを持つ大きなグラフにスケールできない。この問題を解決するために,拡張と縮小という概念を用いてスケーラブルなディープグラフクラスタリング手法(Dink-Net)を提案する。まず、ノードを識別することにより、拡張によって劣化しても、自己教師された方法で表現が学習される。一方、クラスタセンターは学習可能なニューラルネットワークパラメータとして初期化される。次に、提案するクラスタ拡張損失とクラスタ縮小損失を逆方向に最小化することにより、クラスタリング分布を最適化する。これらの設定により、2段階のクラスタリング、すなわち表現学習とクラスタリング最適化をエンドツーエンドフレームワークに統合し、ネットワークにクラスタリングに優しい機能を学習させる。さらに、dink-netは、設計された損失関数がミニバッチデータを採用して、パフォーマンス低下なしにもクラスタリング分布を最適化するため、大きなグラフによくスケールする。実験結果と理論的解析はともに本手法の優越性を示している。ランナアップと比較して、Dink-Netは1億1100万ノードと16億エッジを持つogbn-papers100Mデータセットで9.62%のNMI改善を達成した。ソースコードはhttps://github.com/yueliu 1999/Dink-Netで公開されている。さらに、ディープグラフクラスタリングのコレクション(ペーパー、コード、データセット)はhttps://github.com/yueliu 1999/Awesome-Deep-Graph-Clusteringで共有されている。 Deep graph clustering, which aims to group the nodes of a graph into disjoint clusters with deep neural networks, has achieved promising progress in recent years. However, the existing methods fail to scale to the large graph with million nodes. To solve this problem, a scalable deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink. Firstly, by discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner. Meanwhile, the cluster centres are initialized as learnable neural parameters. Subsequently, the clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss in an adversarial manner. By these settings, we unify the two-step clustering, i.e., representation learning and clustering optimization, into an end-to-end framework, guiding the network to learn clustering-friendly features. Besides, Dink-Net scales well to large graphs since the designed loss functions adopt the mini-batch data to optimize the clustering distribution even without performance drops. Both experimental results and theoretical analyses demonstrate the superiority of our method. Compared to the runner-up, Dink-Net achieves 9.62% NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges. The source code is released at https://github.com/yueliu1999/Dink-Net. Besides, a collection (papers, codes, and datasets) of deep graph clustering is shared at https://github.com/yueliu1999/Awesome-Deep-Graph-Clustering.	翻訳日:2023-05-31 21:33:04 公開日:2023-05-28
# 複数質問応答のための大規模言語モデルによるコンフォーマル予測 Conformal Prediction with Large Language Models for Multi-Choice Question Answering ( http://arxiv.org/abs/2305.18404v1 ) ライセンス: Link先を確認	Bhawesh Kumar, Charlie Lu, Gauri Gupta, Anil Palepu, David Bellamy, Ramesh Raskar, Andrew Beam	(参考訳) 大規模言語モデルが広く開発され続けるにつれて、ロバストな不確実性定量化技術が、高スループットシナリオにおける安全なデプロイメントに不可欠になる。本研究では,複数質問応答の特定のタスクに対して,共形予測を用いて言語モデルに不確かさの定量化を行う方法について検討する。共形予測からの不確実性推定は予測精度と密接に相関していることがわかった。この観測は、選択分類や低品質予測のフィルタリングといった下流の応用に有用である。また,共形予測が主観的疑問に求める交換可能性の仮定についても検討し,多くの実用的応用においてより現実的なシナリオとなる可能性について考察した。我々の研究は、エラー率の確実な保証が必要な安全クリティカルな状況において、より信頼性が高く信頼性の高い大規模言語モデルの活用に寄与する。 As large language models continue to be widely developed, robust uncertainty quantification techniques will become crucial for their safe deployment in high-stakes scenarios. In this work, we explore how conformal prediction can be used to provide uncertainty quantification in language models for the specific task of multiple-choice question-answering. We find that the uncertainty estimates from conformal prediction are tightly correlated with prediction accuracy. This observation can be useful for downstream applications such as selective classification and filtering out low-quality predictions. We also investigate the exchangeability assumption required by conformal prediction to out-of-subject questions, which may be a more realistic scenario for many practical applications. Our work contributes towards more trustworthy and reliable usage of large language models in safety-critical situations, where robust guarantees of error rate are required.	翻訳日:2023-05-31 21:32:39 公開日:2023-05-28
# 低ランクパラメータ効率のファインチューニングを実現するPruning Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2305.18403v1 ) ライセンス: Link先を確認	Mingyang Zhang and Haozhen and Chunhua Shen and Zhen Yang and Linlin Ou and Xinyi Yu and Bohan Zhuang	(参考訳) LLaMAやViT-Gのような大型の事前学習モデル(LPM)は、様々なタスクにおいて例外的な性能を示している。これらの大規模モデルを下流タスクで安価に微調整するためにパラメータ効率の良い微調整(peft)が登場したが、その展開は巨大なモデルスケールと計算コストによって依然として妨げられている。ニューラルネットワークのプルーニングは冗長パラメータを除去することでモデル圧縮のソリューションを提供するが、既存の手法のほとんどはパラメータ勾配の計算に依存している。しかし、勾配を求めることは、代替アプローチの探索を必要とするLPMに対して計算的に禁じられている。そこで我々は,LoRAPrune と呼ばれる LPM の微細調整と展開を効率的に行うための統一的なフレームワークを提案する。重要度推定のための事前学習パラメータの勾配ではなく,低ランク適応(lora)の値と勾配を利用するペフトアウェアプルーニング基準をまず設計する。次に,PEFTの利点を最大化しつつ,冗長パラメータを除去する反復的プルーニング手法を提案する。そこで,我々のLoRAPruneは,効率的な推論のための高精度でコンパクトなモデルを提供する。各種課題に対する実験結果から,本手法が最先端の成果をもたらすことを示す。例えば、VTAB-1kベンチマークでは、LoRAPruneはトレーニング可能なパラメータのわずか0.76%しか使用せず、それぞれ5.7%と4.3%のTop-1精度を達成している。さらに,peft法と同等の性能を達成し,pruningの利点を享受しながら高品質な結果を提供する効果を強調する。 Large pre-trained models (LPMs), such as LLaMA and ViT-G, have shown exceptional performance across various tasks. Although parameter-efficient fine-tuning (PEFT) has emerged to cheaply fine-tune these large models on downstream tasks, their deployment is still hindered by the vast model scale and computational costs. Neural network pruning offers a solution for model compression by removing redundant parameters, but most existing methods rely on computing parameter gradients. However, obtaining the gradients is computationally prohibitive for LPMs, which necessitates the exploration of alternative approaches. To this end, we propose a unified framework for efficient fine-tuning and deployment of LPMs, termed LoRAPrune. We first design a PEFT-aware pruning criterion, which utilizes the values and gradients of Low-Rank Adaption (LoRA), rather than the gradients of pre-trained parameters for importance estimation. We then propose an iterative pruning procedure to remove redundant parameters while maximizing the advantages of PEFT. Thus, our LoRAPrune delivers an accurate, compact model for efficient inference in a highly cost-effective manner. Experimental results on various tasks demonstrate that our method achieves state-of-the-art results. For instance, in the VTAB-1k benchmark, LoRAPrune utilizes only 0.76% of the trainable parameters and outperforms magnitude and movement pruning methods by a significant margin, achieving a mean Top-1 accuracy that is 5.7% and 4.3% higher, respectively. Moreover, our approach achieves comparable performance to PEFT methods, highlighting its efficacy in delivering high-quality results while benefiting from the advantages of pruning.	翻訳日:2023-05-31 21:32:26 公開日:2023-05-28
# neural sculpting: pruning と network analysis による階層的モジュラーなタスク構造を明らかにする Neural Sculpting: Uncovering hierarchically modular task structure through pruning and network analysis ( http://arxiv.org/abs/2305.18402v1 ) ライセンス: Link先を確認	Shreyas Malakarjun Patil, Loizos Michael, Constantine Dovrolis	(参考訳) 自然な対象関数とタスクは通常、階層的なモジュール構造を示す - 階層構造にまとめられた、より単純なサブ関数に分解できる。このようなサブ関数には2つの重要な特徴がある:それらは異なる入力セット(入力分離性)を持ち、階層(再利用性)において高い入力として再利用される。従来の研究では、階層的にモジュール化されたニューラルネットワークは本質的に疎結合であり、学習効率、一般化、マルチタスク学習、転送可能性などの利点がある。しかし、与えられたタスクの下位部分関数とその階層構造を特定することは困難である。この作業の高レベルな疑問は、十分に深いニューラルネットワークを使ってタスクを学習すれば、そのタスクの下位機能階層をどうやって見つけられるのか、ということです。まず,タスクが階層的にモジュール化されているかどうかを判断し易いブール関数の領域について検討する。本稿では,繰り返し単位とエッジプルーニング(訓練中)に基づくアプローチと,モジュール検出と階層推論のためのネットワーク解析の組み合わせを提案する。最後に, この手法により, MNIST桁データセットに基づく幅広いブール関数と2つの視覚タスクの階層的モジュラリティを明らかにすることができることを示す。 Natural target functions and tasks typically exhibit hierarchical modularity - they can be broken down into simpler sub-functions that are organized in a hierarchy. Such sub-functions have two important features: they have a distinct set of inputs (input-separability) and they are reused as inputs higher in the hierarchy (reusability). Previous studies have established that hierarchically modular neural networks, which are inherently sparse, offer benefits such as learning efficiency, generalization, multi-task learning, and transferability. However, identifying the underlying sub-functions and their hierarchical structure for a given task can be challenging. The high-level question in this work is: if we learn a task using a sufficiently deep neural network, how can we uncover the underlying hierarchy of sub-functions in that task? As a starting point, we examine the domain of Boolean functions, where it is easier to determine whether a task is hierarchically modular. We propose an approach based on iterative unit and edge pruning (during training), combined with network analysis for module detection and hierarchy inference. Finally, we demonstrate that this method can uncover the hierarchical modularity of a wide range of Boolean functions and two vision tasks based on the MNIST digits dataset.	翻訳日:2023-05-31 21:31:54 公開日:2023-05-28
# 信頼あるフェデレーション学習における保護メカニズムの調整のためのメタラーニングフレームワーク A Meta-learning Framework for Tuning Parameters of Protection Mechanisms in Trustworthy Federated Learning ( http://arxiv.org/abs/2305.18400v1 ) ライセンス: Link先を確認	Xiaojin Zhang, Yan Kang, Lixin Fan, Kai Chen, Qiang Yang	(参考訳) 信頼できるフェデレートラーニング(TFL)は通常、プライバシを保証するために保護メカニズムを活用する。しかし、保護機構は必然的にデータプライバシを保護しながら、ユーティリティ損失や効率の低下をもたらす。したがって、保護機構とそのパラメータは、 \textit{privacy leakage} と \textit{utility loss} と \textit{efficiency reduction} の最適なトレードオフを打つために慎重に選択する必要がある。この目的のために、フェデレートされた学習実践者は、3つの要因を測定し、それらの間のトレードオフを最適化し、目の前のアプリケーションに最も適した保護メカニズムを選択するツールが必要である。本稿では,(1) プライバシー漏洩, ユーティリティ損失, 効率低下のトレードオフを最適化する保護機構の発見問題として, TFL を定式化する枠組みを提案し, (2) 3つの要因の有界測定を正式に定義する。次に,この最適化問題を近似するメタラーニングアルゴリズムを提案し,ランダム化,準同型暗号,秘密共有,圧縮といった代表的な保護機構の最適保護パラメータを求める。さらに,これらの最適保護パラメータを実用的な水平連関学習設定で定量化するための推定アルゴリズムの設計を行い,推定誤差の理論的解析を行う。 Trustworthy Federated Learning (TFL) typically leverages protection mechanisms to guarantee privacy. However, protection mechanisms inevitably introduce utility loss or efficiency reduction while protecting data privacy. Therefore, protection mechanisms and their parameters should be carefully chosen to strike an optimal tradeoff between \textit{privacy leakage}, \textit{utility loss}, and \textit{efficiency reduction}. To this end, federated learning practitioners need tools to measure the three factors and optimize the tradeoff between them to choose the protection mechanism that is most appropriate to the application at hand. Motivated by this requirement, we propose a framework that (1) formulates TFL as a problem of finding a protection mechanism to optimize the tradeoff between privacy leakage, utility loss, and efficiency reduction and (2) formally defines bounded measurements of the three factors. We then propose a meta-learning algorithm to approximate this optimization problem and find optimal protection parameters for representative protection mechanisms, including Randomization, Homomorphic Encryption, Secret Sharing, and Compression. We further design estimation algorithms to quantify these found optimal protection parameters in a practical horizontal federated learning setting and provide a theoretical analysis of the estimation error.	翻訳日:2023-05-31 21:31:33 公開日:2023-05-28
# HyperTime: 時間分布シフトの圧縮のためのハイパーパラメータ最適化 HyperTime: Hyperparameter Optimization for Combating Temporal Distribution Shifts ( http://arxiv.org/abs/2305.18421v1 ) ライセンス: Link先を確認	Shaokun Zhang, Yiran Wu, Zhonghua Zheng, Qingyun Wu, Chi Wang	(参考訳) 本研究では,未確認試験データ中の時間分布変化に対して頑健な超パラメータを求めるために,超パラメータ最適化法である \emph{HyperTime} を提案する。我々の研究は、多くの場合、ハイパーパラメータ最適化によって時間的に堅牢な予測性能を達成することができるという重要な観察によって動機付けられている。この観察に基づいて,このような強固なハイパーパラメータ構成を見つけるのに役立つロバスト最適化文献から,'worst-case-oriented' という哲学を活用した。 hypertimeは、平均検証損失と、時系列検証セットに対する最悪の検証損失に対して、辞書の優先順位を課す。提案手法の独特な利点を明らかにするために, 期待されるテスト損失の上限を理論的に解析する。また,時間分布シフトを伴う複数の機械学習タスクにおいて,提案手法の強い経験的性能を示す。 In this work, we propose a hyperparameter optimization method named \emph{HyperTime} to find hyperparameters robust to potential temporal distribution shifts in the unseen test data. Our work is motivated by an important observation that it is, in many cases, possible to achieve temporally robust predictive performance via hyperparameter optimization. Based on this observation, we leverage the `worst-case-oriented' philosophy from the robust optimization literature to help find such robust hyperparameter configurations. HyperTime imposes a lexicographic priority order on average validation loss and worst-case validation loss over chronological validation sets. We perform a theoretical analysis on the upper bound of the expected test loss, which reveals the unique advantages of our approach. We also demonstrate the strong empirical performance of the proposed method on multiple machine learning tasks with temporal distribution shifts.	翻訳日:2023-05-31 21:25:50 公開日:2023-05-28
# 分散再現型ロバストQ-ラーニングのサンプル複雑度 Sample Complexity of Variance-reduced Distributionally Robust Q-learning ( http://arxiv.org/abs/2305.18420v1 ) ライセンス: Link先を確認	Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou	(参考訳) 分布シフト下での動的意思決定は、強化学習の理論と応用において基本的な関心事であり、データが収集される環境の分布は、モデルがデプロイされる環境と異なる可能性がある。本稿では,分布的変化にもかかわらずロバストなポリシを効果的に学習できる,分布的ロバストなq-learningアルゴリズムと分散低減アルゴリズムについて述べる。これらのアルゴリズムは、Kulback-Leiblerの不確実性を伴う無限水平$\gamma$-discounted robust Markov決定過程の$q$関数を、エントリワイズ$\epsilon$-degreeの精度で効率的に近似するように設計されている。さらに,分散還元分布ロバストなq-learningは,同期q-learningと分散還元技術を組み合わせて,その性能を向上させる。その結果,$s$ と $a$ が状態空間と作用空間を表す場合,$\tilde o(\|s\|\|a\|(1-\gamma)^{-4}\epsilon^{-2})$ の上限値のminmaxサンプル複雑性が得られる。これは不確実性サイズ$\delta$から独立した最初の複雑性結果であり、新しい複雑性理論的な洞察を提供する。さらに、一連の数値実験により、分布シフトを扱うアルゴリズムの理論的知見と効率が確認された。 Dynamic decision making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment on which the data is collected can differ from that of the environment on which the model is deployed. This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced counterpart, that can effectively learn a robust policy despite distributional shifts. These algorithms are designed to efficiently approximate the $q$-function of an infinite-horizon $\gamma$-discounted robust Markov decision process with Kullback-Leibler uncertainty set to an entry-wise $\epsilon$-degree of precision. Further, the variance-reduced distributionally robust Q-learning combines the synchronous Q-learning with variance-reduction techniques to enhance its performance. Consequently, we establish that it attains a minmax sample complexity upper bound of $\tilde O(\|S\|\|A\|(1-\gamma)^{-4}\epsilon^{-2})$, where $S$ and $A$ denote the state and action spaces. This is the first complexity result that is independent of the uncertainty size $\delta$, thereby providing new complexity theoretic insights. Additionally, a series of numerical experiments confirm the theoretical findings and the efficiency of the algorithms in handling distributional shifts.	翻訳日:2023-05-31 21:25:35 公開日:2023-05-28
# 双方向言語モデルによるセマンティックセグメンテーションによる長期ASRの改善 Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR ( http://arxiv.org/abs/2305.18419v1 ) ライセンス: Link先を確認	W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath	(参考訳) 音声中の意味論的完全文を分離し,長文音声を分割する手法を提案する。これにより、ASRデコーダは不要に遠くのコンテキストを処理できなくなると同時に、現在の文内で関連するコンテキストが失われることを防ぐことができる。意味論的に完全な文境界は典型的には句読点によって区切られるが、残念ながら実世界の発話には句読点がほとんど含まれない。本研究は,文章・句読点に基づく双方向教師言語モデル(LM)から句読点知識を抽出することにより,この制限に対処する。本研究は, LM教師から蒸留したセグメンタと, 他の作品で使用されている音響ポーズベースの教師から蒸留したセグメンタとを, ストリーミングASRパイプラインで比較した。当社のsegmenterを使ったパイプラインは、youtubeのキャプションタスクにおいて、平均60msのレイテンシ削減とともに、平均3.2%のwarゲインを達成しています。 We propose a method of segmenting long-form speech by separating semantically complete sentences within the utterance. This prevents the ASR decoder from needlessly processing faraway context while also preventing it from missing relevant context within the current sentence. Semantically complete sentence boundaries are typically demarcated by punctuation in written text; but unfortunately, spoken real-world utterances rarely contain punctuation. We address this limitation by distilling punctuation knowledge from a bidirectional teacher language model (LM) trained on written, punctuated text. We compare our segmenter, which is distilled from the LM teacher, against a segmenter distilled from a acoustic-pause-based teacher used in other works, on a streaming ASR pipeline. The pipeline with our segmenter achieves a 3.2% relative WER gain along with a 60 ms median end-of-segment latency reduction on a YouTube captioning task.	翻訳日:2023-05-31 21:25:09 公開日:2023-05-28
# ビデオ連続学習のための時間情報の再検討 Just a Glimpse: Rethinking Temporal Information for Video Continual Learning ( http://arxiv.org/abs/2305.18418v1 ) ライセンス: Link先を確認	Lama Alssum, Juan Leon Alcazar, Merey Ramazanova, Chen Zhao, Bernard Ghanem	(参考訳) クラス増分学習は、現実世界のアプリケーションシナリオによく似ているため、継続的学習の研究において最も重要な設定の1つである。メモリサイズが制限されると、クラスやタスクの数が増えると、壊滅的な忘れることになる。ビデオ領域での継続的な学習は、ビデオデータが大量のフレームを含んでいるため、リプレイメモリにより高い負担がかかるため、さらに課題となる。現在の一般的なプラクティスは、ビデオストリームからサブサンプルのフレームをリプレイメモリに格納することです。本稿では,個別フレームに基づく効果的なビデオ連続学習のための新しい再生機構SMILEを提案する。広範にわたる実験により,映像の多様性は時間的情報よりも重要な役割を担っていることが明らかとなった。そこで本手法は,多数の一意なビデオを表す少数のフレームから学習することに焦点を当てている。 3つの代表的なビデオデータセット、kinetics, ucf101, activitynetにおいて、提案手法は最先端の性能を最大21.49%向上させた。 Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles real-world application scenarios. With constrained memory sizes, catastrophic forgetting arises as the number of classes/tasks increases. Studying continual learning in the video domain poses even more challenges, as video data contains a large number of frames, which places a higher burden on the replay memory. The current common practice is to sub-sample frames from the video stream and store them in the replay memory. In this paper, we propose SMILE a novel replay mechanism for effective video continual learning based on individual/single frames. Through extensive experimentation, we show that under extreme memory constraints, video diversity plays a more significant role than temporal information. Therefore, our method focuses on learning from a small number of frames that represent a large number of unique videos. On three representative video datasets, Kinetics, UCF101, and ActivityNet, the proposed method achieves state-of-the-art performance, outperforming the previous state-of-the-art by up to 21.49%.	翻訳日:2023-05-31 21:24:53 公開日:2023-05-28
# 配電系統の一般化を支援する格子符号上の決定点プロセスの注意 Determinantal Point Process Attention Over Grid Codes Supports Out of Distribution Generalization ( http://arxiv.org/abs/2305.18417v1 ) ライセンス: Link先を確認	Shanka Subhra Mondal, Steven Frankland, Taylor Webb, and Jonathan D. Cohen	(参考訳) ディープニューラルネットワークは、人間のような知性をエミュレートする上で大きな進歩を遂げており、脳がそれに依存する複雑な計算問題をどう解決するかを理解する方法として、ますます使われている。しかし、これらはまだ不足しているため、脳が人間の能力の強い一般化をサポートする方法についての洞察を得られていない。そのようなケースの1つは、out-of-distribution (ood) generalization - トレーニングセットの配布外にあるテスト例での成功したパフォーマンスである。ここでは、この能力に寄与する可能性のある脳内処理の特性を同定する。本稿では,ood一般化を実現するために,神経計算の具体的特徴を浮き彫りにした2部アルゴリズムについて述べるとともに,二つの難解な認知タスクにおける性能評価による概念実証を提供する。まず、哺乳類の脳がグリッドのような表現(例えば、円錐皮質)を用いて計量空間を表すという事実を描き出す: 表現空間をカバーする繰り返しモチーフで組織された関係構造の抽象表現。次に,DPP-A(Determinantal Point Process)を用いて,これらのグリッド表現上での注意機構を提案する。本稿では,標準タスク最適化エラーと DPP-A を併用した損失関数がグリッド符号の繰り返しモチーフを利用でき,共通アーキテクチャと統合してアナログおよび算術タスクのOOD一般化性能を向上できることを示す。これは、哺乳類の脳におけるグリッドコードがどのように一般化性能に寄与するかの解釈と、ニューラルネットワークにおけるそのような能力を改善する潜在的な手段の両方を提供する。 Deep neural networks have made tremendous gains in emulating human-like intelligence, and have been used increasingly as ways of understanding how the brain may solve the complex computational problems on which this relies. However, these still fall short of, and therefore fail to provide insight into how the brain supports strong forms of generalization of which humans are capable. One such case is out-of-distribution (OOD) generalization -- successful performance on test examples that lie outside the distribution of the training set. Here, we identify properties of processing in the brain that may contribute to this ability. We describe a two-part algorithm that draws on specific features of neural computation to achieve OOD generalization, and provide a proof of concept by evaluating performance on two challenging cognitive tasks. First we draw on the fact that the mammalian brain represents metric spaces using grid-like representations (e.g., in entorhinal cortex): abstract representations of relational structure, organized in recurring motifs that cover the representational space. Second, we propose an attentional mechanism that operates over these grid representations using determinantal point process (DPP-A) -- a transformation that ensures maximum sparseness in the coverage of that space. We show that a loss function that combines standard task-optimized error with DPP-A can exploit the recurring motifs in grid codes, and can be integrated with common architectures to achieve strong OOD generalization performance on analogy and arithmetic tasks. This provides both an interpretation of how grid codes in the mammalian brain may contribute to generalization performance, and at the same time a potential means for improving such capabilities in artificial neural networks.	翻訳日:2023-05-31 21:24:35 公開日:2023-05-28
# インメモリコンピューティングにおける多様なハードウェアノイズを軽減するバッチノルム最適化の役割と限界の検討 Examining the Role and Limits of Batchnorm Optimization to Mitigate Diverse Hardware-noise in In-memory Computing ( http://arxiv.org/abs/2305.18416v1 ) ライセンス: Link先を確認	Abhiroop Bhattacharjee, Abhishek Moitra, Youngeun Kim, Yeshwanth Venkatesha, and Priyadarshini Panda	(参考訳) アナログクロスバーなどのインメモリコンピューティング(imc)プラットフォームは、高面積および計算効率の低精度ディープニューラルネットワーク(dnn)の高速化を促進するため、注目されている。しかし、しばしば非決定論的かつ非線形であるクロスバーの固有の非理想性は、デプロイされたdnnの性能を低下させる。量子化誤差に加えて、推論中に最も頻繁に遭遇する非理想性には、クロスバー回路レベルの寄生抵抗や、確率的読み取りノイズや時間ドリフトのようなデバイスレベルの非理想性が含まれる。本研究では,これら非理想性がアナログクロスバーのドット生成操作に与える影響を詳細に検討し,非理想性の影響を軽減するために,バッチノルムパラメータのクロスバーアウェア微調整により,ほぼトレーニングレスな解の実現可能性を検討することを目的とする。これにより、メモリとトレーニングエネルギーの観点からハードウェアコストを削減し、クロスバー上のDNN重みの再トレーニングをIMCが認識する。 In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area- & compute-efficiencies. However, the intrinsic non-idealities in crossbars, which are often non-deterministic and non-linear, degrade the performance of the deployed DNNs. In addition to quantization errors, most frequently encountered non-idealities during inference include crossbar circuit-level parasitic resistances and device-level non-idealities such as stochastic read noise and temporal drift. In this work, our goal is to closely examine the distortions caused by these non-idealities on the dot-product operations in analog crossbars and explore the feasibility of a nearly training-less solution via crossbar-aware fine-tuning of batchnorm parameters in real-time to mitigate the impact of the non-idealities. This enables reduction in hardware costs in terms of memory and training energy for IMC noise-aware retraining of the DNN weights on crossbars.	翻訳日:2023-05-31 21:24:06 公開日:2023-05-28
# 幾何代数変換器 Geometric Algebra Transformers ( http://arxiv.org/abs/2305.18415v1 ) ライセンス: Link先を確認	Johann Brehmer, Pim de Haan, S\"onke Behrends, Taco Cohen	(参考訳) 幾何学的データに関わる問題は、コンピュータビジョン、ロボティクス、化学、物理学など様々な分野で発生する。このようなデータは、点、方向ベクトル、平面、変換などの多くの形式を取ることができるが、これまでは、それらの対称性を尊重しながら、そのような様々な幾何学的タイプに適用できる単一のアーキテクチャは存在しない。本稿では,幾何学データのための汎用アーキテクチャであるGeometric Algebra Transformer (GATr)を紹介する。 GATrは射影幾何学代数における入力、出力、隠れ状態を表し、共通幾何学的対象の16次元ベクトル空間表現とそれらに作用する作用素を提供する。 GATr は E(3) に対して同変であり、3次元ユークリッド空間の対称性群である。トランスとしては、GATrはスケーラブルで表現力があり、多用途である。 n体モデリングとロボット計画の実験では、GATrは非幾何学的ベースラインよりも強力な改善を示している。 Problems involving geometric data arise in a variety of fields, including computer vision, robotics, chemistry, and physics. Such data can take numerous forms, such as points, direction vectors, planes, or transformations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric algebra, which offers an efficient 16-dimensional vector space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a transformer, GATr is scalable, expressive, and versatile. In experiments with n-body modeling and robotic planning, GATr shows strong improvements over non-geometric baselines.	翻訳日:2023-05-31 21:23:48 公開日:2023-05-28
# StEik: ニューラルサイン付き距離関数の最適化と有限形状表現の安定化 StEik: Stabilizing the Optimization of Neural Signed Distance Functions and Finer Shape Representation ( http://arxiv.org/abs/2305.18414v1 ) ライセンス: Link先を確認	Huizong Yang, Yuxin Sun, Ganesh Sundaramoorthi, Anthony Yezzi	(参考訳) 形態の暗黙的神経表現(INR)を学習するための新しい知見と新しいパラダイム(StEik)を提案する。特に,INRに符号付き距離関数制約を課すのによく使われるエイコナール損失に光を当てた。ネットワークの表現力が増加するにつれて、最適化は連続極限における偏微分方程式(PDE)に近づき、不安定となることを示す。この不安定性は, 既設のネットワーク最適化において発現し, 再構成表面の不規則性や, 局所的局所最小値への収束を招き, 微妙な幾何学的・位相的構造を捉えることができないことを示す。我々は、現在文献で使われている損失に付加された他の用語が、実際にこれらの不安定性を排除することができるかを分析的に示す。しかし、そのような用語は表面を過度に規則化することができ、微細な形状の表現を妨げている。同様の連続体極限のpde理論に基づき、固有不安定性は相反するが過剰正規化はしない新しい正規化項を導入する。さらに, 安定度は連続限界で保証されているため, この安定化により, より微細な形状の細部を表現できる新しいネットワーク構造も検討できる。このような構造を二次層に導入する。複数のベンチマークデータセットの実験により、我々の新しい正規化とネットワークは、既存の最先端技術よりも正確な形状の詳細と正確なトポロジを捉えることができることが示された。 We present new insights and a novel paradigm (StEik) for learning implicit neural representations (INR) of shapes. In particular, we shed light on the popular eikonal loss used for imposing a signed distance function constraint in INR. We show analytically that as the representation power of the network increases, the optimization approaches a partial differential equation (PDE) in the continuum limit that is unstable. We show that this instability can manifest in existing network optimization, leading to irregularities in the reconstructed surface and/or convergence to sub-optimal local minima, and thus fails to capture fine geometric and topological structure. We show analytically how other terms added to the loss, currently used in the literature for other purposes, can actually eliminate these instabilities. However, such terms can over-regularize the surface, preventing the representation of fine shape detail. Based on a similar PDE theory for the continuum limit, we introduce a new regularization term that still counteracts the eikonal instability but without over-regularizing. Furthermore, since stability is now guaranteed in the continuum limit, this stabilization also allows for considering new network structures that are able to represent finer shape detail. We introduce such a structure based on quadratic layers. Experiments on multiple benchmark data sets show that our new regularization and network are able to capture more precise shape details and more accurate topology than existing state-of-the-art.	翻訳日:2023-05-31 21:23:34 公開日:2023-05-28
# APIから学ぶ: Black-Box Data-Free Meta-Learning Learning to Learn from APIs: Black-Box Data-Free Meta-Learning ( http://arxiv.org/abs/2305.18413v1 ) ライセンス: Link先を確認	Zixuan Hu, Li Shen, Zhenyi Wang, Baoyuan Wu, Chun Yuan, Dacheng Tao	(参考訳) data-free meta-learning(dfml)の目的は、トレーニングデータにアクセスせずに事前学習されたモデルの集合からメタラーニングすることで、新しいタスクの効率的な学習を可能にすることである。既存のDFML作業はメタ学習しかできない (i)ホワイトボックス、及び (ii)小規模事前訓練モデル (iii)同じアーキテクチャで、任意のモデルアーキテクチャと内部のモデルスケールを備えたAPIへの推論アクセスしか持たない、より実用的な設定を無視します。本稿では,ブラックボックスapiの集合から単一メタモデルへ,より汎用的なメタ知識を転送するためのbi-level data-free meta knowledge distillation (bidf-mkd)フレームワークを提案する。具体的には、APIを照会するだけで、各APIを逆転して、ゼロ階勾配推定器を介してトレーニングデータを回復し、新しい二段階メタ知識蒸留構造を用いてメタラーニングを行い、境界クエリセットの回復手法を設計して、決定境界付近のより情報的なクエリセットを復元する。また,限られたAPI予算の設定内での一般化を促進するため,より補間されたタスクをカバーし,タスク分布の多様化を図るタスクメモリ再生を提案する。 bidf-mkdフレームワークの優れた性能を示す、さまざまな現実世界のシナリオにおける広範囲な実験。 Data-free meta-learning (DFML) aims to enable efficient learning of new tasks by meta-learning from a collection of pre-trained models without access to the training data. Existing DFML work can only meta-learn from (i) white-box and (ii) small-scale pre-trained models (iii) with the same architecture, neglecting the more practical setting where the users only have inference access to the APIs with arbitrary model architectures and model scale inside. To solve this issue, we propose a Bi-level Data-free Meta Knowledge Distillation (BiDf-MKD) framework to transfer more general meta knowledge from a collection of black-box APIs to one single meta model. Specifically, by just querying APIs, we inverse each API to recover its training data via a zero-order gradient estimator and then perform meta-learning via a novel bi-level meta knowledge distillation structure, in which we design a boundary query set recovery technique to recover a more informative query set near the decision boundary. In addition, to encourage better generalization within the setting of limited API budgets, we propose task memory replay to diversify the underlying task distribution by covering more interpolated tasks. Extensive experiments in various real-world scenarios show the superior performance of our BiDf-MKD framework.	翻訳日:2023-05-31 21:23:10 公開日:2023-05-28
# ホークスプロセスによる異種事象の短期的時間依存性検出 Short-term Temporal Dependency Detection under Heterogeneous Event Dynamic with Hawkes Processes ( http://arxiv.org/abs/2305.18412v1 ) ライセンス: Link先を確認	Yu Chen, Fengpei Li, Anderson Schneider, Yuriy Nevmyvaka, Asohan Amarasingham, Henry Lam	(参考訳) 多くのイベントシーケンスデータは相互に刺激的あるいは抑制的なパターンを示す。このような時間依存の信頼できる検出は科学的調査に不可欠である。事実上のモデルはマルチ変数ホークスプロセス(MHP)であり、その影響関数はグランガー因果関係の因果構造を自然に符号化する。しかし、既存の手法の大半は、実世界のデータと矛盾する一定のベースラインを持つ標準MHP強度の直接変換または非線形変換を用いる。不規則で不均一な強度の下では、相互相互作用の効果と強度変動の影響を区別するのに苦労するため、時間的依存を捉えることは困難である。本稿では,短期の時間依存検出問題に対処する。 MHPのクロスインパクトに対する最大誤差推定(MLE)は,対象HPではなく相互作用HPのヘテロジニアス強度を用いて,除去できないがマグニチュードで低減できる誤差を有することを示す。そこで我々は、MLEから修正した頑健で計算効率のよい手法を提案し、不均一強度の事前推定に頼らず、データ制限方式(例:少数ショット、反復観察なし)に適用できることを示した。様々なデータセットを広範囲に実験した結果,本手法は神経科学における新たな応用が注目され,既存の手法よりも有意なマージンで勝っていることがわかった。 Many event sequence data exhibit mutually exciting or inhibiting patterns. Reliable detection of such temporal dependency is crucial for scientific investigation. The de facto model is the Multivariate Hawkes Process (MHP), whose impact function naturally encodes a causal structure in Granger causality. However, the vast majority of existing methods use direct or nonlinear transform of standard MHP intensity with constant baseline, inconsistent with real-world data. Under irregular and unknown heterogeneous intensity, capturing temporal dependency is hard as one struggles to distinguish the effect of mutual interaction from that of intensity fluctuation. In this paper, we address the short-term temporal dependency detection issue. We show the maximum likelihood estimation (MLE) for cross-impact from MHP has an error that can not be eliminated but may be reduced by order of magnitude, using heterogeneous intensity not of the target HP but of the interacting HP. Then we proposed a robust and computationally-efficient method modified from MLE that does not rely on the prior estimation of the heterogeneous intensity and is thus applicable in a data-limited regime (e.g., few-shot, no repeated observations). Extensive experiments on various datasets show that our method outperforms existing ones by notable margins, with highlighted novel applications in neuroscience.	翻訳日:2023-05-31 21:22:47 公開日:2023-05-28
# 拡散モデルを用いた認知型クロスモーダルデータ生成 Cognitively Inspired Cross-Modal Data Generation Using Diffusion Models ( http://arxiv.org/abs/2305.18433v1 ) ライセンス: Link先を確認	Zizhao Hu, Mohammad Rostami	(参考訳) 拡散モデルに基づく既存のクロスモーダル生成法の多くは、異なるモダリティをまたいだ条件付き生成を可能にするために潜在空間の制御を提供するためのガイダンスを用いる。このような方法は、1つのモダリティのために個別に訓練されたモデルを通してガイダンスを提供することに焦点を当てている。その結果、これらの手法はクロスモーダル情報損失に悩まされ、一方向条件生成に限られる。マルチモーダル情報を取得し,モダリティ間の相関を学習する方法に着想を得て,チャネル毎のイメージコンディショニングを用いたマルチモーダル拡散モデルの学習とサンプリングスキームを,脳内の学習プロセスを模倣するためにトレーニングフェーズ中に学習する。実験の結果,すべての相関モダリティを条件としたデータ生成が可能となった。 Most existing cross-modal generative methods based on diffusion models use guidance to provide control over the latent space to enable conditional generation across different modalities. Such methods focus on providing guidance through separately-trained models, each for one modality. As a result, these methods suffer from cross-modal information loss and are limited to unidirectional conditional generation. Inspired by how humans synchronously acquire multi-modal information and learn the correlation between modalities, we explore a multi-modal diffusion model training and sampling scheme that uses channel-wise image conditioning to learn cross-modality correlation during the training phase to better mimic the learning process in the brain. Our empirical results demonstrate that our approach can achieve data generation conditioned on all correlated modalities.	翻訳日:2023-05-31 21:15:06 公開日:2023-05-28
# 説明可能なモデリングのための完全可視化による対話型決定木作成と拡張 Interactive Decision Tree Creation and Enhancement with Complete Visualization for Explainable Modeling ( http://arxiv.org/abs/2305.18432v1 ) ライセンス: Link先を確認	Boris Kovalerchuk Andrew Dunn, Alex Worland, Sridevi Wagle	(参考訳) 機械学習(ML)モデルの解釈可能性と予測精度を高めるため、MLモデルの可視化はMLプロセスの重要な部分である。決定木(DT)は、ディープラーニングモデルを含む多くのブラックボックスMLモデルを理解するために使用されるため、機械学習(ML)において不可欠である。本研究では,決定木を理解可能なモデルとして完全可視化する2つの新しい手法を提案する。これらの手法は、GLC(General Line Coordinates)とBC(Bended Coordinates)とSPC(Shifted Paired Coordinates)の2つのバージョンを使用する。曲げ座標は線座標の集合であり、各座標は各DTノードのしきい値点に曲げられる。 spcでは、各 n-d 点を 2-次元デカルト座標のシフト対を有向グラフとして可視化する。これらの新しいメソッドは、DTモデルをより完全に視覚化する既存のメソッドの機能を拡張し、補完する。これらの機能は,(1)属性間の関係,(2)DT構造に対する個々のケース,(3)DT内のデータフロー,(4)DTノード内の各分割しきい値の感度,(5)N-D空間の一部のケースの密度,の観測と解析を可能にする。これらの機能は、DTモデルの過剰な一般化や過度な適合を防ぐのに役立つため、ドメインの専門家やエンドユーザによるDTモデルのパフォーマンス評価と改善に不可欠である。この手法の利点は、実世界のベンチマークデータセットのケーススタディで説明される。この論文は、異なる一般線座標における決定木の可視化のためにそれらを一般化する方法も示している。 To increase the interpretability and prediction accuracy of the Machine Learning (ML) models, visualization of ML models is a key part of the ML process. Decision Trees (DTs) are essential in machine learning (ML) because they are used to understand many black box ML models including Deep Learning models. In this research, two new methods for creation and enhancement with complete visualizing Decision Trees as understandable models are suggested. These methods use two versions of General Line Coordinates (GLC): Bended Coordinates (BC) and Shifted Paired Coordinates (SPC). The Bended Coordinates are a set of line coordinates, where each coordinate is bended in a threshold point of the respective DT node. In SPC, each n-D point is visualized in a set of shifted pairs of 2-D Cartesian coordinates as a directed graph. These new methods expand and complement the capabilities of existing methods to visualize DT models more completely. These capabilities allow us to observe and analyze: (1) relations between attributes, (2) individual cases relative to the DT structure, (3) data flow in the DT, (4) sensitivity of each split threshold in the DT nodes, and (5) density of cases in parts of the n-D space. These features are critical for DT models' performance evaluation and improvement by domain experts and end users as they help to prevent overgeneralization and overfitting of the models. The advantages of this methodology are illustrated in the case studies on benchmark real-world datasets. The paper also demonstrates how to generalize them for decision tree visualizations in different General Line Coordinates.	翻訳日:2023-05-31 21:14:50 公開日:2023-05-28
# マルチタスク学習によるAirbnb検索ジャーニーの最適化 Optimizing Airbnb Search Journey with Multi-task Learning ( http://arxiv.org/abs/2305.18431v1 ) ライセンス: Link先を確認	Chun How Tan, Austin Chan, Malay Haldar, Jie Tang, Xin Liu, Mustafa Abdool, Huiji Gao, Liwei He, Sanjeev Katariya	(参考訳) 宿泊や体験のためのオンラインマーケットプレイスであるairbnbでは、宿泊客は予約リクエストが終わるまで数週間かけて複数のアイテムを探索し比較する。各予約要求は、チェックイン前にホストによって拒否またはキャンセルされる可能性がある。検索の旅路の長くて探索的な性質と、ゲストとホストの好みのバランスをとる必要性は、airbnbの検索ランキングにユニークな課題をもたらす。本稿では、これらの課題に対処する、新しいマルチタスクディープラーニングモデルアーキテクチャである journey ranker について述べる。 journey rankerは、中間のゲストアクションをポジティブとネガティブの両方のマイルストーンとして活用し、ゲストの予約を成功に導く。また、ゲスト状態や検索クエリなどのコンテキスト情報を使用して、ゲストとホストの好みのバランスをとる。モジュールで拡張可能な設計で、懸念を明確に分離した4つのモジュールで構成されており、Airbnbの検索ランキングコンテキストを超えたケースを簡単に使用できる。 Journey Rankerのオフラインおよびオンラインテストを実施して、4つのAirbnb製品に本番環境でのデプロイに成功した。 At Airbnb, an online marketplace for stays and experiences, guests often spend weeks exploring and comparing multiple items before making a final reservation request. Each reservation request may then potentially be rejected or cancelled by the host prior to check-in. The long and exploratory nature of the search journey, as well as the need to balance both guest and host preferences, present unique challenges for Airbnb search ranking. In this paper, we present Journey Ranker, a new multi-task deep learning model architecture that addresses these challenges. Journey Ranker leverages intermediate guest actions as milestones, both positive and negative, to better progress the guest towards a successful booking. It also uses contextual information such as guest state and search query to balance guest and host preferences. Its modular and extensible design, consisting of four modules with clear separation of concerns, allows for easy application to use cases beyond the Airbnb search ranking context. We conducted offline and online testing of the Journey Ranker and successfully deployed it in production to four different Airbnb products with significant business metrics improvements.	翻訳日:2023-05-31 21:14:24 公開日:2023-05-28
# スケーラブルで弱められた銀行取引分類 Scalable and Weakly Supervised Bank Transaction Classification ( http://arxiv.org/abs/2305.18430v1 ) ライセンス: Link先を確認	Liam Toran, Cory Van Der Walt, Alan Sammarone, Alex Keller (Flowcast.ai)	(参考訳) 本稿では,弱い監督,自然言語処理,ディープニューラルネットワーク技術を用いて,銀行取引を分類することを目的とする。我々の手法は、ヒューリスティックスとドメイン知識を活用して正確なトランザクション分類器を訓練することで、高価で入手が難しい手動アノテーションへの依存を最小限に抑える。本稿では,データプリプロセッシング,トランザクションテキスト埋め込み,アンカー,ラベル生成,識別型ニューラルネットワークトレーニング,システムアーキテクチャの概要など,効果的でスケーラブルなエンドツーエンドデータパイプラインを提案する。本手法は,既存の市場主導型ソリューションよりも優れており,正確な分類が可能であり,新規および複合的なユースケースに素早く拡張できることを示す。これにより、金融健康報告や信用リスク評価など、多くの金融応用を解き放つことができる。 This paper aims to categorize bank transactions using weak supervision, natural language processing, and deep neural network techniques. Our approach minimizes the reliance on expensive and difficult-to-obtain manual annotations by leveraging heuristics and domain knowledge to train accurate transaction classifiers. We present an effective and scalable end-to-end data pipeline, including data preprocessing, transaction text embedding, anchoring, label generation, discriminative neural network training, and an overview of the system architecture. We demonstrate the effectiveness of our method by showing it outperforms existing market-leading solutions, achieves accurate categorization, and can be quickly extended to novel and composite use cases. This can in turn unlock many financial applications such as financial health reporting and credit risk assessment.	翻訳日:2023-05-31 21:14:06 公開日:2023-05-28
# 一般線座標を用いた視覚知識発見 Visual Knowledge Discovery with General Line Coordinates ( http://arxiv.org/abs/2305.18429v1 ) ライセンス: Link先を確認	Lincoln Huber, Boris Kovalerchuk, Charles Recaido	(参考訳) 多次元データによるブラックボックス機械学習手法の理解は、機械学習の重要な課題である。多くの強力な機械学習手法がすでに存在するが、これらの手法はしばしば説明がつかないか、複雑なデータでは性能が悪い。本稿では,ロスレス一般線座標を用いた視覚知識発見手法を提案する。これらは、説明規則で非線形分類器を生成、説明、視覚化するために、以前に導入された一般直線座標と動的足場座標の拡張である。これらの非線形モデルとルールの正確性を保証するため、ラインコーディネート・リニアは最悪の検証分割を見つけるためのインタラクティブな視覚知識発見アルゴリズムも開発した。これらの拡張は、非線形、インタラクティブな規則、ハイパーブロックルール、最悪のケースリニアである。複数のベンチマークデータセットにまたがる実験により、この視覚知識探索法は他の視覚的および計算的機械学習アルゴリズムと競合し、線形および非線形分類における解釈可能性と精度の両方を改善した。これらの拡張の主な利点は、ハイパーブロックから正確で高度に解釈可能なモデルやルールを構築する能力、モデルの解釈可能性の弱さを分析する能力、対話的で人間主導の視覚知識発見手法による専門家知識の入力などである。 Understanding black-box Machine Learning methods on multidimensional data is a key challenge in Machine Learning. While many powerful Machine Learning methods already exist, these methods are often unexplainable or perform poorly on complex data. This paper proposes visual knowledge discovery approaches based on several forms of lossless General Line Coordinates. These are an expansion of the previously introduced General Line Coordinates Linear and Dynamic Scaffolding Coordinates to produce, explain, and visualize non-linear classifiers with explanation rules. To ensure these non-linear models and rules are accurate, General Line Coordinates Linear also developed new interactive visual knowledge discovery algorithms for finding worst-case validation splits. These expansions are General Line Coordinates non-linear, interactive rules linear, hyperblock rules linear, and worst-case linear. Experiments across multiple benchmark datasets show that this visual knowledge discovery method can compete with other visual and computational Machine Learning algorithms while improving both interpretability and accuracy in linear and non-linear classifications. Major benefits from these expansions consist of the ability to build accurate and highly interpretable models and rules from hyperblocks, the ability to analyze interpretability weaknesses in a model, and the input of expert knowledge through interactive and human-guided visual knowledge discovery methods.	翻訳日:2023-05-31 21:13:53 公開日:2023-05-28
# GRD:強化学習における解釈可能な再分配のための生成的アプローチ GRD: A Generative Approach for Interpretable Reward Redistribution in Reinforcement Learning ( http://arxiv.org/abs/2305.18427v1 ) ライセンス: Link先を確認	Yudi Zhang, Yali Du, Biwei Huang, Ziyan Wang, Jun Wang, Meng Fang, Mykola Pechenizkiy	(参考訳) 強化学習における大きな課題は、将来の報酬にどの状態-作用ペアが責任を持つかを決定することである。 Return Decompositionは、ポリシーの不変性を保ちながら、観測されたシーケンスから報酬を再分配するソリューションを提供する。現在行われているほとんどのアプローチは、報酬の再分配を解釈不能な方法で構築するが、因果的観点から状態と行動の寄与を明示的にモデル化し、解釈可能な戻り分解をもたらす。本稿では,マルコフ報酬の生成と軌道回りの長期リターンを特徴付けることによる回帰分解における因果生成モデルの役割を考察し,遅延報酬シナリオにおける政策最適化のための生成回帰分解(grd)と呼ばれる枠組みを提案する。具体的には、GRDはまず、生成過程における観測不可能なマルコフ報酬と因果関係を識別する。そして、GRDは同定された因果生成モデルを用いて、エージェントの状態空間の最も好ましい部分空間上のポリシーを訓練するためのコンパクトな表現を形成する。理論的には、観測不能なマルコフ報酬関数は、基礎となる因果構造や因果モデルと同様に識別可能である。実験結果から,本手法は最先端の手法よりも優れており,その可視化によりさらに解釈性が示された。 A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed. Return Decomposition offers a solution by redistributing rewards from observed sequences while preserving policy invariance. While the majority of current approaches construct the reward redistribution in an uninterpretable manner, we propose to explicitly model the contributions of state and action from a causal perspective, resulting in an interpretable return decomposition. In this paper, we start by studying the role of causal generative models in return decomposition by characterizing the generation of Markovian rewards and trajectory-wise long-term return and further propose a framework, called Generative Return Decomposition (GRD), for policy optimization in delayed reward scenarios. Specifically, GRD first identifies the unobservable Markovian rewards and causal relations in the generative process. Then, GRD makes use of the identified causal generative model to form a compact representation to train policy over the most favorable subspace of the state space of the agent. Theoretically, we show that the unobservable Markovian reward function is identifiable, as well as the underlying causal structure and causal models. Experimental results show that our method outperforms state-of-the-art methods and the provided visualization further demonstrates the interpretability of our method.	翻訳日:2023-05-31 21:13:31 公開日:2023-05-28
# 付加製造試料の入力変数と引張強度の相関解析における説明可能な人工知能(XAI)手法の適用 Employing Explainable Artificial Intelligence (XAI) Methodologies to Analyze the Correlation between Input Variables and Tensile Strength in Additively Manufactured Samples ( http://arxiv.org/abs/2305.18426v1 ) ライセンス: Link先を確認	Akshansh Mishra, Vijaykumar S Jatti	(参考訳) 本研究では, インフィルパーセンテージ, 層高さ, 押出温度, 印刷速度などの入力パラメータが, 添加物製造による引張強度に及ぼす影響について検討した。本研究の目的は, 入力パラメータと引張強度の相関関係の理解を深めることと, 添加物製造プロセスの性能に影響を与える要因を明らかにすることである。この目的を達成するために,説明可能な人工知能(xai)技術を初めて活用し,データを分析し,システムの振る舞いに関する貴重な洞察を得ることができた。具体的には、機械学習モデル予測を解釈するための広く採用されているフレームワークであるSHAP(SHapley Additive exPlanations)を用いて、データに基づいてトレーニングされた機械学習モデルの振る舞いを説明する。その結果, インフィル率と押出温度は引張強度に最も大きな影響を与えるが, 層の高さや印刷速度の影響は比較的小さいことがわかった。さらに,入力パラメータと引張強度の関係は複雑で非線形であり,単純な線形モデルを用いて正確に記述することは困難であることがわかった。 This research paper explores the impact of various input parameters, including Infill percentage, Layer Height, Extrusion Temperature, and Print Speed, on the resulting Tensile Strength in objects produced through additive manufacturing. The main objective of this study is to enhance our understanding of the correlation between the input parameters and Tensile Strength, as well as to identify the key factors influencing the performance of the additive manufacturing process. To achieve this objective, we introduced the utilization of Explainable Artificial Intelligence (XAI) techniques for the first time, which allowed us to analyze the data and gain valuable insights into the system's behavior. Specifically, we employed SHAP (SHapley Additive exPlanations), a widely adopted framework for interpreting machine learning model predictions, to provide explanations for the behavior of a machine learning model trained on the data. Our findings reveal that the Infill percentage and Extrusion Temperature have the most significant influence on Tensile Strength, while the impact of Layer Height and Print Speed is relatively minor. Furthermore, we discovered that the relationship between the input parameters and Tensile Strength is highly intricate and nonlinear, making it difficult to accurately describe using simple linear models.	翻訳日:2023-05-31 21:13:07 公開日:2023-05-28
# 重み残差の低ランク近似による微調整モデルの効率的な保存 Efficient Storage of Fine-Tuned Models via Low-Rank Approximation of Weight Residuals ( http://arxiv.org/abs/2305.18425v1 ) ライセンス: Link先を確認	Simo Ryu, Seunghyun Seo, Jaejun Yoo	(参考訳) 本稿では,重量残差の低ランク特性を活かし,微調整モデルの効率的な保存法を提案する。我々の重要な観察は、大きな過パラメータモデルの重量残差がより強い低ランク特性を示すことである。この知見に基づき,低位重み残差を近似することにより,微調整モデル重みの効率的な保存を実現する新しい手法である効率的な残差符号化(ere)を提案する。さらに, 重み残差のロバスト性を分析し, 付加量子化と層別ランク割当てを利用して, 貯蔵効率の限界を押し上げる。実験の結果,様々なタスクやモダリティのパフォーマンスを保ちながらメモリフットプリントを大幅に削減できることがわかった。コードをリリースします。 In this paper, we present an efficient method for storing fine-tuned models by leveraging the low-rank properties of weight residuals. Our key observation is that weight residuals in large overparameterized models exhibit even stronger low-rank characteristics. Based on this insight, we propose Efficient Residual Encoding (ERE), a novel approach that achieves efficient storage of fine-tuned model weights by approximating the low-rank weight residuals. Furthermore, we analyze the robustness of weight residuals and push the limit of storage efficiency by utilizing additional quantization and layer-wise rank allocation. Our experimental results demonstrate that our method significantly reduces memory footprint while preserving performance in various tasks and modalities. We release our code.	翻訳日:2023-05-31 21:12:47 公開日:2023-05-28
# 学習時間と精度の最小化のための繰り返しランダムサンプリング Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning ( http://arxiv.org/abs/2305.18424v1 ) ライセンス: Link先を確認	Patrik Okanovic, Roger Waleffe, Vasilis Mageirakos, Konstantinos E. Nikolakakis, Amin Karbasi, Dionysis Kalogerias, Nezihe Merve G\"urel, Theodoros Rekatsinas	(参考訳) データプルーニング、コアセット選択、データ蒸留等から学習するための少量のトレーニングデータを慎重に選択または生成する方法は、ニューラルネットワークのトレーニングコストの増大を減少させるのに有効であることが示されている。この成功の背後には、大規模なデータセットから有益なトレーニング例を特定するための厳密な設計戦略がある。しかし、これらの戦略は、訓練開始前にサブセットの選択やデータの蒸留に関連する追加計算コストを伴い、さらに、高データ圧縮方式では、多種多様でないランダムサンプリングさえ示される。そのため、多くのデータプルーニング、コアセット選択、蒸留法は、大規模データセット上でディープニューラルネットワークをトレーニングするための重要な効率指標となっている「正確化までの時間」を削減できない。本研究では,これらの課題に対処するために,強力で見過ごされているランダムサンプリング戦略を再検討し,モデルのトレーニング毎にトレーニングデータのサブセットをランダムにサンプリングする,ランダムサブセット(rsrまたはrs2)を繰り返しサンプリングする手法を導入する。我々は、imagenetを含む4つのデータセットにまたがる30の最先端データプルーニングとデータ蒸留法に対してrs2をテストする。その結果,RS2は既存の手法に比べて時間と精度を著しく低下させることがわかった。例えば、圧縮方式(各エポックのデータセットの10%未満を使用して)でimagenetをトレーニングすると、rs2は、競合するpruningメソッドと比較して29%の精度向上を実現し、ランタイムの7倍の削減を提供する。上記のメタスタディを超えて、rs2の収束解析を行い、その一般化機能について論じる。私たちの研究の主な目標は、効率的なトレーニングを目的とした将来のデータ選択や蒸留技術のための競合ベースラインとしてrs2を確立することです。 Methods for carefully selecting or generating a small set of training data to learn from, i.e., data pruning, coreset selection, and data distillation, have been shown to be effective in reducing the ever-increasing cost of training neural networks. Behind this success are rigorously designed strategies for identifying informative training examples out of large datasets. However, these strategies come with additional computational costs associated with subset selection or data distillation before training begins, and furthermore, many are shown to even under-perform random sampling in high data compression regimes. As such, many data pruning, coreset selection, or distillation methods may not reduce 'time-to-accuracy', which has become a critical efficiency measure of training deep neural networks over large datasets. In this work, we revisit a powerful yet overlooked random sampling strategy to address these challenges and introduce an approach called Repeated Sampling of Random Subsets (RSRS or RS2), where we randomly sample the subset of training data for each epoch of model training. We test RS2 against thirty state-of-the-art data pruning and data distillation methods across four datasets including ImageNet. Our results demonstrate that RS2 significantly reduces time-to-accuracy compared to existing techniques. For example, when training on ImageNet in the high-compression regime (using less than 10% of the dataset each epoch), RS2 yields accuracy improvements up to 29% compared to competing pruning methods while offering a runtime reduction of 7x. Beyond the above meta-study, we provide a convergence analysis for RS2 and discuss its generalization capability. The primary goal of our work is to establish RS2 as a competitive baseline for future data selection or distillation techniques aimed at efficient training.	翻訳日:2023-05-31 21:12:33 公開日:2023-05-28
# 学習リカレントニューラルネットワークのサンプル複雑性におけるノイズの役割について--長い列の指数ギャップについて On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences ( http://arxiv.org/abs/2305.18423v1 ) ライセンス: Link先を確認	Alireza Fathollah Pour and Hassan Ashtiani	(参考訳) 我々は,ネットワーク内の各ニューロンの出力に$\mathcal{n}(0,\sigma^2)$による独立した雑音が分布する長さ$t$のシーケンスを分類するために,w$ (unbounded) 重み付き多層型sgmoid recurrentニューラルネットワークのクラスを考える。主な結果は、pac学習のサンプル複雑性が$o(w\log(t/\sigma))$で区切られることを示している。同じクラスの雑音のないバージョン(例えば$\sigma=0$)に対して、サンプル複雑性に対して$\Omega (wT)$の低い境界を証明する。以上の結果から,ノイズネットワークと非ノイズネットワークでは,サンプル複雑性の指数関数的差がt$で示される。さらに、1/\sigma$ 上の上限の軽度対数依存を考えると、このギャップは数値的に無視できる$\sigma$ の値でも維持される。 We consider the class of noisy multi-layered sigmoid recurrent neural networks with $w$ (unbounded) weights for classification of sequences of length $T$, where independent noise distributed according to $\mathcal{N}(0,\sigma^2)$ is added to the output of each neuron in the network. Our main result shows that the sample complexity of PAC learning this class can be bounded by $O (w\log(T/\sigma))$. For the non-noisy version of the same class (i.e., $\sigma=0$), we prove a lower bound of $\Omega (wT)$ for the sample complexity. Our results indicate an exponential gap in the dependence of sample complexity on $T$ for noisy versus non-noisy networks. Moreover, given the mild logarithmic dependence of the upper bound on $1/\sigma$, this gap still holds even for numerically negligible values of $\sigma$.	翻訳日:2023-05-31 21:12:00 公開日:2023-05-28
# 解釈可能な機械学習モデル発見のための並列座標 Parallel Coordinates for Discovery of Interpretable Machine Learning Models ( http://arxiv.org/abs/2305.18434v1 ) ライセンス: Link先を確認	Dustin Hayes, Boris Kovalerchuk	(参考訳) この研究は、並列座標における視覚的知識発見を用いて、解釈可能な機械学習の手法を前進させる。パラレル座標によるグラフィックデータ表現は、ハイパーキューブとハイパーブロック(hbs)の概念をエンドユーザにとって分かりやすくした。提案したデータ分類アルゴリズムであるHyperでは,混合および純粋なハイパーブロックを用いることが提案されている。ハイパーモデルは決定木を一般化する。アルゴリズムはいくつかの設定とオプションで表示され、インタラクティブ、自動オーバーラップ、非オーバーラップのハイパーブロックを検出する。さらに,視覚パターンの言語記述と連動してハイパーブロックの使用が実証された。 UCI MLリポジトリのベンチマークデータは、Hyperアルゴリズムを評価するために使用された。これにより、10倍のクロスバリデーションを用いて評価した混合HBと純粋なHBの発見が可能となった。ハイパーブロック間の接続、次元縮小、可視化が確立されている。エンドユーザーがハイパーブロックを見つけて観察する能力と、パターンを明確にするためのサイドバイサイドの可視化能力は、ハイパーブロック技術とハイパーアルゴリズムの大きな利点である。従来の並列座標ではサポートされていないが,不完全なn-Dデータを不完全な値で可視化する新しい手法を提案する。 HBが決定木上のデータの過一般化と過適合の両方を防止できる能力は、ハイパーブロックの別の利点として示される。ハイパーテクノロジーを実装するviscanvas 2.0ソフトウェアツールの特徴を紹介する。 This work uses visual knowledge discovery in parallel coordinates to advance methods of interpretable machine learning. The graphic data representation in parallel coordinates made the concepts of hypercubes and hyperblocks (HBs) simple to understand for end users. It is suggested to use mixed and pure hyperblocks in the proposed data classifier algorithm Hyper. It is shown that Hyper models generalize decision trees. The algorithm is presented in several settings and options to discover interactively or automatically overlapping or non-overlapping hyperblocks. Additionally, the use of hyperblocks in conjunction with language descriptions of visual patterns is demonstrated. The benchmark data from the UCI ML repository were used to evaluate the Hyper algorithm. It enabled the discovery of mixed and pure HBs evaluated using 10-fold cross validation. Connections among hyperblocks, dimension reduction and visualization have been established. The capability of end users to find and observe hyperblocks, as well as the ability of side-by-side visualizations to make patterns evident, are among major advantages ofhyperblock technology and the Hyper algorithm. A new method to visualize incomplete n-D data with missing values is proposed, while the traditional parallel coordinates do not support it. The ability of HBs to better prevent both overgeneralization and overfitting of data over decision trees is demonstrated as another benefit of the hyperblocks. The features of VisCanvas 2.0 software tool that implements Hyper technology are presented.	翻訳日:2023-05-31 21:03:07 公開日:2023-05-28
# Key-Value Transformer Key-Value Transformer ( http://arxiv.org/abs/2305.19129v1 ) ライセンス: Link先を確認	Ali Borji	(参考訳) トランスフォーマーは、コンピュータビジョンや自然言語処理など、さまざまなAIタスクの一般的な標準ソリューションとして登場した。広く採用されているクエリ、キー、値の定式化(qkv)が重要な役割を果たしている。それにもかかわらず、これら3つの部品のトランスフォーマー性能に関する本質的な研究は行われていない。そこで我々は,左右対称の注意マップを生成するキー値定式化(KV)と,2次元位置符号化をアテンションマトリックスに組み込んだ非対称バージョンの評価を行った。注目すべきは、この変換器は元のパラメータよりも少ないパラメータと計算を必要とすることだ。 3種類のタスクタイプ(例えば、リストの逆転やソート)、視覚(mnistまたはcifar classification)、NLP(character generation and translation))を含む実験を通して、KV変換器が時々QKV変換器を上回っていることが判明した。しかし、QKVと比較して性能の低い事例も示しており、決定的な結論を出すことは困難である。それでも我々は、報告された結果が将来のより効率的なトランスフォーマーへの道を開くことを奨励し、予測している。 Transformers have emerged as the prevailing standard solution for various AI tasks, including computer vision and natural language processing. The widely adopted Query, Key, and Value formulation (QKV) has played a significant role in this. Nevertheless, no research has examined the essentiality of these three components for transformer performance. Therefore, we conducted an evaluation of the key-value formulation (KV), which generates symmetric attention maps, along with an asymmetric version that incorporates a 2D positional encoding into the attention matrix. Remarkably, this transformer requires fewer parameters and computation than the original one. Through experiments encompassing three task types -- synthetics (such as reversing or sorting a list), vision (mnist or cifar classification), and NLP (character generation and translation) -- we discovered that the KV transformer occasionally outperforms the QKV transformer. However, it also exhibits instances of underperformance compared to QKV, making it challenging to draw a definitive conclusion. Nonetheless, we consider the reported results to be encouraging and anticipate that they may pave the way for more efficient transformers in the future.	翻訳日:2023-05-31 15:36:51 公開日:2023-05-28
# 文脈内学習におけるラベルバイアスの軽減 Mitigating Label Biases for In-context Learning ( http://arxiv.org/abs/2305.19148v1 ) ライセンス: Link先を確認	Yu Fei, Yifan Hou, Zeming Chen, Antoine Bosselut	(参考訳) インコンテキスト学習(ICL)のための様々な設計設定、例えばインコンテキストの例の選択と順序は、モデルの予測に偏りがある。多くの研究がこれらの設計選択について論じているが、それらを分類し、その影響を緩和する体系的な調査はほとんど行われていない。本研究では,テキスト分類におけるICLの3種類のラベルバイアスについて,バニララベルバイアス,コンテキストラベルバイアス,ドメインラベルバイアス(概念化と検出を初めて行う)の3種類のタイプを定義した。本分析により, 先行ラベルバイアス校正法は, 3種類のバイアスに対処できないことがわかった。特に、ドメインラベルバイアスは、コンテキスト内例の選択によらず、多くのタスクでllmをランダムレベルのパフォーマンスに制限する。これらのバイアスの影響を緩和するために,タスクコーパスからランダムなドメイン内単語を用いて言語モデルのラベルバイアスを推定する簡易なバイアス校正法を提案する。予測時のこの推定バイアスを制御した後、ドメインコンテキストキャリブレーションにより、幅広いタスクにおけるGPT-JとGPT-3のICL性能が大幅に向上する。利益はドメインラベルバイアスが大きいタスク(マクロf1では最大37%)に相当します。さらに,様々なスケール,プリトレーニング手法,手作業によるタスク指示のモデルに一般化し,iclにおけるラベルバイアスの有意さを示した。 Various design settings for in-context learning (ICL), such as the choice and order of the in-context examples, can bias the model's predictions. While many studies discuss these design choices, there have been few systematic investigations into categorizing them and mitigating their impact. In this work, we define a typology for three types of label biases in ICL for text classification: vanilla-label bias, context-label bias, and domain-label bias (which we conceptualize and detect for the first time). Our analysis demonstrates that prior label bias calibration methods fall short of addressing all three types of biases. Specifically, domain-label bias restricts LLMs to random-level performance on many tasks regardless of the choice of in-context examples. To mitigate the effect of these biases, we propose a simple bias calibration method that estimates a language model's label bias using random in-domain words from the task corpus. After controlling for this estimated bias when making predictions, our novel domain-context calibration significantly improves the ICL performance of GPT-J and GPT-3 on a wide range of tasks. The gain is substantial on tasks with large domain-label bias (up to 37% in Macro-F1). Furthermore, our results generalize to models with different scales, pretraining methods, and manually-designed task instructions, showing the prevalence of label biases in ICL.	翻訳日:2023-05-31 15:25:34 公開日:2023-05-28
# 有限次元ベイズ推論のための条件付きスコアベース拡散モデル Conditional score-based diffusion models for Bayesian inference in infinite dimensions ( http://arxiv.org/abs/2305.19147v1 ) ライセンス: Link先を確認	Lorenzo Baldassari, Ali Siahkoohi, Josselin Garnier, Knut Solna, Maarten V. de Hoop	(参考訳) 最初の導入以来、スコアベース拡散モデル(SDM)は、後方分布を効率的に近似する能力により、有限次元ベクトル空間における様々な線形逆問題の解法に成功している。しかし、無限次元関数空間の逆問題に対するsdmの使用は、最近、無条件スコアの学習によって解決された。このアプローチには、特定の逆問題に依存するいくつかの利点があるが、条件分布からサンプリングするには、観測データからの情報を近位最適化ステップに組み込む必要があり、最適化問題を何度も解く。これは計算コストのかかるフォワード作用素の逆問題では実現できないかもしれない。そこで本研究では, 無限次元ベイズ線形逆問題における後方分布を, 償却条件付きsdmを用いて学習する手法を提案する。特に、条件付き分母推定器は無限次元の条件付きスコアの一貫した推定器であることが証明される。 sdmを条件付き設定に拡張するには,条件付きスコアが無条件のスコアと相反する形で小さく吹き上がるため,ある程度の注意が必要である。また,観測の摂動に対する学習分布の堅牢性についても論じる。最後に、アプローチを検証する数値例を示し、さらなる洞察を提供する。 Since their first introduction, score-based diffusion models (SDMs) have been successfully applied to solve a variety of linear inverse problems in finite-dimensional vector spaces due to their ability to efficiently approximate the posterior distribution. However, using SDMs for inverse problems in infinite-dimensional function spaces has only been addressed recently and by learning the unconditional score. While this approach has some advantages, depending on the specific inverse problem at hand, in order to sample from the conditional distribution it needs to incorporate the information from the observed data with a proximal optimization step, solving an optimization problem numerous times. This may not be feasible in inverse problems with computationally costly forward operators. To address these limitations, in this work we propose a method to learn the posterior distribution in infinite-dimensional Bayesian linear inverse problems using amortized conditional SDMs. In particular, we prove that the conditional denoising estimator is a consistent estimator of the conditional score in infinite dimensions. We show that the extension of SDMs to the conditional setting requires some care because the conditional score typically blows up for small times contrarily to the unconditional score. We also discuss the robustness of the learned distribution against perturbations of the observations. We conclude by presenting numerical examples that validate our approach and provide additional insights.	翻訳日:2023-05-31 15:25:10 公開日:2023-05-28
# ASU-CNN:画像分類と特徴可視化のための効率的なディープアーキテクチャ ASU-CNN: An Efficient Deep Architecture for Image Classification and Feature Visualizations ( http://arxiv.org/abs/2305.19146v1 ) ライセンス: Link先を確認	Jamshaid Ul Rahman, Faiza Makhdoom, Dianchen Lu	(参考訳) 活性化関数はディープニューラルネットワークの能力を決定する上で決定的な役割を果たす。アクティベーション関数に関する以前の研究は、主にモノトニックまたは非振動関数の効用に焦点を当てていたが、Growing Cosine Unitが多くのアプリケーションでタブーを破るまで続いた。本稿では,最近設計されたアクティベーション関数 asu を利用した畳み込みニューラルネットワークモデルである asu-cnn を提案する。この非単調および振動関数の効果は、異なる畳み込み層から特徴写像の可視化を通して検証される。提案するネットワークの最適化はAdam氏が学習率の微調整で提供する。ネットワークはcifar-10の分類のためのトレーニングとテストの両方で有望な結果を得た。実験により,コンピュータビジョンの分野に関するタスクを実行するためのモデルの有効性と有効性を確認した。 Activation functions play a decisive role in determining the capacity of Deep Neural Networks as they enable neural networks to capture inherent nonlinearities present in data fed to them. The prior research on activation functions primarily focused on the utility of monotonic or non-oscillatory functions, until Growing Cosine Unit broke the taboo for a number of applications. In this paper, a Convolutional Neural Network model named as ASU-CNN is proposed which utilizes recently designed activation function ASU across its layers. The effect of this non-monotonic and oscillatory function is inspected through feature map visualizations from different convolutional layers. The optimization of proposed network is offered by Adam with a fine-tuned adjustment of learning rate. The network achieved promising results on both training and testing data for the classification of CIFAR-10. The experimental results affirm the computational feasibility and efficacy of the proposed model for performing tasks related to the field of computer vision.	翻訳日:2023-05-31 15:24:51 公開日:2023-05-28
# オンライン学習の現代的紹介 A Modern Introduction to Online Learning ( http://arxiv.org/abs/1912.13213v6 ) ライセンス: Link先を確認	Francesco Orabona	(参考訳) 本稿では,オンラインコンベックス最適化の現代的展望を通して,オンライン学習の基本概念を紹介する。ここでは、オンライン学習は最悪の仮定の下で後悔の最小化の枠組みを指す。ユークリッドおよび非ユークリッド環境において、凸損失を伴うオンライン学習のための1次および2次アルゴリズムを提案する。すべてのアルゴリズムは、オンラインミラー降下やフォロー・ザ・レギュラライズド・リーダーとその変種をインスタンス化したものである。特に,適応型およびパラメータフリーオンライン学習アルゴリズムを用いて,非有界領域におけるアルゴリズムのパラメータのチューニングと学習の問題に注目する。非凸損失は凸サーロゲート損失とランダム化によって処理される。バンディットの設定も簡単に議論され、逆境や確率的多腕バンディットの問題に触れている。これらのノートは凸解析の事前の知識を必要とせず、必要な数学的ツールはすべて厳密に説明されている。さらに、含まれている全ての証明は可能な限り単純で短いものに慎重に選択されている。 In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the included proofs have been carefully chosen to be as simple and as short as possible.	翻訳日:2023-05-31 05:33:17 公開日:2023-05-28
# FairCanary: 迅速な継続的説明可能なフェアネス FairCanary: Rapid Continuous Explainable Fairness ( http://arxiv.org/abs/2106.07057v4 ) ライセンス: Link先を確認	Avijit Ghosh, Aalok Shanbhag, Christo Wilson	(参考訳) 継続的モデル監視を提供するシステムは、(1)デプロイされた機械学習(ML)モデルと人工知能(AI)モデルの文書化された失敗、(2)これらのモデルに影響を与える新たな規制要件に対応して登場した。既存の監視システムは、デプロイされたMLモデルのパフォーマンスを継続的に追跡し、各予測に対する機能の重要性(説明)を計算し、開発者が創発的なモデルパフォーマンス問題の根本原因を特定するのに役立つ。 qdd(quantile demographic drift)は,分位数二分法を用いて部分群全体の予測分布の差を測定する,新しいモデルバイアス定量化指標である。 QDDは継続的な監視シナリオに最適であり、従来のしきい値ベースのバイアスメトリクスの統計的制限に悩まされず、結果ラベルを必要としない(実行時に利用できない可能性がある)。 QDDをFairCanaryと呼ばれる継続的モデル監視システムに組み込み、各予測毎に計算された既存の説明を再利用し、QDDバイアスメトリクスの説明を素早く計算します。この最適化により、FairCanaryは、機能レベルのバイアス説明を生成しようとする以前の作業よりも桁違いに高速になる。 Systems that offer continuous model monitoring have emerged in response to (1) well-documented failures of deployed Machine Learning (ML) and Artificial Intelligence (AI) models and (2) new regulatory requirements impacting these models. Existing monitoring systems continuously track the performance of deployed ML models and compute feature importance (a.k.a. explanations) for each prediction to help developers identify the root causes of emergent model performance problems. We present Quantile Demographic Drift (QDD), a novel model bias quantification metric that uses quantile binning to measure differences in the overall prediction distributions over subgroups. QDD is ideal for continuous monitoring scenarios, does not suffer from the statistical limitations of conventional threshold-based bias metrics, and does not require outcome labels (which may not be available at runtime). We incorporate QDD into a continuous model monitoring system, called FairCanary, that reuses existing explanations computed for each individual prediction to quickly compute explanations for the QDD bias metrics. This optimization makes FairCanary an order of magnitude faster than previous work that has tried to generate feature-level bias explanations.	翻訳日:2023-05-31 05:00:11 公開日:2023-05-28
# 2層ワイドニューラルネットワークを用いた平均正方形誤差回帰に対するグラディエントDescentのインプリシトバイアス Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks ( http://arxiv.org/abs/2006.07356v5 ) ライセンス: Link先を確認	Hui Jin, Guido Mont\'ufar	(参考訳) 広帯域ニューラルネットワークの勾配降下訓練とそれに対応する関数空間の暗黙バイアスについて検討する。不定回帰の場合、幅=n$の浅いreluネットワークをトレーニングする解は、トレーニングデータに適合する関数の$n^{- 1/2}$以内であり、その初期関数との差は、ネットワークパラメータの初期化に使用される確率分布に依存する曲率ペナルティによって重み付けられた第2導関数の最小の2-ノルムである。様々な共通初期化手順の曲率ペナルティ関数を明示的に計算する。例えば、一様分布を持つ非対称初期化は一定曲率のペナルティをもたらし、従って解関数は訓練データの自然な立方体スプライン補間である。確率的勾配降下では、同じ暗黙のバイアス結果が得られる。 } 異なるアクティベーション関数に対して同様の結果が得られる。多変量回帰に対しては類似の結果を示し、第二微分は分数ラプラシアンのラドン変換に置き換えられる。一定のペナルティ関数をもたらす初期化スキームに対して、解は多調和スプラインである。また, トレーニングトラジェクタを平滑化スプラインの軌道に捕捉し, 正則化強度を低下させることを示した。 We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, we show that the solution of training a width-$n$ shallow ReLU network is within $n^{- 1/2}$ of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curvature penalty that depends on the probability distribution that is used to initialize the network parameters. We compute the curvature penalty function explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. \hj{For stochastic gradient descent we obtain the same implicit bias result.} We obtain a similar result for different activation functions. For multivariate regression we show an analogous result, whereby the second derivative is replaced by the Radon transform of a fractional Laplacian. For initialization schemes that yield a constant penalty function, the solutions are polyharmonic splines. Moreover, we show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength.	翻訳日:2023-05-31 04:56:34 公開日:2023-05-28
# 量子安全な非可逆抽出器 Quantum secure non-malleable-extractors ( http://arxiv.org/abs/2109.03097v4 ) ライセンス: Link先を確認	Naresh Goud Boddu, Rahul Jain, Upendra Kapshikar	(参考訳) 我々は、いくつかの明示的な量子セキュアな非可算抽出器を構成する。私たちが構築した量子安全な非可算抽出子は、Chattopadhyay, Goyal and Li [2015] と Cohen [2015] による構成に基づいている。 1) (ソース) min-entropy $k \geq \textsf{poly}\left(\log \left( \frac{n}{\epsilon} \right)\right)$ (n$ はソースの長さであり、$\epsilon$ はエラーパラメータである)。これまでaggarwal, chung, lin, vidick [2019] は、li [2012] が提案した内積ベースの非可算抽出器は量子安定であることを示したが、それは線形(n$)のミンエントロピーと種子長を必要とした。非可算抽出元とプライバシ増幅(cohen and vidick [2017] による量子設定で最初に確立された)の接続を使って、[2019] のためにプロトコルが要求する線形通信によって指数関数的に改善される、[2019] による通信$\textsf{poly}\left(\log \left( \frac{n}{\epsilon} \right)\right)$ でアクティブな量子敵に対してセキュアな、2ドルのプライバシ増幅プロトコルが得られる。 2) ミンエントロピー$k \geq n-n^{\Omega(1)}$に対して、明示的な量子セキュアな2$2-ソース非可換抽出器を構築し、大きさが$n^{\Omega(1)}$と誤差が$2^{-n^{\Omega(1)}}$とする。 3) 入力の改ざんが$t$-times で行われる場合の自然拡張についても検討した。我々は、シードされた(t=d^{\Omega(1)}$)および2$ソースケース(t=n^{\Omega(1)}$)に対して、明示的な量子セキュアな$t$-非可算抽出器を構築する。 We construct several explicit quantum secure non-malleable-extractors. All the quantum secure non-malleable-extractors we construct are based on the constructions by Chattopadhyay, Goyal and Li [2015] and Cohen [2015]. 1) We construct the first explicit quantum secure non-malleable-extractor for (source) min-entropy $k \geq \textsf{poly}\left(\log \left( \frac{n}{\epsilon} \right)\right)$ ($n$ is the length of the source and $\epsilon$ is the error parameter). Previously Aggarwal, Chung, Lin, and Vidick [2019] have shown that the inner-product based non-malleable-extractor proposed by Li [2012] is quantum secure, however it required linear (in $n$) min-entropy and seed length. Using the connection between non-malleable-extractors and privacy amplification (established first in the quantum setting by Cohen and Vidick [2017]), we get a $2$-round privacy amplification protocol that is secure against active quantum adversaries with communication $\textsf{poly}\left(\log \left( \frac{n}{\epsilon} \right)\right)$, exponentially improving upon the linear communication required by the protocol due to [2019]. 2) We construct an explicit quantum secure $2$-source non-malleable-extractor for min-entropy $k \geq n- n^{\Omega(1)}$, with an output of size $n^{\Omega(1)}$ and error $2^{- n^{\Omega(1)}}$. 3) We also study their natural extensions when the tampering of the inputs is performed $t$-times. We construct explicit quantum secure $t$-non-malleable-extractors for both seeded ($t=d^{\Omega(1)}$) as well as $2$-source case ($t=n^{\Omega(1)}$).	翻訳日:2023-05-31 04:48:40 公開日:2023-05-28
# pvCNN:プライバシ保護と検証可能な畳み込みニューラルネットワークテスト pvCNN: Privacy-Preserving and Verifiable Convolutional Neural Network Testing ( http://arxiv.org/abs/2201.09186v3 ) ライセンス: Link先を確認	Jiasi Weng and Jian Weng and Gui Tang and Anjia Yang and Ming Li and Jia-Nan Liu	(参考訳) 本稿では,CNNモデル開発者が,モデルプライバシを尊重しつつ,複数のテスタの公開データよりも真正なCNNパフォーマンスをユーザに納得させることのできる,プライバシ保護と検証可能な畳み込みニューラルネットワーク(CNN)テストのための新しいアプローチを提案する。セキュリティと効率の両立を図るため、同型暗号化(HE)とゼロ知識簡潔な知識の非対話的議論(zk-SNARK)をCNNテストと適切に統合することで、3つの新しい取り組みを行う。まず、テスト対象のCNNモデルを、モデル開発者がローカルに保持するプライベート部分と、外部サーバにアウトソースされたパブリック部分に戦略的に分割する。そして、プライベート部は、テスタが送信したHE保護されたテストデータ上で動作し、その出力を公開部へ送信し、その後のCNNテストの計算を行う。第2に、上記のcnnテストの正確性は、2次元(2次元)畳み込み操作における証明オーバーヘッドの最適化に重点を置いて、zk-snarkベースの証明を生成することによって実現される。具体的には,複数のフィルタと入力間の2次元畳み込み演算をバッチ方式で表現する単一の乗算ゲートを持つ,新しい二次行列演算回路(qmps)を提案する。第3に、同一のcnnモデルに対して複数の証明を集約し、異なるテストデータ(すなわち異なるステートメント)を1つの証明に集約し、集約された証明の妥当性が元の複数の証明の妥当性を示すことを保証する。最後に,我々のqmps ベースの zk-snark は,既存の qaps ベースの zk-snark よりも約 13.9$\times$fast であり,高次元行列乗算では 17.6$\times$fast であることを示した。 This paper proposes a new approach for privacy-preserving and verifiable convolutional neural network (CNN) testing, enabling a CNN model developer to convince a user of the truthful CNN performance over non-public data from multiple testers, while respecting model privacy. To balance the security and efficiency issues, three new efforts are done by appropriately integrating homomorphic encryption (HE) and zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK) primitives with the CNN testing. First, a CNN model to be tested is strategically partitioned into a private part kept locally by the model developer, and a public part outsourced to an outside server. Then, the private part runs over HE-protected test data sent by a tester and transmits its outputs to the public part for accomplishing subsequent computations of the CNN testing. Second, the correctness of the above CNN testing is enforced by generating zk-SNARK based proofs, with an emphasis on optimizing proving overhead for two-dimensional (2-D) convolution operations, since the operations dominate the performance bottleneck during generating proofs. We specifically present a new quadratic matrix programs (QMPs)-based arithmetic circuit with a single multiplication gate for expressing 2-D convolution operations between multiple filters and inputs in a batch manner. Third, we aggregate multiple proofs with respect to a same CNN model but different testers' test data (i.e., different statements) into one proof, and ensure that the validity of the aggregated proof implies the validity of the original multiple proofs. Lastly, our experimental results demonstrate that our QMPs-based zk-SNARK performs nearly 13.9$\times$faster than the existing QAPs-based zk-SNARK in proving time, and 17.6$\times$faster in Setup time, for high-dimension matrix multiplication.	翻訳日:2023-05-31 04:40:21 公開日:2023-05-28
# VHR画像道路抽出のための強いコンテクストエンコーダを実現するスイニングトランスフォーマー結合CNN Swin Transformer coupling CNNs Makes Strong Contextual Encoders for VHR Image Road Extraction ( http://arxiv.org/abs/2201.03178v2 ) ライセンス: Link先を確認	Tao Chen, Yiran Liu, Haoyu Jiang, Ruirui Li	(参考訳) 正確なセグメンテーション道路は、クラス内の変化、クラス間の違い、シャドウ、木、建物によって引き起こされる閉塞などにより困難である。これらの課題に対処するためには、重要なテクスチャの詳細への注意とグローバルな幾何学的文脈情報の認識が不可欠である。近年の研究では、CNN-Transformerハイブリッド構造は、CNNまたはTransformer単独でより優れていることが示されている。 cnnは局所的な細部特徴の抽出に優れているが、transformerは自然にグローバルな文脈情報を知覚する。本稿では,道路抽出タスクにresnetとswintransformersを組み合わせた2分岐ネットワークブロックconswinを提案する。このConSwinブロックは、両方のアプローチの長所を利用して、より詳細な特徴とグローバルな特徴を抽出する。コンスウィンに基づき,砂時計型道路抽出ネットワークを構築し,テクスチャや構造詳細情報をデコーダに伝達する2つの新しい接続構造を導入する。提案手法は,マサチューセッツおよびCHN6-CUGデータセットの精度,IOU,F1インジケータにおいて,最先端の手法よりも優れている。さらに,提案モジュールの有効性を検証し,可視化の結果から道路の表現性の向上が示された。 Accurately segmenting roads is challenging due to substantial intra-class variations, indistinct inter-class distinctions, and occlusions caused by shadows, trees, and buildings. To address these challenges, attention to important texture details and perception of global geometric contextual information are essential. Recent research has shown that CNN-Transformer hybrid structures outperform using CNN or Transformer alone. While CNN excels at extracting local detail features, the Transformer naturally perceives global contextual information. In this paper, we propose a dual-branch network block named ConSwin that combines ResNet and SwinTransformers for road extraction tasks. This ConSwin block harnesses the strengths of both approaches to better extract detailed and global features. Based on ConSwin, we construct an hourglass-shaped road extraction network and introduce two novel connection structures to better transmit texture and structural detail information to the decoder. Our proposed method outperforms state-of-the-art methods on both the Massachusetts and CHN6-CUG datasets in terms of overall accuracy, IOU, and F1 indicators. Additional experiments validate the effectiveness of our proposed module, while visualization results demonstrate its ability to obtain better road representations.	翻訳日:2023-05-31 04:39:08 公開日:2023-05-28
# HeterPS:異種環境における強化学習に基づくスケジューリングによる分散ディープラーニング HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments ( http://arxiv.org/abs/2111.10635v3 ) ライセンス: Link先を確認	Ji Liu, Zhihua Wu, Dianhai Yu, Yanjun Ma, Danlei Feng, Minxu Zhang, Xinxuan Wu, Xuefeng Yao, Dejing Dou	(参考訳) ディープニューラルネットワーク(DNN)は多くのレイヤと多数のパラメータを利用して優れたパフォーマンスを実現する。 dnnモデルのトレーニングプロセスは一般的に、多くのスパースな機能を持つ大規模な入力データを処理し、高い入出力(io)コストを発生させるが、いくつかの層は計算集約的である。トレーニングプロセスは一般的に分散コンピューティングリソースを利用してトレーニング時間を短縮する。さらに、分散トレーニングプロセスには、CPU、複数のタイプのGPUなどの異種コンピューティングリソースが利用できる。したがって、トレーニングプロセスにおいて、多様なコンピューティングリソースに対する複数のレイヤのスケジューリングが重要となる。異種計算資源を用いてDNNモデルを効率的に訓練するために,分散アーキテクチャと強化学習(RL)に基づくスケジューリング手法からなる分散フレームワークであるPaddle-Heterogeneous Parameter Server(Paddle-HeterPS)を提案する。 Paddle-HeterPSの利点は、既存のフレームワークと比べて3倍である。まず、Paddle-HeterPSは異種コンピューティングリソースを用いた多様なワークロードの効率的なトレーニングプロセスを実現する。第二に、Paddle-HeterPS は RL ベースの手法を利用して、スループットの制約を満たしながらコストを最小限に抑えるため、各レイヤのワークロードを適切な計算リソースに効率的にスケジュールする。第3に、Paddle-HeterPSは分散コンピューティングリソース間のデータストレージとデータ通信を管理する。我々は、パドル・ヘターPSがスループット(14.5倍)と金銭的コスト(312.3%以下)で最先端のアプローチを著しく上回ることを示す広範な実験を行った。フレームワークのコードは、https://github.com/PaddlePaddle/Paddle.comで公開されている。 Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high Input/Output (IO) cost, while some layers are compute-intensive. The training process generally exploits distributed computing resources to reduce training time. In addition, heterogeneous computing resources, e.g., CPUs, GPUs of multiple types, are available for the distributed training process. Thus, the scheduling of multiple layers to diverse computing resources is critical for the training process. To efficiently train a DNN model using the heterogeneous computing resources, we propose a distributed framework, i.e., Paddle-Heterogeneous Parameter Server (Paddle-HeterPS), composed of a distributed architecture and a Reinforcement Learning (RL)-based scheduling method. The advantages of Paddle-HeterPS are three-fold compared with existing frameworks. First, Paddle-HeterPS enables efficient training process of diverse workloads with heterogeneous computing resources. Second, Paddle-HeterPS exploits an RL-based method to efficiently schedule the workload of each layer to appropriate computing resources to minimize the cost while satisfying throughput constraints. Third, Paddle-HeterPS manages data storage and data communication among distributed computing resources. We carry out extensive experiments to show that Paddle-HeterPS significantly outperforms state-of-the-art approaches in terms of throughput (14.5 times higher) and monetary cost (312.3% smaller). The codes of the framework are publicly available at: https://github.com/PaddlePaddle/Paddle.	翻訳日:2023-05-31 04:37:21 公開日:2023-05-28
# 視覚言語事前学習モデルは合成可能な原始概念を学ぶか? Do Vision-Language Pretrained Models Learn Composable Primitive Concepts? ( http://arxiv.org/abs/2203.17271v3 ) ライセンス: Link先を確認	Tian Yun, Usha Bhalla, Ellie Pavlick, Chen Sun	(参考訳) 視覚言語(VL)事前訓練されたモデルは、マルチモーダル推論とゼロショット認識タスクにおいて印象的な性能を達成した。これらのVLモデルの多くは、未ラベルの画像とインターネットからのキャプションペアで事前訓練されている。本稿では,プリミティブな概念の表現 – 色や形状,対象部品の属性など – が,これらの事前学習されたVLモデルに自動的に組み込まれるかを検討する。そこで本研究では,合成概念マッピング(compmap)という2段階の枠組みを提案する。 CompMapはまず、テキストプロンプトでプリミティブな概念アクティベーションを生成するためにVLモデルを求め、続いて、プリミティブな概念アクティベーション(例えば、ブラックテールやレッドウィング)を複合的な概念(例えば、赤翼のブラックバード)にマッピングするコンポジションモデルを構築することを学ぶ。構成モデルは基礎的真理の原始概念から確実に学習できることを示す。したがって、プリミティブな概念が実際にVL事前学習モデルに現れるなら、そのプリミティブな概念アクティベーションは、専門家が設計したような構成モデルを学ぶのに使用できる。類似度を測定するための定量的指標を提案し,その計量を解釈可能性計量と呼ぶ。また,プリミティブ概念アクティベーションと学習合成モデルを用いて複合概念を予測した場合の分類精度を測定し,有用指標として参照する。本研究は,最先端のvlプリトレーニングモデルが,cubデータセットのきめ細かなビジュアル認識や,mit-statesデータセットの合成一般化タスクに非常に有用なプリミティブ概念を学習することを明らかにする。しかし,我々は,学習構成モデルが定性解析において低い解釈性を有することを観察した。本結果は,既存のVLモデルの限界と,プリミティブな概念の獲得を促す事前学習の必要性を明らかにする。 Vision-language (VL) pretrained models have achieved impressive performance on multimodal reasoning and zero-shot recognition tasks. Many of these VL models are pretrained on unlabeled image and caption pairs from the internet. In this paper, we study whether representations of primitive concepts--such as colors, shapes, or the attributes of object parts--emerge automatically within these pretrained VL models. We propose a two-step framework, Compositional Concept Mapping (CompMap), to investigate this. CompMap first asks a VL model to generate primitive concept activations with text prompts, and then learns to construct a composition model that maps the primitive concept activations (e.g. the likelihood of black tail or red wing) to composite concepts (e.g. a red-winged blackbird). We show that a composition model can be reliably learn from ground truth primitive concepts. We thus hypothesize that if primitive concepts indeed emerge in a VL pretrained model, its primitive concept activations can be used to learn a composition model similar to the one designed by experts. We propose a quantitative metric to measure the degree of similarity, and refer to the metric as the interpretability metric. We also measure the classification accuracy when using the primitive concept activations and the learned composition model to predict the composite concepts, and refer to it as the usefulness metric. Our study reveals that state-of-the-art VL pretrained models learn primitive concepts that are highly useful for fine-grained visual recognition on the CUB dataset, and compositional generalization tasks on the MIT-States dataset. However, we observe that the learned composition models have low interpretability in our qualitative analyses. Our results reveal the limitations of existing VL models, and the necessity of pretraining objectives that encourage the acquisition of primitive concepts.	翻訳日:2023-05-31 04:29:45 公開日:2023-05-28
# 顎・高齢者の音声認識におけるオンザフライ特徴に基づくラピッド話者適応 On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition ( http://arxiv.org/abs/2203.14593v3 ) ライセンス: Link先を確認	Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu	(参考訳) 関節症と高齢者の発話の正確な認識は、いまだに難しい課題である。アクセントや性別に起因する話者レベルの不均質性は、年齢や言語障害を伴うと、これらの話者の間に大きな多様性を生み出す。話者レベルのデータの不足は、データ集約型モデルに基づく話者適応手法の実用化を制限する。そこで本研究では、分散規則化スペクトルベース埋め込み(SVR)とスペクトル特徴駆動f-LHUC変換という、2つの新しいデータ効率・特徴量に基づくオンザフライ話者適応手法を提案する。 UASpeech dysarthric と DementiaBank Pitt の高齢者音声コーパスを用いて行った実験では、提案されたオンザフライ話者適応アプローチは、統計学的に有意な WER の 2.48%-2.85% の絶対 (7.92%-8.06% ) と、オフラインモデルに基づく LHUC の 1.82% の絶対 (5.63% の相対) の適応により、ベースライン iVector によるハイブリッド DNN/TDNN と E2E コンフォーマーシステムより一貫して優れていることが示唆された。 Accurate recognition of dysarthric and elderly speech remain challenging tasks to date. Speaker-level heterogeneity attributed to accent or gender, when aggregated with age and speech impairment, create large diversity among these speakers. Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods. To this end, this paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods: variance-regularized spectral basis embedding (SVR) and spectral feature driven f-LHUC transforms. Experiments conducted on UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest the proposed on-the-fly speaker adaptation approaches consistently outperform baseline iVector adapted hybrid DNN/TDNN and E2E Conformer systems by statistically significant WER reduction of 2.48%-2.85% absolute (7.92%-8.06% relative), and offline model based LHUC adaptation by 1.82% absolute (5.63% relative) respectively.	翻訳日:2023-05-31 04:28:15 公開日:2023-05-28
# ディエンス検索のためのテスト時間クエリ表現の最適化 Optimizing Test-Time Query Representations for Dense Retrieval ( http://arxiv.org/abs/2205.12680v3 ) ライセンス: Link先を確認	Mujeen Sung, Jungsoo Park, Jaewoo Kang, Danqi Chen, Jinhyuk Lee	(参考訳) 高密度検索の最近の進展は,事前学習されたクエリとコンテキストエンコーダからのクエリとコンテキストの品質表現に依存している。本稿では,テスト時検索結果からの信号により誘導されるインスタンスレベルのクエリ表現をさらに最適化する tour (test-time optimization of query representations) を提案する。クロスエンコーダの再ランク付けを利用して,検索結果にきめ細かい擬似ラベルを提供し,勾配降下を伴うクエリ表現を反復的に最適化する。理論的解析により,TOURは疑似関連性フィードバックのための古典的ロッキオアルゴリズムの一般化と見なすことができ,擬似ラベルをハードバイナリあるいはソフト連続ラベルとして活用する2つの変種を示す。提案する句再ランク付け器を用いて,まず句検索に tour を適用し,本手法の有効性を評価した。 TOURは、エンドツーエンドのオープンドメイン質問応答精度を大幅に向上し、また、経路検索性能も向上する。さらにTOURは、より効率的な実装で1.3-2.4倍高速に実行しながら、最大2.0%のダイレクトリランクを改善する。 Recent developments of dense retrieval rely on quality representations of queries and contexts from pre-trained query and context encoders. In this paper, we introduce TOUR (Test-Time Optimization of Query Representations), which further optimizes instance-level query representations guided by signals from test-time retrieval results. We leverage a cross-encoder re-ranker to provide fine-grained pseudo labels over retrieval results and iteratively optimize query representations with gradient descent. Our theoretical analysis reveals that TOUR can be viewed as a generalization of the classical Rocchio algorithm for pseudo relevance feedback, and we present two variants that leverage pseudo-labels as hard binary or soft continuous labels. We first apply TOUR on phrase retrieval with our proposed phrase re-ranker, and also evaluate its effectiveness on passage retrieval with an off-the-shelf re-ranker. TOUR greatly improves end-to-end open-domain question answering accuracy, as well as passage retrieval performance. TOUR also consistently improves direct re-ranking by up to 2.0% while running 1.3-2.4x faster with an efficient implementation.	翻訳日:2023-05-31 04:19:21 公開日:2023-05-28
# MVP: 自然言語生成のためのマルチタスク事前トレーニング MVP: Multi-task Supervised Pre-training for Natural Language Generation ( http://arxiv.org/abs/2206.12131v3 ) ライセンス: Link先を確認	Tianyi Tang, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen	(参考訳) プレトレーニング言語モデル(PLM)は自然言語生成(NLG)タスクにおいて顕著な成功を収めた。現在、ほとんどのNPG指向のPLMは、大規模汎用コーパスを用いて教師なしで事前訓練されている。一方、ラベル付きデータ(例えば「教師付き事前訓練」)で事前訓練されたモデルの数は、教師なし事前訓練モデルよりも優れた性能を示している。教師付き事前学習の成功に触発され,自然言語生成のためのマルチタスク・スーパーバイザド・プレトレーニング(MVP)を提案する。大規模な自然言語生成コーパスであるMVPCorpusを、17ドルのデータセットから11ドルの多様なNLGタスクから収集しています。次に、これらの例を一般的なテキスト・トゥ・テキスト形式に統一し、テキスト生成モデルMVPを教師付きで事前訓練する。各タスクについて、特定のタスクを実行するモデルのキャパシティを刺激する、特定のソフトプロンプトを事前トレーニングします。我々のMVPモデルは、比較的小さなPLM上での最近の命令チューニングを利用する実践と見なすことができる。広範な実験により、多数のnlgタスクにおけるmvpモデルの有効性と汎用性を実証し、17ドルのデータセットから13ドルの最先端のパフォーマンスを実現し、bartを9.3\%$、flan-t5を5.8\%$で上回った。 Pre-trained language models (PLMs) have achieved remarkable success in natural language generation (NLG) tasks. Up to now, most NLG-oriented PLMs are pre-trained in an unsupervised manner using the large-scale general corpus. In the meanwhile, an increasing number of models pre-trained with labeled data (i.e. "supervised pre-training") showcase superior performance compared to unsupervised pre-trained models. Motivated by the success of supervised pre-training, we propose Multi-task superVised Pre-training (MVP) for natural language generation. We collect a large-scale natural language generation corpus, MVPCorpus, from $77$ datasets over $11$ diverse NLG tasks. Then we unify these examples into a general text-to-text format to pre-train the text generation model MVP in a supervised manner. For each task, we further pre-train specific soft prompts to stimulate the model's capacity to perform a specific task. Our MVP model can be seen as a practice that utilizes recent instruction tuning on relatively small PLMs. Extensive experiments have demonstrated the effectiveness and generality of our MVP model in a number of NLG tasks, which achieves state-of-the-art performance on $13$ out of $17$ datasets, outperforming BART by $9.3\%$ and Flan-T5 by $5.8\%$.	翻訳日:2023-05-31 04:10:47 公開日:2023-05-28
# 小脳分離のための信頼誘導型教師なしドメイン適応 Confidence-Guided Unsupervised Domain Adaptation for Cerebellum Segmentation ( http://arxiv.org/abs/2206.10357v2 ) ライセンス: Link先を確認	Xuan Li, Paule-J Toussaint, Alan Evans, and Xue Liu	(参考訳) 小脳の包括的高分解能アトラスの欠如は、正常な脳機能と疾患に対する小脳の関与の研究を妨げている。小脳皮質の葉のきつい側面のよい表現は、非常に複雑な表面とそれが手動の起伏に要する時間のために達成し難い。手動セグメンテーションの品質は人間の専門家による判断に影響され、自動ラベリングは既存のセグメンテーションアルゴリズムの限られた堅牢性によって制限される。 20umisotropic BigBrain データセットは、磁気共鳴イメージングによって得られる 1000um(1mm) の解像度と比較して、セマンティックセグメンテーションのための前例のない高解像度のフレームワークを提供する。手動アノテーション要件を不要にするために,allen brain human brain atlasの小脳からbigbrainへのアノテーションを教師なしの方法で適応的に伝達するモデルを訓練することを提案する。アレン脳とBigBrainの視覚的相違は、有意義なセグメンテーションマスクを提供する既存のアプローチや、BigBrainデータの分割と組織学的スライス作成によるアーティファクトの提供を妨げている。これらの問題に対処するために,まずアレン脳小脳を大脳と視覚の類似性を共有する空間に移す2段階の枠組みを提案する。次に,疑似ラベルからモデル学習を反復的に導くために,信頼度マップを用いた自己学習戦略を導入する。定量的実験により, 他の手法と比較して2.6%以上の損失低減が可能であることが判明した。 The lack of a comprehensive high-resolution atlas of the cerebellum has hampered studies of cerebellar involvement in normal brain function and disease. A good representation of the tightly foliated aspect of the cerebellar cortex is difficult to achieve because of the highly convoluted surface and the time it would take for manual delineation. The quality of manual segmentation is influenced by human expert judgment, and automatic labelling is constrained by the limited robustness of existing segmentation algorithms. The 20umisotropic BigBrain dataset provides an unprecedented high resolution framework for semantic segmentation compared to the 1000um(1mm) resolution afforded by magnetic resonance imaging. To dispense with the manual annotation requirement, we propose to train a model to adaptively transfer the annotation from the cerebellum on the Allen Brain Human Brain Atlas to the BigBrain in an unsupervised manner, taking into account the different staining and spacing between sections. The distinct visual discrepancy between the Allen Brain and BigBrain prevents existing approaches to provide meaningful segmentation masks, and artifacts caused by sectioning and histological slice preparation in the BigBrain data pose an extra challenge. To address these problems, we propose a two-stage framework where we first transfer the Allen Brain cerebellum to a space sharing visual similarity with the BigBrain. We then introduce a self-training strategy with a confidence map to guide the model learning from the noisy pseudo labels iteratively. Qualitative results validate the effectiveness of our approach, and quantitative experiments reveal that our method can achieve over 2.6% loss reduction compared with other approaches.	翻訳日:2023-05-31 04:09:45 公開日:2023-05-28
# 測定依存性と隠蔽性のトレードオフ関係としての緩和ベル不等式 Relaxed Bell inequalities as a trade-off relation between measurement dependence and hiddenness ( http://arxiv.org/abs/2206.06196v3 ) ライセンス: Link先を確認	Gen Kimura, Yugo Susuki and Kei Morisue	(参考訳) ベルの不等式に反する量子相関は、任意の(測定独立な)局所隠れ変数理論では説明できない。しかし、この違反は、現実、局所性、測定の独立性という基礎となる仮定の不一致を暗示し、各仮定が定量的に違反する程度を扱わない。対照的に、ホール (2010, 2011) はそれぞれの仮定を定量化し、基礎となる仮定の間のトレードオフ関係を与えるベル-CHSH不等式を一般化した。本稿では,隠蔽変数(隠蔽性)の定量化を導入し,任意の局所隠蔽変数理論に当てはまる隠蔽変数と測定依存性との間の新たなトレードオフ関係を導出する。 Quantum correlations that violate the Bell inequality cannot be explained by any (measurement independent) local hidden variable theory. However, the violation only implies incompatibility of the underlying assumptions of reality, locality, and measurement independence, and does not address the extent to which each assumption is violated quantitatively. In contrast, Hall (2010,2011) gave a quantification of each assumption and generalized the Bell-CHSH inequality that gives a trade-off relationship between the underlying assumptions. In this paper, we introduce a quantification of hidden variables (hiddenness) and derive a new trade-off relation between the hiddenness and the measurement dependency that holds for any local hidden variable theory.	翻訳日:2023-05-31 04:09:17 公開日:2023-05-28
# 正極性ラベルからのマルチラベルサンプルのマイニング Mining Multi-Label Samples from Single Positive Labels ( http://arxiv.org/abs/2206.05764v4 ) ライセンス: Link先を確認	Youngin Cho, Daejin Kim, Mohammad Azam Khan, Jaegul Choo	(参考訳) cgans (conditional generative adversarial networks) はクラス条件生成タスクにおいて優れた結果を示している。複数の条件を同時に制御するために、cGANは複数のラベルのトレーニングデータセットを必要とする。それでも、膨大なアノテーションコストは、実世界のシナリオにおけるマルチラベルデータセットのアクセシビリティを制限する。そこで本研究では,各データインスタンスに明示的な負のラベルを持たない1つの正のラベルをアノテートする,単一正の設定という実践的設定について検討する。単一正の設定でマルチラベルデータを生成するために,マルコフ連鎖モンテカルロ法に基づいて,シングル・トゥ・マルチラベル(s2m)サンプリングと呼ばれる新しいサンプリング手法を提案する。提案するs2mサンプリング手法により,既存の無条件および条件付きganが最小限のアノテーションコストで高品質なマルチラベルデータを描画できる。実画像データセットに対する大規模な実験は、完全に注釈付きデータセットで訓練されたモデルと比較しても、我々の手法の有効性と正確性を検証する。 Conditional generative adversarial networks (cGANs) have shown superior results in class-conditional generation tasks. To simultaneously control multiple conditions, cGANs require multi-label training datasets, where multiple labels can be assigned to each data instance. Nevertheless, the tremendous annotation cost limits the accessibility of multi-label datasets in real-world scenarios. Therefore, in this study we explore the practical setting called the single positive setting, where each data instance is annotated by only one positive label with no explicit negative labels. To generate multi-label data in the single positive setting, we propose a novel sampling approach called single-to-multi-label (S2M) sampling, based on the Markov chain Monte Carlo method. As a widely applicable "add-on" method, our proposed S2M sampling method enables existing unconditional and conditional GANs to draw high-quality multi-label data with a minimal annotation cost. Extensive experiments on real image datasets verify the effectiveness and correctness of our method, even when compared to a model trained with fully annotated datasets.	翻訳日:2023-05-31 04:08:35 公開日:2023-05-28
# 複合超解像と逆トーン・マッピング:特徴分解集約ネットワークと新しいベンチマーク Joint Super-Resolution and Inverse Tone-Mapping: A Feature Decomposition Aggregation Network and A New Benchmark ( http://arxiv.org/abs/2207.03367v2 ) ライセンス: Link先を確認	Gang Xu (1), Yu-chen Yang (1), Liang Wang (2), Jun Xu (1), Xian-Tong Zhen (3) ((1) Nankai University, (2) Institute of Automation, CAS, (3) Guangdong University of Petrochemical Technology)	(参考訳) 超解像と逆トーン・マッピング(交叉SR-ITM)は,低解像度および標準ダイナミックレンジ画像の解像度とダイナミックレンジの向上を目的としている。最近のネットワークは主に複雑なマルチブランチアーキテクチャによる画像分解技術に依存している。しかし、固定分解技術は多彩な画像に対するパワーをほとんど制限する。本稿では,分解機構の潜在的な力を利用するために,画像領域からより広い特徴領域へ一般化する。そこで本稿では,軽量な特徴分解集約ネットワーク(fdan)を提案する。特に,特徴分解ブロック(FDB)を設計して,詳細と基本特徴マップの学習可能な分離を実現し,FDBをカスケードして階層的特徴分解グループを構築する。さらに、比較手法をよりよく評価するために、ロバストモデルトレーニングと評価のための汎用シナリオを提供する共同SR-ITM、すなわちSRITM-4Kの大規模データセットを収集する。 2つのベンチマークデータセットによる実験結果から、FDANは効率的で、関節SR-ITMの最先端手法よりも優れていることが示された。 FDANとSRITM-4Kデータセットのコードはhttps://github.com/CS-GangXu/FDANで公開されている。 Joint Super-Resolution and Inverse Tone-Mapping (joint SR-ITM) aims to increase the resolution and dynamic range of low-resolution and standard dynamic range images. Recent networks mainly resort to image decomposition techniques with complex multi-branch architectures. However, the fixed decomposition techniques would largely restricts their power on versatile images. To exploit the potential power of decomposition mechanism, in this paper, we generalize it from the image domain to the broader feature domain. To this end, we propose a lightweight Feature Decomposition Aggregation Network (FDAN). In particular, we design a Feature Decomposition Block (FDB) to achieve learnable separation of detail and base feature maps, and develop a Hierarchical Feature Decomposition Group by cascading FDBs for powerful multi-level feature decomposition. Moreover, to better evaluate the comparison methods, we collect a large-scale dataset for joint SR-ITM, i.e., SRITM-4K, which provides versatile scenarios for robust model training and evaluation. Experimental results on two benchmark datasets demonstrate that our FDAN is efficient and outperforms state-of-the-art methods on joint SR-ITM. The code of our FDAN and the SRITM-4K dataset are available at https://github.com/CS-GangXu/FDAN.	翻訳日:2023-05-31 04:01:33 公開日:2023-05-28
# パラメトリック方程式発見のための深層学習と記号回帰 Deep Learning and Symbolic Regression for Discovering Parametric Equations ( http://arxiv.org/abs/2207.00529v2 ) ライセンス: Link先を確認	Michael Zhang, Samuel Kim, Peter Y. Lu, Marin Solja\v{c}i\'c	(参考訳) シンボリック回帰(symbolive regression)は、データの制御公式を学習し、科学的発見を変革する可能性を持つ機械学習技術である。しかし、シンボリック回帰は、解析できるシステムの複雑さと次元性にはまだ制限がある。一方、ディープラーニングは、非常に複雑で高次元のデータセットを解析する能力に機械学習を変革した。本稿では,ある係数が変化するが基礎となる支配方程式の構造が一定であるパラメトリックシステムにシンボリック回帰を拡張するニューラルネットワークアーキテクチャを提案する。本稿では,様々な解析式,ODE,PDEを様々な係数で表し,トレーニング領域の外によく外挿されていることを示す。ニューラルネットワークベースのアーキテクチャは、他のディープラーニングアーキテクチャとも統合でき、エンドツーエンドのトレーニングを受けたまま、高次元データを分析できる。この目的のために、アーキテクチャを畳み込みニューラルネットワークと統合し、様々なスプリングシステムの1次元画像を分析する。 Symbolic regression is a machine learning technique that can learn the governing formulas of data and thus has the potential to transform scientific discovery. However, symbolic regression is still limited in the complexity and dimensionality of the systems that it can analyze. Deep learning on the other hand has transformed machine learning in its ability to analyze extremely complex and high-dimensional datasets. We propose a neural network architecture to extend symbolic regression to parametric systems where some coefficient may vary but the structure of the underlying governing equation remains constant. We demonstrate our method on various analytic expressions, ODEs, and PDEs with varying coefficients and show that it extrapolates well outside of the training domain. The neural network-based architecture can also integrate with other deep learning architectures so that it can analyze high-dimensional data while being trained end-to-end. To this end we integrate our architecture with convolutional neural networks to analyze 1D images of varying spring systems.	翻訳日:2023-05-31 03:59:55 公開日:2023-05-28
# 自然言語生成のためのジョイントジェネレータ・ランカー学習 Joint Generator-Ranker Learning for Natural Language Generation ( http://arxiv.org/abs/2206.13974v3 ) ライセンス: Link先を確認	Weizhou Shen, Yeyun Gong, Yelong Shen, Song Wang, Xiaojun Quan, Nan Duan, Weizhu Chen	(参考訳) Generate-then-rankはテキスト生成のための広く使われているメカニズムであり、ジェネレータは複数のテキスト候補を生成し、ローダはテキスト候補の中で最良のものを選択する。しかし、既存の手法は通常、ジェネレータとランチャーを個別に訓練し、相互フィードバックを無視して生成品質をさらに向上させる。この制限に対処するために,ジェネレータとランカを単一のフレームワークに統合した新しい共同学習アルゴリズムであるJGRを提案する。 JGRは、データ可能性とランサー報酬を組み合わせたハイブリッド目的でジェネレータを最適化し、ジェネレータ出力と比較する対照的な損失でローダを訓練する。ジェネレータとランク装置を反復的に更新することにより、JGRは学習を効果的に調和させ、共同で品質を高めることができる。各種テキスト生成タスクにおけるJGRの評価を行い,3つの共通生成シナリオにおける4つの公開データセット上の既存手法を超えることを示す。私たちのコードとモデルはhttps://github.com/microsoft/ProphetNet/tree/master/JGRで公開されています。 Generate-then-rank is a widely used mechanism for text generation, where a generator produces multiple text candidates and a ranker chooses the best one among the text candidates. However, existing methods usually train the generator and the ranker individually, neglecting the mutual feedback that could further enhance the generation quality. To tackle this limitation, we propose JGR, a novel joint training algorithm that integrates the generator and the ranker in a single framework. JGR optimizes the generator with a hybrid objective that combines data likelihood and ranker reward, and trains the ranker with a contrastive loss that compares the generator outputs. By iteratively updating the generator and the ranker, JGR can effectively harmonize their learning and enhance their quality jointly. We evaluate JGR on various text generation tasks and demonstrate that it surpasses existing methods on four public datasets across three common generation scenarios. Our code and models are publicly available at https://github.com/microsoft/ProphetNet/tree/master/JGR.	翻訳日:2023-05-31 03:59:16 公開日:2023-05-28
# 制約付き微分的共役結合型バンディット Differentially Private Federated Combinatorial Bandits with Constraints ( http://arxiv.org/abs/2206.13192v2 ) ライセンス: Link先を確認	Sambhav Solanki, Samhita Kanaparthy, Sankarshan Damle, Sujit Gujar	(参考訳) オンライン学習環境,すなわちフェデレーション学習(fl)では,協調学習パラダイムが急速に向上している。ほとんどのFL設定とは異なり、エージェントが競合する多くの状況がある。それぞれのエージェントは、他の人から学びたいと思っているが、他の人から学ぶために共有する情報の一部は、センシティブであり、したがって、プライバシを欲しがる。本研究は, 品質制約を維持しつつ, 類似の組合せ帯域問題を解決するために, 同時に作業するエージェント群について検討する。これらのエージェントは、差分プライバシーを利用して機密情報を秘密にしながら、集合的に学習できるのか? 私たちはコミュニケーションが後悔を減らすことを観察する。しかし、機密情報を保護するための差分プライバシー技術は、データを騒がしくし、後悔を改善するのに役立つほど劣化する可能性がある。したがって、いつ通信するか、どの共有データを学習して、後悔とプライバシのバランスを取るかを決めることが不可欠である。このような組み合わせMAB設定のために、プライバシ保存型フェデレーションコンビナート帯域幅アルゴリズムP-FCBを提案する。シミュレーションによりp-fcbの有効性を示す。さらに,本アルゴリズムは,品質のしきい値と有意義なプライバシー保証を保ちながら,後悔の点でも改善できることを示した。 There is a rapid increase in the cooperative learning paradigm in online learning settings, i.e., federated learning (FL). Unlike most FL settings, there are many situations where the agents are competitive. Each agent would like to learn from others, but the part of the information it shares for others to learn from could be sensitive; thus, it desires its privacy. This work investigates a group of agents working concurrently to solve similar combinatorial bandit problems while maintaining quality constraints. Can these agents collectively learn while keeping their sensitive information confidential by employing differential privacy? We observe that communicating can reduce the regret. However, differential privacy techniques for protecting sensitive information makes the data noisy and may deteriorate than help to improve regret. Hence, we note that it is essential to decide when to communicate and what shared data to learn to strike a functional balance between regret and privacy. For such a federated combinatorial MAB setting, we propose a Privacy-preserving Federated Combinatorial Bandit algorithm, P-FCB. We illustrate the efficacy of P-FCB through simulations. We further show that our algorithm provides an improvement in terms of regret while upholding quality threshold and meaningful privacy guarantees.	翻訳日:2023-05-31 03:58:58 公開日:2023-05-28
# LR-Net:低解像度画像分類のためのブロックベース畳み込みニューラルネットワーク LR-Net: A Block-based Convolutional Neural Network for Low-Resolution Image Classification ( http://arxiv.org/abs/2207.09531v5 ) ライセンス: Link先を確認	Ashkan Ganj, Mohsen Ebadpour, Mahdi Darvish, Hamid Bahador	(参考訳) 近年,CNNによる画像分類と特徴抽出の成功により,画像分類が盛んになったが,ノイズや低品質の画像の分類に芸術モデルの状況を適用すると,画像分類の課題がより困難になる。モデルがこのタイプの画像から有意義な特徴を抽出することは、その低解像度と有意義なグローバルな特徴の欠如のため、依然として困難である。さらに、高解像度画像はトレーニングにより多くのレイヤーを必要とするため、トレーニングにより多くの時間と計算能力を要する。また,前述した深層ニューラルネットワークでは,層がより深くなり,勾配が消失する問題にも対処している。これらの問題すべてに対処するため,我々は,低レベルとグローバル両方の特徴を,ぼやけた低解像度画像から学習するために設計された,新しい画像分類アーキテクチャを開発した。ブロックの設計は,性能向上とパラメータサイズ削減のために,Residual ConnectionとInceptionモジュールの影響を強く受けていた。私たちはまた、MNISTファミリデータセットを使用して、Oracle-MNISTデータセットに特に重点を置いて、私たちの作業を評価します。提案するアーキテクチャが既存の最先端畳み込みニューラルネットワークよりも高速かつ正確であることを実証する詳細なテストを実施した。さらに,モデルのユニークな特性から,パラメータの少ない方がよい結果が得られる。 The success of CNN-based architecture on image classification in learning and extracting features made them so popular these days, but the task of image classification becomes more challenging when we apply state of art models to classify noisy and low-quality images. It is still difficult for models to extract meaningful features from this type of image due to its low-resolution and the lack of meaningful global features. Moreover, high-resolution images need more layers to train which means they take more time and computational power to train. Our method also addresses the problem of vanishing gradients as the layers become deeper in deep neural networks that we mentioned earlier. In order to address all these issues, we developed a novel image classification architecture, composed of blocks that are designed to learn both low level and global features from blurred and noisy low-resolution images. Our design of the blocks was heavily influenced by Residual Connections and Inception modules in order to increase performance and reduce parameter sizes. We also assess our work using the MNIST family datasets, with a particular emphasis on the Oracle-MNIST dataset, which is the most difficult to classify due to its low-quality and noisy images. We have performed in-depth tests that demonstrate the presented architecture is faster and more accurate than existing cutting-edge convolutional neural networks. Furthermore, due to the unique properties of our model, it can produce a better result with fewer parameters.	翻訳日:2023-05-31 03:49:12 公開日:2023-05-28
# MGG:マルチGPUプラットフォーム上での微細カーネル内通信-計算パイプライニングによるグラフニューラルネットワークの高速化 MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms ( http://arxiv.org/abs/2209.06800v2 ) ライセンス: Link先を確認	Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Kevin Barker, Ang Li, and Yufei Ding	(参考訳) グラフニューラルネットワーク(GNN)の入力グラフサイズの増加は、マルチGPUプラットフォームの使用需要を浮き彫りにしている。しかし,既存のマルチGPUGNNシステムは,従来のDNNのスケーリング手法に基づいて,計算と通信を個別に最適化している。不規則にスパースできめ細かなGNNワークロードに対して、そのようなソリューションは、ハイパフォーマンスデリバリのための計算と通信操作を共同でスケジュール/最適化する機会を逃している。そこで本研究では,マルチGPUプラットフォーム上でのフルグラフGNNを高速化するシステム設計であるMGGを提案する。 MGGの中核は、GPUカーネル内での微粒な計算通信オーバラップを容易にする、新しい動的ソフトウェアパイプラインである。特にMGGは、ワークロードのバランシングと運用オーバーラップを容易にするために、GNN対応パイプライン構築とGPU対応パイプラインマッピングを導入している。 MGGはまた、解析モデリングと最適化ヒューリスティックを備えたインテリジェントランタイム設計を取り入れ、実行性能を動的に改善する。 MGGは、DGL、MGG-UVM、ROCよりも平均4.41X、4.81X、10.83倍高速である。 The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG, a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41X, 4.81X, and 10.83X faster than DGL, MGG-UVM, and ROC, respectively.	翻訳日:2023-05-31 03:42:36 公開日:2023-05-28
# PaLI: 共同スケール多言語画像モデル PaLI: A Jointly-Scaled Multilingual Language-Image Model ( http://arxiv.org/abs/2209.06794v3 ) ライセンス: Link先を確認	Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut	(参考訳) 効率的なスケーリングとフレキシブルなタスクインターフェースにより、大きな言語モデルが多くのタスクで優れている。本稿では,この手法を言語と視覚の融合モデリングに拡張するPaLI(Pathways Language and Image Model)を提案する。 paliは視覚とテキストの入力に基づいてテキストを生成し、このインターフェイスは多くの言語で多くの視覚、言語、マルチモーダルタスクを実行する。 PaLIのトレーニングには、大きなトレーニング済みエンコーダデコーダ言語モデルと視覚変換器(ViT)を利用する。これにより、既存の能力を活用し、トレーニングのかなりのコストを活用できます。ビジョンと言語コンポーネントのジョイントスケーリングが重要であることが分かりました。既存の言語用トランスフォーマーはビジョンモデルよりもはるかに大きいため、4ビリオンパラメータのViT(ViT-e)をトレーニングし、さらに大きな容量のビジョンモデルの利点を定量化する。 PaLIをトレーニングするために、100以上の言語で10B画像とテキストを含む新しい画像テキストトレーニングセットに基づいて、事前学習タスクの多言語混合を作成する。 PaLIは、複数の視覚と言語タスク(キャプション、視覚的質問応答、シーンテキスト理解など)において最先端を達成しつつ、シンプルでモジュラーでスケーラブルな設計を維持している。 Effective scaling and a flexible task interface enable large language models to excel at many tasks. We present PaLI (Pathways Language and Image model), a model that extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaLI, we make use of large pre-trained encoder-decoder language models and Vision Transformers (ViTs). This allows us to capitalize on their existing capabilities and leverage the substantial cost of training them. We find that joint scaling of the vision and language components is important. Since existing Transformers for language are much larger than their vision counterparts, we train a large, 4-billion parameter ViT (ViT-e) to quantify the benefits from even larger-capacity vision models. To train PaLI, we create a large multilingual mix of pretraining tasks, based on a new image-text training set containing 10B images and texts in over 100 languages. PaLI achieves state-of-the-art in multiple vision and language tasks (such as captioning, visual question-answering, scene-text understanding), while retaining a simple, modular, and scalable design.	翻訳日:2023-05-31 03:42:14 公開日:2023-05-28
# コンテクスト化ハイブリッドモデルによるランキングとキャリブレーションの協調最適化 Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model ( http://arxiv.org/abs/2208.06164v2 ) ライセンス: Link先を確認	Xiang-Rong Sheng, Jingyue Gao, Yueyao Cheng, Siran Yang, Shuguang Han, Hongbo Deng, Yuning Jiang, Jian Xu, Bo Zheng	(参考訳) ランキング最適化手法の開発にもかかわらず、ポイントワイズ損失はクリックスルー率予測において依然として優位なアプローチである。これは、予測をクリック確率と見なすことができるため、ポイントワイズ損失のキャリブレーション能力に起因する可能性がある。実際には、CTR予測モデルは、一般的にランキング能力によって評価される。ランキング能力を最適化するために、ランキングの損失(例えば、ペアワイズまたはリストワイズ損失)は、通常、ポイントワイズ損失よりも優れたランク付けを達成できるため、採用できる。これまでの研究では、両者の損失から利益を得るために2つの損失を直接組み合わせて実験し、性能が向上した。しかし、以前の研究では、アウトプット・ロジットをクリックスルーレートとして意味付けしており、それが最適な解決策につながる可能性がある。この問題に対処するため,我々はランキング・キャリブレーション能力(JRC)を簡易に最適化する手法を提案する。 JRCは、サンプルのロジット値を異なるラベルで対比することでランキング能力を向上し、ロジットサブトラクションの関数である予測確率を制約する。さらに,JRCはロジットの解釈を強化し,ロジットが共同分布をモデル化していることを示す。このような解釈により、JRCは文脈化されたハイブリッド識別・生成目的をほぼ最適化していることを示す。パブリックデータセットと産業データセットとオンラインa/bテストの実験では,評価とキャリブレーションの両能力が改善されている。 2022年5月以降、JRCはAlibabaのディスプレイ広告プラットフォームに配備され、大幅な性能向上を実現している。 Despite the development of ranking optimization techniques, pointwise loss remains the dominating approach for click-through rate prediction. It can be attributed to the calibration ability of the pointwise loss since the prediction can be viewed as the click probability. In practice, a CTR prediction model is also commonly assessed with the ranking ability. To optimize the ranking ability, ranking loss (e.g., pairwise or listwise loss) can be adopted as they usually achieve better rankings than pointwise loss. Previous studies have experimented with a direct combination of the two losses to obtain the benefit from both losses and observed an improved performance. However, previous studies break the meaning of output logit as the click-through rate, which may lead to sub-optimal solutions. To address this issue, we propose an approach that can Jointly optimize the Ranking and Calibration abilities (JRC for short). JRC improves the ranking ability by contrasting the logit value for the sample with different labels and constrains the predicted probability to be a function of the logit subtraction. We further show that JRC consolidates the interpretation of logits, where the logits model the joint distribution. With such an interpretation, we prove that JRC approximately optimizes the contextualized hybrid discriminative-generative objective. Experiments on public and industrial datasets and online A/B testing show that our approach improves both ranking and calibration abilities. Since May 2022, JRC has been deployed on the display advertising platform of Alibaba and has obtained significant performance improvements.	翻訳日:2023-05-31 03:40:54 公開日:2023-05-28
# 単一量子ビットゲートテレポーテーションは量子アドバンテージを提供する Single-qubit gate teleportation provides a quantum advantage ( http://arxiv.org/abs/2209.14158v2 ) ライセンス: Link先を確認	Libor Caha, Xavier Coiteux-Roy, Robert Koenig	(参考訳) ゲートテレポーテーション回路は、量子計算の利点をもたらすと信じられている計算の最も基本的な例の1つである: [quantum inf. comput., 4(2):134--145], terhal と divincenzo は、これらの回路が、合理的な複雑性・理論的な仮定の下で、効率的な古典的アルゴリズムによるシミュレーションを免れることを示した。ここでは、回路の出力分布に非ゼロ確率で現れる文字列を出力することが目的であるこのタスクの特に弱い形式である確率論的シミュレーション(Phys. A 106, 062430 (2022))を考える。単一量子Clifford-gate-teleportation回路であっても、このシミュレーション問題はファンインゲートが有界な定深古典回路では解決できない。その結果,パリティの計算問題,古典的回路複雑性におけるよく研究された問題への還元によって得られた。 Gate-teleportation circuits are arguably among the most basic examples of computations believed to provide a quantum computational advantage: In seminal work [Quantum Inf. Comput., 4(2):134--145], Terhal and DiVincenzo have shown that these circuits elude simulation by efficient classical algorithms under plausible complexity-theoretic assumptions. Here we consider possibilistic simulation [Phys. Rev. A 106, 062430 (2022)], a particularly weak form of this task where the goal is to output any string appearing with non-zero probability in the output distribution of the circuit. We show that even for single-qubit Clifford-gate-teleportation circuits this simulation problem cannot be solved by constant-depth classical circuits with bounded fan-in gates. Our results are unconditional and are obtained by a reduction to the problem of computing the parity, a well-studied problem in classical circuit complexity.	翻訳日:2023-05-31 03:32:16 公開日:2023-05-28
# スパイクニューラルネットワークのための時空間拡散注意法 A Spatial-channel-temporal-fused Attention for Spiking Neural Networks ( http://arxiv.org/abs/2209.10837v3 ) ライセンス: Link先を確認	Wuque Cai, Hongze Sun, Rui Liu, Yan Cui, Jun Wang, Yang Xia, Dezhong Yao, and Daqing Guo	(参考訳) スパイクニューラルネットワーク(SNN)は脳の計算戦略を模倣し、時空間情報処理においてかなりの能力を示す。人間の知覚に必須の要素として、視覚注意は生物視覚システムにおいてサルエント領域を選択するダイナミックなプロセスを指す。視覚注意機構はコンピュータビジョンアプリケーションで大きな成功を収めているが、snsに導入されることは滅多にない。そこで本研究では,SNNを誘導し,蓄積した歴史的空間チャネル情報を利用して,対象領域を効果的に捉えることのできる,空間チャネル融合型注意モジュール(SCTFA)を提案する。 3つのイベントストリームデータセット(DVS Gesture, SL-Animals-DVS, MNIST-DVS)の体系的評価により、SCTFAモジュール(SCTFA-SNN)を用いたSNNが、ベースラインSNN(BL-SNN)と他の2つのSNNモデルに分解された注目モジュールを著しく上回るだけでなく、既存の最先端手法との競合精度も達成できることを示した。さらに,SCTFA-SNNモデルでは,不完全なデータに直面する場合,ノイズに対する強い頑健さと優れた安定性を保ちながら,複雑性と効率の維持を図っている。これらの結果は、脳の適切な認知機構を組み込むことが、SNNの能力を高めるための有望なアプローチをもたらすことを示唆している。 Spiking neural networks (SNNs) mimic brain computational strategies, and exhibit substantial capabilities in spatiotemporal information processing. As an essential factor for human perception, visual attention refers to the dynamic process for selecting salient regions in biological vision systems. Although visual attention mechanisms have achieved great success in computer vision applications, they are rarely introduced into SNNs. Inspired by experimental observations on predictive attentional remapping, we propose a new spatial-channel-temporal-fused attention (SCTFA) module that can guide SNNs to efficiently capture underlying target regions by utilizing accumulated historical spatial-channel information in the present study. Through a systematic evaluation on three event stream datasets (DVS Gesture, SL-Animals-DVS and MNIST-DVS), we demonstrate that the SNN with the SCTFA module (SCTFA-SNN) not only significantly outperforms the baseline SNN (BL-SNN) and two other SNN models with degenerated attention modules, but also achieves competitive accuracy with existing state-of-the-art methods. Additionally, our detailed analysis shows that the proposed SCTFA-SNN model has strong robustness to noise and outstanding stability when faced with incomplete data, while maintaining acceptable complexity and efficiency. Overall, these findings indicate that incorporating appropriate cognitive mechanisms of the brain may provide a promising approach to elevate the capabilities of SNNs.	翻訳日:2023-05-31 03:30:58 公開日:2023-05-28
# Dense-ATOMIC:高知識カバレッジと大規模マルチホップパスを備えた高機能なATOMICを目指して Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths ( http://arxiv.org/abs/2210.07621v2 ) ライセンス: Link先を確認	Xiangqing Shen, Siwei Wu, and Rui Xia	(参考訳) ATOMICは大規模なコモンセンス知識グラフ(CSKG)で、日々のif-thenの知識三重項、すなわち {head event, relation, tail event}を含んでいる。ワンホップの注釈法により、ATOMICは独立した二部グラフの集合となり、異なる二部グラフのイベント間の多数のリンクを無視し、結果として知識カバレッジやマルチホップパスの不足を引き起こした。本研究は,Dense-ATOMICを高知識と大規模マルチホップパスで構築することを目的としている。 ATOMICのイベントは、最初は一貫したパターンに正規化されます。次に,Rel-CSKGCと呼ばれるCSKG補完手法を提案し,三重項の先頭イベントと尾イベントの関係を推定し,ATOMICの既存の三重項に基づくCSKG補完モデルを訓練する。最終的に、このモデルを用いて、ATOMICの欠落したリンクを完了し、Dense-ATOMICを構築する。 ATOMICの注釈付きサブグラフにおける自動的および人的評価は、強いベースラインに対するRel-CSKGCの利点を示す。我々はさらに、Dense-ATOMICの知識被覆とマルチホップパスにおける利点を証明し、統計、人的評価、簡単な下流タスクの観点から、Dense-ATOMICの広範な評価を行う。 Rel-CSKGCとDense-ATOMICのソースコードはhttps://github.com/NUSTM/Dense-ATOMICで公開されている。 ATOMIC is a large-scale commonsense knowledge graph (CSKG) containing everyday if-then knowledge triplets, i.e., {head event, relation, tail event}. The one-hop annotation manner made ATOMIC a set of independent bipartite graphs, which ignored the numerous links between events in different bipartite graphs and consequently caused shortages in knowledge coverage and multi-hop paths. In this work, we aim to construct Dense-ATOMIC with high knowledge coverage and massive multi-hop paths. The events in ATOMIC are normalized to a consistent pattern at first. We then propose a CSKG completion method called Rel-CSKGC to predict the relation given the head event and the tail event of a triplet, and train a CSKG completion model based on existing triplets in ATOMIC. We finally utilize the model to complete the missing links in ATOMIC and accordingly construct Dense-ATOMIC. Both automatic and human evaluation on an annotated subgraph of ATOMIC demonstrate the advantage of Rel-CSKGC over strong baselines. We further conduct extensive evaluations on Dense-ATOMIC in terms of statistics, human evaluation, and simple downstream tasks, all proving Dense-ATOMIC's advantages in Knowledge Coverage and Multi-hop Paths. Both the source code of Rel-CSKGC and Dense-ATOMIC are publicly available on https://github.com/NUSTM/Dense-ATOMIC.	翻訳日:2023-05-31 03:24:31 公開日:2023-05-28
# VIMA:マルチモーダルプロンプトによる汎用ロボット操作 VIMA: General Robot Manipulation with Multimodal Prompts ( http://arxiv.org/abs/2210.03094v2 ) ライセンス: Link先を確認	Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan	(参考訳) プロンプトに基づく学習は自然言語処理において成功し、入力プロンプトによって指定されたタスクを実行するために単一の汎用言語モデルを指示することができる。しかしロボティクスにおけるタスク仕様は、ワンショットデモの模倣、言語指示の追従、視覚目標の達成など、さまざまな形態で実現されている。それらはしばしば異なるタスクと見なされ、特殊なモデルによって取り組まれる。ロボット操作タスクの幅広い範囲を多モーダルなプロンプトで表現し,テキストトークンと視覚トークンを介在することを示す。そこで本研究では,複数モーダルプロンプトを持つ数千のプロシージャ生成テーブルトップタスクと,模倣学習のための600K以上の専門トラジェクトリと,体系的一般化のための4段階評価プロトコルからなる新しいシミュレーションベンチマークを開発した。我々は、これらのプロンプトを処理するトランスフォーマーベースのロボットエージェントVIMAを設計し、自動回帰動作を出力する。 VIMAは強力なモデルスケーラビリティとデータ効率を実現するレシピを備えている。これは、同じトレーニングデータに対して最大2.9\times$タスク成功率で、最も難しいゼロショット一般化設定で代替設計を上回っている。 10\times$のトレーニングデータでは、vimaは最高の競合製品よりも2.7\times$が良い。コードとビデオのデモはhttps://vimalabs.github.io/で見ることができる。 Prompt-based learning has emerged as a successful paradigm in natural language processing, where a single general-purpose language model can be instructed to perform any task specified by input prompts. Yet task specification in robotics comes in various forms, such as imitating one-shot demonstrations, following language instructions, and reaching visual goals. They are often considered different tasks and tackled by specialized models. We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens. Accordingly, we develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert trajectories for imitation learning, and a four-level evaluation protocol for systematic generalization. We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively. VIMA features a recipe that achieves strong model scalability and data efficiency. It outperforms alternative designs in the hardest zero-shot generalization setting by up to $2.9\times$ task success rate given the same training data. With $10\times$ less training data, VIMA still performs $2.7\times$ better than the best competing variant. Code and video demos are available at https://vimalabs.github.io/	翻訳日:2023-05-31 03:22:30 公開日:2023-05-28
# 生成検索のための非パラメトリックデコーディング Nonparametric Decoding for Generative Retrieval ( http://arxiv.org/abs/2210.02068v3 ) ライセンス: Link先を確認	Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vlad Karpukhin, Yi Lu, Minjoon Seo	(参考訳) 生成検索モデルは、外部メモリのないモデルパラメータで符号化された情報のみに依存し、その情報容量は制限され固定される。この制限を克服するため,既存の生成検索モデルに適用可能な非パラメトリックデコーディング(Npデコーディング)を提案する。 npデコードでは、バニラボカブ組込みをデコーダボカブ組込みとしてではなく、非パラメトリックコンテキスト化vocab組込み(外部メモリ)を使用する。文脈化されたvocab埋め込みを利用することで、生成的検索モデルはパラメトリック空間と非パラメトリック空間の両方を利用することができる。文書検索タスクにおける9つのデータセット(シングルホップ8個、マルチホップ1個)に対する評価は、生成的検索モデルにNpデコードを適用することにより、性能が大幅に向上することを示している。また、Npデコーディングはデータおよびパラメータ効率が高く、ゼロショット設定では高い性能を示す。 The generative retrieval model depends solely on the information encoded in its model parameters without external memory, its information capacity is limited and fixed. To overcome the limitation, we propose Nonparametric Decoding (Np Decoding) which can be applied to existing generative retrieval models. Np Decoding uses nonparametric contextualized vocab embeddings (external memory) rather than vanilla vocab embeddings as decoder vocab embeddings. By leveraging the contextualized vocab embeddings, the generative retrieval model is able to utilize both the parametric and nonparametric space. Evaluation over 9 datasets (8 single-hop and 1 multi-hop) in the document retrieval task shows that applying Np Decoding to generative retrieval models significantly improves the performance. We also show that Np Decoding is data- and parameter-efficient, and shows high performance in the zero-shot setting.	翻訳日:2023-05-31 03:22:11 公開日:2023-05-28
# 神経崩壊の摂動解析 Perturbation Analysis of Neural Collapse ( http://arxiv.org/abs/2210.16658v2 ) ライセンス: Link先を確認	Tom Tirer, Haoxiang Huang, Jonathan Niles-Weed	(参考訳) 分類のためのディープニューラルネットワークのトレーニングには、ゼロトレーニングエラーポイントを超えるトレーニング損失の最小化が含まれることが多い。この段階では、クラス内のサンプルの特徴(ペナルティメート層のアウトプット)の変化が減少し、異なるクラスの平均的な特徴が特定のタイトなフレーム構造に近づくという「神経崩壊」の挙動が観察されている。最近の研究は、全ての最小値が完全に崩壊する理想化されていない特徴モデルを通してこの振る舞いを分析する。しかし、実際的なネットワークやデータセットでは、例えば深い層は崩壊から程遠い中間の機能を任意に修正できないため、機能は通常正確な崩壊に達しない。本稿では,特徴を予め定義された特徴行列(例えば,中間特徴)の近傍に留まらせることにより,この現象を捉えることができるリッチなモデルを提案する。本研究では, 摂動解析により小近傍のモデルを調べ, 既往のモデルでは得られない結果を得る。例えば、最適化された特徴のクラス内変動を(最小限の仮定で「中央経路」の勾配流を解析することで)事前定義された入力特徴と比較し、近収束状態における最小値を分析し、正規化ハイパーパラメータが崩壊の近さに与える影響についての洞察を与える。我々は,実際の深層学習環境で実験を行い,理論を支持する。 Training deep neural networks for classification often includes minimizing the training loss beyond the zero training error point. In this phase of training, a "neural collapse" behavior has been observed: the variability of features (outputs of the penultimate layer) of within-class samples decreases and the mean features of different classes approach a certain tight frame structure. Recent works analyze this behavior via idealized unconstrained features models where all the minimizers exhibit exact collapse. However, with practical networks and datasets, the features typically do not reach exact collapse, e.g., because deep layers cannot arbitrarily modify intermediate features that are far from being collapsed. In this paper, we propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix (e.g., intermediate features). We explore the model in the small vicinity case via perturbation analysis and establish results that cannot be obtained by the previously studied models. For example, we prove reduction in the within-class variability of the optimized features compared to the predefined input features (via analyzing gradient flow on the "central-path" with minimal assumptions), analyze the minimizers in the near-collapse regime, and provide insights on the effect of regularization hyperparameters on the closeness to collapse. We support our theory with experiments in practical deep learning settings.	翻訳日:2023-05-31 03:15:13 公開日:2023-05-28
# 協調推論誘導言語モデルによる数学単語問題の解法 Solving Math Word Problems via Cooperative Reasoning induced Language Models ( http://arxiv.org/abs/2210.16257v4 ) ライセンス: Link先を確認	Xinyu Zhu, Junjie Wang, Lin Zhang, Yuxiang Zhang, Yongfeng Huang, Ruyi Gan, Jiaxing Zhang, Yujiu Yang	(参考訳) 大規模事前学習言語モデル(PLM)は、特に数学語問題(MWP)のような高レベルの知性を必要とする問題に新たな機会をもたらす。しかしながら、既存のPLMをMWPに直接適用することは、生成プロセスが十分な監督を欠いているため、人間としての高速な適応性を欠いているため失敗する可能性がある。人間の推論には、即時反応系(システム1)と微妙な推論系(システム2)から構成される二重推論の枠組みがあることに気付く。これにより、協調推論(Cooperative Reasoning, CoRe)と呼ばれる、MWPを解くための協調推論によるPLMを開発することとなり、システム1をジェネレータとして、システム2をバリデーションとして、人間のような推論アーキテクチャを実現する。提案手法では, ジェネレータは推論経路の生成に責任を持ち, 検証器を用いて評価を監督し, ジェネレータに対する信頼性の高いフィードバックを得る。我々はCoReフレームワークをいくつかの数学的推論データセット上で評価し、最先端の手法よりも優れた改善を実現した。私たちのコードはhttps://github.com/TianHongZXY/CoReで利用可能です。 Large-scale pre-trained language models (PLMs) bring new opportunities to challenging problems, especially those that need high-level intelligence, such as the math word problem (MWPs). However, directly applying existing PLMs to MWPs can fail as the generation process lacks sufficient supervision and thus lacks fast adaptivity as humans. We notice that human reasoning has a dual reasoning framework that consists of an immediate reaction system (system 1) and a delicate reasoning system (system 2), where the entire reasoning is determined by their interaction. This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier. In our approach, the generator is responsible for generating reasoning paths, and the verifiers are used to supervise the evaluation in order to obtain reliable feedback for the generator. We evaluate our CoRe framework on several mathematical reasoning datasets and achieve decent improvement over state-of-the-art methods, up to 9.6% increase over best baselines. Our codes are available at https://github.com/TianHongZXY/CoRe	翻訳日:2023-05-31 03:14:33 公開日:2023-05-28
# wavebound: 安定時系列予測のための動的エラー境界 WaveBound: Dynamic Error Bounds for Stable Time Series Forecasting ( http://arxiv.org/abs/2210.14303v2 ) ライセンス: Link先を確認	Youngin Cho, Daejin Kim, Dongmin Kim, Mohammad Azam Khan, Jaegul Choo	(参考訳) 時系列予測は、交通、エネルギー消費、経済と財政、疾病分析といった現実の応用において高い実用性のために重要な課題となっている。最近のディープラーニングベースのアプローチは、時系列予測で顕著な成功を示している。それでも、時系列データのダイナミクスのため、ディープネットワークは不安定なトレーニングと過度な適合に悩まされている。実世界のデータに現れる一貫性のないパターンは、モデルを特定のパターンにバイアスし、一般化を制限する。本稿では,時系列予測における過適合問題に対処するため,トレーニング損失の動的誤差境界を導入する。そこで本研究では,各イテレーションの時間ステップと特徴ごとにトレーニング損失の適切な誤差範囲を推定するウェーブバウンドと呼ばれる正規化手法を提案する。予測不可能なデータにモデルを集中させることで、WaveBoundはトレーニングプロセスを安定させ、一般化を大幅に改善する。大規模な実験により、WaveBoundは最先端モデルを含む既存のモデルを大きく改善することを示す。 Time series forecasting has become a critical task due to its high practicality in real-world applications such as traffic, energy consumption, economics and finance, and disease analysis. Recent deep-learning-based approaches have shown remarkable success in time series forecasting. Nonetheless, due to the dynamics of time series data, deep networks still suffer from unstable training and overfitting. Inconsistent patterns appearing in real-world data lead the model to be biased to a particular pattern, thus limiting the generalization. In this work, we introduce the dynamic error bounds on training loss to address the overfitting issue in time series forecasting. Consequently, we propose a regularization method called WaveBound which estimates the adequate error bounds of training loss for each time step and feature at each iteration. By allowing the model to focus less on unpredictable data, WaveBound stabilizes the training process, thus significantly improving generalization. With the extensive experiments, we show that WaveBound consistently improves upon the existing models in large margins, including the state-of-the-art model.	翻訳日:2023-05-31 03:13:53 公開日:2023-05-28
# 大規模交通速度推定のためのスパースセンシング:ラプラシアン強化低ランクテンソルクリグ法 Correlating sparse sensing for large-scale traffic speed estimation: A Laplacian-enhanced low-rank tensor kriging approach ( http://arxiv.org/abs/2210.11780v3 ) ライセンス: Link先を確認	Tong Nie, Guoyang Qin, Yunpeng Wang, Jian Sun	(参考訳) 交通速度は道路網の流動性を特徴づける中心である。多くの輸送アプリケーションは、リアルタイムナビゲーション、動的経路計画、混雑管理など、それに依存している。センサと通信技術の急速な進歩は、交通速度の検出をこれまで以上に容易にする。しかし,静的センサの配置不足や移動センサの浸透率の低下により,検出速度は不完全であり,ネットワーク全体の利用には程遠い。さらに、センサーは様々な理由でデータの誤りや欠落を招きやすいため、これらのセンサーの速度はノイズが高くなる可能性がある。これらの欠点は、不完全なデータから信頼できる見積もりを回収するための効果的な手法を必要とする。本研究では,この問題を時空間クリグ問題として認識し,低ランク性および多次元相関を考慮したラプラシア拡張低ランクテンソル補完(LETC)フレームワークを提案する。具体的には、時間連続性、時間周期性、空間近接性を含む3種類の速度相関を、時間グラフフーリエ変換、一般化時間整合正則化、拡散グラフ正則化という3つの異なる形式のグラフラプラシアンによって慎重に、同時にモデル化する。次に,提案したモデルをネットワークワイド・クリグにスケールアップするために,複数の有効な数値手法を用いて効率的な解アルゴリズムを設計する。 2つの公開100万レベルのトラヒックスピードデータセットで実験を行うことで、我々は最終的に結論を導き、提案するletcは、低観察率でも最先端のクリング性能を達成し、同時に、ベースライン法に比べて半分以上の計算時間を節約できることを示した。時空間的トラフィックデータモデリングとネットワークレベルでのkrigingに関する洞察も提供されている。 Traffic speed is central to characterizing the fluidity of the road network. Many transportation applications rely on it, such as real-time navigation, dynamic route planning, and congestion management. Rapid advances in sensing and communication techniques make traffic speed detection easier than ever. However, due to sparse deployment of static sensors or low penetration of mobile sensors, speeds detected are incomplete and far from network-wide use. In addition, sensors are prone to error or missing data due to various kinds of reasons, speeds from these sensors can become highly noisy. These drawbacks call for effective techniques to recover credible estimates from the incomplete data. In this work, we first identify the issue as a spatiotemporal kriging problem and propose a Laplacian enhanced low-rank tensor completion (LETC) framework featuring both lowrankness and multi-dimensional correlations for large-scale traffic speed kriging under limited observations. To be specific, three types of speed correlation including temporal continuity, temporal periodicity, and spatial proximity are carefully chosen and simultaneously modeled by three different forms of graph Laplacian, named temporal graph Fourier transform, generalized temporal consistency regularization, and diffusion graph regularization. We then design an efficient solution algorithm via several effective numeric techniques to scale up the proposed model to network-wide kriging. By performing experiments on two public million-level traffic speed datasets, we finally draw the conclusion and find our proposed LETC achieves the state-of-the-art kriging performance even under low observation rates, while at the same time saving more than half computing time compared with baseline methods. Some insights into spatiotemporal traffic data modeling and kriging at the network level are provided as well.	翻訳日:2023-05-31 03:12:30 公開日:2023-05-28
# SketchySGD:ランダムな曲率推定による信頼性確率最適化 SketchySGD: Reliable Stochastic Optimization via Randomized Curvature Estimates ( http://arxiv.org/abs/2211.08597v4 ) ライセンス: Link先を確認	Zachary Frangella, Pratik Rathore, Shipu Zhao, Madeleine Udell	(参考訳) SketchySGDは、サブサンプルのHessianに対するランダム化低ランク近似を用いることで、機械学習の既存の確率勾配法を改善し、幅広い凸機械学習問題に対してうまく機能する自動ステップサイズを導入する。固定段数を持つSketchySGDが最適の周りの小さな球に線形に収束することを理論的に示す。さらに、不条件条件下では、SketchySGDは最小二乗問題に対してSGDよりも高速に収束することを示す。この改善を実データに対するリッジ回帰実験で実証的に検証する。密度および疎度データを用いたリッジおよびロジスティック回帰問題の数値実験により、SketchySGDのデフォルトのハイパーパラメーターは、最高の性能が得られるように調整された場合でも、一般的な確率勾配法と同等あるいはより良い結果が得られることを示した。特にSketchySGDは、840ドル(約8万4000円)以上のRAMを格納するデータマトリックスを使って、不条件のロジスティック回帰問題を解決することができる。 sketchysgdの既定のハイパーパラメーターでアウト・オブ・ザ・ボックスを動作させ、悪条件の問題に優れる能力は、他の確率的勾配法よりも優れている。 SketchySGD improves upon existing stochastic gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian and by introducing an automated stepsize that works well across a wide range of convex machine learning problems. We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum. Further, in the ill-conditioned setting we show SketchySGD converges at a faster rate than SGD for least-squares problems. We validate this improvement empirically with ridge regression experiments on real data. Numerical experiments on both ridge and logistic regression problems with dense and sparse data, show that SketchySGD equipped with its default hyperparameters can achieve comparable or better results than popular stochastic gradient methods, even when they have been tuned to yield their best performance. In particular, SketchySGD is able to solve an ill-conditioned logistic regression problem with a data matrix that takes more than $840$GB RAM to store, while its competitors, even when tuned, are unable to make any progress. SketchySGD's ability to work out-of-the box with its default hyperparameters and excel on ill-conditioned problems is an advantage over other stochastic gradient methods, most of which require careful hyperparameter tuning (especially of the learning rate) to obtain good performance and degrade in the presence of ill-conditioning.	翻訳日:2023-05-31 03:06:10 公開日:2023-05-28
# mOKB6: 多言語オープンな知識ベースコンプリートベンチマーク mOKB6: A Multilingual Open Knowledge Base Completion Benchmark ( http://arxiv.org/abs/2211.06959v2 ) ライセンス: Link先を確認	Shubham Mittal, Keshav Kolluru, Soumen Chakrabarti, Mausam	(参考訳) オープン知識ベース(Open KB)の自動補完は,オープン情報抽出(Open IE)システムによって得られる3つの形式(対象語句,関係語句,対象語句)から構築され,テキストに直接存在しない可能性のある新規事実の発見に有用である。しかし、Open KB Complete(Open KBC)の研究は、これまで英語のようなリソース豊富な言語に限られてきた。マルチ言語Open IEの最新の進歩を利用して、最初のマルチ言語Open KBCデータセット、mOKB6を構築し、ウィキペディアの事実を6言語(英語を含む)で記述した。従来のOpen KB構築パイプラインの改善には,マルチリンガルコア参照の解決と,エンティティリンクされたトリプルのみを保持することで,密集したOpen KBを作成する。我々は,タスクのためのいくつかのモデルを試行し,共有埋め込み空間と事実の翻訳の助けを借りて,言語を組み合わせるという一貫した利点を観察する。また、現在の多言語モデルは、異なるスクリプトの言語で見られる事実を覚えるのに苦労している。 Automated completion of open knowledge bases (Open KBs), which are constructed from triples of the form (subject phrase, relation phrase, object phrase), obtained via open information extraction (Open IE) system, are useful for discovering novel facts that may not be directly present in the text. However, research in Open KB completion (Open KBC) has so far been limited to resource-rich languages like English. Using the latest advances in multilingual Open IE, we construct the first multilingual Open KBC dataset, called mOKB6, containing facts from Wikipedia in six languages (including English). Improving the previous Open KB construction pipeline by doing multilingual coreference resolution and keeping only entity-linked triples, we create a dense Open KB. We experiment with several models for the task and observe a consistent benefit of combining languages with the help of shared embedding space as well as translations of facts. We also observe that current multilingual models struggle to remember facts seen in languages of different scripts.	翻訳日:2023-05-31 03:05:20 公開日:2023-05-28
# 拡散モデルに基づく雑音線形逆問題に対する後方サンプリング Diffusion Model Based Posterior Sampling for Noisy Linear Inverse Problems ( http://arxiv.org/abs/2211.12343v2 ) ライセンス: Link先を確認	Xiangming Meng and Yoshiyuki Kabashima	(参考訳) 加法ガウス雑音を用いたユビキタス線形逆問題について考察し,拡散モデルに基づく後方サンプリング (DMPS) と呼ばれる教師なしサンプリング手法を提案する。具体的には、一つの拡散モデル(dm)を暗黙の先行として用いると、後続サンプリングの基本的な難易度は、ノイズ摂動度スコア、すなわちアニール度関数の勾配が難易度である。この問題を回避すべく,非形式的事前仮定を用いて,単純かつ効果的な閉形式近似を導入する。ノイズの超解像, ノイズ除去, デブロリング, カラー化など, 様々なノイズ線形逆問題に対して, 広範囲にわたる実験を行った。全てのタスクにおいて、提案したDMPSは、最先端の競合拡散後サンプリング(DPS)の3倍の速さで、様々なタスクにおいて高い競争力や性能を示す。結果を再現するコードはhttps://github.com/mengxiangming/dmpsで入手できる。 We consider the ubiquitous linear inverse problems with additive Gaussian noise and propose an unsupervised sampling approach called diffusion model based posterior sampling (DMPS) to reconstruct the unknown signal from noisy linear measurements. Specifically, using one diffusion model (DM) as an implicit prior, the fundamental difficulty in performing posterior sampling is that the noise-perturbed likelihood score, i.e., gradient of an annealed likelihood function, is intractable. To circumvent this problem, we introduce a simple yet effective closed-form approximation of it using an uninformative prior assumption. Extensive experiments are conducted on a variety of noisy linear inverse problems such as noisy super-resolution, denoising, deblurring, and colorization. In all tasks, the proposed DMPS demonstrates highly competitive or even better performances on various tasks while being 3 times faster than the state-of-the-art competitor diffusion posterior sampling (DPS). The code to reproduce the results is available at https://github.com/mengxiangming/dmps.	翻訳日:2023-05-31 02:54:55 公開日:2023-05-28
# マルチエージェントリーグトレーニングによる異種エージェント協調学習 Learning Heterogeneous Agent Cooperation via Multiagent League Training ( http://arxiv.org/abs/2211.11616v2 ) ライセンス: Link先を確認	Qingxu Fu, Xiaolin Ai, Jianqiang Yi, Tenghai Qiu, Wanmai Yuan, Zhiqiang Pu	(参考訳) 現実世界の多くのマルチエージェントシステムは、異なる能力と機能を持つ複数のタイプのエージェントを含んでいる。このような異質なマルチエージェントシステムには、大きな実用的利点がある。しかし、それらはまた、非定常問題やポリシーバージョン反復問題のようなマルチエージェント強化学習のための均質なシステムと比較される。本研究ではヘテロジニアス・リーグ・トレーニング(HLT)と呼ばれる汎用強化学習アルゴリズムを提案する。 hltは、エージェントがトレーニング中に検討したポリシーのプールを追跡し、将来のポリシー最適化を促進するために異種ポリシーのリーグを収集する。さらに、異なるレベルの協力スキルを持つチームメイトとコラボレーションする際のエージェント行動の多様性を高めるためにハイパーネットワークが導入された。我々は,(1)HLTが協調的不均一なタスクの成功率を促進すること,(2)HLTは政策バージョン反復問題の解決に有効なアプローチであること,(3)HLTは異種チームにおける各役割の学習の困難さを評価するための実践的な方法を提供する。 Many multiagent systems in the real world include multiple types of agents with different abilities and functionality. Such heterogeneous multiagent systems have significant practical advantages. However, they also come with challenges compared with homogeneous systems for multiagent reinforcement learning, such as the non-stationary problem and the policy version iteration issue. This work proposes a general-purpose reinforcement learning algorithm named Heterogeneous League Training (HLT) to address heterogeneous multiagent problems. HLT keeps track of a pool of policies that agents have explored during training, gathering a league of heterogeneous policies to facilitate future policy optimization. Moreover, a hyper-network is introduced to increase the diversity of agent behaviors when collaborating with teammates having different levels of cooperation skills. We use heterogeneous benchmark tasks to demonstrate that (1) HLT promotes the success rate in cooperative heterogeneous tasks; (2) HLT is an effective approach to solving the policy version iteration problem; (3) HLT provides a practical way to assess the difficulty of learning each role in a heterogeneous team.	翻訳日:2023-05-31 02:54:21 公開日:2023-05-28
# サンプル選択と平衡損失を用いた長周期雑音データからの学習 Learning from Long-Tailed Noisy Data with Sample Selection and Balanced Loss ( http://arxiv.org/abs/2211.10906v3 ) ライセンス: Link先を確認	Lefan Zhang, Zhang-Hao Tian, Wujun Zhou, Wei Wang	(参考訳) ディープラーニングの成功は、大規模かつ高精細なトレーニングデータに依存する一方で、現実世界のアプリケーションにおけるデータは、一般的にロングテールでノイズが多い。ロングテールデータやノイズデータを扱うために多くの手法が提案されているが、ロングテールデータを扱うためにいくつかの手法が開発されている。そこで本研究では,長い尾を持つ雑音データからサンプル選択と損失のバランスをとる頑健な学習法を提案する。具体的には、ノイズのあるトレーニングデータをクリーンなラベル付きセットとサンプル選択付き未ラベルセットに分離し、モデルバイアスに基づくバランスの取れた損失で、深いニューラルネットワークを半教師付きでトレーニングする。ベンチマーク実験により,本手法が既存の最先端手法より優れていることが示された。 The success of deep learning depends on large-scale and well-curated training data, while data in real-world applications are commonly long-tailed and noisy. Many methods have been proposed to deal with long-tailed data or noisy data, while a few methods are developed to tackle long-tailed noisy data. To solve this, we propose a robust method for learning from long-tailed noisy data with sample selection and balanced loss. Specifically, we separate the noisy training data into clean labeled set and unlabeled set with sample selection, and train the deep neural network in a semi-supervised manner with a balanced loss based on model bias. Extensive experiments on benchmarks demonstrate that our method outperforms existing state-of-the-art methods.	翻訳日:2023-05-31 02:53:35 公開日:2023-05-28
# ニューラル高次条件ランダム場を用いた共同情報抽出のためのインスタンス間相互作用のモデル化 Modeling Instance Interactions for Joint Information Extraction with Neural High-Order Conditional Random Field ( http://arxiv.org/abs/2212.08929v2 ) ライセンス: Link先を確認	Zixia Jia, Zhaohui Yan, Wenjuan Han, Zilong Zheng, Kewei Tu	(参考訳) 統合情報抽出(IE)は、典型的なモデルインスタンス(例えば、イベントトリガー、エンティティ、ロール、リレーションシップ)の表現強化、型依存のスコアリング、グローバルデコードによるインタラクションである。従来のモデルでは,一対のインスタンスのバイナリ型依存性のスコアリングが一般的であり,ビームサーチなどの局所探索を利用して大域的解を求める。クロスインスタンスインタラクションをよりよく統合するために、我々は、高次条件ランダムフィールドとしてIEを定式化する共同IEフレームワーク(CRFIE)を導入する。具体的には、一対のインスタンスだけでなく三重項の相互作用を直接モデル化するために、二元因子と三元因子を設計する。そして、これらの因子を用いて全てのインスタンスのラベルを共同で予測する。正確な高階推定の難解性問題に対処するために,平均場変分推論法から展開される高階のニューラルデコーダを取り入れ,一貫した学習と推論を実現する。実験の結果,本手法は3つのieタスクにおいてベースラインと先行作業と比較して一貫した改善が得られた。 Prior works on joint Information Extraction (IE) typically model instance (e.g., event triggers, entities, roles, relations) interactions by representation enhancement, type dependencies scoring, or global decoding. We find that the previous models generally consider binary type dependency scoring of a pair of instances, and leverage local search such as beam search to approximate global solutions. To better integrate cross-instance interactions, in this work, we introduce a joint IE framework (CRFIE) that formulates joint IE as a high-order Conditional Random Field. Specifically, we design binary factors and ternary factors to directly model interactions between not only a pair of instances but also triplets. Then, these factors are utilized to jointly predict labels of all instances. To address the intractability problem of exact high-order inference, we incorporate a high-order neural decoder that is unfolded from a mean-field variational inference method, which achieves consistent learning and inference. The experimental results show that our approach achieves consistent improvements on three IE tasks compared with our baseline and prior work.	翻訳日:2023-05-31 02:47:18 公開日:2023-05-28
# 離散ウェーブレット変換と生成逆ネットワークに基づくカラー文書画像の3段階二元化 Three-stage binarization of color document images based on discrete wavelet transform and generative adversarial networks ( http://arxiv.org/abs/2211.16098v3 ) ライセンス: Link先を確認	Yu-Shian Lin, Rui-Yang Ju, Chih-Chia Chen, Ting-Yu Lin, Jen-Shiun Chiang	(参考訳) 劣化したカラー文書画像における背景テキスト情報の効率的なセグメンテーションは熱い研究課題である。古文書の保存が長期にわたって不完全なため、染色、黄化、インクの浸出など様々な種類の劣化が画像二項化の結果に深刻な影響を与えている。本稿では, 離散ウェーブレット変換 (DWT) とGAN (Generative Adversarial Network) を用いて, 劣化したカラー文書画像の画像強調とバイナライズを行う3段階手法を提案する。ステージ1では、DWTを用いてLLサブバンド画像を保持し、画像強調を実現する。ステージ2では、元の入力画像は4つのシングルチャネル画像(赤、緑、青、灰色)に分割され、それぞれが独立した敵ネットワークを訓練する。トレーニングされた敵ネットワークモデルを用いて、画像から色前景情報を抽出する。グローバルな特徴とローカルな特徴を組み合わせるために、ステージ2からの出力画像と元の入力画像を用いて、文書バイナライゼーションのための独立した敵ネットワークを訓練する。実験の結果,提案手法は文書画像二元化コンテスト(DIBCO)データセットにおいて,従来のSOTA法よりも優れていた。私たちは実装コードをhttps://github.com/abcpp12383/ThreeStageBinarizationでリリースします。 The efficient segmentation of foreground text information from the background in degraded color document images is a hot research topic. Due to the imperfect preservation of ancient documents over a long period of time, various types of degradation, including staining, yellowing, and ink seepage, have seriously affected the results of image binarization. In this paper, a three-stage method is proposed for image enhancement and binarization of degraded color document images by using discrete wavelet transform (DWT) and generative adversarial network (GAN). In Stage-1, we use DWT and retain the LL subband images to achieve the image enhancement. In Stage-2, the original input image is split into four (Red, Green, Blue and Gray) single-channel images, each of which trains the independent adversarial networks. The trained adversarial network models are used to extract the color foreground information from the images. In Stage-3, in order to combine global and local features, the output image from Stage-2 and the original input image are used to train the independent adversarial networks for document binarization. The experimental results demonstrate that our proposed method outperforms many classical and state-of-the-art (SOTA) methods on the Document Image Binarization Contest (DIBCO) dataset. We release our implementation code at https://github.com/abcpp12383/ThreeStageBinarization.	翻訳日:2023-05-31 02:45:50 公開日:2023-05-28
# 対人ロバスト性が精度差に及ぼす影響の理解 Understanding the Impact of Adversarial Robustness on Accuracy Disparity ( http://arxiv.org/abs/2211.15762v2 ) ライセンス: Link先を確認	Yuzheng Hu, Fan Wu, Hongyang Zhang, Han Zhao	(参考訳) 敵対的ロバスト性は標準的な精度に反する可能性があり、異なるクラスにさらに異なる影響を与える可能性があることは、長い間実証されてきたが、そのような観察がどの程度の程度で、クラスの不均衡が内部でどのように役割を果たすのかについては、未解決の問題である。本稿では,ガウス混合モデルの下で線形分類器を詳しく検討することにより,この精度格差の問題を解明しようとする。本研究は, 対向ロバスト性の影響を, 頑健性制約による全クラスにおける標準精度を低下させる固有の効果と, 標準トレーニングと比較して精度の相違を増大させるクラス不均衡比によって引き起こされる影響の2つに分解する。さらに,データモデルを安定分布の一般族に一般化することにより,そのような効果がガウス混合モデルを超えて広がることを示す。より具体的には、敵対的ロバスト性の制約はバランスのとれたクラス設定の標準的精度を一貫して低下させるが、クラス不均衡比は安定分布の重く、ガウスの場合と比較して精度の差において根本的に異なる役割を担っていることを示す。さらに,合成データと実世界のデータの両方について実験を行い,理論的な知見を裏付ける。また,実世界のデータセット上での非線形モデルにも影響が及ぶ可能性が示唆された。私たちのコードはGitHubでhttps://github.com/Accuracy-Disparity/AT-on-ADで公開されています。 While it has long been empirically observed that adversarial robustness may be at odds with standard accuracy and may have further disparate impacts on different classes, it remains an open question to what extent such observations hold and how the class imbalance plays a role within. In this paper, we attempt to understand this question of accuracy disparity by taking a closer look at linear classifiers under a Gaussian mixture model. We decompose the impact of adversarial robustness into two parts: an inherent effect that will degrade the standard accuracy on all classes due to the robustness constraint, and the other caused by the class imbalance ratio, which will increase the accuracy disparity compared to standard training. Furthermore, we also show that such effects extend beyond the Gaussian mixture model, by generalizing our data model to the general family of stable distributions. More specifically, we demonstrate that while the constraint of adversarial robustness consistently degrades the standard accuracy in the balanced class setting, the class imbalance ratio plays a fundamentally different role in accuracy disparity compared to the Gaussian case, due to the heavy tail of the stable distribution. We additionally perform experiments on both synthetic and real-world datasets to corroborate our theoretical findings. Our empirical results also suggest that the implications may extend to nonlinear models over real-world datasets. Our code is publicly available on GitHub at https://github.com/Accuracy-Disparity/AT-on-AD.	翻訳日:2023-05-31 02:45:26 公開日:2023-05-28
# 宇宙デブリのための量子重力センサ Quantum Gravitational Sensor for Space Debris ( http://arxiv.org/abs/2211.15695v2 ) ライセンス: Link先を確認	Meng-Zhi Wu, Marko Toro\v{s}, Sougato Bose, Anupam Mazumdar	(参考訳) 物質波干渉計は、等価原理や重力の量子性をテストするなど、重力実験の基本的な応用がある。さらに、物質波干渉計を量子センサとして使用して、外部の巨大な移動物体による局所重力加速度を測定することで、技術応用に役立てることができる。本稿では,外部移動物体からの重力勾配信号を記述するための3次元モデルを構築し,Stern-Gerlach セットアップに基づく物質波干渉計による達成可能な感度を理論的に検討する。応用として、メソスコピック干渉(MIMAC)と重力波検出法(New J. Phys. 22 083012 (2020))について検討し、周波数空間解析を用いて重力勾配に対する感度を定量化する。我々は,地球近傍の物体と衛星近傍の宇宙デブリを考察し,その距離,速度,方向の関数として物体の最小検出可能な質量を推定する。 Matter-wave interferometers have fundamental applications for gravity experiments such as testing the equivalence principle and the quantum nature of gravity. In addition, matter-wave interferometers can be used as quantum sensors to measure the local gravitational acceleration caused by external massive moving objects, thus lending itself for technological applications. In this paper, we will establish a three dimensional model to describe the gravity gradient signal from an external moving object, and theoretically investigate the achievable sensitivities using the matter-wave interferometer based on the Stern-Gerlach set-up. As an application we will consider the Mesoscopic Interference for Metric and Curvature (MIMAC) and Gravitational wave detection scheme [New J. Phys. 22, 083012 (2020)] and quantify its sensitivity to gravity gradients using frequency-space analysis. We will consider objects near Earth-based experiments and space debris in proximity of satellites and estimate the minimum detectable mass of the object as a function of their distance, velocity, and orientation.	翻訳日:2023-05-31 02:44:59 公開日:2023-05-28
# perturb初期特徴:半教師付きノード分類のためのスパース特徴に基づくニューラルネットワークの一般化 Perturb Initial Features: Generalization of Neural Networks Under Sparse Features for Semi-supervised Node Classification ( http://arxiv.org/abs/2211.15081v7 ) ライセンス: Link先を確認	Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim	(参考訳) グラフニューラルネットワーク(GNN)は、半教師付き設定で一般的に使用される。これまでの研究は主に、ホモ親和性グラフとヘテロ親和性グラフの両方でよく機能する適切なグラフフィルタ(例えばアグリゲーション法)の発見に重点を置いてきた。これらの手法は有効であるが、初期データがゼロでない要素をほとんど含まないノード機能に悩まされることがある。これは、トレーニングサンプルがグラフフィルタ(超平面)の全範囲をカバーしていないため、最初の射影行列の特定の次元で過度に適合する可能性がある。そこで本研究では,新しいデータ拡張戦略を提案する。具体的には、初期特徴と超平面の両方を反転させることで、学習可能なパラメータをより正確に更新し、推論中に目に見えない特徴の堅牢性を向上する訓練スペースを構築する。私たちの知る限りでは、これは最初の機能によって引き起こされるオーバーフィットを軽減する最初の試みです。実世界のデータセットに対する大規模な実験により,提案手法によりノード分類精度が46.5%向上した。 Graph neural networks (GNNs) are commonly used in semi-supervised settings. Previous research has primarily focused on finding appropriate graph filters (e.g. aggregation methods) to perform well on both homophilic and heterophilic graphs. While these methods are effective, they can still suffer from the sparsity of node features, where the initial data contain few non-zero elements. This can lead to overfitting in certain dimensions in the first projection matrix, as training samples may not cover the entire range of graph filters (hyperplanes). To address this, we propose a novel data augmentation strategy. Specifically, by flipping both the initial features and hyperplane, we create additional space for training, which leads to more precise updates of the learnable parameters and improved robustness for unseen features during inference. To the best of our knowledge, this is the first attempt to mitigate the overfitting caused by the initial features. Extensive experiments on real-world datasets show that our proposed technique increases node classification accuracy by up to 46.5% relatively.	翻訳日:2023-05-31 02:44:41 公開日:2023-05-28
# 信頼できない言語モデル:パラメトリックおよび非パラメトリック記憶の有効性と限界を探る When Not to Trust Language Models: Investigating Effectiveness and Limitations of Parametric and Non-Parametric Memories ( http://arxiv.org/abs/2212.10511v2 ) ライセンス: Link先を確認	Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh Hajishirzi	(参考訳) 大きな言語モデル(LM)は、多種多様なタスクにおける印象的なパフォーマンスにもかかわらず、豊かな世界の知識を必要とするタスクに苦戦し、豊富な世界の知識を符号化するためにパラメータのみに依存するという制限を暗示している。本稿では,10モデルと4つの拡張手法を用いた大規模知識探索実験をPopQA上で実施することにより,事実知識の記憶におけるLMの強みと限界を理解することを目的とする。 LMは、あまり一般的でない事実知識に苦しむとともに、長期にわたる事実知識の記憶の改善に失敗する。そして, 検索拡張されたLMは, 大容量のLMよりもはるかに優れており, 高人気エンティティに関する問題では, LMの非支援が競争力を維持していることを示す。これらの結果に基づき,非パラメトリック記憶を必要時にのみ検索できる,強力かつ効率的な検索型lms法を考案した。実験結果から,モデルの性能が大幅に向上し,推論コストが低減された。 Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge. This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments of 10 models and 4 augmentation methods on PopQA, our new open-domain QA dataset with 14k questions. We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the long tail. We then show that retrieval-augmented LMs largely outperform orders of magnitude larger LMs, while unassisted LMs remain competitive in questions about high-popularity entities. Based on those findings, we devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary. Experimental results show that this significantly improves models' performance while reducing the inference costs.	翻訳日:2023-05-31 02:35:57 公開日:2023-05-28
# Naamapadam: インデックス言語用の大規模なエンティティアノテーション付きデータ Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages ( http://arxiv.org/abs/2212.10168v2 ) ライセンス: Link先を確認	Arnav Mhaske, Harshit Kedia, Sumanth Doddapaneni, Mitesh M. Khapra, Pratyush Kumar, Rudra Murthy V, Anoop Kunchukuttan	(参考訳) 現在、Naamapadamは、2つの言語ファミリーから11の主要なインドの言語に対して、最も広く公開されている名前付きエンティティ認識(NER)データセットである。このデータセットには、11言語中9つの標準エンティティカテゴリ(Person、Location、Organization)から少なくとも100万のエンティティが注釈付けされた400万以上の文が含まれている。トレーニングデータセットは、英語文から対応するインド語翻訳に自動的にタグ付けされたエンティティを投影することにより、サマナンタル並列コーパスから自動的に作成される。また、9言語用に手動でアノテーション付きのテストセットを作成します。 Naamapadam-testデータセット上で得られたデータセットの有用性を示す。 IndicNERは、Naamapadamトレーニングセットを微調整した多言語IndicBERTモデルである。 IndicNERは、9ドルのテスト言語のうち、80ドル以上でF1スコアを達成している。データセットとモデルは、https://ai4bharat.iitm.ac.in/naamapadamでオープンソースライセンスで利用できる。 We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. The dataset contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location, and, Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language translation. We also create manually annotated testsets for 9 languages. We demonstrate the utility of the obtained dataset on the Naamapadam-test dataset. We also release IndicNER, a multilingual IndicBERT model fine-tuned on Naamapadam training set. IndicNER achieves an F1 score of more than $80$ for $7$ out of $9$ test languages. The dataset and models are available under open-source licences at https://ai4bharat.iitm.ac.in/naamapadam.	翻訳日:2023-05-31 02:35:38 公開日:2023-05-28
# マルチアスペクト制御可能なテキスト生成のための拡張可能なプラグアンドプレイ法 An Extensible Plug-and-Play Method for Multi-Aspect Controllable Text Generation ( http://arxiv.org/abs/2212.09387v2 ) ライセンス: Link先を確認	Xuancheng Huang, Zijun Liu, Peng Li, Tao Li, Maosong Sun, Yang Liu	(参考訳) 近年、複数の側面(感情、話題、キーワードなど)で生成されたテキストを制御するマルチアスペクト制御可能なテキスト生成が注目されている。プレフィックスチューニングのようなパラメータ効率のよいチューニングに基づく手法は、プラグ・アンド・プレイ方式でマルチアスペクト制御を実現することができるが、複数のプレフィックスの相互干渉は、制約を著しく劣化させ、トレーニング時に見えないアスペクトの組み合わせに拡張性を制限する。本研究は, 干渉の理論的下限を提供し, プレフィックスが挿入される層数に応じて干渉が増加することを実証的に見出した。これらの分析に基づいて,プレフィックスの介入を正規化するためにトレーニング可能なゲートを用いることを提案する。その結果、新しい制約を低コストで拡張できるように、対応するプラグインを単に結合することで、アスペクトのトレーニング時間未認識の組み合わせを制御することができる。さらに,分類的制約と自由形式制約の両方を統一的に処理する方法を提案する。テキスト生成と機械翻訳の実験は、制約精度、テキスト品質、拡張性に基づくベースラインよりも、我々のアプローチの方が優れていることを示す。 Recently, multi-aspect controllable text generation that controls the generated text in multiple aspects (e.g., sentiment, topic, and keywords) has attracted increasing attention. Although methods based on parameter efficient tuning like prefix-tuning could achieve multi-aspect controlling in a plug-and-play way, the mutual interference of multiple prefixes leads to significant degeneration of constraints and limits their extensibility to training-time unseen aspect combinations. In this work, we provide a theoretical lower bound for the interference and empirically found that the interference grows with the number of layers where prefixes are inserted. Based on these analyses, we propose using trainable gates to normalize the intervention of prefixes to restrain the growing interference. As a result, controlling training-time unseen combinations of aspects can be realized by simply concatenating corresponding plugins such that new constraints can be extended at a lower cost. In addition, we propose a unified way to process both categorical and free-form constraints. Experiments on text generation and machine translation demonstrate the superiority of our approach over baselines on constraint accuracy, text quality, and extensibility.	翻訳日:2023-05-31 02:34:12 公開日:2023-05-28
# 有限結果空間上の相対確率:その公理化、性質および応用に関する体系的検討 Relative Probability on Finite Outcome Spaces: A Systematic Examination of its Axiomatization, Properties, and Applications ( http://arxiv.org/abs/2212.14555v3 ) ライセンス: Link先を確認	Max Sklar	(参考訳) この研究は、確率を絶対測度ではなく相対測度として捉えることを提案する。この概念を実証するために, 有限結果空間に着目し, 相対確率関数の要件を定める3つの基本公理を考案する。次に、これらの関数の例のライブラリとそれらを構成するシステムを提供します。さらに、ベイズ推論の相対版とそのデジタル実装について議論する。最後に、相対確率空間の位相閉包を証明し、限界の下で情報を保存する能力を強調した。 This work proposes a view of probability as a relative measure rather than an absolute one. To demonstrate this concept, we focus on finite outcome spaces and develop three fundamental axioms that establish requirements for relative probability functions. We then provide a library of examples of these functions and a system for composing them. Additionally, we discuss a relative version of Bayesian inference and its digital implementation. Finally, we prove the topological closure of the relative probability space, highlighting its ability to preserve information under limits.	翻訳日:2023-05-31 02:27:16 公開日:2023-05-28
# スピン-1/2等方性ハイゼンベルククラスター中の量子傷 Quantum scars in spin-1/2 isotropic Heisenberg clusters ( http://arxiv.org/abs/2212.12362v2 ) ライセンス: Link先を確認	G. Zhang and Z. Song	(参考訳) スピン1/2等方性ハイゼンベルククラスターにおけるエネルギー準位と固有状態の塔の統計量に及ぼす外部場の影響について検討した。一方向の一様場が存在する場合、システムのsu(2)対称性は、ほぼ全スペクトルが同じレベルの間隔を持つ多数の塔からなることを許す。有限クラスタ上での厳密な対角化は、ポアソンからウィグナー・ダイソンの分布から平均レベル間隔比の異なる値のレベル統計を導出し、積分性から非可積分性への遷移を示すことを示している。しかし、3つのタイプのクラスターでは、最も大きな塔は対称性がほぼ破れており、量子の傷を負っていることが判明した。顕著なことに、非熱化状態はグリーンベルガー・ホルン・ザイリンガー状態とW状態を含み、ネールの状態が動的過程で急速に崩壊する間は、回復の特徴を保っている。また, 実験的検出のための動的スキームも提案している。我々の発見は、有限サイズの量子スピンクラスターにおける熱化に無害な量子情報処理の可能性を明らかにする。 We investigate the influence of the external fields on the statistics of energy levels and towers of eigenstates in spin-1/2 isotropic Heisenberg clusters, including chain, ladder, square and triangular lattices. In the presence of uniform field in one direction, the SU(2) symmetry of the system allows that almost whole spectrum consists of a large number of towers with identical level spacing. Exact diagonalization on finite clusters shows that random transverse fields in other two directions drive the level statistics from Poisson to Wigner-Dyson distributions with different values of mean level spacing ratio, indicating the transition from integrability to non-integrability. However, for the three types of clusters, it is found that the largest tower still hold approximately even the symmetry is broken, resulting to a quantum scar. Remarkably, the non-thermalized states cover the Greenberger-Horn-Zeilinger and W states, which maintain the feature of revival while a Neel state decays fast in the dynamic processes. In addition, some dynamic schemes for experimental detection are proposed. Our finding reveals the possibility of quantum information processing that is immune to the thermalization in finite size quantum spin clusters.	翻訳日:2023-05-31 02:26:07 公開日:2023-05-28
# Parsel: 分解による言語モデルとのアルゴリズム推論 Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions ( http://arxiv.org/abs/2212.10561v3 ) ライセンス: Link先を確認	Eric Zelikman, Qian Huang, Gabriel Poesia, Noah D. Goodman, Nick Haber	(参考訳) 最近の大言語モデル(llm)推論の成功にもかかわらず、llmは複雑なプログラムの生成のような階層的多段階推論タスクに苦しむ。これらのタスクでは、人間が高レベルなアルゴリズム設計から始めて、各部分を徐々に実装する。コードLLMによる複雑なアルゴリズムの自動実装と検証を可能にするフレームワークであるParselを紹介する。 Parselでは、アルゴリズムタスクを階層的な自然言語関数記述に自動的に分解し、テストを使って可能な関数実装の組み合わせを検索する。プログラム合成やロボット計画など,階層的推論を必要とする領域でParselを使用できることを示す。 parselを使用することで、アプリデータセットの競合レベルの問題をllmが解決し、アルファコードとcodexを直接サンプリングすることで、以前の結果よりもパスレートが75\%高くなり、サンプル予算も小さくなることが分かりました。さらに、自動生成されたテストでは、ParselはHumanEvalの最先端のpass@1パフォーマンスを67\%から85\%に改善できる。また, Parselを用いたLCM生成ロボット計画の精度は, 直接生成した計画の2倍以上であることがわかった。最後に、ParselがLLMの制限にどう対処するかを検討し、Parselが人間のプログラマにとってどのように役立つかについて議論する。コードをhttps://github.com/ezelikman/parselでリリースします。 Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs. For these tasks, humans often start with a high-level algorithmic design and implement each part gradually. We introduce Parsel, a framework enabling automatic implementation and validation of complex algorithms with code LLMs. With Parsel, we automatically decompose algorithmic tasks into hierarchical natural language function descriptions and then search over combinations of possible function implementations using tests. We show that Parsel can be used across domains requiring hierarchical reasoning, including program synthesis and robotic planning. We find that, using Parsel, LLMs solve more competition-level problems in the APPS dataset, resulting in pass rates over 75\% higher than prior results from directly sampling AlphaCode and Codex, while often using a smaller sample budget. Moreover, with automatically generated tests, we find that Parsel can improve the state-of-the-art pass@1 performance on HumanEval from 67\% to 85\%. We also find that LLM-generated robotic plans using Parsel are more than twice as likely to be considered accurate than directly generated plans. Lastly, we explore how Parsel addresses LLM limitations and discuss how Parsel may be useful for human programmers. We release our code at https://github.com/ezelikman/parsel	翻訳日:2023-05-31 02:25:06 公開日:2023-05-28
# 数保存型散逸量子状態生成の反応拡散ダイナミクス Reaction-diffusive dynamics of number-conserving dissipative quantum state preparation ( http://arxiv.org/abs/2301.05258v3 ) ライセンス: Link先を確認	P. A. Nosov, D. S. Shapiro, M. Goldstein, I. S. Burmistrov	(参考訳) 非自明な量子多体相関状態の制御生成のための散逸の使用は、非常に基本的かつ実用的な関心事である。閉じた系では、拡散する拡散を引き起こすような数保存の結果はどうなるのか? 本研究では,一方のバンドを空にし,他方のバンドを配置し,他方が位相状態の散逸安定化のために導入された2バンドシステムのパラダイムモデルについて検討する。散逸動力学の平均場処理を超越して, 粒子とホール密度モードを中間長さと時間スケールで拡散的に配置し, 外部磁場に対する非線形応答でのみ励起できることを実証した。また,このモードの拡散挙動を最長及び時間スケールで制限するプロセスも同定する。驚くべきことに、これらの過程はフィッシャー-コルモゴロフ-ペトロフスキー-ピスクノフ方程式によって制御される反応拡散ダイナミクスをもたらし、設計された暗黒状態が有限粒子とホール密度を持つ状態に向かって不安定になることがわかった。 The use of dissipation for the controlled creation of nontrivial quantum many-body correlated states is of much fundamental and practical interest. What is the result of imposing number conservation, which, in closed system, gives rise to diffusive spreading? We investigate this question for a paradigmatic model of a two-band system, with dissipative dynamics aiming to empty one band and to populate the other, which had been introduced before for the dissipative stabilization of topological states. Going beyond the mean-field treatment of the dissipative dynamics, we demonstrate the emergence of a diffusive regime for the particle and hole density modes at intermediate length- and time-scales, which, interestingly, can only be excited in nonlinear response to external fields. We also identify processes that limit the diffusive behavior of this mode at the longest length- and time-scales. Strikingly, we find that these processes lead to a reaction-diffusion dynamics governed by the Fisher-Kolmogorov-Petrovsky-Piskunov equation, making the designed dark state unstable towards a state with a finite particle and hole density.	翻訳日:2023-05-31 02:15:30 公開日:2023-05-28
# 命令生成モデルのためのタスク指向認知能力の定義、評価、改善 Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models ( http://arxiv.org/abs/2301.05149v2 ) ライセンス: Link先を確認	Lingjun Zhao and Khanh Nguyen and Hal Daum\'e III	(参考訳) 最近の研究は、人間の心理テストを通して言語モデルの認知能力を研究する。これらの研究は、これらのモデルの一般的な能力を理解するのに役立つが、テストに合格するのに十分な能力を持つモデルが実際に実際のタスクを実行するのにこれらの能力を使用するという保証はない。本研究は,言語モデルがタスクの実行に活用するヒューマンライクな認知能力であるタスク指向認知能力を定式化する。これらの能力 (i)優れた候補発声(検索能力)を迅速に生成する能力 (二)聴取者がそれらの発話をどのように解釈し、最も適切なもの(実用的能力)を選択するかを予測する能力。言語モデルのこれらの機能と人間の機能を比較するための評価スキームを設計する。ナビゲーション命令生成問題において,様々なモデルを調べるためにこの手法を適用すると,その実用性が極めて不足していることが分かる。この洞察は、リスナのよりよいモデルでそれらを増強し、実際の人間を誘導する成功率の11%を大きく向上させます。我々の研究は、言語モデルと人間を結びつけるための原則化された手続きを持つことを提唱している。 (i)タスク指向能力の定式化二その不足を定量化する方法を考案すること、及び (iii)反復的に改善する。 Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. While these studies are helpful for understanding the general capabilities of these models, there is no guarantee that a model possessing sufficient capabilities to pass those tests would actually use those capabilities in performing real-life tasks. In this work, we formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks. These capabilities are (i) the ability to quickly generate good candidate utterances (the search capability) (ii) the ability to predict how a listener interprets those utterances and choose the most appropriate one (the pragmatic capability). We design an evaluation scheme for comparing these capabilities of a language model with those of a human. Applying this scheme to examine various models in a navigation instruction generation problem, we find that their pragmatic capability is severely lacking. This insight leads us to augment them with better models of the listener and obtain a significant boost of 11% in success rate in guiding real humans. Our work advocates for having a principled procedure for aligning language models with humans that involves (i) formulating task-oriented capabilities, (ii) devising a method to quantify their deficiency, and (iii) iteratively improving them.	翻訳日:2023-05-31 02:15:11 公開日:2023-05-28
# 量子軌道に沿った幾何学的位相 Geometric phases along quantum trajectories ( http://arxiv.org/abs/2301.04222v4 ) ライセンス: Link先を確認	Ludmila Viotti, Ana Laura Gramajo, Paula I. Villar, Fernando C. Lombardo, Rosario Fazio	(参考訳) ハミルトニアンを統治するパラメータの循環的進化を行う監視量子系は、量子軌道に依存する幾何学的位相を蓄積し、それに続く系は進化する。フェーズ値は、ユニタリダイナミクスと、システムと環境の相互作用の両方によって決定されます。したがって、幾何学的位相はランダムな量子ジャンプの発生により確率的特性を得る。本稿では,観測量子系における幾何位相の分布関数について検討し,開量子系における幾何位相を測定するために,いつ,何が異なるかについて議論する。また,監視されたエコープロトコルについて検討し,実験で抽出された干渉パターンの分布が幾何位相と関連している場合について議論する。さらに, 量子ジャンプを伴わない単一軌道に対して, サイクル後に得られた位相の位相遷移を示し, この臨界挙動がエコープロトコルでどのように観測されるかを示す。同じパラメータに対して、密度行列は特異点を示さない。外部環境下での時間変化磁場に浸漬したスピン1/2のパラダイムケースを考慮し,本研究の主な成果を概説する。しかしながら、我々の分析の主な結果は非常に一般的であり、その定性的特徴において、研究されたモデルの選択に依存しない。 A monitored quantum system undergoing a cyclic evolution of the parameters governing its Hamiltonian accumulates a geometric phase that depends on the quantum trajectory followed by the system on its evolution. The phase value will be determined both by the unitary dynamics and by the interaction of the system with the environment. Consequently, the geometric phase will acquire a stochastic character due to the occurrence of random quantum jumps. Here we study the distribution function of geometric phases in monitored quantum systems and discuss when/if different quantities, proposed to measure geometric phases in open quantum systems, are representative of the distribution. We also consider a monitored echo protocol and discuss in which cases the distribution of the interference pattern extracted in the experiment is linked to the geometric phase. Furthermore, we unveil, for the single trajectory exhibiting no quantum jumps, a topological transition in the phase acquired after a cycle and show how this critical behavior can be observed in an echo protocol. For the same parameters, the density matrix does not show any singularity. We illustrate all our main results by considering a paradigmatic case, a spin-1/2 immersed in time-varying a magnetic field in presence of an external environment. The major outcomes of our analysis are however quite general and do not depend, in their qualitative features, on the choice of the model studied.	翻訳日:2023-05-31 02:14:50 公開日:2023-05-28
# 最大最適性マージン:文脈線形計画法と逆線形計画法の統一的アプローチ Maximum Optimality Margin: A Unified Approach for Contextual Linear Programming and Inverse Linear Programming ( http://arxiv.org/abs/2301.11260v2 ) ライセンス: Link先を確認	Chunlin Sun, Shang Liu, Xiaocheng Li	(参考訳) 本稿では,機械学習予測タスクの出力を下流最適化問題,例えば線形プログラムの客観的係数ベクトルの入力として使用する予測列最適化問題について検討する。この問題は予測分析や文脈線形プログラミングとしても知られている。既存のアプローチは、ほとんどどちらかに苦しむ (i)最適化難解性(非凸目的関数)/統計的非効率性(準最適一般化境界)、又は (ii)制約や損失校正がないなどの強い条件を必要とすること。我々は、下流最適化の最適条件により機械学習損失関数を設計する「textit{maximum optimality margin}」と呼ばれる問題に対する新しいアプローチを開発する。 max-marginの定式化は、計算効率と学習手順の良質な理論特性の両方を享受する。さらに,本手法では,目的関数ではなく,学習データにおける最適解の観測しか必要とせず,文脈的・文脈的・文脈的両条件下での逆線形プログラミング問題に対する新たな自然なアプローチとして,オフライン・オンライン両方の設定で提案手法を解析し,数値実験を用いてその性能を実証する。 In this paper, we study the predict-then-optimize problem where the output of a machine learning prediction task is used as the input of some downstream optimization problem, say, the objective coefficient vector of a linear program. The problem is also known as predictive analytics or contextual linear programming. The existing approaches largely suffer from either (i) optimization intractability (a non-convex objective function)/statistical inefficiency (a suboptimal generalization bound) or (ii) requiring strong condition(s) such as no constraint or loss calibration. We develop a new approach to the problem called \textit{maximum optimality margin} which designs the machine learning loss function by the optimality condition of the downstream optimization. The max-margin formulation enjoys both computational efficiency and good theoretical properties for the learning procedure. More importantly, our new approach only needs the observations of the optimal solution in the training data rather than the objective function, which makes it a new and natural approach to the inverse linear programming problem under both contextual and context-free settings; we also analyze the proposed method under both offline and online settings, and demonstrate its performance using numerical experiments.	翻訳日:2023-05-31 02:08:14 公開日:2023-05-28
# DIFFormer:エネルギー制約拡散によるスケーラブル(グラフ)トランス DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion ( http://arxiv.org/abs/2301.09474v4 ) ライセンス: Link先を確認	Qitian Wu, Chenxiao Yang, Wentao Zhao, Yixuan He, David Wipf, Junchi Yan	(参考訳) 現実世界のデータ生成には、しばしばインスタンス間の複雑な相互依存があり、標準学習パラダイムのiidデータ仮説に違反し、望ましいインスタンス表現を学習するための幾何学的構造を明らかにするための課題となる。この目的のために、データセットから進化状態へインスタンスのバッチをエンコードするエネルギー制約拡散モデルを導入し、その相互作用によって他のインスタンスの情報を取り込む。拡散過程は下降条件 w.r.t.~ 潜在構造上のインスタンス表現の大域的一貫性を特徴づける原理エネルギー関数によって制約される。我々は、任意のインスタンスペア間の対拡散強度の閉形式最適推定を示唆する厳密な理論を提案し、これは、DIFFormer (diffusion-based Transformers)と呼ばれる新しいタイプのニューラルエンコーダを生み出し、二つのインスタンスをインスタンス化する単純なバージョンと、複雑な構造を学ぶための高度なバージョンである。実験では,大規模グラフのノード分類,半教師付き画像/テキスト分類,空間-時空間ダイナミクス予測など,様々なタスクにおいて優れた性能を持つ汎用エンコーダバックボーンとしてモデルの適用性が強調された。 Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t.~a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based Transformers), with two instantiations: a simple version with linear complexity for prohibitive instance numbers, and an advanced version for learning complex structures. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks, such as node classification on large graphs, semi-supervised image/text classification, and spatial-temporal dynamics prediction.	翻訳日:2023-05-31 02:06:09 公開日:2023-05-28
# 1次元長距離量子球面モデルにおける絡み合いギャップ Entanglement gap in 1D long-range quantum spherical models ( http://arxiv.org/abs/2301.09143v2 ) ライセンス: Link先を確認	Sascha Wald, Raul Arias, Vincenzo Alba	(参考訳) 本研究では1次元長範囲量子球面モデル(QSM)における絡み合いギャップの有限サイズスケーリングについて検討する。熱力学の限界が明確に定義された弱い長距離QSMに焦点をあてる。このモデルは連続相転移を示し、強磁性相から常磁性を分離する。遷移の普遍性クラスは長距離指数$\alpha$に依存する。熱力学的限界では、絡み合いギャップは常磁性相では有限であり、強磁性相では消滅することを示す。強磁性相では、絡み合いギャップは標準磁気相関関数によって理解される。エンタングルメントギャップは$\delta\xi\simeq c_\alpha l^{-(1/2-\alpha/4)} で崩壊し、定数 $c_\alpha$ はモデルの低エネルギー特性に依存する。これは、分散の下部が長距離物理学の影響を受けていることを反映する。最後に、乗法対数補正は、高次元の場合とは対照的に、エンタングルメントギャップのスケーリングに欠落している。 We investigate the finite-size scaling of the entanglement gap in the one dimensional long-range quantum spherical model (QSM). We focus on the weak long-range QSM, for which the thermodynamic limit is well-defined. This model exhibits a continuous phase transition, separating a paramagnetic from a ferromagnet phase. The universality class of the transition depends on the long-range exponent $\alpha$. We show that in the thermodynamic limit the entanglement gap is finite in the paramagnetic phase, and it vanishes in the ferromagnetic phase. In the ferromagnetic phase the entanglement gap is understood in terms of standard magnetic correlation functions. The entanglement gap decays as $\delta\xi\simeq C_\alpha L^{-(1/2-\alpha/4)}$, where the constant $C_\alpha$ depends on the low-energy properties of the model. This reflects that the lower part of the dispersion is affected by the long range physics. Finally, multiplicative logarithmic corrections are absent in the scaling of the entanglement gap, in contrast with the higher-dimensional case.	翻訳日:2023-05-31 02:05:30 公開日:2023-05-28
# 階層的蒸留による事前学習言語モデルからCifに基づく音声認識への知識伝達 Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation ( http://arxiv.org/abs/2301.13003v2 ) ライセンス: Link先を確認	Minglun Han, Feilong Chen, Jing Shi, Shuang Xu, Bo Xu	(参考訳) 大規模事前学習型言語モデル(PLM)は自然言語処理タスクにおいて大きな可能性を示している。 PLMの能力を活用して自動音声認識(ASR)システムを強化することも有望な研究方向として現れている。しかし, 従来の研究は PLM の非曲げ構造と PLM の不十分な利用によって制限される可能性がある。これらの問題を緩和するため,CIFモデルに基づく階層的知識蒸留(HKD)を提案する。 plmからasrモデルに知識を移すため、hkdは音響レベルでの対照的な損失を伴うクロスモーダル知識蒸留と、言語レベルでの回帰損失を伴う知識蒸留を用いる。従来のCIFモデルと比較すると,AISHELL-1 と LibriSpeech の相対誤差率の 15% と 9% の削減を実現している。 Large-scale pre-trained language models (PLMs) have shown great potential in natural language processing tasks. Leveraging the capabilities of PLMs to enhance automatic speech recognition (ASR) systems has also emerged as a promising research direction. However, previous works may be limited by the inflexible structures of PLMs and the insufficient utilization of PLMs. To alleviate these problems, we propose the hierarchical knowledge distillation (HKD) on the continuous integrate-and-fire (CIF) based ASR models. To transfer knowledge from PLMs to the ASR models, HKD employs cross-modal knowledge distillation with contrastive loss at the acoustic level and knowledge distillation with regression loss at the linguistic level. Compared with the original CIF-based model, our method achieves 15% and 9% relative error rate reduction on the AISHELL-1 and LibriSpeech datasets, respectively.	翻訳日:2023-05-31 01:57:46 公開日:2023-05-28
# テーマ駆動型キーフレーズ抽出によるソーシャルメディア談話の分析 Theme-driven Keyphrase Extraction to Analyze Social Media Discourse ( http://arxiv.org/abs/2301.11508v2 ) ライセンス: Link先を確認	William Romano, Omar Sharif, Madhusudan Basak, Joseph Gatto, and Sarah Preum	(参考訳) ソーシャルメディアプラットフォームは、自己報告された健康体験を共有する上で重要なリソースであり、さまざまな健康トピックに関する豊富なデータを提供する。大規模ソーシャルメディアデータ分析を可能にする自然言語処理(nlp)の進歩にもかかわらず、健康関連コンテンツにキーフレーズ抽出を適用することにはギャップがある。キーワード抽出は、定義済みのエンティティクラスに制約されることなく、ソーシャルメディアの会話における健全な概念を特定するために使用される。本稿では,ユーザが生成した健康テキストから臨床に関連のあるキーフレーズを捉えるための先駆的アプローチとして,ソーシャルメディア用にカスタマイズされたテーマ駆動キーフレーズ抽出フレームワークを提案する。テーマは抽出タスクの目的によって決定される広いカテゴリとして定義される。テーマ駆動型キーフレーズ抽出の新たな課題を定式化し,オピオイド使用障害の治療にソーシャルメディアテキストを効率的にマイニングする可能性を示す。本稿では,ソーシャルメディアデータから実行可能な洞察を抽出し,最小教師付きNLPモデルを用いてキーフレーズを効率的に抽出する可能性を示す。我々の貢献は、テーマ駆動型キーフレーズ抽出のための新しいデータ収集とキュレーションフレームワークの開発と、Redditコミュニティから人間注釈付きキーフレーズからなるMOUD-キーフレーズの作成である。また、ソーシャルメディアデータからキーフレーズを効率的に抽出するための最小教師付きNLPモデルのスコープも同定する。最後に,大規模言語モデル(chatgpt)が教師なしキーフレーズ抽出モデルよりも優れており,その効果を評価した。 Social media platforms are vital resources for sharing self-reported health experiences, offering rich data on various health topics. Despite advancements in Natural Language Processing (NLP) enabling large-scale social media data analysis, a gap remains in applying keyphrase extraction to health-related content. Keyphrase extraction is used to identify salient concepts in social media discourse without being constrained by predefined entity classes. This paper introduces a theme-driven keyphrase extraction framework tailored for social media, a pioneering approach designed to capture clinically relevant keyphrases from user-generated health texts. Themes are defined as broad categories determined by the objectives of the extraction task. We formulate this novel task of theme-driven keyphrase extraction and demonstrate its potential for efficiently mining social media text for the use case of treatment for opioid use disorder. This paper leverages qualitative and quantitative analysis to demonstrate the feasibility of extracting actionable insights from social media data and efficiently extracting keyphrases using minimally supervised NLP models. Our contributions include the development of a novel data collection and curation framework for theme-driven keyphrase extraction and the creation of MOUD-Keyphrase, the first dataset of its kind comprising human-annotated keyphrases from a Reddit community. We also identify the scope of minimally supervised NLP models to extract keyphrases from social media data efficiently. Lastly, we found that a large language model (ChatGPT) outperforms unsupervised keyphrase extraction models, and we evaluate its efficacy in this task.	翻訳日:2023-05-31 01:55:16 公開日:2023-05-28
# 経済深層学習モデルを用いたIoTボットネットの検出 IoT Botnet Detection Using an Economic Deep Learning Model ( http://arxiv.org/abs/2302.02013v4 ) ライセンス: Link先を確認	Nelly Elsayed, Zag ElSayed, Magdy Bayoumi	(参考訳) 技術の革新と流通の急速な進歩は、この10年間で増加している。世界中のIoT(Internet of Things)システムの急速な成長は、悪意のあるサードパーティが生み出したネットワークセキュリティ上の課題を増大させている。したがって、セキュリティ上の懸念やIoTシステムの制限を考慮に入れた、信頼性の高い侵入検知とネットワークフォサイシクスシステムは、そのようなシステムを保護する上で不可欠である。 IoTボットネット攻撃は企業や個人にとって重要な脅威のひとつだ。そこで本稿では,IoTボットネット攻撃を検知する経済的深層学習モデルを提案する。提案手法は, 実装予算を小さくし, 訓練および検出プロセスを高速化することで, 最先端検出モデルよりも高い精度を達成した。 The rapid progress in technology innovation usage and distribution has increased in the last decade. The rapid growth of the Internet of Things (IoT) systems worldwide has increased network security challenges created by malicious third parties. Thus, reliable intrusion detection and network forensics systems that consider security concerns and IoT systems limitations are essential to protect such systems. IoT botnet attacks are one of the significant threats to enterprises and individuals. Thus, this paper proposed an economic deep learning-based model for detecting IoT botnet attacks along with different types of attacks. The proposed model achieved higher accuracy than the state-of-the-art detection models using a smaller implementation budget and accelerating the training and detecting processes.	翻訳日:2023-05-31 01:50:23 公開日:2023-05-28
# 逆摂動に対するランダム化アンサンブルのロバスト性について On the Robustness of Randomized Ensembles to Adversarial Perturbations ( http://arxiv.org/abs/2302.01375v3 ) ライセンス: Link先を確認	Hassan Dbouk, Naresh R. Shanbhag	(参考訳) 1つの分類器が推論中にランダムに選択されるランダム化アンサンブル分類器(recs)は、計算要件が限定された可逆的ロバスト分類器を実現する伝統的な意味付け手法の魅力的な代替として登場した。しかし、最近の研究は、RECの構築方法が当初主張していたよりも脆弱であることを示し、「RECはいつ有用か?」「限界は何か?」「どのようにトレーニングするのか?」といった根本的な疑問を提起している。本研究では,recsの理論的限界,有用であるために必要な条件等に関する基礎的な結果が導出され,まずrecsを非神秘化する。この新たな理解を活用して、ロバストなRECをトレーニングするための新しいブースティングアルゴリズム(BARRE)を提案し、さまざまなネットワークアーキテクチャやデータセットにまたがる強い$\ell_\infty$ノルムバウンドな敵に対する防御効果を実証的に実証する。私たちのコードはhttps://github.com/hsndbk4/BARREで参照できます。 Randomized ensemble classifiers (RECs), where one classifier is randomly selected during inference, have emerged as an attractive alternative to traditional ensembling methods for realizing adversarially robust classifiers with limited compute requirements. However, recent works have shown that existing methods for constructing RECs are more vulnerable than initially claimed, casting major doubts on their efficacy and prompting fundamental questions such as: "When are RECs useful?", "What are their limits?", and "How do we train them?". In this work, we first demystify RECs as we derive fundamental results regarding their theoretical limits, necessary and sufficient conditions for them to be useful, and more. Leveraging this new understanding, we propose a new boosting algorithm (BARRE) for training robust RECs, and empirically demonstrate its effectiveness at defending against strong $\ell_\infty$ norm-bounded adversaries across various network architectures and datasets. Our code can be found at https://github.com/hsndbk4/BARRE.	翻訳日:2023-05-31 01:49:50 公開日:2023-05-28
# 二重等分散による新しいノードと新しい関係型のインダクティブリンク予測 Inductive Link Prediction for Both New Nodes and New Relation Types via Double Equivariance ( http://arxiv.org/abs/2302.01313v5 ) ライセンス: Link先を確認	Jianfei Gao, Yangze Zhou, Jincheng Zhou, Bruno Ribeiro	(参考訳) 近年のリレーショナルラーニングの進歩にもかかわらず、新しいノードとテストにおける新しい関係型を持つ離散属性多重グラフにおける帰納的リンク予測の課題は未解決の問題である。本研究は,ノードの同一性とエッジ関係の両方の置換に同値な,二重交換性の概念とそれに関連する二重置換同変グラフニューラルネットワークを定義することで,この問題に取り組む。我々のニューラルネットワークは、訓練ノードと関係から任意に新しいテストノードと関係へと誘導的に一般化できる関係の構造的表現を課し、適応や再訓練を必要とせず、関係学習における新たな方向性を可能にする。最後に、このような二重同値表現に対する一般的な青写真を導入し、既存の作品が正確に実行できない2つの実世界のベンチマークで実証的にその能力を示す。 Despite recent advances in relational learning, the task of inductive link prediction in discrete attributed multigraphs with both new nodes and new relation types in test remains an open problem. In this work we tackle this task by defining the concept of double exchangeability and its associated double-permutation equivariant graph neural network that are equivariant to permutations of both node identities and edge relations. Our neural architecture imposes a structural representation of relations that can inductively generalize from training nodes and relations to arbitrarily new test nodes and relations, without the need for adaptation or retraining, thus enabling a new direction in relational learning. Finally, we introduce a general blueprint for such double equivariant representations and empirically showcase its capability on two proposed real-world benchmarks that no existing works can perform accurately.	翻訳日:2023-05-31 01:49:28 公開日:2023-05-28
# 実演による専門知識の理解: オフライン逆強化学習のための最大可能性フレームワーク Understanding Expertise through Demonstrations: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning ( http://arxiv.org/abs/2302.07457v2 ) ライセンス: Link先を確認	Siliang Zeng, Chenliang Li, Alfredo Garcia, Mingyi Hong	(参考訳) オフライン逆強化学習(オフラインirl)は、専門家エージェントによる固定された有限のデモンストレーションで観察された動作を裏付ける報酬と環境ダイナミクスの構造を回復することを目的としている。タスクの実行に関する専門知識の正確なモデルは、臨床意思決定や自動運転といった安全性に敏感な応用に応用できる。しかし、観察された行動において暗黙的な専門家の選好の構造は、専門家の環境力学のモデル(すなわち「世界」)と密接に関連している。したがって、限られた範囲の有限データから得られた世界の不正確なモデルは、推定報酬において不正確を複雑にする可能性がある。この問題に対処するため,我々は,専門家の政策(下位レベル)の保守的モデルに基づいて上層レベルが最大化されるような推定タスクの2レベル最適化手法を提案する。政策モデルは、世界の推定モデルの不確実性の増大するペナルティの対象となる報酬を最大化するという点で保守的である。本稿では,二段階最適化問題の定式化を解いた新しいアルゴリズムフレームワークを提案し,関連する報酬推定器の性能の統計的および計算的保証を提供する。最後に、提案アルゴリズムは、MuJoCoの連続制御タスクとD4RLベンチマークの異なるデータセットに対して、最先端のオフラインIRLと模倣学習ベンチマークを大きなマージンで上回ることを示す。 Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving. However, the structure of an expert's preferences implicit in observed actions is closely linked to the expert's model of the environment dynamics (i.e. the ``world''). Thus, inaccurate models of the world obtained from finite data with limited coverage could compound inaccuracy in estimated rewards. To address this issue, we propose a bi-level optimization formulation of the estimation task wherein the upper level is likelihood maximization based upon a conservative model of the expert's policy (lower level). The policy model is conservative in that it maximizes reward subject to a penalty that is increasing in the uncertainty of the estimated model of the world. We propose a new algorithmic framework to solve the bi-level optimization problem formulation and provide statistical and computational guarantees of performance for the associated reward estimator. Finally, we demonstrate that the proposed algorithm outperforms the state-of-the-art offline IRL and imitation learning benchmarks by a large margin, over the continuous control tasks in MuJoCo and different datasets in the D4RL benchmark.	翻訳日:2023-05-31 01:38:23 公開日:2023-05-28
# 外乱認識対象検出のための正規化フローベース特徴合成 Normalizing Flow based Feature Synthesis for Outlier-Aware Object Detection ( http://arxiv.org/abs/2302.07106v3 ) ライセンス: Link先を確認	Nishant Kumar, Sini\v{s}a \v{S}egvi\'c, Abouzar Eslami, Stefan Gumhold	(参考訳) 自律運転のようなアプリケーションには、信頼性の高いオブジェクト検出器の現実的な展開が不可欠である。しかし、Faster R-CNNのような汎用オブジェクト検出器は、不整形物体の過信予測を提供する傾向にある。最近の異常物体検出手法は, クラス条件ガウシアンによるインスタンスワイド特徴の密度を推定し, 低様領域から合成外乱特徴を訓練する。しかし、この戦略は、合成された外層特徴が他のクラス条件ガウス多様体に従えば低い確率を持つことを保証しない。そこで本研究では,すべてのイリアークラスの合同データ分布を可逆正規化フローで学習することにより,イリアーとイリアーオブジェクトを区別する,新しい外れ値認識型オブジェクト検出フレームワークを提案する。フローモデルの適切なサンプリングは、合成されたアウトリアーが全てのオブジェクトクラスのインリアーよりも低い可能性を持つことを保証するため、インリアーとアウトリアーの間のより良い決定境界をモデル化する。提案手法は,画像データとビデオデータの両方において,外部認識オブジェクト検出の最先端性を大幅に向上させる。コードはhttps://github.com/nish03/ffsで利用可能 Real-world deployment of reliable object detectors is crucial for applications such as autonomous driving. However, general-purpose object detectors like Faster R-CNN are prone to providing overconfident predictions for outlier objects. Recent outlier-aware object detection approaches estimate the density of instance-wide features with class-conditional Gaussians and train on synthesized outlier features from their low-likelihood regions. However, this strategy does not guarantee that the synthesized outlier features will have a low likelihood according to the other class-conditional Gaussians. We propose a novel outlier-aware object detection framework that distinguishes outliers from inlier objects by learning the joint data distribution of all inlier classes with an invertible normalizing flow. The appropriate sampling of the flow model ensures that the synthesized outliers have a lower likelihood than inliers of all object classes, thereby modeling a better decision boundary between inlier and outlier objects. Our approach significantly outperforms the state-of-the-art for outlier-aware object detection on both image and video datasets. Code available at https://github.com/nish03/FFS	翻訳日:2023-05-31 01:38:00 公開日:2023-05-28
# LiDAR点雲における変化検出のための最適輸送 Optimal Transport for Change Detection on LiDAR Point Clouds ( http://arxiv.org/abs/2302.07025v2 ) ライセンス: Link先を確認	Marco Fiorucci, Peter Naylor, Makoto Yamada	(参考訳) 多時期リモートセンシングデータにおける変化の検出は、災害、森林破壊、都市計画といった実際の生活の様々な側面を監視する上で重要な役割を果たす。後者の文脈では、景観や市マネジャーが持続可能な開発を促進するためには、新しく建設された建物と取り壊された建物の両方を特定することが不可欠である。大気中のLiDAR点雲の使用は都市の変化検出において広く行われているが、最も一般的なアプローチは、点雲を補間された高さ測定の正規格子、すなわちデジタル標高モデル(DEM)に変換することである。しかし、DEMの補間ステップは、オブジェクトの高さに関連する情報損失を引き起こし、3次元のLiDAR点雲の高分解能が最も有益となるような建物変更の検出能力に影響を与える。距離ベース計算法とセマンティックセグメンテーション前処理法のいずれかを用いて点雲上で直接変化を検出する最近の試みにもかかわらず、都市計画において最重要となる正と負の両方の変化を識別できるのはM3C2距離計算法のみである。先行する議論に動機づけられ, 最適な輸送に基づく変更検出パイプラインを導入し, 新しく建設された建物(ポジティブな変化)と解体された建物(ネガティブな変化)を区別する。本研究では,リダ点雲の双時間対で発生する建物変化に関連する質量の生成と破壊に対処するために,不均衡な最適輸送の利用を提案する。我々は,M3C2とNicolas CourtyらによるこれまでのIGARSS 2016で提示した最適輸送方式よりも優れた性能を示すことで,変更検出のために利用可能な唯一のLiDARデータセットに対するアプローチの有効性を実証した。 The detection of changes occurring in multi-temporal remote sensing data plays a crucial role in monitoring several aspects of real life, such as disasters, deforestation, and urban planning. In the latter context, identifying both newly built and demolished buildings is essential to help landscape and city managers to promote sustainable development. While the use of airborne LiDAR point clouds has become widespread in urban change detection, the most common approaches require the transformation of a point cloud into a regular grid of interpolated height measurements, i.e. Digital Elevation Model (DEM). However, the DEM's interpolation step causes an information loss related to the height of the objects, affecting the detection capability of building changes, where the high resolution of LiDAR point clouds in the third dimension would be the most beneficial. Notwithstanding recent attempts to detect changes directly on point clouds using either a distance-based computation method or a semantic segmentation pre-processing step, only the M3C2 distance computation-based approach can identify both positive and negative changes, which is of paramount importance in urban planning. Motivated by the previous arguments, we introduce a principled change detection pipeline, based on optimal transport, capable of distinguishing between newly built buildings (positive changes) and demolished ones (negative changes). In this work, we propose to use unbalanced optimal transport to cope with the creation and destruction of mass related to building changes occurring in a bi-temporal pair of LiDAR point clouds. We demonstrate the efficacy of our approach on the only publicly available airborne LiDAR dataset for change detection by showing superior performance over the M3C2 and the previous optimal transport-based method presented by Nicolas Courty et al.at IGARSS 2016.	翻訳日:2023-05-31 01:37:41 公開日:2023-05-28
# ニューラルネットワーク関数空間距離の効率的なパラメトリック近似 Efficient Parametric Approximations of Neural Network Function Space Distance ( http://arxiv.org/abs/2302.03519v2 ) ライセンス: Link先を確認	Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Grosse	(参考訳) モデルパラメータとトレーニングデータの重要な特性をコンパクトに要約して、データセット全体の保存と/または反復することなく、後で使用できるようにすることがしばしば有用である。具体的には、トレーニングセット上の関数空間距離(fsd)、すなわち2つのニューラルネットワークの出力間の平均不一致を推定することを検討する。本稿では,線形化アクティベーション関数トリック(laftr)を提案し,reluニューラルネットワークに対するfsdの効率的な近似を導出する。鍵となるアイデアは、統計的ゲーティングを伴う線形ネットワークとしてアーキテクチャを近似することである。ネットワーク単位あたりのパラメータは1つしかないが、より大きなメモリ要件を持つ他のパラメトリック近似よりも優れている。連続学習に適用すると、パラメトリック近似は最先端の非パラメトリック近似と競合し、多くのトレーニング例を格納する必要がある。さらに,影響関数を精度良く推定し,データセット全体にわたるコストのかかる反復を伴わない誤記例の検出に有効性を示す。 It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.	翻訳日:2023-05-31 01:35:48 公開日:2023-05-28
# 臨床データを用いたX線拡散による異常検出のためのMDF-Net MDF-Net for Abnormality Detection by Fusing X-Rays with Clinical Data ( http://arxiv.org/abs/2302.13390v2 ) ライセンス: Link先を確認	Chihcheng Hsieh and Isabel Blanco Nobre and Sandra Costa Sousa and Chun Ouyang and Margot Brereton and Jacinto C. Nascimento and Joaquim Jorge and Catarina Moreira	(参考訳) 本研究は,胸部x線画像における深層学習(dl)分類器の性能に及ぼす患者の臨床情報を含む影響について検討した。現在の分類器は胸部X線画像のみを用いて高い性能を示すが, 臨床データは画像の解釈や適切な診断に不可欠であると考えられた。本研究では,患者の臨床データ(構造化データ)と胸部X線(画像データ)を同時に処理できる2つの融合法からなる新しいアーキテクチャを提案する。これらのデータモダリティは異なる次元空間にあるため,マスクr-cnnモデルにおけるマルチモーダル学習プロセスを容易にする空間配置戦略,空間化を提案する。 MIMIC-CXR(ケストX線画像)、MIMIC IV-ED(患者の臨床データ)、REFLACX(胸部X線における疾患部位の注釈)の3つの指標からなるデータセットであるMIMIC-Eyeを用いて広範囲にわたる実験的評価を行った。その結果, 患者の臨床データをDLモデルに組み込むことで, 胸部X線のみを用いた標準的なMask R-CNNと比較して, 胸部X線像の病変局在を12倍に向上させることがわかった。さらにアブレーション研究は、多モードDLアーキテクチャの重要性と、疾患の局所化における患者の臨床データの取り込みも強調している。本研究で提案するアーキテクチャは,研究の科学的再現性を促進するために公開されている(https://github.com/chihchenghsieh/multimodal-abnormalities-detection)。 This study investigates the effects of including patients' clinical information on the performance of deep learning (DL) classifiers for disease location in chest X-ray images. Although current classifiers achieve high performance using chest X-ray images alone, our interviews with radiologists indicate that clinical data is highly informative and essential for interpreting images and making proper diagnoses. In this work, we propose a novel architecture consisting of two fusion methods that enable the model to simultaneously process patients' clinical data (structured data) and chest X-rays (image data). Since these data modalities are in different dimensional spaces, we propose a spatial arrangement strategy, spatialization, to facilitate the multimodal learning process in a Mask R-CNN model. We performed an extensive experimental evaluation using MIMIC-Eye, a dataset comprising modalities: MIMIC-CXR (chest X-ray images), MIMIC IV-ED (patients' clinical data), and REFLACX (annotations of disease locations in chest X-rays). Results show that incorporating patients' clinical data in a DL model together with the proposed fusion methods improves the disease localization in chest X-rays by 12\% in terms of Average Precision compared to a standard Mask R-CNN using only chest X-rays. Further ablation studies also emphasize the importance of multimodal DL architectures and the incorporation of patients' clinical data in disease localization. The architecture proposed in this work is publicly available to promote the scientific reproducibility of our study (https://github.com/ChihchengHsieh/multimodal-abnormalities-detection)	翻訳日:2023-05-31 01:29:16 公開日:2023-05-28
# マルチスケールリモートセンシングオブジェクト検出のためのTucker Bilinear Attention Network Tucker Bilinear Attention Network for Multi-scale Remote Sensing Object Detection ( http://arxiv.org/abs/2303.05329v2 ) ライセンス: Link先を確認	Tao Chen, Ruirui Li, Jiafeng Fu, and Daguang Jiang	(参考訳) vhrリモートセンシング画像における物体検出は,都市計画,土地資源管理,救助活動などにおいて重要な役割を担っている。リモートセンシング対象の大規模変動は、VHRリモートセンシング対象検出における大きな課題の1つである。既存の手法では,特徴ピラミッドの構造を改善し,異なる注意モジュールを採用することで,高解像度リモートセンシング物体の検出精度を向上させる。しかし、小さなターゲットでは、重要な詳細機能が失われているため、検出が著しく欠落している。マルチスケールの機能融合とバランスの方法にはまだ改善の余地があります。本稿では, 早期核融合の段階と後期核融合の段階にそれぞれ適用可能な2つの新しいモジュール, Guided Attention と Tucker Bilinear Attention を提案する。前者はクリーンなキーの詳細機能を効果的に保持でき、後者はセマンティックレベルの相関マイニングによって特徴のバランスを改善することができる。 2つのモジュールに基づいて、我々は新しいマルチスケールリモートセンシングオブジェクト検出フレームワークを構築した。鐘も笛もない。提案手法は小型オブジェクトの平均精度を大幅に向上させ,dota,dior,nwpu vhr-10.codeの9つの最先端手法と比較して,平均精度が最も高い。 Object detection on VHR remote sensing images plays a vital role in applications such as urban planning, land resource management, and rescue missions. The large-scale variation of the remote-sensing targets is one of the main challenges in VHR remote-sensing object detection. Existing methods improve the detection accuracy of high-resolution remote sensing objects by improving the structure of feature pyramids and adopting different attention modules. However, for small targets, there still be seriously missed detections due to the loss of key detail features. There is still room for improvement in the way of multiscale feature fusion and balance. To address this issue, this paper proposes two novel modules: Guided Attention and Tucker Bilinear Attention, which are applied to the stages of early fusion and late fusion respectively. The former can effectively retain clean key detail features, and the latter can better balance features through semantic-level correlation mining. Based on two modules, we build a new multi-scale remote sensing object detection framework. No bells and whistles. The proposed method largely improves the average precisions of small objects and achieves the highest mean average precisions compared with 9 state-of-the-art methods on DOTA, DIOR, and NWPU VHR-10.Code and models are available at https://github.com/Shinichict/GTNet.	翻訳日:2023-05-31 01:19:30 公開日:2023-05-28
# 改良した戦略カードゲーム(ハースストーン) Mastering Strategy Card Game (Hearthstone) with Improved Techniques ( http://arxiv.org/abs/2303.05197v2 ) ライセンス: Link先を確認	Changnan Xiao, Yongxin Zhang, Xuefeng Huang, Qinhan Huang, Jie Chen, Peng Sun	(参考訳) 戦略カードゲームは知的なゲームプレイを要求される有名なジャンルであり、AIにとって理想的なテストベンチになり得る。これまでの作品は、エンド・ツー・エンドのポリシー機能と楽観的なスムーズな架空のプレイを組み合わせることで、戦略カードゲーム『Regend of Code and Magic』で有望なパフォーマンスを示している。本研究では,このアルゴリズムを,ゲームルールや機構においてより複雑な,有名な商用ゲームであるhearthstoneに適用する。我々はさらに,いくつかの改良手法を提案し,その結果,著しい進歩を遂げた。マシンvsヒューマンテストでは、中国のオフィシャルリーグの上位10位にランクインしたハートストーンストリーマーを招待します。私たちのモデルは、全試合(デッキビルディングとバトルの両方を含む)のベスト5のトーナメントで人間プレイヤーを倒し、意思決定の強い能力を示します。 Strategy card game is a well-known genre that is demanding on the intelligent game-play and can be an ideal test-bench for AI. Previous work combines an end-to-end policy function and an optimistic smooth fictitious play, which shows promising performances on the strategy card game Legend of Code and Magic. In this work, we apply such algorithms to Hearthstone, a famous commercial game that is more complicated in game rules and mechanisms. We further propose several improved techniques and consequently achieve significant progress. For a machine-vs-human test we invite a Hearthstone streamer whose best rank was top 10 of the official league in China region that is estimated to be of millions of players. Our models defeat the human player in all Best-of-5 tournaments of full games (including both deck building and battle), showing a strong capability of decision making.	翻訳日:2023-05-31 01:19:07 公開日:2023-05-28
# 確率的ツールボックスユーザガイド --xSPDE3:確率的常微分方程式と偏微分方程式のための拡張可能なソフトウェア The Stochastic Toolbox User's Guide -- xSPDE3: extensible software for stochastic ordinary and partial differential equations ( http://arxiv.org/abs/2303.04448v2 ) ライセンス: Link先を確認	Simon Kiesewetter, Ria R. Joseph, Peter D. Drummond	(参考訳) xspdeツールボックスは、生物学、化学、工学、医学、物理学、量子技術への応用を含む、確率的偏微分方程式と常微分方程式を扱う。時間ステップやサンプリングエラー推定を含む統計平均を計算する。 xSPDE は高次収束、フーリエスペクトル、確率密度を提供する。ツールボックスにはグラフィカルな出力と$\chi^{2}$統計、重み付け、投影、フォワードバックワードの方程式がある。入出力量子スペクトルを生成することができる。すべての方程式は、任意の次元、任意のベクトル場成分、および任意の区間の両端において、独立周期、ディリクレ、ノイマンあるいはロビン境界条件を持つことができる。 The xSPDE toolbox treats stochastic partial and ordinary differential equations, with applications in biology, chemistry, engineering, medicine, physics and quantum technologies. It computes statistical averages, including time-step and/or sampling error estimation. xSPDE can provide higher order convergence, Fourier spectra and probability densities. The toolbox has graphical output and $\chi^{2}$ statistics, as well as weighted, projected, or forward-backward equations. It can generate input-output quantum spectra. All equations may have independent periodic, Dirichlet, and Neumann or Robin boundary conditions in any dimension, for any vector field component, and at either end of any interval.	翻訳日:2023-05-31 01:18:51 公開日:2023-05-28
# バックドアフェデレーション学習への学習 Learning to Backdoor Federated Learning ( http://arxiv.org/abs/2303.03320v3 ) ライセンス: Link先を確認	Henger Li, Chen Wu, Sencun Zhu, Zizhan Zheng	(参考訳) フェデレーション学習(fl)システムでは、悪意のある参加者は、モデルのメインタスクのパフォーマンスを維持しながら、簡単にバックドアを集約モデルに埋め込むことができる。近年,訓練段階の集約型防御や訓練後の緩和防衛など,様々な防御が提案されている。これらの防御は、主にヒューリスティックスに基づく既存のバックドア攻撃に対して合理的な性能を得るが、より先進的な攻撃に直面すると不十分であることを示す。特に,攻撃者がまずローカルデータとFLシステムの共通知識をベースとしたシミュレータを用いて(非明視的)攻撃ポリシーを訓練し,実際のFL訓練中に適用できる汎用強化学習ベースのバックドア攻撃フレームワークを提案する。我々の攻撃フレームワークは適応的かつ柔軟であり、最先端の防御の下でも強力な攻撃性能と耐久性を実現する。 In a federated learning (FL) system, malicious participants can easily embed backdoors into the aggregated model while maintaining the model's performance on the main task. To this end, various defenses, including training stage aggregation-based defenses and post-training mitigation defenses, have been proposed recently. While these defenses obtain reasonable performance against existing backdoor attacks, which are mainly heuristics based, we show that they are insufficient in the face of more advanced attacks. In particular, we propose a general reinforcement learning-based backdoor attack framework where the attacker first trains a (non-myopic) attack policy using a simulator built upon its local data and common knowledge on the FL system, which is then applied during actual FL training. Our attack framework is both adaptive and flexible and achieves strong attack performance and durability even under state-of-the-art defenses.	翻訳日:2023-05-31 01:18:20 公開日:2023-05-28
# 量子コンピュータによる分子電子構造計算 Molecular Electronic Structure Calculation via a Quantum Computer ( http://arxiv.org/abs/2303.09911v3 ) ライセンス: Link先を確認	Hamid Reza Naeij, Erfan Mahmoudi, Hossein Davoodi Yeganeh and Mohsen Akbari	(参考訳) 量子コンピュータは電子構造を計算し、多電子分子系の基底状態エネルギーを推定するために用いられる。本研究では,量子ビット数が増加傾向にあるh3+,oh-,hf,bh3などの分子の基底状態エネルギーを計算するハイブリッド量子古典アルゴリズムとして,変分量子固有ソルバ(vqe)アルゴリズムを実装した。我々はFermionのパリティ変換をqubitエンコーディングに、Unitary Coupled Cluster for Single and Double Excitations (UCCSD) を用いてアンサッツを構築する。量子シミュレーションの結果とフルコンフィグレーション相互作用 (fci) をベンチマークエネルギーとして,unrestricted hartree-fock (uhf) を一般的な計算手法として計算化学手法と比較した。以上の結果から,vqeとfciから得られる分子基底状態エネルギーは良好な一致を示した。さらに,VQEから得られた基底状態エネルギーの精度は,これまでに報告した値よりも高い。 Quantum computers can be used to calculate the electronic structure and estimate the ground state energy of many-electron molecular systems. In the present study, we implement the Variational Quantum Eigensolver (VQE) algorithm, as a hybrid quantum-classical algorithm to calculate the ground state energy of the molecules such as H3+, OH-, HF and BH3 in which the number of qubits has an increasing trend. We use the parity transformation for Fermion to qubit encoding and the Unitary Coupled Cluster for Single and Double excitations (UCCSD) to construct an ansatz. We compare our quantum simulation results with the computational chemistry approaches including Full Configuration Interaction (FCI), as benchmark energy and Unrestricted Hartree-Fock (UHF), as a common computational method. Our results show that there is a good agreement between molecular ground state energy obtained from VQE and FCI. Moreover, the accuracy of the ground state energies obtained from VQE in our work is higher than the previously reported values.	翻訳日:2023-05-31 01:08:46 公開日:2023-05-28
# 表面電子のリドバーグ状態に基づく制御なしゲート Controlled-NOT gate based on the Rydberg states of surface electrons ( http://arxiv.org/abs/2303.08650v3 ) ライセンス: Link先を確認	Jun Wang, Wan-Ting He, Cong-Wei Lu, Yang-Yang Wang, Qing Ai, Hai-Bo Wang	(参考訳) 長いコヒーレンス時間と効率的な操作のため、表面電子(se)は量子計算と量子シミュレーションのための完全な2次元プラットフォームを提供する。本研究では,制御NOT(CNOT)ゲートを実現するための理論スキームを提案し,SEの4レベルRydberg構造上に2量子系を符号化する。状態伝達は中間レベルを持つ3レベル構造によって達成される。 2つの外部電磁界でSEを同時に駆動することにより、電磁誘導透過(EIT)効果の暗黒状態を利用して、最も散逸した状態の人口を抑制し、散逸に対する堅牢性を高める。このスキームの忠実性は、実験的に達成可能なパラメータで 0.9989 である。 Due to the long coherence time and efficient manipulation, the surface electron (SE) provides a perfect two-dimensional platform for quantum computation and quantum simulation. In this work, a theoretical scheme to realize the controlled-NOT (CNOT) gate is proposed, where the two-qubit system is encoded on the four-level Rydberg structure of SE. The state transfer is achieved by a three-level structure with an intermediate level. By simultaneously driving the SE with two external electromagnetic fields, the dark state in the electromagnetically induced transparency (EIT) effect is exploited to suppress the population of the most dissipative state and increase the robustness against dissipation. The fidelity of the scheme is 0.9989 with experimentally achievable parameters.	翻訳日:2023-05-31 01:08:09 公開日:2023-05-28
# 超音波トモグラフィインバージョンのためのニューラルオペレータ学習 Neural Operator Learning for Ultrasound Tomography Inversion ( http://arxiv.org/abs/2304.03297v2 ) ライセンス: Link先を確認	Haocheng Dai, Michael Penwarden, Robert M. Kirby, Sarang Joshi	(参考訳) 複雑な関数空間間のマッピング手段としてのニューラル演算子学習は、計算科学と工学(CS&E)の分野で大きな注目を集めている。本稿では,時空超音波CT(USCT)問題に対するニューラル演算子学習を適用した。我々は、フルウェーブ・ソルバを用いて、飛行時間(TOF)データと異種音速場のマッピングを学習し、トレーニングデータを生成する。演算子学習のこの新しい応用は、計算集約的な反復逆問題を解く必要性を回避している。オペレータは非線形マッピングをオフラインで学習し、モデルを通過する単一のフォワードパスで異種音場を予測する。超音波断層撮影におけるオペレーターの学習はこれが初めてであり、ビーストイメージングにおける腫瘍の同定のための軟組織分布のリアルタイム予測の第一歩である。 Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a full-wave solver to generate the training data. This novel application of operator learning circumnavigates the need to solve the computationally intensive iterative inverse problem. The operator learns the non-linear mapping offline and predicts the heterogeneous sound field with a single forward pass through the model. This is the first time operator learning has been used for ultrasound tomography and is the first step in potential real-time predictions of soft tissue distribution for tumor identification in beast imaging.	翻訳日:2023-05-31 00:59:31 公開日:2023-05-28
# Re-IQA: 野生の画像品質評価のための教師なし学習 Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild ( http://arxiv.org/abs/2304.00451v2 ) ライセンス: Link先を確認	Avinab Saha, Sandeep Mishra, Alan C. Bovik	(参考訳) 自動知覚画像品質評価は、何十億ものインターネットとソーシャルメディアユーザーに影響を与える難しい問題である。そこで本研究では, 2つの異なるエンコーダを訓練し, 教師なし設定で高レベルコンテンツと低レベル画像品質特徴を学習する, 専門家の混合手法を提案する。このアプローチのユニークな特徴は、画像コンテンツを表すハイレベルな特徴を補完する低レベルの画像品質表現を生成する能力である。 2つのエンコーダをトレーニングするフレームワークをRe-IQAと呼ぶ。野生の画質評価のために、re-iqaフレームワークから得られた補完的な低レベルおよび高レベル画像表現をデプロイして、画像表現を地上の真理品質スコアにマッピングするために使用される線形回帰モデルをトレーニングします。提案手法は,複数の大規模画像品質評価データベースにおいて,実歪みと合成歪みの両方を含む最先端のニューラルネットワークを教師なし環境でトレーニングし,知覚に関連のある表現を生成する方法を示す。得られた低レベル・高レベルの特徴は相補的であり,線形回帰器の性能に肯定的な影響を及ぼす。この作業に関連するすべてのコードのパブリックリリースは、githubで公開されている。 Automatic Perceptual Image Quality Assessment is a challenging problem that impacts billions of internet, and social media users daily. To advance research in this field, we propose a Mixture of Experts approach to train two separate encoders to learn high-level content and low-level image quality features in an unsupervised setting. The unique novelty of our approach is its ability to generate low-level representations of image quality that are complementary to high-level features representing image content. We refer to the framework used to train the two encoders as Re-IQA. For Image Quality Assessment in the Wild, we deploy the complementary low and high-level image representations obtained from the Re-IQA framework to train a linear regression model, which is used to map the image representations to the ground truth quality scores, refer Figure 1. Our method achieves state-of-the-art performance on multiple large-scale image quality assessment databases containing both real and synthetic distortions, demonstrating how deep neural networks can be trained in an unsupervised setting to produce perceptually relevant representations. We conclude from our experiments that the low and high-level features obtained are indeed complementary and positively impact the performance of the linear regressor. A public release of all the codes associated with this work will be made available on GitHub.	翻訳日:2023-05-31 00:59:03 公開日:2023-05-28
# Diffusion Schr\"odinger Bridge Matching Diffusion Schr\"odinger Bridge Matching ( http://arxiv.org/abs/2303.16852v2 ) ライセンス: Link先を確認	Yuyang Shi, Valentin De Bortoli, Andrew Campbell, Arnaud Doucet	(参考訳) 輸送問題の解決、すなわちある分布を別の分布に輸送する地図を見つけることは、機械学習に多くの応用がある。生成的モデルに動機づけられた新しい質量移動法が最近提案されており、例えば、分極拡散モデル(ddms)とフローマッチングモデル(fmms)は、そのような移動を確率微分方程式(sde)または常微分方程式(ode)で実装している。しかし、多くの応用において、魅力的な特性を持つ決定論的動的最適輸送(OT)マップを近似することが望ましいが、DDMとFMMはOTマップに近い輸送を提供することが保証されていない。対照的に、Schr\"odinger bridges (SBs) は OT のエントロピー規則化されたバージョンを復元する確率的動的写像を計算する。残念なことに、SBを近似する既存の数値法は、次元のスケールが低かったり、繰り返しにまたがってエラーを蓄積する。本稿では,SB問題を解決するための新しい手法であるIterative Markovian Fitting (IMF)と,IMFの反復計算のための新しい数値アルゴリズムであるDiffusion Schr\"odinger Bridge Matching (DSBM)を紹介する。 DSBMは従来のSB数値よりも大幅に改善され、様々な最近の輸送方法の特殊な/制限ケースとして回復する。様々な問題についてDSBMの性能を実証する。 Solving transport problems, i.e. finding a map transporting one given distribution to another, has numerous applications in machine learning. Novel mass transport methods motivated by generative modeling have recently been proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models (FMMs) implement such a transport through a Stochastic Differential Equation (SDE) or an Ordinary Differential Equation (ODE). However, while it is desirable in many applications to approximate the deterministic dynamic Optimal Transport (OT) map which admits attractive properties, DDMs and FMMs are not guaranteed to provide transports close to the OT map. In contrast, Schr\"odinger bridges (SBs) compute stochastic dynamic mappings which recover entropy-regularized versions of OT. Unfortunately, existing numerical methods approximating SBs either scale poorly with dimension or accumulate errors across iterations. In this work, we introduce Iterative Markovian Fitting (IMF), a new methodology for solving SB problems, and Diffusion Schr\"odinger Bridge Matching (DSBM), a novel numerical algorithm for computing IMF iterates. DSBM significantly improves over previous SB numerics and recovers as special/limiting cases various recent transport methods. We demonstrate the performance of DSBM on a variety of problems.	翻訳日:2023-05-31 00:57:27 公開日:2023-05-28
# 自然言語生成対話サービスのチャット体験を予測する要因は何か? Which Factors Predict the Chat Experience of a Natural Language Generation Dialogue Service? ( http://arxiv.org/abs/2304.10785v2 ) ライセンス: Link先を確認	Eason Chen	(参考訳) 本稿では,自然言語生成ダイアログシステムにおけるチャット体験を予測するための概念モデルを提案する。部分最小方形構造方程式モデリング (PLS-SEM) を用いた120人の被験者によるモデルの評価を行い, R-square (R2) を0.541で取得した。モデルは、生成に使用するプロンプト、会話におけるコヒーレンス、感情、類似性、ユーザの認識するダイアログエージェントの好適性など、さまざまな要因を考察する。次に,提案モデルのサブセットの有効性をさらに検討する。その結果,対話におけるユーザの好適性,一貫性,感情,類似性は,ユーザのチャット体験の肯定的な予測要因であることがわかった。さらに,外向性,開放性,良心性,同意性,非ニューロティシズムなどの特徴を持つ対話エージェントが好まれる可能性が示唆された。本研究を通じて,アダプティブダイアログシステムでは,収集したデータを用いてモデル内の要因を推測し,これらの要因によりユーザのチャット体験を予測し,プロンプトを調整して最適化する。 In this paper, we proposed a conceptual model to predict the chat experience in a natural language generation dialog system. We evaluated the model with 120 participants with Partial Least Squares Structural Equation Modeling (PLS-SEM) and obtained an R-square (R2) with 0.541. The model considers various factors, including the prompts used for generation; coherence, sentiment, and similarity in the conversation; and users' perceived dialog agents' favorability. We then further explore the effectiveness of the subset of our proposed model. The results showed that users' favorability and coherence, sentiment, and similarity in the dialogue are positive predictors of users' chat experience. Moreover, we found users may prefer dialog agents with characteristics of Extroversion, Openness, Conscientiousness, Agreeableness, and Non-Neuroticism. Through our research, an adaptive dialog system might use collected data to infer factors in our model, predict the chat experience for users through these factors, and optimize it by adjusting prompts.	翻訳日:2023-05-31 00:50:47 公開日:2023-05-28
# 動的シーン理解のための教師なしオブジェクト中心ボクセル化 Unsupervised Object-Centric Voxelization for Dynamic Scene Understanding ( http://arxiv.org/abs/2305.00393v2 ) ライセンス: Link先を確認	Siyu Gao, Yanpeng Zhao, Yunbo Wang, Xiaokang Yang	(参考訳) 教師なしの3Dシナリオで世界の構成力学を理解することは難しい。既存のアプローチでは、タイムキューを効果的に利用できないか、シーン分解のマルチビュー一貫性を無視している。本稿では,複数の実体(オブジェクトなど)を持つ動的シーンの時間変化容積表現をパイロットで学習するための,逆ニューラルネットワークレンダリングフレームワークであるDynaVolを提案する。主な貢献は2つある。まず、時間依存の3Dグリッドを維持し、空間的位置を異なるエンティティに動的かつ柔軟に結合し、表現レベルで情報の分離を促進する。第2に, グリッドレベルの局所力学, オブジェクトレベルの大域的力学, 構成的ニューラルラジアンス場をエンドツーエンドアーキテクチャで共同学習することにより, オブジェクト中心のシーンボキセル化の時空間的一貫性を向上させる。ダイナボリの2段階のトレーニングスキームを提示し,マルチオブジェクト,多様なダイナミクス,実世界の形状とテクスチャを用いた様々なベンチマークでの有効性を検証する。可視化はhttps://sites.google.com/view/dynavol-visual.comで行います。 Understanding the compositional dynamics of the world in unsupervised 3D scenarios is challenging. Existing approaches either fail to make effective use of time cues or ignore the multi-view consistency of scene decomposition. In this paper, we propose DynaVol, an inverse neural rendering framework that provides a pilot study for learning time-varying volumetric representations for dynamic scenes with multiple entities (like objects). It has two main contributions. First, it maintains a time-dependent 3D grid, which dynamically and flexibly binds the spatial locations to different entities, thus encouraging the separation of information at a representational level. Second, our approach jointly learns grid-level local dynamics, object-level global dynamics, and the compositional neural radiance fields in an end-to-end architecture, thereby enhancing the spatiotemporal consistency of object-centric scene voxelization. We present a two-stage training scheme for DynaVol and validate its effectiveness on various benchmarks with multiple objects, diverse dynamics, and real-world shapes and textures. We present visualization at https://sites.google.com/view/dynavol-visual.	翻訳日:2023-05-31 00:41:24 公開日:2023-05-28
# FedVS: 分割モデルのためのストラグラー耐性とプライバシ保護による垂直的フェデレーション学習 FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models ( http://arxiv.org/abs/2304.13407v2 ) ライセンス: Link先を確認	Songze Li, Duanyi Yao, Jin Liu	(参考訳) 中央サーバと多くの分散クライアントからなる垂直連合学習(VFL)システムにおいて、トレーニングデータを垂直に分割し、異なる特徴を異なるクライアントにプライベートに格納する。分割VFLの問題は、サーバとクライアントの間で分割されたモデルをトレーニングすることだ。本稿では,分割VFLにおける2つの課題に対処することを目的とする。 1) 研修中にクライアントを絞ったことによる性能の低下 2) クライアントがアップロードしたデータ埋め込みからのデータとモデルのプライバシリーク。我々はこれらの2つの課題に同時に対処するためにFedVSを提案する。 fedvsの鍵となるアイデアは、ローカルデータやモデルのシークレット共有スキームをデザインすることであり、クライアントと好奇心に満ちたサーバに対する情報理論的なプライバシーが保証され、全てのクライアントの埋め込みの集約は、非ストラグリングクライアントから計算共有を復号することで損失なく再構築される。様々な種類のVFLデータセット(表、CV、マルチビューを含む)に対する大規模な実験は、ベースラインプロトコルに対するトラグラー緩和とプライバシ保護におけるFedVSの普遍的な利点を示している。 In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients' uploaded data embeddings. We propose FedVS to simultaneously address these two challenges. The key idea of FedVS is to design secret sharing schemes for the local data and models, such that information-theoretical privacy against colluding clients and curious server is guaranteed, and the aggregation of all clients' embeddings is reconstructed losslessly, via decrypting computation shares from the non-straggling clients. Extensive experiments on various types of VFL datasets (including tabular, CV, and multi-view) demonstrate the universal advantages of FedVS in straggler mitigation and privacy protection over baseline protocols.	翻訳日:2023-05-31 00:38:52 公開日:2023-05-28
# GPT-NAS:生成事前学習モデルによる進化的ニューラルネットワーク探索 GPT-NAS: Evolutionary Neural Architecture Search with the Generative Pre-Trained Model ( http://arxiv.org/abs/2305.05351v2 ) ライセンス: Link先を確認	Caiyang Yu, Xianggen Liu, Wentao Feng, Chenwei Tang, Jiancheng Lv	(参考訳) 最適なニューラルネットワークアーキテクチャを自動設計する有効な方法の1つとして、ニューラルネットワーク探索(NAS)が登場した。ニューラルアーキテクチャはいくつかのタスクで人間レベルの性能を達成したが、NAS法から得られるものはほとんどない。主な理由は、ニューラルネットワークの巨大な検索空間であり、NASアルゴリズムを非効率にする。この研究は、進化的アルゴリズム(EA)を探索戦略とする生成事前学習(GPT)モデルを用いてニューラルネットワークを最適化する、GPT-NASと呼ばれる新しいアーキテクチャ探索アルゴリズムを提案する。 GPT-NASでは、大規模コーパスで事前学習した生成モデルが、ニューラルネットワーク構築の基本法則を学習できると仮定する。したがって、GPT-NAS は GPT モデルを利用して基本的なアーキテクチャ要素を適切に提案し、EA を用いて最適解を求める。このようなアプローチは、検索プロセスに事前知識を導入することで、検索スペースを大幅に削減することができる。 GPT-NAS法は,手作業で設計した7つのニューラルネットワークと,競合するNAS法によって提供される13のアーキテクチャより有意に優れていた。さらに,提案アルゴリズムは,GPTのないものに比べて12%程度の微調整ニューラルアーキテクチャの性能向上を実現し,さらに,ニューラルアーキテクチャの探索に有効であることを示す。 Neural Architecture Search (NAS) has emerged as one of the effective methods to design the optimal neural network architecture automatically. Although neural architectures have achieved human-level performances in several tasks, few of them are obtained from the NAS method. The main reason is the huge search space of neural architectures, making NAS algorithms inefficient. This work presents a novel architecture search algorithm, called GPT-NAS, that optimizes neural architectures by Generative Pre-Trained (GPT) model with an evolutionary algorithm (EA) as the search strategy. In GPT-NAS, we assume that a generative model pre-trained on a large-scale corpus could learn the fundamental law of building neural architectures. Therefore, GPT-NAS leverages the GPT model to propose reasonable architecture components given the basic one and then utilizes EAs to search for the optimal solution. Such an approach can largely reduce the search space by introducing prior knowledge in the search process. Extensive experimental results show that our GPT-NAS method significantly outperforms seven manually designed neural architectures and thirteen architectures provided by competing NAS methods. In addition, our experiments also indicate that the proposed algorithm improves the performance of finely tuned neural architectures by up to about 12% compared to those without GPT, further demonstrating its effectiveness in searching neural architectures.	翻訳日:2023-05-31 00:31:56 公開日:2023-05-28
# factify-5wqa: 質問応答による5wのアスペクトベースファクト検証 FACTIFY-5WQA: 5W Aspect-based Fact Verification through Question Answering ( http://arxiv.org/abs/2305.04329v2 ) ライセンス: Link先を確認	Anku Rani, S.M Towhidul Islam Tonmoy, Dwip Dalal, Shreya Gautam, Megha Chakraborty, Aman Chadha, Amit Sheth, Amitava Das	(参考訳) 自動事実検証は近年大きな注目を集めている。現代自動ファクトチェックシステムは、人間に解釈できない数値スコアを用いて真理度を推定することに焦点を当てている。ヒューマン・ファクト・チェッカーは一般に、正当性クレームを検証し、その真偽か単なる仮面かを判断するためのいくつかの論理的なステップに従う。人気のあるファクトチェックwebサイトは、半真実、半偽、偽、火のズボンなど、ファクト分類のための共通の構造に従う。したがって, 事実に関連のある質問を行う際に, 人間の事実チェックを支援するアスペクトベース(どの部分が真か, どれが偽か)で説明可能なシステムを持つことが求められ, 最終判断に到達するためには, 個別に検証することができる。本稿では5wフレームワーク(who,what, when, where, and why)を提案する。そこで,本稿では,391,041のファクトと関連する5wのqasからなる,factify-5wqaという半自動生成データセットを提案する。セマンティックロールラベリングシステムを用いて、5Wを探索し、マスク付き言語モデルを用いてクレームのQAペアを生成する。最後に,これらの回答を証拠文書から自動的に検出するベースラインQAシステムについて報告する。最後に,言い換えられた主張を自動検証する堅牢な事実検証システムを提案する。データセットとベースラインモデルはhttps: //github.com/ankuranii/acl-5W-QAで利用可能である。 Automatic fact verification has received significant attention recently. Contemporary automatic fact-checking systems focus on estimating truthfulness using numerical scores which are not human-interpretable. A human fact-checker generally follows several logical steps to verify a verisimilitude claim and conclude whether its truthful or a mere masquerade. Popular fact-checking websites follow a common structure for fact categorization such as half true, half false, false, pants on fire, etc. Therefore, it is necessary to have an aspect-based (delineating which part(s) are true and which are false) explainable system that can assist human fact-checkers in asking relevant questions related to a fact, which can then be validated separately to reach a final verdict. In this paper, we propose a 5W framework (who, what, when, where, and why) for question-answer-based fact explainability. To that end, we present a semi-automatically generated dataset called FACTIFY-5WQA, which consists of 391, 041 facts along with relevant 5W QAs - underscoring our major contribution to this paper. A semantic role labeling system has been utilized to locate 5Ws, which generates QA pairs for claims using a masked language model. Finally, we report a baseline QA system to automatically locate those answers from evidence documents, which can serve as a baseline for future research in the field. Lastly, we propose a robust fact verification system that takes paraphrased claims and automatically validates them. The dataset and the baseline model are available at https: //github.com/ankuranii/acl-5W-QA	翻訳日:2023-05-31 00:30:55 公開日:2023-05-28
# バイアスノイズ量子ビットに対するスケーラブルノイズ量子回路 Scalable noisy quantum circuits for biased-noise qubits ( http://arxiv.org/abs/2305.02045v3 ) ライセンス: Link先を確認	Marco Fellous-Asiani, Moein Naseri, Chandan Datta, Alexander Streltsov, Micha{\l} Oszmaniec	(参考訳) 量子誤差軽減は、量子アルゴリズムに対するノイズの影響を低減することができる。しかし、回路サイズで指数関数的にスケールするリソースを必要とするため、スケーラブルではない。本研究では,安定猫量子ビットの既存システムに動機づけられたビットフリップ誤差のみに影響されるバイアスノイズ量子ビットについて考察する。この特性により、アルゴリズム繰り返しの多項式オーバーヘッドだけで確実に実行される、絡み合いと非クリフォードゲートを含むノイズの多いアダマールテストのクラスを設計できる。また,従来のアルゴリズムでは,Adamardテストの特定の変種を効率的にシミュレートすることができた。我々は,このアルゴリズムを,大規模かつ複雑な量子回路のスケールにおける雑音のバイアスの単純なベンチマークとして用いることを提案する。我々の回路の強いノイズ耐性はさらなる研究の動機となり、量子計算の利点が高度に特定されながらノイズの多い回路に到達できるかどうかを確かめる。 Quantum error mitigation allows to reduce the impact of noise on quantum algorithms. Yet, it is not scalable as it requires resources scaling exponentially with the circuit size. In this work, we consider biased-noise qubits affected only by bit-flip errors, which is motivated by existing systems of stabilized cat qubits. This property allows us to design a class of noisy Hadamard-tests involving entangling and certain non-Clifford gates, which can be conducted reliably with only a polynomial overhead in algorithm repetitions. On the flip side we also found a classical algorithm able to efficiently simulate our specific variants of Hadamard test. We propose to use this algorithm as a simple benchmark of the biasness of the noise at the scale of large and complicated quantum circuits. The strong noise-resilience of our circuits could motivate further research, to see if a quantum computational advantage could be reached for highly specific, yet noisy circuits.	翻訳日:2023-05-31 00:29:46 公開日:2023-05-28
# Vision meets Definitions: Unsupervised Visual Word Sense Disambiguation incorporated Gloss Information Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information ( http://arxiv.org/abs/2305.01788v2 ) ライセンス: Link先を確認	Sunjae Kwon, Rishabh Garodia, Minhwa Lee, Zhichao Yang, Hong Yu	(参考訳) Visual Word Sense Disambiguation (VWSD) は、与えられたコンテキストに対する対象単語の正しい感覚を最も正確に表現した画像を見つけるためのタスクである。これまで、画像テキストマッチングモデルは多義語認識に苦しめられていた。本稿では,外部語彙知識ベース,特に感覚定義の光沢情報を用いた教師なしVWSD手法を提案する。具体的には,解答の感覚情報が提供されない場合に,ベイズ推論を用いて感覚定義を取り入れることを提案する。さらに,時間外問題(OOD)を改善するために,GPT-3を用いた文脈認識定義生成を提案する。実験の結果,ベイズ推定法によりVWSDの性能は有意に向上した。さらに,既存の定義生成手法よりも優れた性能を示すOOD例では,文脈認識による定義生成が顕著な性能向上を実現した。できるだけ早くソースコードを公開します。 Visual Word Sense Disambiguation (VWSD) is a task to find the image that most accurately depicts the correct sense of the target word for the given context. Previously, image-text matching models often suffered from recognizing polysemous words. This paper introduces an unsupervised VWSD approach that uses gloss information of an external lexical knowledge-base, especially the sense definitions. Specifically, we suggest employing Bayesian inference to incorporate the sense definitions when sense information of the answer is not provided. In addition, to ameliorate the out-of-dictionary (OOD) issue, we propose a context-aware definition generation with GPT-3. Experimental results show that the VWSD performance significantly increased with our Bayesian inference-based approach. In addition, our context-aware definition generation achieved prominent performance improvement in OOD examples exhibiting better performance than the existing definition generation method. We will publish source codes as soon as possible.	翻訳日:2023-05-31 00:29:07 公開日:2023-05-28
# 講演とAIリスニング - EHRにおける言語のスティグマティクスがAIのパフォーマンスに与える影響 People Talking and AI Listening: How Stigmatizing Language in EHR Notes Affect AI Performance ( http://arxiv.org/abs/2305.10201v2 ) ライセンス: Link先を確認	Yizhi Liu, Weiguang Wang, Guodong Gordon Gao, Ritu Agarwal	(参考訳) EHR(Electronic Health Record)は、医療におけるAI(AI)主導の変革に必要なデータソースとして機能する。しかし、EHRノートに反映された臨床バイアスは、これらのバイアスを継承し増幅し、健康格差を持続させるAIモデルにつながる可能性がある。本研究では,変圧器を用いた深層学習モデルと説明可能なAI(XAI)技術を用いた死亡予測における音声合成言語(SL)の影響について検討した。以上の結果から,臨床医が作成したSLは,特に黒人患者に対して,AIモデル開発における人種格差の源泉として,AIのパフォーマンスに悪影響を及ぼすことが明らかとなった。 SLの効果を緩和するための運用的に効率的な方法を探るため,臨床医の協調ネットワークを通じてSLの生成パターンを調査し,AIモデルにおける人種格差に強い影響を与えると認識した。中央臨床医によるSLの除去は,全データのSLを除去するよりも,より効率的なバイアス低減戦略であることがわかった。本研究は,責任あるai開発に有効な洞察を提供し,臨床行動の理解と,ehr note writing in healthcareに寄与する。 Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction using a Transformer-based deep learning model and explainable AI (XAI) techniques. Our findings demonstrate that SL written by clinicians adversely affects AI performance, particularly so for black patients, highlighting SL as a source of racial disparity in AI model development. To explore an operationally efficient way to mitigate SL's impact, we investigate patterns in the generation of SL through a clinicians' collaborative network, identifying central clinicians as having a stronger impact on racial disparity in the AI model. We find that removing SL written by central clinicians is a more efficient bias reduction strategy than eliminating all SL in the entire corpus of data. This study provides actionable insights for responsible AI development and contributes to understanding clinician behavior and EHR note writing in healthcare.	翻訳日:2023-05-31 00:12:41 公開日:2023-05-28
# グラフ上のロングテールカテゴリの特徴付け Characterizing Long-Tail Categories on Graphs ( http://arxiv.org/abs/2305.09938v2 ) ライセンス: Link先を確認	Haohui Wang, Baoyu Jing, Kaize Ding, Yada Zhu, Dawei Zhou	(参考訳) ロングテールデータ配信は、金融取引ネットワーク、eコマースネットワーク、コラボレーションネットワークなど、多くの現実世界のネットワークで一般的である。最近の開発の成功にもかかわらず、既存の作品は主にグラフ拡張や客観的な重み付けによる機械学習モデルのデバイアスに焦点を当てている。しかし、グラフ上の長い尾のカテゴリの挙動を特徴づけ、実際のシナリオにおける一般化性能を理解するための理論的ツールを提供する文献は限られている。このギャップを埋めるために,マルチタスク学習の方法で問題を定式化することにより,グラフ上の長い尾の分類のための最初の一般化を提案し,各タスクは1つの特定のカテゴリの予測に対応する。その結果,ロングテール分類の一般化性能は,すべてのタスクの損失範囲とタスクの総数に支配されていることがわかった。理論的な知見に基づいて,グラフのロングテールカテゴリの性能を向上させるための新しい汎用フレームワークtail2learnを提案する。特に,ラベル制限されたクラスを他のクラスが共有する関連情報から恩恵を受ける階層型タスクグループ化モジュールから始め,頭と尾のクラスの勾配寄与のバランスをとるために,バランスのとれたコントラスト学習モジュールを更に設計する。最後に、様々な実世界のデータセットに関する広範な実験は、グラフ上の長い尾のカテゴリをキャプチャするTail2Learnの有効性を示した。 Long-tail data distributions are prevalent in many real-world networks, including financial transaction networks, e-commerce networks, and collaboration networks. Despite the success of recent developments, the existing works mainly focus on debiasing the machine learning models via graph augmentation or objective reweighting. However, there is limited literature that provides a theoretical tool to characterize the behaviors of long-tail categories on graphs and understand the generalization performance in real scenarios. To bridge this gap, we propose the first generalization bound for long-tail classification on graphs by formulating the problem in the fashion of multi-task learning, i.e., each task corresponds to the prediction of one particular category. Our theoretical results show that the generalization performance of long-tail classification is dominated by the range of losses across all tasks and the total number of tasks. Building upon the theoretical findings, we propose a novel generic framework Tail2Learn to improve the performance of long-tail categories on graphs. In particular, we start with a hierarchical task grouping module that allows label-limited classes to benefit from the relevant information shared by other classes; then, we further design a balanced contrastive learning module to balance the gradient contributions of head and tail classes. Finally, extensive experiments on various real-world datasets demonstrate the effectiveness of Tail2Learn in capturing long-tail categories on graphs.	翻訳日:2023-05-31 00:12:19 公開日:2023-05-28
# グラフセグメントトレーニングによる大規模グラフ特性予測の学習 Learning Large Graph Property Prediction via Graph Segment Training ( http://arxiv.org/abs/2305.12322v2 ) ライセンス: Link先を確認	Kaidi Cao, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Dustin Zelle, Yanqi Zhou, Charith Mendis, Jure Leskovec, Bryan Perozzi	(参考訳) 各予測にはグラフ全体の知識が必要であり、トレーニング中に利用可能なメモリ量は制限されているため、大きなグラフの特性を予測するための学習は困難である。本稿では,大きなグラフ特性の予測を一定メモリフットプリントで学習するために,分割・コンカレントアプローチを利用する一般的なフレームワークであるグラフセグメントトレーニング(GST)を提案する。 GSTは、まず大きなグラフをセグメントに分割し、トレーニングイテレーション毎にサンプリングされた少数のセグメントをバックプロパゲートする。バックプロパゲーションのためにサンプリングされていないセグメントに対する埋め込みを効率的に得るために,歴史的埋め込みテーブルを導入することにより,GSTパラダイムを洗練する。歴史的埋め込みの安定性を軽減するため,2つの新しい手法を設計する。まず,入力分布シフトを補正するために予測ヘッドを微調整する。第2に,トレーニング中に古い埋め込みをドロップしてバイアスを減らすために,stale embedded dropoutを導入する。我々は、MalNetとTpuGraphsという2つの大きなグラフ特性予測ベンチマーク上で、GST-EFD(すべての手法を併用)の完全な手法を評価する。実験の結果,GST-EFDはメモリ効率が良く,高速でありながら,通常の全グラフ学習システムよりもテスト精度が若干向上していることがわかった。 Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first divides a large graph into segments and then backpropagates through only a few segments sampled per training iteration. We refine the GST paradigm by introducing a historical embedding table to efficiently obtain embeddings for segments not sampled for backpropagation. To mitigate the staleness of historical embeddings, we design two novel techniques. First, we finetune the prediction head to fix the input distribution shift. Second, we introduce Stale Embedding Dropout to drop some stale embeddings during training to reduce bias. We evaluate our complete method GST-EFD (with all the techniques together) on two large graph property prediction benchmarks: MalNet and TpuGraphs. Our experiments show that GST-EFD is both memory-efficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime.	翻訳日:2023-05-31 00:01:50 公開日:2023-05-28
# 確率的エンサンブルニューラルネットワークダイナミクスを用いたブリッジングアクティブ探索と不確実性対応展開 Bridging Active Exploration and Uncertainty-Aware Deployment Using Probabilistic Ensemble Neural Network Dynamics ( http://arxiv.org/abs/2305.12240v2 ) ライセンス: Link先を確認	Taekyung Kim, Jungwi Mun, Junwon Seo, Beomsu Kim, Seongil Hong	(参考訳) 近年,ロボット工学における学習に基づく制御は,実環境における複雑なタスクに対処する能力から注目されている。機械学習アルゴリズムと計算能力の進歩により、このアプローチは未知または部分的に知られているロボットのダイナミクスを学習することでロボットの制御問題を解くためにますます重要になっている。効率的なデータ収集と人間の監督の最小化のためには、ロボットが最高の情報を得る状態へ自身を誘導する能動的探査が不可欠である。同様に、不確実性を認識したデプロイメントは、ロボット制御において、学習されたモデルから情報を得た不確実なアクションが不安定な動きや失敗に繋がる可能性がある、という懸念が高まっている。しかし、活発な探索と不確実性を認識した展開は独立に研究されており、それらをシームレスに統合する文献は限られている。本稿では,ロボット制御領域におけるこれらの2つのタスクをブリッジするモデルベース強化学習フレームワークを提案する。本フレームワークは,確率的アンサンブルニューラルネットワークを用いてダイナミクス学習を行い,jensen-renyiダイバージェンスによる認識的不確かさの定量化を可能にする。調査と展開の対立する2つのタスクは、最先端のサンプリングベースのMPCによって最適化され、トレーニングデータの効率的な収集と、不確実な状態アクション空間の回避に成功した。自動運転車と車輪付きロボットの両方で実験を行い、探索と展開の両方に有望な結果を示す。 In recent years, learning-based control in robotics has gained significant attention due to its capability to address complex tasks in real-world environments. With the advances in machine learning algorithms and computational capabilities, this approach is becoming increasingly important for solving challenging control problems in robotics by learning unknown or partially known robot dynamics. Active exploration, in which a robot directs itself to states that yield the highest information gain, is essential for efficient data collection and minimizing human supervision. Similarly, uncertainty-aware deployment has been a growing concern in robotic control, as uncertain actions informed by the learned model can lead to unstable motions or failure. However, active exploration and uncertainty-aware deployment have been studied independently, and there is limited literature that seamlessly integrates them. This paper presents a unified model-based reinforcement learning framework that bridges these two tasks in the robotics control domain. Our framework uses a probabilistic ensemble neural network for dynamics learning, allowing the quantification of epistemic uncertainty via Jensen-Renyi Divergence. The two opposing tasks of exploration and deployment are optimized through state-of-the-art sampling-based MPC, resulting in efficient collection of training data and successful avoidance of uncertain state-action spaces. We conduct experiments on both autonomous vehicles and wheeled robots, showing promising results for both exploration and deployment.	翻訳日:2023-05-31 00:01:31 公開日:2023-05-28
# 不確定入力を用いたガウス過程回帰に対するベイズ的アプローチ Bayesian approach to Gaussian process regression with uncertain inputs ( http://arxiv.org/abs/2305.11586v2 ) ライセンス: Link先を確認	Dongwei Ye, Mengwu Guo	(参考訳) 従来のガウス過程の回帰は、モデル観測の出力データにノイズの存在を前提としている。しかし、多くの科学的・工学的応用において、観測データの入力位置は、モデリングの仮定や測定誤差などによる不確実性によっても損なわれる可能性がある。本研究では,ガウス過程の回帰に入力データの可変性を統合するベイズ法を提案する。 2種類のオブザーバブル -- 固定入力を持つノイズ分解出力と、予め分布が定義された不確定入力を持つ出力を考えると、後方分布はベイズフレームワークによって推定され、不確かさデータの位置を推定する。その後、そのような入力の定量化された不確かさを限界化によってガウス過程予測に組み込む。この新しい回帰手法の有効性は、不確定入力のベイズ推定によって予測の不確実性が大幅に減少するのに対し、一般化の一貫して良好な性能が観察されるいくつかの数値例を通して実証される。 Conventional Gaussian process regression exclusively assumes the existence of noise in the output data of model observations. In many scientific and engineering applications, however, the input locations of observational data may also be compromised with uncertainties owing to modeling assumptions, measurement errors, etc. In this work, we propose a Bayesian method that integrates the variability of input data into Gaussian process regression. Considering two types of observables -- noise-corrupted outputs with fixed inputs and those with prior-distribution-defined uncertain inputs, a posterior distribution is estimated via a Bayesian framework to infer the uncertain data locations. Thereafter, such quantified uncertainties of inputs are incorporated into Gaussian process predictions by means of marginalization. The effectiveness of this new regression technique is demonstrated through several numerical examples, in which a consistently good performance of generalization is observed, while a substantial reduction in the predictive uncertainties is achieved by the Bayesian inference of uncertain inputs.	翻訳日:2023-05-31 00:00:00 公開日:2023-05-28
# 言語モデルのポストホック説明は言語モデルを改善することができる Post Hoc Explanations of Language Models Can Improve Language Models ( http://arxiv.org/abs/2305.11426v2 ) ライセンス: Link先を確認	Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, Himabindu Lakkaraju	(参考訳) 大規模言語モデル(LLM)は複雑なタスクの実行において顕著な能力を示した。さらに、最近の研究では、コンテキスト学習中に人間の注釈付き合理性(例えば、チェーン・オブ・マインド・プロンプト)を組み込むことで、特に推論能力を必要とするタスクにおいて、これらのモデルのパフォーマンスが著しく向上することが示されている。しかし、このような合理性の導入は、高い人間的関与を必要とするため、スケーラビリティの面での課題となる。そこで本研究では, 論理生成のプロセスを自動化することで, 上記の課題に対処する, AMPLIFY(Post Hoc Explanations)を用いたインテクスト学習の活用によるモデルパフォーマンスの増幅手法を提案する。この目的のために,各入力特徴がモデル予測に与える影響を捉えた帰属スコア(説明)を出力するポストホックな説明手法を利用する。より具体的には、ポストホックな説明から洞察を埋め込み、llmに補正信号を提供する自動自然言語理論を構築する。現実世界のデータセットによる大規模な実験により、私たちのフレームワークAMPLIFYは、Chain-of-Thoughtのような注釈付き論理に依存した従来のアプローチが不足するなど、幅広いタスクに対して約10～25%の精度の向上をもたらすことが示されています。本研究は,LLMの有効性を高める貴重なツールとして,ポストホック説明の可能性を強調した最初の試みである。さらに、amplifyの各コンポーネントの影響を実証するために、追加の実証分析とアブレーション研究を行い、その結果として、コンテキスト内学習を洗練するための重要な洞察を導きます。 Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of-Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales poses challenges in terms of scalability as this requires a high degree of human involvement. In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation. To this end, we leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. More specifically, we construct automated natural language rationales that embed insights from post hoc explanations to provide corrective signals to LLMs. Extensive experimentation with real-world datasets demonstrates that our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks, including those where prior approaches which rely on human-annotated rationales such as Chain-of-Thought prompting fall short. Our work makes one of the first attempts at highlighting the potential of post hoc explanations as valuable tools for enhancing the effectiveness of LLMs. Furthermore, we conduct additional empirical analyses and ablation studies to demonstrate the impact of each of the components of AMPLIFY, which, in turn, leads to critical insights for refining in-context learning.	翻訳日:2023-05-30 23:59:44 公開日:2023-05-28
# VisorGPT: 生成的事前学習による視覚的優先学習 VisorGPT: Learning Visual Prior via Generative Pre-Training ( http://arxiv.org/abs/2305.13777v3 ) ライセンス: Link先を確認	Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou	(参考訳) 視覚データ内の様々な物や物は、ディープニューラルネットワークによって学習できる特定の特徴を持ち、モデル内のオブジェクトの位置や形状など、視覚的に先行するものとして暗黙的に表現される。このような事前処理は多くの視覚タスクに影響を与える可能性がある。例えば、条件付き画像合成では、事前に固執しない空間条件は、視覚的に不正確な合成結果をもたらす。この作業は、視覚的事前学習とサンプリングのカスタマイズを可能にすることを目的としている。言語モデリングの進歩に触発されて、私たちはVisorGPTと呼ばれるジェネレーティブ・プレトレーニングを通してビジュアル・プレトレーニングを学ぶことを提案する。オブジェクトの視覚的位置を識別することで、ボックス、人間のポーズ、インスタンスマスクをシーケンスに分割することで、VisorGPTは最大化によって視覚的事前をモデル化することができる。さらに、様々な視覚的位置を統一し、学習前の逐次的な出力のサンプリングをカスタマイズできるようにする。実験の結果、visorgptは視覚前兆を効果的にモデル化できることが示され、例えば、制御ネットのような条件付き画像合成モデルのための正確な人間のポーズをカスタマイズするなど、多くの視覚タスクに使用できる。コードはhttps://github.com/Sierkinhane/VisorGPTでリリースされる。 Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, \emph{e.g.,} object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VisorGPT. By discretizing visual locations of objects, \emph{e.g.,} bounding boxes, human pose, and instance masks, into sequences, VisorGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate that VisorGPT can effectively model the visual prior, which can be employed for many vision tasks, such as customizing accurate human pose for conditional image synthesis models like ControlNet. Code will be released at https://github.com/Sierkinhane/VisorGPT.	翻訳日:2023-05-30 23:53:08 公開日:2023-05-28
# DiffProtect: 顔のプライバシー保護のための拡散モデルを用いた逆例の生成 DiffProtect: Generate Adversarial Examples with Diffusion Models for Facial Privacy Protection ( http://arxiv.org/abs/2305.13625v2 ) ライセンス: Link先を確認	Jiang Liu, Chun Pong Lau, Rama Chellappa	(参考訳) ますます広まりつつある顔認識(FR)システムは、特にソーシャルメディアで写真を公開している何十億ものユーザーにとって、個人のプライバシーに対する深刻な懸念を引き起こしている。いくつかの試みは、暗号化された顔画像を生成するために敵対的攻撃を利用する不正なFRシステムによって個人が識別されるのを防ぐために行われた。しかし、既存の手法は視覚品質の低下や攻撃成功率の低下に苦しむため、実用性が制限される。近年,拡散モデルが画像生成に多大な成功を収めている。拡散モデルは、視覚品質と攻撃性能の両方を改善するために、逆の例を生成するために使用できますか? 本稿では拡散オートエンコーダを用いてFRシステム上で意味論的に意味のある摂動を生成するDiffProtectを提案する。大規模な実験では、DiffProtectは最先端の手法よりも自然に見える暗号化画像を生成する一方で、CelebA-HQとFFHQのデータセットに対する24.5%と25.1%の絶対的な改善など、攻撃の成功率を大きく向上している。 The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. Several attempts have been made to protect individuals from being identified by unauthorized FR systems utilizing adversarial attacks to generate encrypted face images. However, existing methods suffer from poor visual quality or low attack success rates, which limit their utility. Recently, diffusion models have achieved tremendous success in image generation. In this work, we ask: can diffusion models be used to generate adversarial examples to improve both visual quality and attack performance? We propose DiffProtect, which utilizes a diffusion autoencoder to generate semantically meaningful perturbations on FR systems. Extensive experiments demonstrate that DiffProtect produces more natural-looking encrypted images than state-of-the-art methods while achieving significantly higher attack success rates, e.g., 24.5% and 25.1% absolute improvements on the CelebA-HQ and FFHQ datasets.	翻訳日:2023-05-30 23:52:46 公開日:2023-05-28
# ケイ酸塩の導電率予測のための非線形方程式の開発 Development of Non-Linear Equations for Predicting Electrical Conductivity in Silicates ( http://arxiv.org/abs/2305.13519v2 ) ライセンス: Link先を確認	Patrick dos Anjos, Lucas A. Quaresma, Marcelo L. P. Machado	(参考訳) 電気伝導度は電気炉(EAF)において基本的な重要性であり、この現象とプロセススラグとの相互作用はエネルギー損失と低い最適化をもたらす。数学的モデリングは現象の挙動を理解するのに役立ち、人工ニューラルネットワークを介してeafスラグの電気伝導率を予測するのに使われた。最高の人工ニューラルネットワークは、隠れた層に100のニューロンを持ち、6つの予測変数と予測変数、電気伝導率を持つ。平均絶対誤差と絶対誤差の標準偏差を算出し,各予測変数の効果を予測変数に関連付けるために感度解析を行った。 Electrical conductivity is of fundamental importance in electric arc furnaces (EAF) and the interaction of this phenomenon with the process slag results in energy losses and low optimization. As mathematical modeling helps in understanding the behavior of phenomena and it was used to predict the electrical conductivity of EAF slags through artificial neural networks. The best artificial neural network had 100 neurons in the hidden layer, with 6 predictor variables and the predicted variable, electrical conductivity. Mean absolute error and standard deviation of absolute error were calculated, and sensitivity analysis was performed to correlate the effect of each predictor variable with the predicted variable.	翻訳日:2023-05-30 23:52:24 公開日:2023-05-28
# LLMを用いたLLM推論パイプラインの応答長知覚とシーケンススケジューリング Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline ( http://arxiv.org/abs/2305.13144v2 ) ライセンス: Link先を確認	Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You	(参考訳) 大規模言語モデル(LLM)はAIの分野に革命をもたらし、様々なタスクで前例のない能力を示している。しかし、LLMの推論プロセスにはかなりの計算コストが伴う。本稿では,LLMのパワーを利用する効率的なLLM推論パイプラインを提案する。我々のアプローチは、LLMのポテンシャルをタップして、最小限のオーバーヘッドで応答長を正確に知覚し、予測することから始まります。この情報を活用することで、類似の応答長を持つクエリをマイクロバッチにグループ化する効率的なシーケンススケジューリング手法を導入する。 LLaMAモデルを用いて実世界の命令データセットに対するアプローチを評価し,提案手法の有効性を損なうことなく,推論スループットが86%向上したことを示す。特に,本手法は他の推論高速化手法と直交しており,LLM推論のための多くの既存のツールキット(例えば,FlashAttention, Quantization)に付加価値がある。 Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. However, the inference process for LLMs comes with significant computational costs. In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs. Our approach begins by tapping into the potential of LLMs to accurately perceive and predict the response length with minimal overhead. By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches. We evaluate our approach on real-world instruction datasets using the LLaMA-based model, and our results demonstrate an impressive 86% improvement in inference throughput without compromising effectiveness. Notably, our method is orthogonal to other inference acceleration techniques, making it a valuable addition to many existing toolkits (e.g., FlashAttention, Quantization) for LLM inference.	翻訳日:2023-05-30 23:51:59 公開日:2023-05-28
# コード言語モデルを用いたテキストからsqlへの誤り訂正 Text-to-SQL Error Correction with Language Models of Code ( http://arxiv.org/abs/2305.13073v2 ) ライセンス: Link先を確認	Ziru Chen, Shijie Chen, Michael White, Raymond Mooney, Ali Payani, Jayanth Srinivasa, Yu Su, Huan Sun	(参考訳) テキストからsqlへの構文解析の最近の進歩にもかかわらず、現在のセマンティックパーサは実用上十分正確ではない。本稿では,テキストからSQLへの自動誤り訂正モデルの構築方法について検討する。トークンレベルの編集は文脈外であり、時には曖昧であることに気付き、代わりに節レベルの編集モデルを構築することを提案する。また、ほとんどのコードの言語モデルはSQL用に事前訓練されていないが、一般的なデータ構造とPythonのようなプログラミング言語での操作を知っている。そこで本研究では,言語モデルの事前学習コーパスに係わる,SQLクエリとその編集のための新しい表現を提案する。誤差補正モデルは、異なるパーサーの正確なセットマッチング精度を2.4-6.5改善し、2つの強いベースラインに対して最大4.3ポイントの絶対改善を得る。私たちのコードとデータはhttps://github.com/OSU-NLP-Group/Auto-SQL-Correctionで公開されています。 Despite recent progress in text-to-SQL parsing, current semantic parsers are still not accurate enough for practical use. In this paper, we investigate how to build automatic text-to-SQL error correction models. Noticing that token-level edits are out of context and sometimes ambiguous, we propose building clause-level edit models instead. Besides, while most language models of code are not specifically pre-trained for SQL, they know common data structures and their operations in programming languages such as Python. Thus, we propose a novel representation for SQL queries and their edits that adheres more closely to the pre-training corpora of language models of code. Our error correction model improves the exact set match accuracy of different parsers by 2.4-6.5 and obtains up to 4.3 point absolute improvement over two strong baselines. Our code and data are available at https://github.com/OSU-NLP-Group/Auto-SQL-Correction.	翻訳日:2023-05-30 23:51:42 公開日:2023-05-28
# 深部強化学習によるスラムの道路計画 Road Planning for Slums via Deep Reinforcement Learning ( http://arxiv.org/abs/2305.13060v2 ) ライセンス: Link先を確認	Yu Zheng, Hongyuan Su, Jingtao Ding, Depeng Jin, Yong Li	(参考訳) 何百万人ものスラム住民がスラム内の不適切な道路インフラのために都市サービスへのアクセシビリティが低下しており、スラムの道路計画が都市の持続可能な発展に不可欠である。既存の再ブロックやヒューリスティックな手法は、異なるスラムに一般化できない時間を要するか、アクセシビリティや建設コストの観点から最適以下の道路計画が得られる。本稿では,スラムの道路配置を自動的に行うための深層強化学習手法を提案する。本研究では,スラムのトポロジー構造を捉える汎用グラフモデルを提案し,計画道路の場所を選択するための新しいグラフニューラルネットワークを考案する。マスキングポリシー最適化により,スラム内の場所を最小限の建設コストで接続する道路計画を作成することができる。異なる国における実世界のスラムに関する広範囲な実験により、モデルの有効性が検証され、既存のベースラインメソッドに対するアクセシビリティが14.3%向上した。異なるタスク間での移動に関するさらなる調査は、我々のモデルが単純なシナリオで道路計画スキルを習得し、より複雑なシナリオに適応できることを示し、我々のモデルを現実世界のスラムアップグレードに適用する可能性を示している。コードとデータはhttps://github.com/tsinghua-fib-lab/road-planning-for-slumsで入手できる。 Millions of slum dwellers suffer from poor accessibility to urban services due to inadequate road infrastructure within slums, and road planning for slums is critical to the sustainable development of cities. Existing re-blocking or heuristic methods are either time-consuming which cannot generalize to different slums, or yield sub-optimal road plans in terms of accessibility and construction costs. In this paper, we present a deep reinforcement learning based approach to automatically layout roads for slums. We propose a generic graph model to capture the topological structure of a slum, and devise a novel graph neural network to select locations for the planned roads. Through masked policy optimization, our model can generate road plans that connect places in a slum at minimal construction costs. Extensive experiments on real-world slums in different countries verify the effectiveness of our model, which can significantly improve accessibility by 14.3% against existing baseline methods. Further investigations on transferring across different tasks demonstrate that our model can master road planning skills in simple scenarios and adapt them to much more complicated ones, indicating the potential of applying our model in real-world slum upgrading. The code and data are available at https://github.com/tsinghua-fib-lab/road-planning-for-slums.	翻訳日:2023-05-30 23:51:29 公開日:2023-05-28
# Poisson から Gaussian ユニタリアンサンブル統計への移行のための Rosenzweig-Porter モデルの実験的検討 Experimental test of the Rosenzweig-Porter model for the transition from Poisson to Gaussian unitary ensemble statistics ( http://arxiv.org/abs/2305.12840v2 ) ライセンス: Link先を確認	Xiaodong Zhang, Weihua Zhang, Jiongning Che, and Barbara Dietz	(参考訳) 本稿では、積分可能な古典力学を持つ量子系の時間反転不変性(T)とカオス古典的相違性(カオス古典的相違性)に遷移する実験的研究について報告する。高温超伝導マイクロ波共振器を用いて高精度な実験を行い, その中心に位置するフェライトディスクを磁化することにより, T不変性およびカオスダイナミクスを誘導する。エルゴード相, フラクタル相, 局所相を示す多体量子カオスの文脈において, 現在, 集中的な研究が進められているRosenzweig-Porter(RP)モデルのスペクトル特性について, 1000個の固有周波数の完全列を決定し, 解析的予測を行う。さらに、このRPモデルとハイデルベルク法に基づいて、対応する開量子系の散乱(S)行列に対するランダム行列モデルにアプローチし、マイクロ波共振器の測定したS行列のゆらぎ特性を完璧に再現することを示す。 We report on an experimental investigation of the transition of a quantum system with integrable classical dynamics to one with violated time-reversal (T ) invariance and chaotic classical counterpart. High-precision experiments are performed with a flat superconducting microwave resonator with circular shape in which T invariance and a chaotic dynamics are induced by magnetizing a ferrite disk placed at its center. We determine a complete sequence of ' 1000 eigenfrequencies and verify analytical predictions for the spectral properties of the Rosenzweig-Porter (RP) model which, currently, is under intensive study in the context of many-body quantum chaos as it exhibits ergodic, fractal and localized phases. Furthermore, we introduce based on this RP model and the Heidelberg approach a random-matrix model for the scattering (S) matrix of the corresponding open quantum system and show that it perfectly reproduces the fluctuation properties of the measured S matrix of the microwave resonator.	翻訳日:2023-05-30 23:50:27 公開日:2023-05-28
# GPUに基づく並列アルゴリズムによるグラフ解析:量子クラスタリング Graph Analysis Using a GPU-based Parallel Algorithm: Quantum Clustering ( http://arxiv.org/abs/2305.14641v2 ) ライセンス: Link先を確認	Zhe Wang, ZhiJie He, Ding Liu	(参考訳) 本稿では、グラフ構造に量子クラスタリングを適用する新しい方法を紹介する。量子クラスタリング(Quantum Clustering, QC)は、ポテンシャル関数を構築してクラスター中心を決定する、新しい密度に基づく教師なし学習手法である。本手法では,グラフ勾配降下アルゴリズムを用いてクラスタの中心を探索する。 GPU並列化はポテンシャル値の計算に利用される。また,広く使用されている5つのデータセットについて実験を行い,4つの指標を用いて評価した。その結果,提案手法の性能が向上した。最後に,実験結果に対する$\sigma$の影響について考察する。 The article introduces a new method for applying Quantum Clustering to graph structures. Quantum Clustering (QC) is a novel density-based unsupervised learning method that determines cluster centers by constructing a potential function. In this method, we use the Graph Gradient Descent algorithm to find the centers of clusters. GPU parallelization is utilized for computing potential values. We also conducted experiments on five widely used datasets and evaluated using four indicators. The results show superior performance of the method. Finally, we discuss the influence of $\sigma$ on the experimental results.	翻訳日:2023-05-30 23:42:39 公開日:2023-05-28
# WinDB: HMDフリーで歪みのないパノラマビデオ固定学習 WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning ( http://arxiv.org/abs/2305.13901v2 ) ライセンス: Link先を確認	Guotao Wang, Chenglizhao Chen, Aimin Hao, Hong Qin, Deng-Ping Fan	(参考訳) これまで、パンオプティカルビデオで固定コレクションを行う方法は、hmdを装着しながら参加者の固定を収集し、与えられたパンオプティカルシーンを自由に探索するヘッドマウントディスプレイ(hmd)に基づいている。しかし、この広範に使用されているデータ収集手法は、間欠的な有意なイベントを含む場合、与えられたパノプティクス内のどの領域が最も重要であるかを正確に予測する深層モデルの訓練には不十分である。主な理由は、参加者が常にパン光学シーン全体を探索するために頭を回転させ続けることができないため、HMDを使用して固定を収集する際、常に「盲ズーム」が存在するからである。その結果、収集された固定は一部のローカルビューに閉じ込められがちであり、残りの領域は「盲ズーム」である。したがって、局所的なビューを蓄積するHMDベースの手法を用いて収集した固定データは、複雑なパノラマシーンの全体的重要性を正確に表すことはできない。本稿では,HMDを必要とせず,失明を伴わないパンオプティカルビデオに対して,動的ブラリング(WinDB)による補助窓を提案する。したがって、収集された固定は地域的重要性の度合いをよく反映することができる。 WinDBアプローチを使用して、225以上のカテゴリをカバーする300のパノプティクスクリップを含む、新しいPanopticVideo-300データセットをリリースしました。さらに,我々はpanopticvideo-300をフル活用し,ブラインドブルームフリー属性による固定シフト問題に対処するためのシンプルなベースライン設計を提案した。 To date, the widely-adopted way to perform fixation collection in panoptic video is based on a head-mounted display (HMD), where participants' fixations are collected while wearing an HMD to explore the given panoptic scene freely. However, this widely-used data collection method is insufficient for training deep models to accurately predict which regions in a given panoptic are most important when it contains intermittent salient events. The main reason is that there always exist "blind zooms" when using HMD to collect fixations since the participants cannot keep spinning their heads to explore the entire panoptic scene all the time. Consequently, the collected fixations tend to be trapped in some local views, leaving the remaining areas to be the "blind zooms". Therefore, fixation data collected using HMD-based methods that accumulate local views cannot accurately represent the overall global importance of complex panoramic scenes. This paper introduces the auxiliary Window with a Dynamic Blurring (WinDB) fixation collection approach for panoptic video, which doesn't need HMD and is blind-zoom-free. Thus, the collected fixations can well reflect the regional-wise importance degree. Using our WinDB approach, we have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories. Besides, we have presented a simple baseline design to take full advantage of PanopticVideo-300 to handle the blind-zoom-free attribute-induced fixation shifting problem.	翻訳日:2023-05-30 23:40:05 公開日:2023-05-28
# AI Coach Assist: エージェントコーチングのためのコンタクトセンターにおけるコールレコメンデーション自動化アプローチ AI Coach Assist: An Automated Approach for Call Recommendation in Contact Centers for Agent Coaching ( http://arxiv.org/abs/2305.17619v1 ) ライセンス: Link先を確認	Md Tahmid Rahman Laskar, Cheng Chen, Xue-Yong Fu, Mahsa Azizi, Shashi Bhushan, Simon Corston-Oliver	(参考訳) 近年,コンタクトセンター産業における人工知能(AI)の利用が増加している。 aiが大きな影響を与える領域の1つは、コンタクトセンターエージェントのコーチングである。自然言語処理(NLP)技術を用いてコールトランスクリプトを解析することにより、コーチング目的に最も関係のある呼び出しを素早く判断することができる。本稿では,予め学習したトランスフォーマーベースの言語モデルを用いて,コンタクトセンタマネージャや管理者が求めた品質保証(qa)質問に基づいて,あるコールがコーチ可能かどうかを判断するai coach assistを提案する。このシステムは、現実世界のコンタクトセンタから収集された大規模なデータセットでトレーニングと評価が行われ、コーチ可能なモーメントを含む可能性が高いコンタクトセンタマネージャへのコールを推奨する効果的な方法を提供する。実験の結果,AIコーチ支援がコーチングプロセスを改善する可能性を示し,コンタクトセンターエージェントの性能を高めることができた。 In recent years, the utilization of Artificial Intelligence (AI) in the contact center industry is on the rise. One area where AI can have a significant impact is in the coaching of contact center agents. By analyzing call transcripts using Natural Language Processing (NLP) techniques, it would be possible to quickly determine which calls are most relevant for coaching purposes. In this paper, we present AI Coach Assist, which leverages the pre-trained transformer-based language models to determine whether a given call is coachable or not based on the quality assurance (QA) questions asked by the contact center managers or supervisors. The system was trained and evaluated on a large dataset collected from real-world contact centers and provides an effective way to recommend calls to the contact center managers that are more likely to contain coachable moments. Our experimental findings demonstrate the potential of AI Coach Assist to improve the coaching process, resulting in enhancing the performance of contact center agents.	翻訳日:2023-05-30 17:59:01 公開日:2023-05-28
# ビジュアルクエリを2次元でローカライズするベイズ決定法 Bayesian Decision Making to Localize Visual Queries in 2D ( http://arxiv.org/abs/2305.17611v1 ) ライセンス: Link先を確認	Syed Asjad, Aniket Gupta, Hanumant Singh	(参考訳) 本稿では,EGO4D 2023 Visual Query 2D Localization Challengeに対する我々のアプローチについて述べる。本手法は,視覚的作物と提案する境界ボックスとの類似性が高いために生じる偽陽性(FP)の数を,ベースラインの地域提案ネットワーク(RPN)から削減することを目的としている。提案手法は,より高次元の類似性を決定するためにトランスフォーマを用いている。結果は,シムズヘッドの低次元の類似度と組み合わせて測定を行い,提案した境界箱との視覚的作物の最終的な類似度を決定するために,後部を生成する。私たちのコードは$\href{https://github.com/s-m-asjad/ego4d_vq2d}{here}$です。 This report describes our approach for the EGO4D 2023 Visual Query 2D Localization Challenge. Our method aims to reduce the number of False Positives (FP) that occur because of high similarity between the visual crop and the proposed bounding boxes from the baseline's Region Proposal Network (RPN). Our method uses a transformer to determine similarity in higher dimensions which is used as our prior belief. The results are then combined together with the similarity in lower dimensions from the Siamese Head, acting as our measurement, to generate a posterior which is then used to determine the final similarity of the visual crop with the proposed bounding box. Our code is publicly available $\href{https://github.com/s-m-asjad/EGO4D_VQ2D}{here}$.	翻訳日:2023-05-30 17:58:45 公開日:2023-05-28
# スピノルマター波の精密ラマン制御のための複合偏回転 Composite Biased Rotations for Precise Raman Control of Spinor Matterwaves ( http://arxiv.org/abs/2305.17610v1 ) ライセンス: Link先を確認	Liyang Qiu, Haidong Yuan and Saijun Wu	(参考訳) ラマン励起による超微粒子の精密制御は、原子ベースの量子テクノロジーのクラスに寄与する。我々は,ラマン励起電力効率と制御速度,励起状態断熱除去,自発的放出抑制条件のバランスを選択できる単光子デチューニング中間状態におけるアルカリ原子のラマンスピノル制御手法について検討した。ラマン結合による原子スピノルの回転は、実質的な光シフトによってバイアスを受ける。固定バイアス角を利用して、超微細な基底状態とレーザー照射が強い不均一な場合にも、複合偏光回転を最適化して、ナノ秒内で正確なエンサンブルスピノルマター波制御を可能にすることを示す。本手法は光パルス原子干渉計の技術的ギャップを埋め、中程度のレーザーパワーで高速ラマンスピノル物質波制御を実現する。 Precise control of hyperfine matterwaves via Raman excitations is instrumental to a class of atom-based quantum technology. We investigate the Raman spinor control technique for alkaline atoms in an intermediate regime of single-photon detuning where a choice can be made to balance the Raman excitation power efficiency with the control speed, excited-state adiabatic elimination, and spontaneous emission suppression requirements. Within the regime, rotations of atomic spinors by the Raman coupling are biased by substantial light shifts. Taking advantage of the fixed bias angle, we show that composite biased rotations can be optimized to enable precise ensemble spinor matterwave control within nanoseconds, even for multiple Zeeman pseudo-spins defined on the hyperfine ground states and when the laser illumination is strongly inhomogeneous. Our scheme fills a technical gap in light pulse atom interferometry, for achieving high speed Raman spinor matterwave control with moderate laser power.	翻訳日:2023-05-30 17:58:32 公開日:2023-05-28
# 大規模言語モデルに付随する逆崩壊 Reward Collapse in Aligning Large Language Models ( http://arxiv.org/abs/2305.17608v1 ) ライセンス: Link先を確認	Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su	(参考訳) ChatGPTやGPT-4のような大きな言語モデル(LLM)の異常な能力は、人間の好みに基づいて訓練された報酬モデルと整合させることによって、部分的には解放される。本稿では,一般的なランキングに基づくアプローチが,トレーニングの終盤におけるプロンプトの<textit{reward collapse}>分布を<textit{identical} reward distribution \textit{regardless}とする経験的観察である<textit{reward collapse}>の現象について述べる。この結果が望ましくないのは、例えば『あなたの親友について短い物語を書く』のようなオープンエンドプロンプトは、その完成に対して連続的な報酬を与えるべきであり、『ニュージーランドの首都である』のような特定のプロンプトは、高いまたは低い報酬を生成するべきである。我々の理論的調査により,報酬の崩壊は,主として最適化中にプロンプト関連情報を取り込むためのランキングに基づく客観的関数の不足によるものであることが明らかとなった。この洞察により、漸近的な方法でユーティリティ関数の集合に付随する報酬分布に対する閉形式表現を導出することができる。報酬の崩壊を克服するため,インタプリケーション・アウェア・最適化方式を導入し,インタプリケーション・レシスタンス内での報酬分布を確実に認める。提案するプロンプトアウェア効用関数は,報酬モデルのトレーニング中の報酬崩壊を著しく軽減することが示唆された。 The extraordinary capabilities of large language models (LLMs) such as ChatGPT and GPT-4 are in part unleashed by aligning them with reward models that are trained on human preferences, which are often represented as rankings of responses to prompts. In this paper, we document the phenomenon of \textit{reward collapse}, an empirical observation where the prevailing ranking-based approach results in an \textit{identical} reward distribution \textit{regardless} of the prompts during the terminal phase of training. This outcome is undesirable as open-ended prompts like ``write a short story about your best friend'' should yield a continuous range of rewards for their completions, while specific prompts like ``what is the capital of New Zealand'' should generate either high or low rewards. Our theoretical investigation reveals that reward collapse is primarily due to the insufficiency of the ranking-based objective function to incorporate prompt-related information during optimization. This insight allows us to derive closed-form expressions for the reward distribution associated with a set of utility functions in an asymptotic regime. To overcome reward collapse, we introduce a prompt-aware optimization scheme that provably admits a prompt-dependent reward distribution within the interpolating regime. Our experimental results suggest that our proposed prompt-aware utility functions significantly alleviate reward collapse during the training of reward models.	翻訳日:2023-05-30 17:58:15 公開日:2023-05-28
# more than classification:イベント時間関係抽出のための統一フレームワーク More than Classification: A Unified Framework for Event Temporal Relation Extraction ( http://arxiv.org/abs/2305.17607v1 ) ライセンス: Link先を確認	Quzhe Huang, Yutong Hu, Shengqi Zhu, Yansong Feng, Chang Liu, Dongyan Zhao	(参考訳) イベント時間関係抽出~(etre)は、通常マルチラベル分類タスクとして定式化され、各タイプの関係は単に1つのホットラベルとして扱われる。この定式化は関係の意味を無視し、固有の依存関係を消去する。 ETREタスクにおける関係定義を調べた結果,イベントの開始点と終了点を用いてすべての関係を解釈できることがわかった。例えば、関係 \textit{includes} は、イベント 1 がイベント 2 から始まり、イベント 2 よりも早く終わると解釈できる。本稿では,時間関係を時間軸の論理式に変換し,ある時間軸対間の関係を予測してetreを完了させる統一イベント時間関係抽出フレームワークを提案する。 TB-DenseとMATRESの実験では、強いベースラインよりも大幅に改善され、両方のデータセットで最先端モデルよりも0.35%向上した。統一されたフレームワークにおけるすべての関係を表現することにより、適切なデータとの関係を利用して、他の関係の学習を支援し、低データシナリオにおける安定した改善を実現することができる。関係定義が変更されると、時間ポイントを新しいイベントリレーションにマップするロジック式を単純に変更することで、新しいものに素早く適応することができる。コードは \url{https://github.com/AndrewZhe/A-Unified-Framework-for-ETRE} でリリースされる。 Event temporal relation extraction~(ETRE) is usually formulated as a multi-label classification task, where each type of relation is simply treated as a one-hot label. This formulation ignores the meaning of relations and wipes out their intrinsic dependency. After examining the relation definitions in various ETRE tasks, we observe that all relations can be interpreted using the start and end time points of events. For example, relation \textit{Includes} could be interpreted as event 1 starting no later than event 2 and ending no earlier than event 2. In this paper, we propose a unified event temporal relation extraction framework, which transforms temporal relations into logical expressions of time points and completes the ETRE by predicting the relations between certain time point pairs. Experiments on TB-Dense and MATRES show significant improvements over a strong baseline and outperform the state-of-the-art model by 0.3\% on both datasets. By representing all relations in a unified framework, we can leverage the relations with sufficient data to assist the learning of other relations, thus achieving stable improvement in low-data scenarios. When the relation definitions are changed, our method can quickly adapt to the new ones by simply modifying the logic expressions that map time points to new event relations. The code is released at \url{https://github.com/AndrewZhe/A-Unified-Framework-for-ETRE}.	翻訳日:2023-05-30 17:57:48 公開日:2023-05-28
# 機械設計による光格子原子干渉計 A Machine-Designed Optical Lattice Atom Interferometer ( http://arxiv.org/abs/2305.17603v1 ) ライセンス: Link先を確認	Catie LeDesma, Kendall Mehling, Jieqiu Shao, John Drew Wilson, Penina Axelrad, Marco M. Nicotra, Murray Holland, and Dana Z. Anderson	(参考訳) 光の定常波によって形成される光学格子における干渉測定は、原子を光学ポテンシャルによって閉じ込めて操作できるため、その自由空間等価性に対して潜在的に有利である。このような干渉計を1次元格子で示し、その周期中に多くの段階で波動関数をイメージングし再構成することで原子を制御する能力を示す。加速信号が適用され、量子理論に従って囲む時間空間領域に対して、得られる性能は可能な限り最適に近いものとなる。われわれの機械設計の手法は、センサーをオンザフライで再構成可能とし、スケールアップすれば、さまざまな潜在的な応用が可能な最先端の慣性・重力センサーを作れる可能性がある。 Performing interferometry in an optical lattice formed by standing waves of light offers potential advantages over its free-space equivalents since the atoms can be confined and manipulated by the optical potential. We demonstrate such an interferometer in a one dimensional lattice and show the ability to control the atoms by imaging and reconstructing the wavefunction at many stages during its cycle. An acceleration signal is applied and the resulting performance is seen to be close to the optimum possible for the time-space area enclosed according to quantum theory. Our methodology of machine design enables the sensor to be reconfigurable on the fly, and when scaled up, offers the potential to make state-of-the art inertial and gravitational sensors that will have a wide range of potential applications.	翻訳日:2023-05-30 17:57:25 公開日:2023-05-28
# 適切なスコアリングルールによる正直なパフォーマンス予測の動機付け Incentivizing honest performative predictions with proper scoring rules ( http://arxiv.org/abs/2305.17601v1 ) ライセンス: Link先を確認	Caspar Oesterheld, Johannes Treutlein, Emery Cooper, Rubi Hudson	(参考訳) 適切なスコアリングルールは、予測が結果に影響を及ぼさないと仮定して、専門家に信念を正確に報告するインセンティブを与える。この仮定を緩和し、予測が実行可能である場合、すなわち株式市場に関する公開予測を行う場合など、予測の結果に影響を与える場合のインセンティブを調査します。予測は、その予測がなされた後の専門家の信念を正確に反映するならば、不動点であると言える。この設定では、期待スコアを最大化するレポートは専門家の信念を反映せず、そのようなレポートの正確性に限界を与える。二項予測に対して、専門家の予測が結果に与える影響が限定されている場合、最適なレポートが任意に固定点に近づくスコアリングルールを定義することができる。しかし、これは2つ以上の結果に対する予測では不可能である。また、おもちゃの設定で数値シミュレーションを行い、いくつかの状況では境界がきついこと、予測誤差がかなり大きいこと(5～10%以上)を示しました。最後に,最適性の代替概念について検討し,不動点の報告にインセンティブを与えることを示す。 Proper scoring rules incentivize experts to accurately report beliefs, assuming predictions cannot influence outcomes. We relax this assumption and investigate incentives when predictions are performative, i.e., when they can influence the outcome of the prediction, such as when making public predictions about the stock market. We say a prediction is a fixed point if it accurately reflects the expert's beliefs after that prediction has been made. We show that in this setting, reports maximizing expected score generally do not reflect an expert's beliefs, and we give bounds on the inaccuracy of such reports. We show that, for binary predictions, if the influence of the expert's prediction on outcomes is bounded, it is possible to define scoring rules under which optimal reports are arbitrarily close to fixed points. However, this is impossible for predictions over more than two outcomes. We also perform numerical simulations in a toy setting, showing that our bounds are tight in some situations and that prediction error is often substantial (greater than 5-10%). Lastly, we discuss alternative notions of optimality, including performative stability, and show that they incentivize reporting fixed points.	翻訳日:2023-05-30 17:57:12 公開日:2023-05-28
# ゲームアップ: 軌道予測のためのゲームアウェアモード列挙と理解 GAME-UP: Game-Aware Mode Enumeration and Understanding for Trajectory Prediction ( http://arxiv.org/abs/2305.17600v1 ) ライセンス: Link先を確認	Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman	(参考訳) 道路エージェント間の相互作用は、特に複数のエージェントを含む場合において、軌道予測において重要な課題となる。既存の多様性を考慮した予測器はマルチエージェント予測のインタラクティブな性質を考慮しないため、これらの重要な相互作用の結果を見逃す可能性がある。本稿では,ゲーム理論の逆強化学習を活用し,マルチモーダル予測のカバレッジを向上させるための軌道予測フレームワークであるGAME-UPを提案する。我々は,エージェントの行動の分類を仮定することなく,学習時間ゲーム理論の数値解析を補助的損失として使用し,カバレッジと精度を改善した。 Waymo Open Motion Datasetのインタラクティブなサブセットに対して,対話性の高いシナリオを含む3つのサブセットを含むアプローチを実証する。実験の結果、予測器はベースラインモデルに比べて2倍の相互作用をカバーし、正確な予測を行うことがわかった。 Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose GAME-UP, a framework for trajectory prediction that leverages game-theoretic inverse reinforcement learning to improve coverage of multi-modal predictions. We use a training-time game-theoretic numerical analysis as an auxiliary loss resulting in improved coverage and accuracy without presuming a taxonomy of actions for the agents. We demonstrate our approach on the interactive subset of Waymo Open Motion Dataset, including three subsets involving scenarios with high interaction complexity. Experiment results show that our predictor produces accurate predictions while covering twice as many possible interactions versus a baseline model.	翻訳日:2023-05-30 17:56:53 公開日:2023-05-28
# キャタピラーを使って小さな画像をくすぐる Using Caterpillar to Nibble Small-Scale Images ( http://arxiv.org/abs/2305.17644v1 ) ライセンス: Link先を確認	Jin Sun, Xiaoshuang Shi, Zhiyuan Weng, Kaidi Xu, Heng Tao Shen and Xiaofeng Zhu	(参考訳) 近年、MLPベースのモデルは人気を博し、中規模のデータセット(例えば ImageNet-1k)で大きなパフォーマンスを達成した。しかし、小規模な画像への直接の応用は限られている。この問題に対処するため,我々は,局所性の帰納的バイアスを生かしたShifted-Pillars-Concatenation (SPC) のキーモジュールを提案することで,新たなMLPベースのネットワークであるCaterpillarを設計する。 spcは、画像内のすべての柱を異なる方向に沿って移動させてコピーを生成するピラーシフトと、シフトされたコピーの離散シフト近傍から局所情報をキャプチャするピラー結合の2つのプロセスからなる。大規模な実験では、人気のある小規模データセット上でのスケーラビリティと優れたパフォーマンス、ImageNet-1Kの最近の最先端メソッドとの競合性能が実証されている。 Recently, MLP-based models have become popular and attained significant performance on medium-scale datasets (e.g., ImageNet-1k). However, their direct applications to small-scale images remain limited. To address this issue, we design a new MLP-based network, namely Caterpillar, by proposing a key module of Shifted-Pillars-Concatenation (SPC) for exploiting the inductive bias of locality. SPC consists of two processes: (1) Pillars-Shift, which is to shift all pillars within an image along different directions to generate copies, and (2) Pillars-Concatenation, which is to capture the local information from discrete shift neighborhoods of the shifted copies. Extensive experiments demonstrate its strong scalability and superior performance on popular small-scale datasets, and the competitive performance on ImageNet-1K to recent state-of-the-art methods.	翻訳日:2023-05-30 17:48:15 公開日:2023-05-28
# 演算子のエンタングリング能力 Entangling capacity of operators ( http://arxiv.org/abs/2305.17636v1 ) ライセンス: Link先を確認	Manas K Patra	(参考訳) 複合量子システムに作用するユニタリ演算子$U$を与えられた場合、絡み合う容量は$U$? この質問は幾何学的アプローチで検討される。ユニタリ群上の計量によって定義される絡み合い容量は \emph{minimax} 問題に繋がる。双対問題である \emph{maximin} は並列に研究され、慣れ親しんだ絡み合い測度が得られる。一般化制御作用素と呼ばれる絡み合い作用素のクラスが定義される。この作用素のクラスに対する絡み合うキャパシティとその他の性質について研究する。 Given a unitary operator $U$ acting on a composite quantum system what is the entangling capacity of $U$? This question is investigated using a geometric approach. The entangling capacity, defined via metrics on the unitary groups, leads to a \emph{minimax} problem. The dual, a \emph{maximin} problem, is investigated in parallel and yields some familiar entanglement measures. A class of entangling operators, called generalized control operators is defined. The entangling capacities and other properties for this class of operators is studied.	翻訳日:2023-05-30 17:47:58 公開日:2023-05-28
# DPFormer: 長期データによる個人差分変換器の学習 DPFormer: Learning Differentially Private Transformer on Long-Tailed Data ( http://arxiv.org/abs/2305.17633v1 ) ライセンス: Link先を確認	Youlong Ding, Xueyang Wu, Hao Wang and Weike Pan	(参考訳) Transformerは幅広いアプリケーションを持つ汎用的で効果的なアーキテクチャとして登場した。しかし、高ユーティリティのTransformerモデルを異なるプライバシー保証で効率的にトレーニングする方法は、まだ未解決のままである。本稿では,差分秘密変換器の学習における2つの重要な課題,すなわち,サンプルごとの勾配切り抜きや注意機構内の意図しない注意散らしによる計算オーバーヘッドについて述べる。そこで我々は,これらの課題に対処するため,Phantom ClippingとRe-Attention Mechanismを備えたDPFormerを提案する。我々の理論的分析は,DPFormerが勾配クリッピングの際の計算コストを低減し,注意散逸を効果的に軽減できることを示唆している(これはトレーニング過程を阻害し,特に長期データの存在下では顕著な性能低下につながる可能性がある)。このような分析は、2つの実世界のデータセットに対する実験結果によってさらに裏付けられ、提案したDPFormerの有効性と有効性を示す。 The Transformer has emerged as a versatile and effective architecture with broad applications. However, it still remains an open problem how to efficiently train a Transformer model of high utility with differential privacy guarantees. In this paper, we identify two key challenges in learning differentially private Transformers, i.e., heavy computation overhead due to per-sample gradient clipping and unintentional attention distraction within the attention mechanism. In response, we propose DPFormer, equipped with Phantom Clipping and Re-Attention Mechanism, to address these challenges. Our theoretical analysis shows that DPFormer can reduce computational costs during gradient clipping and effectively mitigate attention distraction (which could obstruct the training process and lead to a significant performance drop, especially in the presence of long-tailed data). Such analysis is further corroborated by empirical results on two real-world datasets, demonstrating the efficiency and effectiveness of the proposed DPFormer.	翻訳日:2023-05-30 17:47:49 公開日:2023-05-28
# 高スケーラブルユニバーサルユニタリのためのプログラム可能なフォトニック時間回路 Programmable photonic time circuits for highly scalable universal unitaries ( http://arxiv.org/abs/2305.17632v1 ) ライセンス: Link先を確認	Xianji Piao, Sunkyu Yu, and Namkyoo Park	(参考訳) プログラマブルフォトニック回路 (Programmable Photonic circuits, PPC) は、ディープラーニング加速と普遍量子計算の実現に多大な関心を集めている。 PPCを用いたフォトニック計算は、超高速な演算、エネルギー効率のマトリックス計算、室温量子状態などの重要な利点があるが、そのスケーラビリティの低さは産業アプリケーションに必要な統合を妨げている。この課題は、従来のPPCにおける伝搬光を用いた一時的ワンショット操作から生じ、デバイスフットプリントの光速増加につながる。本稿では,フォン・ノイマンアーキテクチャと量子計算におけるゲートサイクリングに類似した時間サイクル計算を用いた,プログラマブルフォトニック時間回路の概念を提案する。ビルディングブロックとして、波長可変共振を持つ2つの共振器からなる再構成可能なsu(2)タイムゲートを開発し、時間符号化されたデュアルチャネルゲージフィールドを介して結合する。我々はSU(2)時間ゲートの系統的な組立を用いて高忠実度なU(N)演算を実証し、フットプリントとゲート数の両方においてO(N^2)からO(N)へのスケーラビリティの向上を実現した。これにより、産業レベルのPPC実装を大規模に統合する道が開ける。 Programmable photonic circuits (PPCs) have garnered substantial interest in achieving deep learning accelerations and universal quantum computations. Although photonic computation using PPCs offers critical advantages, including ultrafast operation, energy-efficient matrix calculation and room-temperature quantum states, its poor scalability impedes the integration required for industrial applications. This challenge arises from the temporally one-shot operation using propagating light in conventional PPCs, which leads to the light-speed increase of device footprints. Here we propose a concept of programmable photonic time circuits, which employ time-cycle-based computations analogous to the gate cycling in the von Neumann architecture and quantum computation. As a building block, we develop a reconfigurable SU(2) time gate composed of two resonators, which have tunable resonances and are coupled through time-coded dual-channel gauge fields. We demonstrate universal U(N) operations with high fidelity using the systematic assembly of the SU(2) time gates, achieving improved scalability from O(N^2) to O(N) in both the footprint and gate number. This result opens a pathway to industrial-level PPC implementation in very large-scale integration.	翻訳日:2023-05-30 17:47:32 公開日:2023-05-28
# 射影演算子に基づくニュートンステップを用いた量子最適制御問題の解法 How to solve Quantum Optimal Control Problems using Projection Operator-based Newton Steps ( http://arxiv.org/abs/2305.17630v1 ) ライセンス: Link先を確認	Jieqiu Shao, Mantas Naris, John Hauser and Marco M. Nicotra	(参考訳) 量子PRojection Operator-based Newton method for Trajectory Optimization(Q-PRONTO)は、量子最適制御問題の解法である。本稿では,各繰り返しの解推定を安定化させるレギュレータを導入することにより,先行バージョンの量子投影演算子を著しく改善する。この修正はアルゴリズムの収束率を向上させるだけでなく、非規制の場合と比較して解法をより局所的な最小化へと導くことが示されている。数値的な例では、Q-PRONTOは、時間的なコストと過渡期に避けるべき望ましくない人口を含む、多入力の量子最適制御問題の解決に使用できる。 The Quantum PRojection Operator-based Newton method for Trajectory Optimization, a.k.a. Q-PRONTO, is a numerical method for solving quantum optimal control problems. This paper significantly improves prior versions of the quantum projection operator by introducing a regulator that stabilizes the solution estimate at every iteration. This modification is shown to not only improve the convergence rate of the algorithm, but also steer the solver towards better local minima compared to the un-regulated case. Numerical examples showcase Q-PRONTO can be used to solve multi-input quantum optimal control problems featuring time-varying costs and undesirable populations that ought to be avoided during the transient.	翻訳日:2023-05-30 17:47:11 公開日:2023-05-28
# 残意障害を考慮した頑健な自然言語理解 Robust Natural Language Understanding with Residual Attention Debiasing ( http://arxiv.org/abs/2305.17627v1 ) ライセンス: Link先を確認	Fei Wang, James Y. Huang, Tianyi Yan, Wenxuan Zhou, Muhao Chen	(参考訳) 自然言語理解(NLU)モデルは意図しないデータセットバイアスに悩まされることが多い。バイアス緩和手法のうち、アンサンブルに基づくデバイアス手法、特にpoe(product-of-experts)は印象的な成功を収めている。しかしながら、従来のアンサンブルベースのデバイアス手法は、一般的に、バイアスのある注意パターンを直接扱うことなく、トップレベルのロジットにデバイアスを適用する。注意力はplmにおける機能インタラクションと集約の主要なメディアとなり、堅牢な予測を提供する上で重要な役割を果たす。本稿では,注意から意図しないバイアスを緩和するエンド・ツー・エンド・デバイアス手法であるresent attention debiasing (read)を提案する。 3つのNLUタスクの実験では、READはショートカットを除去したOODデータ上でのBERTベースのモデルの性能を著しく改善し、HANSでは+12.9%、FEVER-Symmetricでは+11.0%、PAWSでは+2.7%である。詳細な分析により、ロバストなnluモデルにおける偏りのない注意の重要役割が示され、読解は注意のバイアスを効果的に軽減する。コードはhttps://github.com/luka-group/readで入手できる。 Natural language understanding (NLU) models often suffer from unintended dataset biases. Among bias mitigation methods, ensemble-based debiasing methods, especially product-of-experts (PoE), have stood out for their impressive empirical success. However, previous ensemble-based debiasing methods typically apply debiasing on top-level logits without directly addressing biased attention patterns. Attention serves as the main media of feature interaction and aggregation in PLMs and plays a crucial role in providing robust prediction. In this paper, we propose REsidual Attention Debiasing (READ), an end-to-end debiasing method that mitigates unintended biases from attention. Experiments on three NLU tasks show that READ significantly improves the performance of BERT-based models on OOD data with shortcuts removed, including +12.9% accuracy on HANS, +11.0% accuracy on FEVER-Symmetric, and +2.7% F1 on PAWS. Detailed analyses demonstrate the crucial role of unbiased attention in robust NLU models and that READ effectively mitigates biases in attention. Code is available at https://github.com/luka-group/READ.	翻訳日:2023-05-30 17:46:59 公開日:2023-05-28
# 事前学習言語モデルを用いた文脈分析 In-Context Analogical Reasoning with Pre-Trained Language Models ( http://arxiv.org/abs/2305.17626v1 ) ライセンス: Link先を確認	Xiaoyang Hu, Shane Storks, Richard L. Lewis, Joyce Chai	(参考訳) アナロジカル推論は人間の認知の基本的な能力であり、過去の経験に関連付けて、新しい状況を抽象的に推論することができる。 aiシステムのロバストな推論には不可欠と考えられているが、従来のアプローチでは、ベンチマークタスクに適用するには、重要なトレーニングとドメイン知識のハードコーディングが必要となる。人間の言語とアナロジー作成の関連を見出した認知科学の研究に触発され、aiシステムにおけるアナロジーをサポートするために直感的な言語ベースの抽象化の使用を探求する。具体的には、一般的な関係推論テストである visual raven's progressive matrices (rpm) に、大きな事前学習言語モデル (plm) を適用する。問題の知覚的特徴を言語形式に符号化することで、PLMはゼロショットリレーショナル推論に顕著な能力を示し、人間のパフォーマンスを超え、教師付き視覚ベースの手法に近づいた。タスク特徴よりも抽象化のレベルが異なる異なるエンコーディングを探索し、より高いレベルの抽象化がPLMのアナログ推論をさらに強化することを発見した。詳細な分析により,rpmタスク解決におけるモデル複雑性,インコンテキスト学習,事前知識の役割に関する知見が明らかになった。 Analogical reasoning is a fundamental capacity of human cognition that allows us to reason abstractly about novel situations by relating them to past experiences. While it is thought to be essential for robust reasoning in AI systems, conventional approaches require significant training and/or hard-coding of domain knowledge to be applied to benchmark tasks. Inspired by cognitive science research that has found connections between human language and analogy-making, we explore the use of intuitive language-based abstractions to support analogy in AI systems. Specifically, we apply large pre-trained language models (PLMs) to visual Raven's Progressive Matrices (RPM), a common relational reasoning test. By simply encoding the perceptual features of the problem into language form, we find that PLMs exhibit a striking capacity for zero-shot relational reasoning, exceeding human performance and nearing supervised vision-based methods. We explore different encodings that vary the level of abstraction over task features, finding that higher-level abstractions further strengthen PLMs' analogical reasoning. Our detailed analysis reveals insights on the role of model complexity, in-context learning, and prior knowledge in solving RPM tasks.	翻訳日:2023-05-30 17:46:34 公開日:2023-05-28
# バリューガイドデータフィルタリングによるクロスドメインポリシー適応 Cross-Domain Policy Adaptation via Value-Guided Data Filtering ( http://arxiv.org/abs/2305.17625v1 ) ライセンス: Link先を確認	Kang Xu, Chenjia Bai, Xiaoteng Ma, Dong Wang, Bin Zhao, Zhen Wang, Xuelong Li, Wei Li	(参考訳) 動的ミスマッチによるドメイン間のポリシーの一般化は、強化学習において重要な課題となる。例えば、ロボットはシミュレータでポリシーを学習するが、現実の世界にデプロイされると、環境のダイナミクスが異なる可能性がある。動的ミスマッチのあるソースドメインとターゲットドメインを考えると、ターゲットドメインとのオンラインインタラクションが制限されている間にエージェントが十分なソースドメインデータにアクセスすることができるオンラインダイナミクス適応問題を考える。既存の研究は、ダイナミクスの不一致の観点からこの問題を解決しようと試みている。本稿では、これらの手法の限界を明らかにし、ドメイン間の価値整合性に関する新しい洞察を通して、価値差の観点から問題を探求する。具体的には、2つの領域にまたがるペア値ターゲットの近接性に基づいて、ソースドメインからの遷移を選択的に共有するバリューガイドデータフィルタリング(VGDF)アルゴリズムを提案する。キネマティック・モルフォロジーシフトを用いた各種環境における実験結果から,従来の手法よりも優れた性能が得られることが示された。 Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. For example, a robot learns the policy in a simulator, but when it is deployed in the real world, the dynamics of the environment may be different. Given the source and target domain with dynamics mismatch, we consider the online dynamics adaptation problem, in which case the agent can access sufficient source domain data while online interactions with the target domain are limited. Existing research has attempted to solve the problem from the dynamics discrepancy perspective. In this work, we reveal the limitations of these methods and explore the problem from the value difference perspective via a novel insight on the value consistency across domains. Specifically, we present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets across the two domains. Empirical results on various environments with kinematic and morphology shifts demonstrate that our method achieves superior performance compared to prior approaches.	翻訳日:2023-05-30 17:46:13 公開日:2023-05-28
# SimpSON: シングルクリックディストリクトオブジェクトセグメンテーションネットワークによる写真クリーンアップの簡易化 SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network ( http://arxiv.org/abs/2305.17624v1 ) ライセンス: Link先を確認	Chuong Huynh, Yuqian Zhou, Zhe Lin, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Abhinav Shrivastava	(参考訳) 写真編集では、視覚障害を取り除き、全体的な画質を改善し、第一主題を強調するのが一般的である。しかし、これら小さくて密集した散逸した領域を手動で選別・取り除くことは、手間と時間を要する作業である。本稿では,ワンクリックでタスクを遂行するために最適化された対話型トラクタ選択法を提案する。提案手法は,従来のパン光学セグメント法により達成された精度とリコールを超越し,クリックを含むセグメントを選択する。また,ユーザのクリック位置に似た,より注意をそそる領域を特定するために,トランスフォーマティブベースのモジュールを使用する方法も紹介する。実験により,未知の注意をそらす物体を対話的およびグループ的に,効果的かつ正確に分割できることを実証した。画像のクリーニングとリタッチ処理を大幅に単純化することにより,レアオブジェクトのセグメンテーションとグループ選択をワンクリックで探索するためのインスピレーションを提供する。 In photo editing, it is common practice to remove visual distractions to improve the overall image quality and highlight the primary subject. However, manually selecting and removing these small and dense distracting regions can be a laborious and time-consuming task. In this paper, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click. Our method surpasses the precision and recall achieved by the traditional method of running panoptic segmentation and then selecting the segments containing the clicks. We also showcase how a transformer-based module can be used to identify more distracting regions similar to the user's click position. Our experiments demonstrate that the model can effectively and accurately segment unknown distracting objects interactively and in groups. By significantly simplifying the photo cleaning and retouching process, our proposed model provides inspiration for exploring rare object segmentation and group selection with a single click.	翻訳日:2023-05-30 17:45:54 公開日:2023-05-28
# 政策再利用における筋覚行動の価値について On the Value of Myopic Behavior in Policy Reuse ( http://arxiv.org/abs/2305.17623v1 ) ライセンス: Link先を確認	Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li	(参考訳) 未知のシナリオで学習戦略を活用することは、人間の知性の基本である。強化学習では、他のタスクや人間の専門家から得られたポリシーを合理的に再利用することが、スクラッチから学ぶのが難しい問題に取り組む上で重要である。本研究では,Selectivemyopic bEhavior Control~(SMEC)というフレームワークを提案する。 SMECは、ハイブリッドバリュー関数アーキテクチャによる事前ポリシーの動作を評価することにより、事前ポリシーの共有可能な短期的行動とタスクポリシーの長期的挙動を適応的に集約し、協調的な決定をもたらす。操作と移動タスクのコレクションに関する実証的な結果は、SMECが既存の手法よりも優れており、SMECが関連する事前ポリシーを活用する能力を検証することを示している。 Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In reinforcement learning, rationally reusing the policies acquired from other tasks or human experts is critical for tackling problems that are difficult to learn from scratch. In this work, we present a framework called Selective Myopic bEhavior Control~(SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks. By evaluating the behaviors of prior policies via a hybrid value function architecture, SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions. Empirical results on a collection of manipulation and locomotion tasks demonstrate that SMEC outperforms existing methods, and validate the ability of SMEC to leverage related prior policies.	翻訳日:2023-05-30 17:45:35 公開日:2023-05-28
# 多様文脈における語彙検索仮説 Lexical Retrieval Hypothesis in Multimodal Context ( http://arxiv.org/abs/2305.17663v1 ) ライセンス: Link先を確認	Po-Ya Angela Wang, Pin-Er Chen, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh	(参考訳) マルチモーダルコーパスは言語科学や自然言語処理(NLP)システムにとって欠かせない言語資源となっている。本稿では,台湾初の多モーダル言語コーパス(MultiMoco)の構築に向けた取り組みについて紹介する。コーパスに基づいて語彙検索仮説(LRH)を検証し,言語定数と共起する手振りが語彙検索や他の言論機能に役立てるかどうかを検討した。台湾・マンダリンにおける8つの議会干渉に関する詳細なアノテーションを用いて, 発話定数と非言語的特徴(頭部運動, 顔運動, 手のジェスチャー, 動作機能)の共起について検討した。本研究は,手の動きが語彙検索のファシリテーターとして機能する一方で,情報強調の目的も果たすことを示唆している。本研究は,MultiMoco Corpusが深部分析やマルチモーダルコミュニケーション研究において重要な資源を提供する可能性を明らかにするものである。 Multimodal corpora have become an essential language resource for language science and grounded natural language processing (NLP) systems due to the growing need to understand and interpret human communication across various channels. In this paper, we first present our efforts in building the first Multimodal Corpus for Languages in Taiwan (MultiMoco). Based on the corpus, we conduct a case study investigating the Lexical Retrieval Hypothesis (LRH), specifically examining whether the hand gestures co-occurring with speech constants facilitate lexical retrieval or serve other discourse functions. With detailed annotations on eight parliamentary interpellations in Taiwan Mandarin, we explore the co-occurrence between speech constants and non-verbal features (i.e., head movement, face movement, hand gesture, and function of hand gesture). Our findings suggest that while hand gestures do serve as facilitators for lexical retrieval in some cases, they also serve the purpose of information emphasis. This study highlights the potential of the MultiMoco Corpus to provide an important resource for in-depth analysis and further research in multimodal communication studies.	翻訳日:2023-05-30 17:40:03 公開日:2023-05-28
# 事前学習モデルのためのプラグ・アンド・プレイ文書モジュール Plug-and-Play Document Modules for Pre-trained Models ( http://arxiv.org/abs/2305.17660v1 ) ライセンス: Link先を確認	Chaojun Xiao, Zhengyan Zhang, Xu Han, Chi-Min Chan, Yankai Lin, Zhiyuan Liu, Xiangyang Li, Zhonghua Li, Zhao Cao, Maosong Sun	(参考訳) 大規模事前学習モデル(PTM)は、質問応答などの文書指向のNLPタスクに広く用いられている。しかし、エンコーディングとタスクの結合要件により、異なるタスクやクエリに対して同じ文書を繰り返しエンコーディングすることになり、計算効率が低下する。この目的のために、下流タスクから文書エンコーディングを分離することを目標とし、各文書をPTM(PlugD)用のプラグインであるプラグイン・アンド・プレイ文書モジュールとして表現することを提案する。下流タスクのために文書プラグインをバックボーンPTMに挿入することで、文書を1回エンコードして複数のタスクを処理することができ、タスク固有のエンコーダを用いて文書と入力クエリを同時にエンコードする従来のエンコード-タスク結合方式よりも効率的である。典型的な4つのNLPタスクの8つのデータセットに対する大規模な実験は、PlugDによって、さまざまなシナリオにまたがって、モデルがドキュメントをエンコードできることを示している。特にplugdは計算コストを節約でき、最先端のエンコーディング-タスク結合法に匹敵する性能を実現している。さらに、PlugDはタスク固有のモデルに知識を注入する効果的な後処理方法として機能し、追加のモデルトレーニングなしでモデル性能を向上させることができることを示す。 Large-scale pre-trained models (PTMs) have been widely used in document-oriented NLP tasks, such as question answering. However, the encoding-task coupling requirement results in the repeated encoding of the same documents for different tasks and queries, which is highly computationally inefficient. To this end, we target to decouple document encoding from downstream tasks, and propose to represent each document as a plug-and-play document module, i.e., a document plugin, for PTMs (PlugD). By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks, which is more efficient than conventional encoding-task coupling methods that simultaneously encode documents and input queries using task-specific encoders. Extensive experiments on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode documents once and for all across different scenarios. Especially, PlugD can save $69\%$ computational costs while achieving comparable performance to state-of-the-art encoding-task coupling methods. Additionally, we show that PlugD can serve as an effective post-processing way to inject knowledge into task-specific models, improving model performance without any additional model training.	翻訳日:2023-05-30 17:39:42 公開日:2023-05-28
# パラメトリック駆動非線形共振器を用いた動的臨界量子センシング Dynamical critical quantum sensing with a single parametrically-driven nonlinear resonator ( http://arxiv.org/abs/2305.17656v1 ) ライセンス: Link先を確認	Ken Chen, Jia-Hao L\"u, Xin Zhu, Hao-Long Zhang, Wen Ning, Zhen-Biao Yang, and Shi-Biao Zheng	(参考訳) 量子系の臨界現象は量子センシングの強化に有用である。本稿では,ケラ非線形性とパラメトリック駆動の競合を特徴とする発振器の動的進化状態において信号が符号化されるセンシング方式の性能について検討する。量子フィッシャー情報を計算し,臨界性が有効となる拡張性を確認するシミュレーションを行う。制御パラメータの変動に対する2次数の1つの応答についてさらに詳しく述べる。数値的な結果から,その逆分散は臨界点における変動挙動を示すことが明らかとなった。 Critical phenomena of quantum systems are useful for enhancement of quantum sensing. We here investigate the performance of a sensing scheme, where the signal is encoded in the dynamically-evolving state of an oscillator, featuring a competition of the Kerr nonlinearity and parametric driving. We calculate the quantum Fisher information, and perform a simulation, which confirms the criticality-enabled enhancement. We further detail the response of one of the quadratures to the variation of the control parameter. The numerical results reveal that its inverted variance exhibits a diverging behavior at the critical point.	翻訳日:2023-05-30 17:39:18 公開日:2023-05-28
# MixDehazeNet : 画像デハジングネットワークのための混合構造ブロック MixDehazeNet : Mix Structure Block For Image Dehazing Network ( http://arxiv.org/abs/2305.17654v1 ) ライセンス: Link先を確認	LiPing Lu, Qian Xiong, DuanFeng Chu, BingRong Xu	(参考訳) イメージデハジングは低レベル視野における典型的なタスクである。前回の研究では、大きな畳み込み核と注意機構の有効性が検証された。しかし、2つの欠点がある: 大きな畳み込みカーネルを導入すると画像のマルチスケール特性は容易に無視され、注意モジュールの標準直列接続は不均一な分布を十分に考慮しない。本稿では,上述の2つの問題を解決する,mix structure image dehazing network (mixdehazenet) という新しいフレームワークを提案する。具体的には,マルチスケール並列大規模畳み込みカーネルモジュールと拡張パラレルアテンションモジュールの2つの部分から構成されている。単一の大きなカーネルと比較して、マルチスケールの並列大規模カーネルは、デハザリングフェーズ中に部分的なテクスチャを考慮に入れることができる。また,不均一な不均一分布の除去において,注意の並列接続が良好に機能する拡張パラレルアテンションモジュールを開発した。提案手法の有効性を3つのベンチマークで検証した。例えば、これまでの最先端の手法と比較して、MixDehazeNetはSOTS屋内データセットにおいて大幅な改善(42.62dB PSNR)を達成している。コードはhttps://github.com/AmeryXiong/MixDehazeNetで公開されている。 Image dehazing is a typical task in the low-level vision field. Previous studies verified the effectiveness of the large convolutional kernel and attention mechanism in dehazing. However, there are two drawbacks: the multi-scale properties of an image are readily ignored when a large convolutional kernel is introduced, and the standard series connection of an attention module does not sufficiently consider an uneven hazy distribution. In this paper, we propose a novel framework named Mix Structure Image Dehazing Network (MixDehazeNet), which solves two issues mentioned above. Specifically, it mainly consists of two parts: the multi-scale parallel large convolution kernel module and the enhanced parallel attention module. Compared with a single large kernel, parallel large kernels with multi-scale are more capable of taking partial texture into account during the dehazing phase. In addition, an enhanced parallel attention module is developed, in which parallel connections of attention perform better at dehazing uneven hazy distribution. Extensive experiments on three benchmarks demonstrate the effectiveness of our proposed methods. For example, compared with the previous state-of-the-art methods, MixDehazeNet achieves a significant improvement (42.62dB PSNR) on the SOTS indoor dataset. The code is released in https://github.com/AmeryXiong/MixDehazeNet.	翻訳日:2023-05-30 17:39:09 公開日:2023-05-28
# 非知識集約型タスクに対するPrompt-Guided Retrieval Augmentation Prompt-Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks ( http://arxiv.org/abs/2305.17653v1 ) ライセンス: Link先を確認	Zhicheng Guo, Sijie Cheng, Yile Wang, Peng Li, Yang Liu	(参考訳) 外部リソースからの有用な情報を活用することで,下流タスク支援に注目が集まっている。近年の研究では,知識集約型(ki)課題の探索に焦点が当てられている。しかし、nki(non-knowledge-intensive)タスクの検索は未検討のままである。 NKIタスクにおける検索強化手法の活用には,2つの課題がある。 1)多様な関連スコア機能に対する需要 2)トレーニングコストとタスクパフォーマンスのジレンマ。これらの課題に対処するため、PGRAと呼ばれるNKIタスクのための2段階のフレームワークを提案する。第1段階ではタスク非依存のレトリバーを採用し、共有静的インデックスを構築し、効率的な候補証拠を選択する。第2段階では、読者のタスク固有の関連性に応じて、最も近いエビデンスを再現するプロンプト誘導リランカを設計する。実験の結果,PGRAは他の最先端検索手法よりも優れていた。本研究は,pgraのモデル性能に及ぼす影響因子をさらに調査し,pgraの汎用性を示す。コードはhttps://github.com/thunlp-mt/pgraで入手できる。 Retrieval-augmented methods have received increasing attention to support downstream tasks by leveraging useful information from external resources. Recent studies mainly focus on exploring retrieval to solve knowledge-intensive (KI) tasks. However, the potential of retrieval for most non-knowledge-intensive (NKI) tasks remains under-explored. There are two main challenges to leveraging retrieval-augmented methods for NKI tasks: 1) the demand for diverse relevance score functions and 2) the dilemma between training cost and task performance. To address these challenges, we propose a two-stage framework for NKI tasks, named PGRA. In the first stage, we adopt a task-agnostic retriever to build a shared static index and select candidate evidence efficiently. In the second stage, we design a prompt-guided reranker to rerank the nearest evidence according to task-specific relevance for the reader. Experimental results show that PGRA outperforms other state-of-the-art retrieval-augmented methods. Our analyses further investigate the influence factors to model performance and demonstrate the generality of PGRA. Codes are available at https://github.com/THUNLP-MT/PGRA.	翻訳日:2023-05-30 17:38:47 公開日:2023-05-28
# ConaCLIP:軽量テキスト画像検索のための完全連結知識相互作用グラフの蒸留探索 ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval ( http://arxiv.org/abs/2305.17652v1 ) ライセンス: Link先を確認	Jiapeng Wang, Chengyu Wang, Xiaodan Wang, Jun Huang, Lianwen Jin	(参考訳) デュアルエンコーダアーキテクチャ(CLIPなど)を備えた大規模事前訓練されたテキストイメージモデルは通常、テキストイメージ検索を含む様々な視覚言語アプリケーションに採用されている。しかしながら、これらのモデルは、かなりのインデックス化と推論時間と計算資源の大量消費のため、エッジデバイスやリアルタイムの状況では実用的ではない。ユニモーダルモデル圧縮には知識蒸留技術が広く利用されているが,モダリティ数と教師・学生数を倍増させる方法がほとんど研究されていない。本稿では,本トピックに関する包括的実験を行い,クロスモーダルプレトレーニング蒸留のための完全連結知識相互作用グラフ(cona)手法を提案する。その結果, Flickr30K と MSCOCO のベンチマークにおいて, 軽量な設定でSOTA 性能を達成できた。本手法のe-commercial platformへの産業的応用により,ConaCLIPの有効性がさらに示された。 Large-scale pre-trained text-image models with dual-encoder architectures (such as CLIP) are typically adopted for various vision-language applications, including text-image retrieval. However,these models are still less practical on edge devices or for real-time situations, due to the substantial indexing and inference time and the large consumption of computational resources. Although knowledge distillation techniques have been widely utilized for uni-modal model compression, how to expand them to the situation when the numbers of modalities and teachers/students are doubled has been rarely studied. In this paper, we conduct comprehensive experiments on this topic and propose the fully-Connected knowledge interaction graph (Cona) technique for cross-modal pre-training distillation. Based on our findings, the resulting ConaCLIP achieves SOTA performances on the widely-used Flickr30K and MSCOCO benchmarks under the lightweight setting. An industry application of our method on an e-commercial platform further demonstrates the significant effectiveness of ConaCLIP.	翻訳日:2023-05-30 17:38:32 公開日:2023-05-28
# DPHuBERT:自己監督音声モデルの連成蒸留とプルーニング DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models ( http://arxiv.org/abs/2305.17651v1 ) ライセンス: Link先を確認	Yifan Peng, Yui Sudo, Shakeel Muhammad, Shinji Watanabe	(参考訳) 自己教師付き学習(SSL)は多くの音声処理タスクで顕著な成功を収めてきたが、大きなモデルサイズと計算コストが配置を妨げている。知識蒸留は、小さな生徒モデルを訓練し、大きな教師モデルの振る舞いを模倣する。しかしながら、学生アーキテクチャは通常、手動で設計され、トレーニング中に修正される必要がある。近年のタスク特化構造プルーニングの成功に触発されて,ジョイント蒸留とプルーニングに基づく音声sslのためのタスク非依存圧縮法であるdphubertを提案する。 SUPERBの実験では、DPHuBERTはほとんど全てのタスクにおいて純粋な蒸留法よりも優れていた。さらに、DPHuBERTはトレーニング時間が少なく、限られたトレーニングデータでうまく動作し、リソース制約のあるアプリケーションに適している。本手法は各種音声SSLモデルにも適用可能である。私たちのコードとモデルは公開されます。 Self-supervised learning (SSL) has achieved notable success in many speech processing tasks, but the large model size and heavy computational cost hinder the deployment. Knowledge distillation trains a small student model to mimic the behavior of a large teacher model. However, the student architecture usually needs to be manually designed and will remain fixed during training, which requires prior knowledge and can lead to suboptimal performance. Inspired by recent success of task-specific structured pruning, we propose DPHuBERT, a novel task-agnostic compression method for speech SSL based on joint distillation and pruning. Experiments on SUPERB show that DPHuBERT outperforms pure distillation methods in almost all tasks. Moreover, DPHuBERT requires little training time and performs well with limited training data, making it suitable for resource-constrained applications. Our method can also be applied to various speech SSL models. Our code and models will be publicly available.	翻訳日:2023-05-30 17:38:14 公開日:2023-05-28
# リカレントスパイクニューラルネットワークのための接続性の進化 Evolving Connectivity for Recurrent Spiking Neural Networks ( http://arxiv.org/abs/2305.17650v1 ) ライセンス: Link先を確認	Guan Wang, Yuhao Sun, Sijie Cheng, Sen Song	(参考訳) リカレントスパイキングニューラルネットワーク(RSNN)は、生物学的神経系からインスピレーションを得て、複雑な力学をモデル化する可能性を示すため、人工知能の進歩に大きな可能性を秘めている。しかし、RSNNの広範に使われているサロゲート勾配に基づくトレーニング手法は本質的に不正確であり、ニューロモルフィックハードウェアには不向きである。これらの制約に対処するために、RSNNをトレーニングするための推論のみの手法である進化的接続性(EC)フレームワークを提案する。 ECフレームワークは、パラメータ化された接続確率分布の探索として重み付けを再構成し、これらの分布を最適化するためにNatural Evolution Strategies (NES) を用いる。我々のECフレームワークは、グラデーションの必要性を回避し、スパースブール接続や高いスケーラビリティなど、ハードウェアフレンドリな特徴を特徴としています。そこでは、深層ニューラルネットワークと同等の性能を達成し、複雑な17-DoFヒューマノイドタスクを解くことで、勾配学習されたRSNNよりも優れた性能を発揮する。さらに、ECフレームワークは直接進化するパラメータに比べて効率が2倍から3倍に向上することを示した。 ECフレームワークは、パフォーマンスとハードウェアに優しい代替手段を提供することにより、RSNNのさらなるエネルギー効率の高い応用の基礎を築き、ニューロモルフィックデバイスの開発を進める。 Recurrent spiking neural networks (RSNNs) hold great potential for advancing artificial general intelligence, as they draw inspiration from the biological nervous system and show promise in modeling complex dynamics. However, the widely-used surrogate gradient-based training methods for RSNNs are inherently inaccurate and unfriendly to neuromorphic hardware. To address these limitations, we propose the evolving connectivity (EC) framework, an inference-only method for training RSNNs. The EC framework reformulates weight-tuning as a search into parameterized connection probability distributions, and employs Natural Evolution Strategies (NES) for optimizing these distributions. Our EC framework circumvents the need for gradients and features hardware-friendly characteristics, including sparse boolean connections and high scalability. We evaluate EC on a series of standard robotic locomotion tasks, where it achieves comparable performance with deep neural networks and outperforms gradient-trained RSNNs, even solving the complex 17-DoF humanoid task. Additionally, the EC framework demonstrates a two to three fold speedup in efficiency compared to directly evolving parameters. By providing a performant and hardware-friendly alternative, the EC framework lays the groundwork for further energy-efficient applications of RSNNs and advances the development of neuromorphic devices.	翻訳日:2023-05-30 17:37:58 公開日:2023-05-28
# Z-GMOT:ゼロショットジェネリック多目的追跡 Z-GMOT: Zero-shot Generic Multiple Object Tracking ( http://arxiv.org/abs/2305.17648v1 ) ライセンス: Link先を確認	Kim Hoang Tran, Tien-Phat Nguyen, Anh Duy Le Dinh, Pha Nguyen, Thinh Phan, Khoa Luu, Donald Adjeroh, Ngan Hoang Le	(参考訳) 近年の進歩にもかかわらず、Multi-Object Tracking(MOT)アプローチは、大規模ラベル付きデータセットの高価なアノテーションを必要とするトラッキングターゲットの事前知識への依存など、いくつかの制限を被っている。結果として、既存のMOTメソッドは、定義済みの小さなカテゴリに限られており、実世界の目に見えないオブジェクトと戦っている。これらの問題に対処するため、GMOT(Generic Multiple Object Tracking)が提案されている。しかしながら、既存のGMOTアプローチはすべてワンショットのパラダイムに従っており、主に初期バウンディングボックスに依存しており、視点、照明、閉塞、スケールなどの変種を扱うのに苦労している。本稿では,既存のMOT法とGMOT法の限界に対処する新しい手法を提案する。具体的には,ゼロショットGMOT (Z-GMOT) アルゴリズムを提案する。そこで本研究では, 偽陽性を最小化しつつ, 未確認物体を検出可能な言語画像事前学習(GLIP)の改良版iGLIPを提案する。 GMOT-40データセット、AnimalTrackテストセット、DanceTrackテストセットに基づいて、Z-GMOTを徹底的に評価する。これらの評価結果は,既存手法よりも大幅に改善された。例えば、GMOT-40データセットでは、Z-GMOTは1ショットのGMOTとOC-SORTを27.79ポイントのHOTAと44.37ポイントのMOTAで上回っている。 AnimalTrackデータセットでは、DeepSORTで完全に監督されたメソッドを12.55ポイントのHOTAと8.97ポイントのMOTAで上回っている。さらなる研究を促進するため、本論文の受理後、コードとモデルを公開します。 Despite the significant progress made in recent years, Multi-Object Tracking (MOT) approaches still suffer from several limitations, including their reliance on prior knowledge of tracking targets, which necessitates the costly annotation of large labeled datasets. As a result, existing MOT methods are limited to a small set of predefined categories, and they struggle with unseen objects in the real world. To address these issues, Generic Multiple Object Tracking (GMOT) has been proposed, which requires less prior information about the targets. However, all existing GMOT approaches follow a one-shot paradigm, relying mainly on the initial bounding box and thus struggling to handle variants e.g., viewpoint, lighting, occlusion, scale, and etc. In this paper, we introduce a novel approach to address the limitations of existing MOT and GMOT methods. Specifically, we propose a zero-shot GMOT (Z-GMOT) algorithm that can track never-seen object categories with zero training examples, without the need for predefined categories or an initial bounding box. To achieve this, we propose iGLIP, an improved version of Grounded language-image pretraining (GLIP), which can detect unseen objects while minimizing false positives. We evaluate our Z-GMOT thoroughly on the GMOT-40 dataset, AnimalTrack testset, DanceTrack testset. The results of these evaluations demonstrate a significant improvement over existing methods. For instance, on the GMOT-40 dataset, the Z-GMOT outperforms one-shot GMOT with OC-SORT by 27.79 points HOTA and 44.37 points MOTA. On the AnimalTrack dataset, it surpasses fully-supervised methods with DeepSORT by 12.55 points HOTA and 8.97 points MOTA. To facilitate further research, we will make our code and models publicly available upon acceptance of this paper.	翻訳日:2023-05-30 17:37:36 公開日:2023-05-28
# 1次元ボース気体中の分散量子衝撃波における「真空点」と灰色のソリトンの運命 The fate of the "vacuum point'' and of grey solitons in dispersive quantum shock waves in a one-dimensional Bose gas ( http://arxiv.org/abs/2305.17647v1 ) ライセンス: Link先を確認	S. A. Simmons, J. C. Pillay, and K. V. Kheruntsyan	(参考訳) 平均場近似を超えた1次元ボース気体中の分散量子衝撃波の研究を継続する。 Simmonsらによる最近の作品。 [Phys. Let. 125, 180401 (2020)], この系で発生した発振衝撃波列は, 量子力学的自己干渉の結果, 物質-波位相コヒーレンスの損失によって干渉コントラストが減少すると考えられる。このようなコヒーレンスの喪失は、平均体Gross-Pitaevskiiの記述と比較して、量子的または熱的ゆらぎと強く相互作用する状態によって起こる。本研究では、この文脈における分散量子衝撃波の解析を他の動的シナリオにまで拡張する。より具体的には、研究されたシナリオには、平均場記述でいわゆる「真空点」へと導くのに十分な密度のバンプの進化と、同じ平均場近似で灰色のソリトン列を降ろすことで知られる初期密度ディップの進化が含まれる。量子的および熱的ゆらぎの存在,および中間的および強い相互作用におけるこれらの非線形波動構造の運命について検討し,真空点と灰色のソリトンの両方が平均場的アプローチを超えないことを示す。一方, 真空点は, 局所ジムプルポテンシャルの基底状態から進化する理想的(非相互作用的)ボースガス中で発生する。自然界における分散衝撃波のユビキタス性から,非線形波動現象を表示できる他の物理系に対して有用な知見と展望を提供する必要がある。 We continue the study of dispersive quantum shock waves in a one-dimensional Bose gas beyond the mean-field approximation. In a recent work by Simmons et al. [Phys. Rev. Let. 125, 180401 (2020)], the oscillatory shock wave train developing in this system from an initial localized density bump on a uniform background was interpreted as a result of quantum mechanical self-interference, wherein the interference contrast would diminish with the loss of matter-wave phase coherence. Such loss of coherence, relative to the mean-field Gross-Pitaevskii description, occurs due to either quantum or thermal fluctuations, as well as in the strongly interacting regime. In this work, we extend the analysis of dispersive quantum shock waves in this context to other dynamical scenarios. More specifically, the scenarios studied include evolution of a sufficiently high density bump, known to lead to the so-called ``vacuum point'' in the mean-field description, and evolution of an initial density dip, known to shed a train of grey solitons in the same mean-field approximation. We study the fate of these nonlinear wave structures in the presence of quantum and thermal fluctuations, as well as at intermediate and strong interactions, and show that both the vacuum point and grey solitons cease to manifest themselves beyond the mean-field approach. On the other hand, we find that a vacuum point can occur in an ideal (noninteracting) Bose gas evolving from a ground state of a localized dimple potential. Due to the ubiquity of dispersive shock waves in nature, our results should provide useful insights and perspectives for a variety of other physical systems known to display nonlinear wave phenomena.	翻訳日:2023-05-30 17:37:06 公開日:2023-05-28
# 二重不純物アンダーソン模型における不純物スペクトル関数と電流ノイズスペクトルの近藤法 Kondo regime of the impurity spectral function and the current noise spectrum in the double impurity Anderson model ( http://arxiv.org/abs/2305.17686v1 ) ライセンス: Link先を確認	Zi-Hao Chen and YiJing Yan	(参考訳) ディシパトン運動方程式(DEOM)法は、量子不純物系をシミュレートする最も一般的な方法の一つである。本稿では、二重量子ドット(dqds)の不純物系の近藤問題を扱うために、doem理論を用いる。我々は,不純物スペクトル関数と全雑音スペクトル関数に着目し,この2つの関数を用いて,このシステムの近藤効果を記述する。相互作用, フープ, および2点間の化学ポテンシャルの差がシステムの近藤効果に及ぼす影響について検討した。 2つのドット間の相互作用はシステムの近藤効果に大きな影響を与えることが判明した。 The dissipaton equations of motion (DEOM) method is one of the most popular methods for simulating quantum impurity systems. In this article, we use DOEM theory to deal with the Kondo problem of the double quantum dots (DQDs) impurity system. We focus on the impurity spectral function and the total noise spectral function, this two function will be used to describe the Kondo effect of this system. The influence of the interaction, the hooping and the difference of the chemical potential between the two dots on the Kondo effect of the system is studied. We find that the interaction between the two dots can influence the Kondo effect of the system a lot.	翻訳日:2023-05-30 17:28:37 公開日:2023-05-28
# 連続可変量子鍵分布におけるガウス的信頼ノイズの一般処理 General treatment of Gaussian trusted noise in continuous variable quantum key distribution ( http://arxiv.org/abs/2305.17684v1 ) ライセンス: Link先を確認	Shinichiro Yamano, Takaya Matsuura, Yui Kuramochi, Toshihiko Sasaki, Masato Koashi	(参考訳) 連続可変(CV)量子鍵分布(QKD)は、既存の通信技術との互換性のため、実用的な実装には有望な候補である。検出器内の電子ノイズなどの不完全性にアクセスできないと仮定する信頼されたデバイスシナリオは、キーレートを大幅に改善することが期待されているが、これまでは特定のプロトコルや特定の証明技術のために別々に行われた。本稿では, ホモダイン/ヘテロダイン測定を用いた任意のプロトコルに対して, ガウス的信頼ノイズの効果を組み込む, 単純で一般的な処理法を開発する。提案手法では、ノイズの大きいホモダイン/ヘテロダイン検出器の結果を再スケーリングすることで、量子光学でよく知られたノイズ損失等価性のおかげで、ノイズのない検出器の損失を少し増やした結果と同等にすることができる。この手法はプロトコルやセキュリティ証明とは無関係であるため、ガウス変調および離散変調プロトコル、有限サイズ規則、そしてこれまでに開発されまだ発見されていないいかなる証明技術にも適用することができる。 Continuous Variable (CV) quantum key distribution (QKD) is a promising candidate for practical implementations due to its compatibility with the existing communication technology. A trusted device scenario assuming that an adversary has no access to imperfections such as electronic noises in the detector is expected to provide significant improvement in the key rate, but such an endeavor so far was made separately for specific protocols and for specific proof techniques. Here, we develop a simple and general treatment that can incorporate the effects of Gaussian trusted noises for any protocol that uses homodyne/heterodyne measurements. In our method, a rescaling of the outcome of a noisy homodyne/heterodyne detector renders it equivalent to the outcome of a noiseless detector with a tiny additional loss, thanks to a noise-loss equivalence well-known in quantum optics. Since this method is independent of protocols and security proofs, it is applicable to Gaussian-modulation and discrete-modulation protocols, to the finite-size regime, and to any proof techniques developed so far and yet to be discovered as well.	翻訳日:2023-05-30 17:28:27 公開日:2023-05-28
# 1つのネットワーク、多くのマスク:よりパラメーター効率のよい転送学習を目指して One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning ( http://arxiv.org/abs/2305.17682v1 ) ライセンス: Link先を確認	Guangtao Zeng, Peiyuan Zhang, Wei Lu	(参考訳) 複数のタスクのための微調整済み言語モデルは、ストレージの点で高価である傾向がある。これを軽減するためにパラメータ効率変換学習法 (PETL) が提案されているが, 幅広いタスクに適用するには, かなりの数のパラメータと記憶が必要である。さらに大きなストレージ削減を実現するために、propetlは、プロトタイプネットワーク(例えば、アダプタ、lora、プレフィックスチューニング)と呼ばれる1つのpetlモジュールを、レイヤとタスク間で効率的に共有できる新しい方法を提案する。次にバイナリマスクを学び、共有プロトタイプネットワークから異なるサブネットワークを選択し、異なるレイヤにpetlモジュールとして適用します。二分マスクはネットワークから重要な情報を決定できるが、これは前回の研究では無視されることが多い。私たちの研究は、一見小さなpetlモジュールにも過剰パラメーターが存在することを発見したpruningメソッドの一種と見なすこともできる。各種下流タスクにおいて, ProPETL の評価を行い, パラメータ記憶の約10%で他の PETL 手法よりも優れていることを示す。 Fine-tuning pre-trained language models for multiple tasks tends to be expensive in terms of storage. To mitigate this, parameter-efficient transfer learning (PETL) methods have been proposed to address this issue, but they still require a significant number of parameters and storage when being applied to broader ranges of tasks. To achieve even greater storage reduction, we propose PROPETL, a novel method that enables efficient sharing of a single PETL module which we call prototype network (e.g., adapter, LoRA, and prefix-tuning) across layers and tasks. We then learn binary masks to select different sub-networks from the shared prototype network and apply them as PETL modules into different layers. We find that the binary masks can determine crucial information from the network, which is often ignored in previous studies. Our work can also be seen as a type of pruning method, where we find that overparameterization also exists in the seemingly small PETL modules. We evaluate PROPETL on various downstream tasks and show that it can outperform other PETL methods with approximately 10% of the parameter storage required by the latter.	翻訳日:2023-05-30 17:28:07 公開日:2023-05-28
# コンテンツモデレーションのためのGPT-3生成説明の評価 Evaluating GPT-3 Generated Explanations for Hateful Content Moderation ( http://arxiv.org/abs/2305.17680v1 ) ライセンス: Link先を確認	Han Wang, Ming Shan Hee, Md Rabiul Awal, Kenny Tsu Wei Choo, Roy Ka-Wei Lee	(参考訳) 最近の研究は、大規模言語モデル(LLM)を使用して、微調整やプロンプトを通じてヘイトスピーチの説明を生成することに焦点を当てている。この領域への関心が高まりつつあるにもかかわらず、これらの発生した説明の有効性と潜在的な限界は未だ理解されていない。 LLMによって生成されたこれらの説明は、ユーザとコンテンツモデレーターの両方がフラグ付きコンテンツの性質について誤った判断を下す可能性がある。例えば、LCMが生成した説明は、コンテンツモデレーターが良質なコンテンツが憎悪であることを不正確に納得させるかもしれない。これを踏まえて,ヘイトスピーチの説明を解析するための枠組みを提案し,その説明を評価するための広範囲な調査を行った。具体的には、GPT-3にヘイトフルコンテンツと非ヘイトフルコンテンツの両方を説明するよう促し、2,400人の独特な回答者を対象に調査を行った。その結果,(1) 人間の評価者は, GPT による説明を, 言語流布度, 情報伝達性, 説得性, 論理音性の観点から高い品質と評価し, それらの説明の説得性は, 実施する促進戦略によって異なること, (3) 内容の嫌悪性について誤った判断を下す可能性が示唆された。本研究は,コンテンツモデレーションにllm生成説明を適用する際に注意が必要であることを強調する。コードと結果はhttps://github.com/Social-AI-Studio/GPT3-HateEvalで公開されている。 Recent research has focused on using large language models (LLMs) to generate explanations for hate speech through fine-tuning or prompting. Despite the growing interest in this area, these generated explanations' effectiveness and potential limitations remain poorly understood. A key concern is that these explanations, generated by LLMs, may lead to erroneous judgments about the nature of flagged content by both users and content moderators. For instance, an LLM-generated explanation might inaccurately convince a content moderator that a benign piece of content is hateful. In light of this, we propose an analytical framework for examining hate speech explanations and conducted an extensive survey on evaluating such explanations. Specifically, we prompted GPT-3 to generate explanations for both hateful and non-hateful content, and a survey was conducted with 2,400 unique respondents to evaluate the generated explanations. Our findings reveal that (1) human evaluators rated the GPT-generated explanations as high quality in terms of linguistic fluency, informativeness, persuasiveness, and logical soundness, (2) the persuasive nature of these explanations, however, varied depending on the prompting strategy employed, and (3) this persuasiveness may result in incorrect judgments about the hatefulness of the content. Our study underscores the need for caution in applying LLM-generated explanations for content moderation. Code and results are available at https://github.com/Social-AI-Studio/GPT3-HateEval.	翻訳日:2023-05-30 17:27:48 公開日:2023-05-28
# RuSentNE-2023:ロシア語ニューステキストにおけるエンティティ指向感分析の評価 RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts ( http://arxiv.org/abs/2305.17679v1 ) ライセンス: Link先を確認	Anton Golubev, Nicolay Rusnachenko, Natalia Loukachevitch	(参考訳) 本稿では,ロシアのニューステキストにおける感情分析を目的としたRuSentNE-2023の評価について述べる。タスクは、一文で名前付きエンティティに対する感情を予測することです。 RuSentNE-2023の評価データセットは、リッチな感情関連アノテーションを持つロシアのニュースコーパスRuSentNEに基づいている。コーパスには、これらのエンティティに対する名前付きエンティティと感情、関連する効果と感情状態が注釈されている。評価はcodalab competition frameworkを用いて行われた。主な評価尺度は, 正および負のクラスのマクロ平均値であった。その結果,Macro F-measure (Positive+Negative Class) は66%であった。また,テストセットでChatGPTを試験したところ,ChatGPTが提示したゼロショットの回答がF尺度の60%に達し,評価の4位に相当することがわかった。 ChatGPTは結論の詳細な説明も提供している。これはゼロショットアプリケーションにとって非常に高いものと考えられる。 The paper describes the RuSentNE-2023 evaluation devoted to targeted sentiment analysis in Russian news texts. The task is to predict sentiment towards a named entity in a single sentence. The dataset for RuSentNE-2023 evaluation is based on the Russian news corpus RuSentNE having rich sentiment-related annotation. The corpus is annotated with named entities and sentiments towards these entities, along with related effects and emotional states. The evaluation was organized using the CodaLab competition framework. The main evaluation measure was macro-averaged measure of positive and negative classes. The best results achieved were of 66% Macro F-measure (Positive+Negative classes). We also tested ChatGPT on the test set from our evaluation and found that the zero-shot answers provided by ChatGPT reached 60% of the F-measure, which corresponds to 4th place in the evaluation. ChatGPT also provided detailed explanations of its conclusion. This can be considered as quite high for zero-shot application.	翻訳日:2023-05-30 17:27:25 公開日:2023-05-28
# マルチモーダルハテフルミームの下位意味をデコードする Decoding the Underlying Meaning of Multimodal Hateful Memes ( http://arxiv.org/abs/2305.17678v1 ) ライセンス: Link先を確認	Ming Shan Hee, Wen-Haw Chong and Roy Ka-Wei Lee	(参考訳) 近年、ヘイトフルミーム分類タスクに有望な性能をもたらすモデルが提案されている。それにもかかわらず、これらのモデルは基礎となる意味を解明し、分類出力をサポートする解釈可能な説明を生成しない。説明可能な憎悪のミームメソッドが欠如している主な理由は、ベンチマークやトレーニングのための根拠となる真実の説明を含む憎悪のミームデータセットがないことである。直感的には、そのような説明を持つことで、コンテンツモデレーターがフラグのある憎しみのあるミームを解釈し、取り除くことを教育し、支援することができる。本稿では,憎悪の背景にある文脈的理由にアノテートされた,新しいマルチモーダルな憎悪のミームデータセットであるdataset (hatred)を導入することで,この研究のギャップを解決する。また、ヘイトフルミームを説明するための基礎となる理由を自動的に生成し、この課題に基づいて最先端の訓練済み言語モデルのベースライン性能を確立することを目的とした、新しい条件生成タスクも定義する。我々はさらに、新しい条件生成タスクの課題を分析し、目に見える領域や見えない領域におけるミームを説明することで、HatReDの有用性を実証する。データセットとベンチマークモデルはここで利用可能である。 Recent studies have proposed models that yielded promising performance for the hateful meme classification task. Nevertheless, these proposed models do not generate interpretable explanations that uncover the underlying meaning and support the classification output. A major reason for the lack of explainable hateful meme methods is the absence of a hateful meme dataset that contains ground truth explanations for benchmarking or training. Intuitively, having such explanations can educate and assist content moderators in interpreting and removing flagged hateful memes. This paper address this research gap by introducing Hateful meme with Reasons Dataset (HatReD), which is a new multimodal hateful meme dataset annotated with the underlying hateful contextual reasons. We also define a new conditional generation task that aims to automatically generate underlying reasons to explain hateful memes and establish the baseline performance of state-of-the-art pre-trained language models on this task. We further demonstrate the usefulness of HatReD by analyzing the challenges of the new conditional generation task in explaining memes in seen and unseen domains. The dataset and benchmark models are made available here: https://github.com/Social-AI-Studio/HatRed	翻訳日:2023-05-30 17:27:11 公開日:2023-05-28
# OSPC:オンライン連続測光校正 OSPC: Online Sequential Photometric Calibration ( http://arxiv.org/abs/2305.17673v1 ) ライセンス: Link先を確認	Jawad Haidar, Douaa Khalil, Daniel Asmar	(参考訳) 測光キャリブレーションは多くのコンピュータビジョンアプリケーションに必須である。主な利点の1つは、特に標準のKLTアルゴリズムのようなトラッキングの直接的な方法に依存する場合、Visual SLAMの性能を向上させることである。もうひとつの利点は、測定された強度からセンサーの照射値を取得することであり、シェーディングの形状のような視覚アルゴリズムの事前処理ステップである。現在の測光キャリブレーションシステムは、共同最適化の問題に頼り、推定値の曖昧さに遭遇する。本稿では, 逐次推定手法を用いて, 測光パラメータを求める新しい手法を提案する。提案手法は,すべてのパラメータを高精度に推定でき,さらに定式化は線形かつ凸であり,その解を高速かつオンラインアプリケーションに適したものにしている。提案手法を検証し,その利点を実証するビジュアルオドメトリーシステムの実験を行った。 Photometric calibration is essential to many computer vision applications. One of its key benefits is enhancing the performance of Visual SLAM, especially when it depends on a direct method for tracking, such as the standard KLT algorithm. Another advantage could be in retrieving the sensor irradiance values from measured intensities, as a pre-processing step for some vision algorithms, such as shape-from-shading. Current photometric calibration systems rely on a joint optimization problem and encounter an ambiguity in the estimates, which can only be resolved using ground truth information. We propose a novel method that solves for photometric parameters using a sequential estimation approach. Our proposed method achieves high accuracy in estimating all parameters; furthermore, the formulations are linear and convex, which makes the solution fast and suitable for online applications. Experiments on a Visual Odometry system validate the proposed method and demonstrate its advantages.	翻訳日:2023-05-30 17:26:49 公開日:2023-05-28
# パラメータ効率向上のための効果的な正規化器としての確率ブリッジ Stochastic Bridges as Effective Regularizers for Parameter-Efficient Tuning ( http://arxiv.org/abs/2305.17670v1 ) ライセンス: Link先を確認	Weize Chen, Xu Han, Yankai Lin, Zhiyuan Liu, Maosong Sun, Jie Zhou	(参考訳) パラメータ効率調整法(PET)は,大規模事前学習言語モデル(PLM)のチューニングにおいて有望な結果を得た。凍結したPLMと調整可能なパラメータをそれぞれシステムと制御として形式化することにより、PETは最適制御に理論的に基礎を置き、最適制御文献における端末コストとランニングコストの最適化とみなすことができる。この理論的根拠のエレガントさにもかかわらず、実際には既存のPETはランニングコストを無視してターミナルコストのみを最適化し、中間状態に依存するランニングコストに関係なく、出力状態の損失関数の最適化に重点を置いている。中間状態を直接モデル化してランニングコスト関数を設計するのは簡単ではないため,中間状態の正規化に潜時確率的ブリッジを用い,正規化をPETのランニングコストとして用いることを提案する。中間状態の正則化(ランニングコスト)として確率的ブリッジを用いた正則化PETを提案する最初の試みとして、この正則化の有効性と汎用性を示す。潜在能力と能力を考えると、より高度な正則化器はPET用に設計でき、将来より優れた性能が達成できると考えています。コードは \url{https://github.com/thunlp/stochastic-bridge-pet/tree/main} でリリースされる。 Parameter-efficient tuning methods (PETs) have achieved promising results in tuning large pre-trained language models (PLMs). By formalizing frozen PLMs and additional tunable parameters as systems and controls respectively, PETs can be theoretically grounded to optimal control and further viewed as optimizing the terminal cost and running cost in the optimal control literature. Despite the elegance of this theoretical grounding, in practice, existing PETs often ignore the running cost and only optimize the terminal cost, i.e., focus on optimizing the loss function of the output state, regardless of the running cost that depends on the intermediate states. Since it is non-trivial to directly model the intermediate states and design a running cost function, we propose to use latent stochastic bridges to regularize the intermediate states and use the regularization as the running cost of PETs. As the first work to propose regularized PETs that use stochastic bridges as the regularizers (running costs) for the intermediate states, we show the effectiveness and generality of this regularization across different tasks, PLMs and PETs. In view of the great potential and capacity, we believe more sophisticated regularizers can be designed for PETs and better performance can be achieved in the future. The code is released at \url{https://github.com/thunlp/stochastic-bridge-pet/tree/main}.	翻訳日:2023-05-30 17:26:30 公開日:2023-05-28
# データを簡潔に選択する:セマンティックカウンターファクトのフレームワーク Choose your Data Wisely: A Framework for Semantic Counterfactuals ( http://arxiv.org/abs/2305.17667v1 ) ライセンス: Link先を確認	Edmund Dervakos, Konstantinos Thomas, Giorgos Filandrianos, Giorgos Stamou	(参考訳) 反事実的な説明は最も直感的な説明の1つだと論じられている。通常は、与えられたデータサンプルに対する最小限の編集セットとして定義され、適用されると、そのサンプル上のモデルの出力が変更される。しかし、最小限の編集セットは、例えば、逆の例(元のデータサンプルからエンドユーザへの区別がつかない)を構成することができるため、エンドユーザにとって必ずしも明確かつ理解可能なものではない。代わりに、反事実の文脈における最小性の概念は、特徴空間ではなく、データサンプルのセマンティクスを参照すべきである、という最近の考え方がある。本研究は,これらのアイデアに基づいて,知識グラフの観点で対実的な説明を提供するフレームワークを提案する。このような説明(基礎知識に関するいくつかの仮定)を計算し,その枠組みをユーザスタディで定量的に評価するアルゴリズムを提案する。 Counterfactual explanations have been argued to be one of the most intuitive forms of explanation. They are typically defined as a minimal set of edits on a given data sample that, when applied, changes the output of a model on that sample. However, a minimal set of edits is not always clear and understandable to an end-user, as it could, for instance, constitute an adversarial example (which is indistinguishable from the original data sample to an end-user). Instead, there are recent ideas that the notion of minimality in the context of counterfactuals should refer to the semantics of the data sample, and not to the feature space. In this work, we build on these ideas, and propose a framework that provides counterfactual explanations in terms of knowledge graphs. We provide an algorithm for computing such explanations (given some assumptions about the underlying knowledge), and quantitatively evaluate the framework with a user study.	翻訳日:2023-05-30 17:26:04 公開日:2023-05-28
# 平均運動量による確率勾配降下の加速:有限サンプルレートと漸近正規性 Acceleration of stochastic gradient descent with momentum by averaging: finite-sample rates and asymptotic normality ( http://arxiv.org/abs/2305.17665v1 ) ライセンス: Link先を確認	Kejie Tang, Weidong Liu and Yichen Zhang	(参考訳) 運動量による確率勾配降下(SGDM)は多くの機械学習や統計応用で広く用いられている。従来のSGDに対するSGDMの実証的な利点にもかかわらず、最適化過程における異なる学習率に対する運動量の役割の理論的理解は依然として広く行われている。我々は,SGDMの有限サンプル収束速度を強い凸条件下で解析し,バッチサイズが大きい場合,ミニバッチSGDMは最小バッチSGDよりも高速に最適値の近傍に収束することを示す。さらに,SGDM推定器のPolyak-averagingバージョンを分析し,その漸近正規性を確立し,その漸近等価性を平均SGDに正当化する。 Stochastic gradient descent with momentum (SGDM) has been widely used in many machine learning and statistical applications. Despite the observed empirical benefits of SGDM over traditional SGD, the theoretical understanding of the role of momentum for different learning rates in the optimization process remains widely open. We analyze the finite-sample convergence rate of SGDM under the strongly convex settings and show that, with a large batch size, the mini-batch SGDM converges faster than mini-batch SGD to a neighborhood of the optimal value. Furthermore, we analyze the Polyak-averaging version of the SGDM estimator, establish its asymptotic normality, and justify its asymptotic equivalence to the averaged SGD.	翻訳日:2023-05-30 17:25:49 公開日:2023-05-28
# aiによる自動運転配達ロボットによるラストマイル配送の実現に向けて Towards Autonomous and Safe Last-mile Deliveries with AI-augmented Self-driving Delivery Robots ( http://arxiv.org/abs/2305.17705v1 ) ライセンス: Link先を確認	Eyad Shaklab, Areg Karapetyan, Arjun Sharma, Murad Mebrahtu, Mustofa Basri, Mohamed Nagy, Majid Khonji, and Jorge Dias	(参考訳) 顧客満足度に対する重要な影響に加えて、ラストマイル配送(LMD)は出荷プロセスの最も時間とコストのかかる段階として有名である。環境問題と最近のeコマースの売上急増が相まって、ラストマイル物流の自動化と電化への関心が再び高まっている。既存のロボット配達業者が直面するハードルに対処するため,本稿では,ai支援自律配送ロボットに基づく小規模都市コミュニティを対象とした,顧客中心かつ安全志向のlmdシステムについて紹介する。提案フレームワークは,実世界の運用上の不確実性,クライアントの好む時間スケジュール,歩行者の安全を考慮しつつ,ロジスティックなプロセスのエンドツーエンドの自動化と最適化を可能にする。この目的のために、統合最適化コンポーネントは、タイムウインドウを伴う累積容量型車両ルーティング問題のロバストな変種としてモデル化され、経路は、配送の遅延を最小化するために不確定な走行時間の下で構築される(すなわち、顧客の全体的な待ち時間であり、満足度に悪影響を及ぼす)。ロボットクーリエを1台設置した大学キャンパスにおける実地試験を通じて,提案システムの有用性を実証する。実装の側面と、配置から得られた知見と実践的な洞察を詳細に論じる。最後に,ロボット車両数と顧客数に関して,開発した数学的定式化のスケーラビリティを検討するために,数値シミュレーションによる貢献をまとめる。 In addition to its crucial impact on customer satisfaction, last-mile delivery (LMD) is notorious for being the most time-consuming and costly stage of the shipping process. Pressing environmental concerns combined with the recent surge of e-commerce sales have sparked renewed interest in automation and electrification of last-mile logistics. To address the hurdles faced by existing robotic couriers, this paper introduces a customer-centric and safety-conscious LMD system for small urban communities based on AI-assisted autonomous delivery robots. The presented framework enables end-to-end automation and optimization of the logistic process while catering for real-world imposed operational uncertainties, clients' preferred time schedules, and safety of pedestrians. To this end, the integrated optimization component is modeled as a robust variant of the Cumulative Capacitated Vehicle Routing Problem with Time Windows, where routes are constructed under uncertain travel times with an objective to minimize the total latency of deliveries (i.e., the overall waiting time of customers, which can negatively affect their satisfaction). We demonstrate the proposed LMD system's utility through real-world trials in a university campus with a single robotic courier. Implementation aspects as well as the findings and practical insights gained from the deployment are discussed in detail. Lastly, we round up the contributions with numerical simulations to investigate the scalability of the developed mathematical formulation with respect to the number of robotic vehicles and customers.	翻訳日:2023-05-30 17:18:33 公開日:2023-05-28
# KoSBI: 大規模言語モデルアプリケーションに対する社会的バイアスリスクの軽減のためのデータセット KoSBI: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application ( http://arxiv.org/abs/2305.17701v1 ) ライセンス: Link先を確認	Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Gunhee Kim and Jung-Woo Ha	(参考訳) 大規模言語モデル(llm)は、自然テキスト生成能力だけでなく、実世界データから異なる人口集団に対する社会バイアスも学習する。 LLMベースのアプリケーションをデプロイする場合、これは重大なリスクとなる。既存の研究や資源は、言語と文化の違いにより、韓国では容易には適用できない。この制限は、LLMの安全かつ効果的なデプロイを保証するために、局所的な社会的バイアスデータセットを必要とする。この目的のために、韓国の72の人口集団を15のカテゴリーでカバーする34k対の文脈と文からなる新しい社会的バイアスデータセットKO SB Iを提案する。フィルタリングに基づくモデレーションにより、HyperCLOVA (30B, 82B) と GPT-3 では、生成されたコンテンツの社会的バイアスを平均16.47%減少させることができる。 Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and targeted demographic groups. This limitation requires localized social bias datasets to ensure the safe and effective deployment of LLMs. To this end, we present KO SB I, a new social bias dataset of 34k pairs of contexts and sentences in Korean covering 72 demographic groups in 15 categories. We find that through filtering-based moderation, social biases in generated content can be reduced by 16.47%p on average for HyperCLOVA (30B and 82B), and GPT-3.	翻訳日:2023-05-30 17:18:07 公開日:2023-05-28
# 一般化意図発見のための擬似ラベル曖昧化と表現学習の分離 Decoupling Pseudo Label Disambiguation and Representation Learning for Generalized Intent Discovery ( http://arxiv.org/abs/2305.17699v1 ) ライセンス: Link先を確認	Yutao Mou, Xiaoshuai Song, Keqing He, Chen Zeng, Pei Wang, Jingang Wang, Yunsen Xian and Weiran Xu	(参考訳) 一般化されたインテント発見は、クローズドセットのインテント分類器を、インドメインやドメイン外インテントを含むオープンワールドインテントセットに拡張することを目的としている。主な課題は、擬似ラベルの曖昧さと表現学習にある。従来の手法では、擬似ラベルの曖昧さと表現学習の結合、すなわち、擬似ラベルの信頼性は表現学習に依存しており、表現学習は順番に擬似ラベルによって制限される。本稿では、擬似ラベルの曖昧さと表現学習を分離するための分離型プロトタイプ学習フレームワーク(DPL)を提案する。具体的には,まずpcl(prototypepical contrastive representation learning)を導入し,識別表現を得る。そしてプロトタイプベースのラベル曖昧化法(pld)を用いて擬似ラベルを得る。理論的にはPCLとPLDは協調的に機能し、擬似ラベル曖昧化を促進する。 3つのベンチマークデータセットの実験と分析により,本手法の有効性が示された。 Generalized intent discovery aims to extend a closed-set in-domain intent classifier to an open-world intent set including in-domain and out-of-domain intents. The key challenges lie in pseudo label disambiguation and representation learning. Previous methods suffer from a coupling of pseudo label disambiguation and representation learning, that is, the reliability of pseudo labels relies on representation learning, and representation learning is restricted by pseudo labels in turn. In this paper, we propose a decoupled prototype learning framework (DPL) to decouple pseudo label disambiguation and representation learning. Specifically, we firstly introduce prototypical contrastive representation learning (PCL) to get discriminative representations. And then we adopt a prototype-based label disambiguation method (PLD) to obtain pseudo labels. We theoretically prove that PCL and PLD work in a collaborative fashion and facilitate pseudo label disambiguation. Experiments and analysis on three benchmark datasets show the effectiveness of our method.	翻訳日:2023-05-30 17:17:55 公開日:2023-05-28
# 動的グラフ畳み込みデコーダを用いたニューラルマシン翻訳 Neural Machine Translation with Dynamic Graph Convolutional Decoder ( http://arxiv.org/abs/2305.17698v1 ) ライセンス: Link先を確認	Lei Li, Kai Fan, Lingyu Yang, Hongjia Li, Chun Yuan	(参考訳) 既存の知恵は、ニューラルマシン翻訳モデルを改善するための構文知識の重要性を示している。しかし、以前のほとんどの作品は、よく知られたエンコーダ-デコーダフレームワークのソース構文を活用することにのみ焦点が当てられている。対照的に,本研究では,対象翻訳と対応する構文グラフを共同でモデル化し,生成する(グラフ \&シーケンス)構造入力から(グラフ \&シーケンス)出力へのエンド・ツー・エンドの変換アーキテクチャを提案する。本稿では,動的空間-時空間グラフ畳み込みデコーダ(dyn-stgcd)を提案し,ソース特徴表現とその構文グラフを自動生成し,対象の構文グラフとトークンを同時に生成する。我々は5つの広く認知されている翻訳ベンチマークで広範な実験を行い、提案手法がベースラインや他の構文認識の変種よりも一貫した改善を達成できることを確認した。 Existing wisdom demonstrates the significance of syntactic knowledge for the improvement of neural machine translation models. However, most previous works merely focus on leveraging the source syntax in the well-known encoder-decoder framework. In sharp contrast, this paper proposes an end-to-end translation architecture from the (graph \& sequence) structural inputs to the (graph \& sequence) outputs, where the target translation and its corresponding syntactic graph are jointly modeled and generated. We propose a customized Dynamic Spatial-Temporal Graph Convolutional Decoder (Dyn-STGCD), which is designed for consuming source feature representations and their syntactic graph, and auto-regressively generating the target syntactic graph and tokens simultaneously. We conduct extensive experiments on five widely acknowledged translation benchmarks, verifying that our proposal achieves consistent improvements over baselines and other syntax-aware variants.	翻訳日:2023-05-30 17:17:40 公開日:2023-05-28
# SQuARe:人間と機械のコラボレーションによる感性的な質問と受け入れ可能な反応の大規模データセット SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration ( http://arxiv.org/abs/2305.17696v1 ) ライセンス: Link先を確認	Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Meeyoung Cha, Yejin Choi, Byoung Pil Kim, Gunhee Kim, Eun-Ju Lee, Yong Lim, Alice Oh, Sangchul Park and Jung-Woo Ha	(参考訳) 攻撃的なコンテンツの生成やバイアスの強化など、大きな言語モデルがもたらす潜在的な社会的害は、急速に増加している。既存の作業では、ヘイトスピーチを明示的に行い、有害な反応を誘発するユーザなど、意図しないユーザと対話しながら、この懸念に対処することに重点を置いている。しかし、ユーザが十分に意識している場合でも、センシティブな問題に関する議論は有害になる可能性がある。このようなシナリオにおいて、より安全なモデルのために、49kのセンシティブな質問と42kの許容可能な46kの許容できない応答からなる、韓国の大規模データセットである、センシティブな質問と受け入れ可能な応答(square)データセットを提示します。データセットは、実際のニュースの見出しに基づいて、HyperCLOVAを人道的に活用して構築された。実験の結果,HyperCLOVAとGPT-3では許容応答生成が有意に改善し,このデータセットの有効性が示された。 The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset.	翻訳日:2023-05-30 17:17:22 公開日:2023-05-28
# k-NNN: 近隣住民の異常検出 k-NNN: Nearest Neighbors of Neighbors for Anomaly Detection ( http://arxiv.org/abs/2305.17695v1 ) ライセンス: Link先を確認	Ori Nizan, Ayellet Tal	(参考訳) 異常検出は、基準から著しく逸脱する画像を特定することを目的としている。通常のトレーニングサンプルを空間に埋め込んだアルゴリズムに着目し,テスト画像が与えられた場合,k-nearestトレーニングの隣人に対する特徴距離に基づいて異常を検出する。埋め込み空間における特徴の様々な構造と重要性を考慮に入れた新しい演算子を提案する。興味深いことに、これは隣人の隣人だけでなく、隣人の隣人(k-NNN)も考慮して行われる。既存のアルゴリズムに最も近いコンポーネントをk-NNN演算子に置き換えるだけで、残りのアルゴリズムに手を加えずに、各アルゴリズムの処理結果が改善されることを示す。これは、特定のタイプの花やナッツのような共通の均質なデータセットと、より多様なデータセットの両方の場合である。 Anomaly detection aims at identifying images that deviate significantly from the norm. We focus on algorithms that embed the normal training examples in space and when given a test image, detect anomalies based on the features distance to the k-nearest training neighbors. We propose a new operator that takes into account the varying structure & importance of the features in the embedding space. Interestingly, this is done by taking into account not only the nearest neighbors, but also the neighbors of these neighbors (k-NNN). We show that by simply replacing the nearest neighbor component in existing algorithms by our k-NNN operator, while leaving the rest of the algorithms untouched, each algorithms own results are improved. This is the case both for common homogeneous datasets, such as flowers or nuts of a specific type, as well as for more diverse datasets	翻訳日:2023-05-30 17:16:55 公開日:2023-05-28
# 信頼できない絡み合い支援による絡み合い防止チャネルの通信 Communication Over Entanglement-Breaking Channels With Unreliable Entanglement Assistance ( http://arxiv.org/abs/2305.17692v1 ) ライセンス: Link先を確認	Uzi Pereg	(参考訳) 絡み合い支援は通信速度を大幅に向上させることができる。しかし、その世代は容易に失敗する。最近導入された信頼できない援助のモデルはこれらの課題に対処している。以前の研究は、アンアシストとアンタグルメント支援による過剰率のトレードオフに対する漸近的な公式を提供した。エンタングルメント破壊チャネルの完全特徴を導出し,エンタングルメント支援と非アシスト符号化の組み合わせが最適であることを示す。ネットワークの観点からすると、この発見は非自明であり、重ね合わせから生じる量子的挙動を強調する。 Entanglement assistance can improve communication rates significantly. Yet, its generation can easily fail. The recently-introduced model of unreliable assistance accounts for those challenges. Previous work provided an asymptotic formula for the tradeoff between the unassisted and excess rates from entanglement assistance. We derive a full characterization for entanglement-breaking channels, and show that combining entanglement-assisted and unassisted coding is suboptimal. From a networking perspective, this finding is nontrivial and highlights a quantum behavior arising from superposition.	翻訳日:2023-05-30 17:16:28 公開日:2023-05-28
# 事前学習言語モデルのためのプラグアンドプレイ知識注入 Plug-and-Play Knowledge Injection for Pre-trained Language Models ( http://arxiv.org/abs/2305.17691v1 ) ライセンス: Link先を確認	Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye, Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou	(参考訳) 外部知識を注入することで、様々な下流NLPタスクにおける事前学習言語モデル(PLM)の性能を向上させることができる。しかし、ダウンストリームタスクに新しい知識注入メソッドや知識ベースをデプロイするには、大規模な再トレーニングが必要となる。本研究では,既存の下流モデルの再利用により,知識注入の柔軟性と効率性を向上する方法を初めて研究する。この目的のために,我々は知識ベースを,知識プラグインによって凍結した既存の下流モデルに注入する,新たなパラダイムのプラグイン・アンド・プレイナレッジインジェクションを探求する。そこで本研究では,知識埋め込みのマッピングを学習し,モデルパラメータを凍らせながらモデル入力を強調する,プラグ・アンド・プレイ・インジェクション方式のmap-tuningを提案する。 3つの知識駆動型NLPタスクの実験結果から,既存のインジェクション手法は新しいパラダイムには適さないが,マップチューニングは下流モデルの性能を効果的に向上することが示された。さらに、凍結した下流モデルは、異なるドメイン知識のマッピングネットワークを持つ異なるドメインに適用可能であることを示す。私たちのコードとモデルはhttps://github.com/THUNLP/Knowledge-Plugin.comで公開されています。 Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks. However, massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks. In this work, we are the first to study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models. To this end, we explore a new paradigm plug-and-play knowledge injection, where knowledge bases are injected into frozen existing downstream models by a knowledge plugin. Correspondingly, we propose a plug-and-play injection method map-tuning, which trains a mapping of knowledge embeddings to enrich model inputs with mapped embeddings while keeping model parameters frozen. Experimental results on three knowledge-driven NLP tasks show that existing injection methods are not suitable for the new paradigm, while map-tuning effectively improves the performance of downstream models. Moreover, we show that a frozen downstream model can be well adapted to different domains with different mapping networks of domain knowledge. Our code and models are available at https://github.com/THUNLP/Knowledge-Plugin.	翻訳日:2023-05-30 17:16:09 公開日:2023-05-28
# HaVQA:Hausa言語における視覚的質問応答とマルチモーダルリサーチのためのデータセット HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language ( http://arxiv.org/abs/2305.17690v1 ) ライセンス: Link先を確認	Shantipriya Parida, Idris Abdulmumin, Shamsuddeen Hassan Muhammad, Aneesh Bose, Guneet Singh Kohli, Ibrahim Said Ahmad, Ketan Kotwal, Sayan Deb Sarkar, Ond\v{r}ej Bojar, Habeebah Adamu Kakudi	(参考訳) 本稿では,Hausa言語における視覚質問応答(VQA)タスクのためのマルチモーダルデータセットHaVQAを提案する。データセットは、6,022の英問合せペアを手動で翻訳することで作成され、Visual Genomeデータセットから1,555のユニークな画像に関連付けられている。その結果、データセットは12,044ゴールドの標準英語とハウサの平行文を提供し、対応する視覚情報と意味的一致を保証する方法で翻訳される。視覚質問応答,視覚質問誘発,テキストのみの翻訳,マルチモーダル機械翻訳など,データセットのベースライン実験を行った。 This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.	翻訳日:2023-05-30 17:15:29 公開日:2023-05-28
# amplification trojan network: 固有の弱さを増幅してディープニューラルネットワークを攻撃する Amplification trojan network: Attack deep neural networks by amplifying their inherent weakness ( http://arxiv.org/abs/2305.17688v1 ) ライセンス: Link先を確認	Zhanhao Hu, Jun Zhu, Bo Zhang, Xiaolin Hu	(参考訳) 最近の研究で、ディープニューラルネットワーク(DNN)は、クリーンな入力に敵のノイズを加えることで、敵の例によって騙される可能性があることが判明した。実例におけるdnnの精度は, 逆雑音の大きさが大きくなるにつれて低下する。本研究では,特定の状況下でノイズが小さい場合には,DNNも騙すことができることを示す。この新しい攻撃はアンプリフィケーション・トロイジャン・アタック(Amplification Trojan Attack,ATAttack)と呼ばれる。具体的には、ターゲットDNNに送信する前に、トロイの木馬ネットワークを用いて入力を変換する。このトロイの木馬ネットワークは、ターゲットDNN固有の弱点を増幅する増幅器として機能する。トロイの木馬ネットワークに感染した標的DNNは、通常、クリーンデータ上で動作し、敵の例に対して脆弱である。入力だけを変換するので、トロイの木馬ネットワークはDNNベースのパイプラインに隠れることができる。この新しいタイプの脅威は、安全なDNNを開発する際に考慮すべきである。 Recent works found that deep neural networks (DNNs) can be fooled by adversarial examples, which are crafted by adding adversarial noise on clean inputs. The accuracy of DNNs on adversarial examples will decrease as the magnitude of the adversarial noise increase. In this study, we show that DNNs can be also fooled when the noise is very small under certain circumstances. This new type of attack is called Amplification Trojan Attack (ATAttack). Specifically, we use a trojan network to transform the inputs before sending them to the target DNN. This trojan network serves as an amplifier to amplify the inherent weakness of the target DNN. The target DNN, which is infected by the trojan network, performs normally on clean data while being more vulnerable to adversarial examples. Since it only transforms the inputs, the trojan network can hide in DNN-based pipelines, e.g. by infecting the pre-processing procedure of the inputs before sending them to the DNNs. This new type of threat should be considered in developing safe DNNs.	翻訳日:2023-05-30 17:15:17 公開日:2023-05-28
# デッドバンド負荷アグリゲーションにおける予測可能性と公平性 Predictability and Fairness in Load Aggregation with Deadband ( http://arxiv.org/abs/2305.17725v1 ) ライセンス: Link先を確認	F. V. Difonzo and M. Roubalik and J. Marecek	(参考訳) 仮想発電所と負荷集約はますます一般的になりつつある。そこでは、分散エネルギー資源(ders)のアンサンブルの集約電力出力を規制する。 Marecekなど。 [Automatica, Volume 147, January 2023, 110743, arXiv:2110.03001] は、最近、提供された価格又はインセンティブの長期平均は、DER、アグリゲーター及び電力網の運営者の初期状態とは独立して存在するべきであることを示唆している。これは予測可能性と見なすことができ、公平さの根底にある。残念ながら、そのような平均値の存在は、デッドバンドの有無にかかわらず比例積分(PI)規制を含む多くの伝統的な規制機関では保証できない。ここでは、交流電流モデルにおける損失とコントローラのデッドバンドの影響について考察する。これにより(非線形損失による)非線形力学系は(デッドバンドによる)不連続性を示す。交互電流モデルとデッドバンドの非線形性を考慮したフィリッポフ不変測度は予測可能性と公平性についての推論を可能にする。 Virtual power plants and load aggregation are becoming increasingly common. There, one regulates the aggregate power output of an ensemble of distributed energy resources (DERs). Marecek et al. [Automatica, Volume 147, January 2023, 110743, arXiv:2110.03001] recently suggested that long-term averages of prices or incentives offered should exist and be independent of the initial states of the operators of the DER, the aggregator, and the power grid. This can be seen as predictability, which underlies fairness. Unfortunately, the existence of such averages cannot be guaranteed with many traditional regulators, including the proportional-integral (PI) regulator with or without deadband. Here, we consider the effects of losses in the alternating current model and the deadband in the controller. This yields a non-linear dynamical system (due to the non-linear losses) exhibiting discontinuities (due to the deadband). We show that Filippov invariant measures enable reasoning about predictability and fairness while considering non-linearity of the alternating-current model and deadband.	翻訳日:2023-05-30 17:07:54 公開日:2023-05-28
# 中国語スペル訂正のためのマスケッド言語モデリングの再考 Rethinking Masked Language Modeling for Chinese Spelling Correction ( http://arxiv.org/abs/2305.17721v1 ) ライセンス: Link先を確認	Hongqiu Wu and Shaohua Zhang and Yuchen Zhang and Hai Zhao	(参考訳) 本稿では,中国語のスペル補正(CSC)を,言語モデルと誤りモデルという2つの異なるモデルによる共同決定として検討する。経験的分析により、細調整されたBERTは言語モデルに不適合なままエラーモデルに過度に適合する傾向にあり、結果として分布外エラーパターンへの一般化が不十分であることがわかった。 BERTがほとんどのCSCモデルのバックボーンであることを考えると、この現象は大きな負の影響を及ぼす。この問題に対処するため、既存のベンチマークよりも高品質で多様性の高いマルチドメインベンチマークLEMONをリリースし、CSCモデルのオープンドメインの一般化を包括的に評価する。そこで我々は,入力シーケンスから20 %の非エラートークンをランダムにマスキングすることで,エラーモデルを犠牲にすることなく,より優れた言語モデルを学習できることを示す。この手法はどんなモデルアーキテクチャにも適用可能で、SIGHAN、ECSpell、LEMONで最新の結果が得られる。 In this paper, we study Chinese Spelling Correction (CSC) as a joint decision made by two separate models: a language model and an error model. Through empirical analysis, we find that fine-tuning BERT tends to over-fit the error model while under-fit the language model, resulting in poor generalization to out-of-distribution error patterns. Given that BERT is the backbone of most CSC models, this phenomenon has a significant negative impact. To address this issue, we are releasing a multi-domain benchmark LEMON, with higher quality and diversity than existing benchmarks, to allow a comprehensive assessment of the open domain generalization of CSC models. Then, we demonstrate that a very simple strategy, randomly masking 20\% non-error tokens from the input sequence during fine-tuning is sufficient for learning a much better language model without sacrificing the error model. This technique can be applied to any model architecture and achieves new state-of-the-art results on SIGHAN, ECSpell, and LEMON.	翻訳日:2023-05-30 17:07:33 公開日:2023-05-28
# FuseCap: ビジュアルデータをリッチなイメージキャプションにフェースするために大規模な言語モデルを活用する FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions ( http://arxiv.org/abs/2305.17718v1 ) ライセンス: Link先を確認	Noam Rotstein, David Bensaid, Shaked Brody, Roy Ganz, Ron Kimmel	(参考訳) 画像キャプションはコンピュータビジョンにおける中心的な課題であり、視覚言語による事前学習技術の出現以降、かなりの進歩を遂げてきた。本稿では,意味的に重要な要素を捉えるのにしばしば失敗するキャプションモデルに,しばしば見落とされがちな制限を強調する。この欠点は、テキスト画像データセットに遡ることができる。キャプションは通常、画像コンテンツの一般的な描写を提供するが、しばしば詳細を省略する。この制限を緩和するために,物体検出器,属性認識器,光学文字認識器 (OCR) などの視覚専門家から得られた視覚情報によりキャプションを充実させる新しい手法であるFuseCapを提案する。提案手法は,大規模な言語モデル (LLM) を用いて視覚専門家の出力を元のキャプションと融合し,包括的画像記述を示す豊富なキャプションを生成する。定量的および定性的な分析により,提案手法の有効性を検証した。提案手法は, 高精度かつ詳細なキャプションを生成する上で, 精度の低いパラメータとトレーニングデータを用いて, 最先端のアプローチを超越したキャプションモデルBLIPのトレーニングセットをキュレートする。さらに,12M画像強化キャプションペアからなるデータセットを提供し,提案手法が画像テキスト検索を大幅に改善することを示す。 Image captioning is a central task in computer vision which has experienced substantial progress following the advent of vision-language pre-training techniques. In this paper, we highlight a frequently overlooked limitation of captioning models that often fail to capture semantically significant elements. This drawback can be traced back to the text-image datasets; while their captions typically offer a general depiction of image content, they frequently omit salient details. To mitigate this limitation, we propose FuseCap - a novel method for enriching captions with additional visual information, obtained from vision experts, such as object detectors, attribute recognizers, and Optical Character Recognizers (OCR). Our approach fuses the outputs of such vision experts with the original caption using a large language model (LLM), yielding enriched captions that present a comprehensive image description. We validate the effectiveness of the proposed caption enrichment method through both quantitative and qualitative analysis. Our method is then used to curate the training set of a captioning model based BLIP which surpasses current state-of-the-art approaches in generating accurate and detailed captions while using significantly fewer parameters and training data. As additional contributions, we provide a dataset comprising of 12M image-enriched caption pairs and show that the proposed method largely improves image-text retrieval.	翻訳日:2023-05-30 17:07:16 公開日:2023-05-28
# InDL:ビジュアルイリュージョンに基づくインダイアグラム論理解釈のための新しいデータセットとベンチマーク InDL: A New Datasets and Benchmark for In-Diagram Logic Interpreting based on Visual Illusion ( http://arxiv.org/abs/2305.17716v1 ) ライセンス: Link先を確認	Haobo Yang, Wenyu Wang, Ze Cao, Zhekai Duan, Xuchen Liu	(参考訳) 本稿では,深層学習モデルの論理解釈能力を評価するための新しい手法を提案する。視覚錯視の興味深い領域を活用して、これらのモデルを厳格にテストし、ベンチマークするために設計されたユニークなデータセットInDLを構築します。ディープラーニングはコンピュータビジョンや自然言語処理といった領域で顕著な進歩をみせた。しかしながら、モデルは、決定過程を曖昧にする固有の「ブラックボックス」特性のために、論理的推論を必要とするタスクに悩まされることが多い。私たちの研究は、知覚と論理の複雑な相互作用である視覚錯覚の扱いに焦点を当てることで、これらのモデルをよりよく理解するための新しいレンズを提示します。 6つの古典的な幾何学的錯覚を用いて,人間と機械の視覚知覚の比較枠組みを構築した。この方法論は、モデルをランク付けし、潜在的な弱点を解明し、モデル改善のための実行可能な洞察を提供する。実験により,本手法の有効性を検証し,その論理解釈能力に基づくモデルランキングの有効性を示す。再現可能な研究へのコミットメントの一環として、ソースコードとデータセットは、以下で公開されます(TODO GitHub repo)。 This paper introduces a novel approach to evaluating deep learning models' capacity for in-diagram logic interpretation. Leveraging the intriguing realm of visual illusions, we establish a unique dataset, InDL, designed to rigorously test and benchmark these models. Deep learning has witnessed remarkable progress in domains such as computer vision and natural language processing. However, models often stumble in tasks requiring logical reasoning due to their inherent 'black box' characteristics, which obscure the decision-making process. Our work presents a new lens to understand these models better by focusing on their handling of visual illusions -- a complex interplay of perception and logic. We utilize six classic geometric optical illusions to create a comparative framework between human and machine visual perception. This methodology offers a quantifiable measure to rank models, elucidating potential weaknesses and providing actionable insights for model improvements. Our experimental results affirm the efficacy of our benchmarking strategy, demonstrating its ability to effectively rank models based on their logic interpretation ability. As part of our commitment to reproducible research, the source code and datasets will be made publicly available here: (TODO GitHub repo).	翻訳日:2023-05-30 17:06:44 公開日:2023-05-28
# 署名付き言語翻訳のためのオープンソースのGrossベースライン An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation ( http://arxiv.org/abs/2305.17714v1 ) ライセンス: Link先を確認	Amit Moryossef, Mathias M\"uller, Anne G\"ohring, Zifan Jiang, Yoav Goldberg, and Sarah Ebling	(参考訳) 手話翻訳システムは複雑で多くのコンポーネントを必要とする。その結果、出版物間で手法を比較することは非常に困難である。本稿では,ドイツ語からスイスドイツ語への変換,スイスのフランス語からフランス語への変換,スイスのイタリア語からイタリア語への変換を示す,テキストから音声へのパイプライン方式のオープンソース実装を提案する。テキストから言語への翻訳には,レマタイザ,ルールに基づく単語の並べ替えとドロップ,ニューラルマシン翻訳システムという3つの異なるコンポーネントが提案されている。 Gloss-to-pose変換は、ビデオから骨格のポーズを抽出した3つの異なる符号付き言語のための辞書のデータを使用して発生する。文を生成するために、まずtext-to-glossシステムを実行し、その結果の符号のポーズ表現を縫い合わせる。 Sign language translation systems are complex and require many components. As a result, it is very hard to compare methods across publications. We present an open-source implementation of a text-to-gloss-to-pose-to-video pipeline approach, demonstrating conversion from German to Swiss German Sign Language, French to French Sign Language of Switzerland, and Italian to Italian Sign Language of Switzerland. We propose three different components for the text-to-gloss translation: a lemmatizer, a rule-based word reordering and dropping component, and a neural machine translation system. Gloss-to-pose conversion occurs using data from a lexicon for three different signed languages, with skeletal poses extracted from videos. To generate a sentence, the text-to-gloss system is first run, and the pose representations of the resulting signs are stitched together.	翻訳日:2023-05-30 17:06:25 公開日:2023-05-28
# gibbs状態生成のための変分量子アルゴリズム Variational Quantum Algorithms for Gibbs State Preparation ( http://arxiv.org/abs/2305.17713v1 ) ライセンス: Link先を確認	Mirko Consiglio	(参考訳) ノイズの多い中間スケール量子(NISQ)デバイス上で相互作用する量子多体系のギブス状態を作成することは、量子状態における熱力学的性質を探索するための重要な課題である。熱化や平衡外熱力学などの理解プロトコルや、忠実に準備されたギブス状態からのサンプリングは、量子アルゴリズムに有用なリソースを提供する方法を作ることができる。変分量子アルゴリズム(VQA)は、ギブス状態を効率的に作成する上で最も有望であるが、NISQコンピュータ上でギブス状態を効果的に決定および準備するために適用できる様々なアプローチがある。本稿では,システム-環境結合,量子イマジナリー時間発展,ヘルムホルツ自由エネルギーをコスト関数として用いた最新のvqaなど,gibbs状態の合成が可能なアルゴリズムの簡潔な概要について述べる。さらに,consiglioら (arxiv:2303.11276) が開発した最新の変分ギブス状態生成アルゴリズムのベンチマークを行い,スピン1/2 1次元 xy$モデルに適用した。 Preparing the Gibbs state of an interacting quantum many-body system on noisy intermediate-scale quantum (NISQ) devices is a crucial task for exploring the thermodynamic properties in the quantum regime. It encompasses understanding protocols such as thermalization and out-of-equilibrium thermodynamics, as well as sampling from faithfully prepared Gibbs states could pave the way to providing useful resources for quantum algorithms. Variational quantum algorithms (VQAs) show the most promise in efficiently preparing Gibbs states, however, there are many different approaches that could be applied to effectively determine and prepare Gibbs states on a NISQ computer. In this paper, we provide a concise overview of the algorithms capable of preparing Gibbs states, including joint Hamiltonian evolution of a system--environment coupling, quantum imaginary time evolution, and modern VQAs utilizing the Helmholtz free energy as a cost function, among others. Furthermore, we perform a benchmark of one of the latest variational Gibbs state preparation algorithms, developed by Consiglio et al. (arXiv:2303.11276), by applying it to the spin 1/2 one-dimensional $XY$ model.	翻訳日:2023-05-30 17:06:09 公開日:2023-05-28
# OccCasNet:光深度推定のためのオクルージョン対応カスケードコストボリューム OccCasNet: Occlusion-aware Cascade Cost Volume for Light Field Depth Estimation ( http://arxiv.org/abs/2305.17710v1 ) ライセンス: Link先を確認	Wentao Chao, Fuqing Duan, Xuechun Wang, Yingqian Wang, Guanghui Wang	(参考訳) 光場(LF)深度推定は,多くの実用応用において重要な課題である。しかし、マルチビューステレオ(MVS)に基づく主流の手法は、より細かいコストのボリュームを構築する必要があるため、リソース集約的で時間を要する。この問題に対処し,精度と効率のトレードオフを改善するために,LF深度推定のためのオクルージョン対応カスケードコストボリュームを提案する。提案手法は,細かなコストボリュームの構築時にサンプリング間隔を一定に保ちながらサンプリング数を削減する。また,オクルージョン対応のコスト容積を構築する際の精度を高めるために,オクルージョンマップを導入する。具体的には,まず粗異性推定ネットワークを通して粗異性マップを得る。そして、初期差分マップに基づいて、サイドビューのサブアパーチャ画像(SAI)をセンタービューにワープする。次に、歪んだSAIと中央SAIとの間の光一貫性制約を提案し、各SAIに対して閉塞マップを生成する。最後に, 粗分散マップとオクルージョンマップを導入し, オクルージョン・アウェア・コストボリュームの構築を行い, 洗練された不一致推定ネットワークによりより正確な不一致マップが得られるようにした。広範な実験により本手法の有効性が実証された。本手法は最先端の手法と比較して精度と効率のバランスが良く,HCI 4D ベンチマークで発表された手法のうち,MSE と Q25 の指標が第一位である。提案手法のコードとモデルはhttps://github.com/chaowentao/occcasnetで入手できる。 Light field (LF) depth estimation is a crucial task with numerous practical applications. However, mainstream methods based on the multi-view stereo (MVS) are resource-intensive and time-consuming as they need to construct a finer cost volume. To address this issue and achieve a better trade-off between accuracy and efficiency, we propose an occlusion-aware cascade cost volume for LF depth (disparity) estimation. Our cascaded strategy reduces the sampling number while keeping the sampling interval constant during the construction of a finer cost volume. We also introduce occlusion maps to enhance accuracy in constructing the occlusion-aware cost volume. Specifically, we first obtain the coarse disparity map through the coarse disparity estimation network. Then, the sub-aperture images (SAIs) of side views are warped to the center view based on the initial disparity map. Next, we propose photo-consistency constraints between the warped SAIs and the center SAI to generate occlusion maps for each SAI. Finally, we introduce the coarse disparity map and occlusion maps to construct an occlusion-aware refined cost volume, enabling the refined disparity estimation network to yield a more precise disparity map. Extensive experiments demonstrate the effectiveness of our method. Compared with state-of-the-art methods, our method achieves a superior balance between accuracy and efficiency and ranks first in terms of MSE and Q25 metrics among published methods on the HCI 4D benchmark. The code and model of the proposed method are available at https://github.com/chaowentao/OccCasNet.	翻訳日:2023-05-30 17:05:50 公開日:2023-05-28
# ニューラルエンティティの参照解決を支援する並列データ Parallel Data Helps Neural Entity Coreference Resolution ( http://arxiv.org/abs/2305.17709v1 ) ライセンス: Link先を確認	Gongbo Tang, Christian Hardmeier	(参考訳) コリファレンス解決(coreference resolution)とは、テキスト内の同じエンティティを参照する式を見つける作業である。コリファレンスモデルは、一般的には単言語アノテートデータで訓練されるが、コリファレンスへのアノテートは高価かつ困難である。 Hardmeierら。 (2013) は、並列データが潜在照応的知識を含むことを示したが、エンドツーエンドのニューラルモデルではまだ研究されていない。本稿では,並列データからコア参照知識を活用するための,シンプルで効果的なモデルを提案する。アノテーションからコリファレンスを学ぶ従来のモジュールに加えて,言語間コリファレンス知識をキャプチャする教師なしモジュールも導入する。提案手法は,9つの異なる合成並列データセットを用いて,OntoNotes 5.0の英語データセットに対して最大1.74ポイントの一貫した改善を実現する。これらの実験結果から、並列データは、コリファレンス解決タスクに有用な追加のコリファレンス知識を提供できることが確認された。 Coreference resolution is the task of finding expressions that refer to the same entity in a text. Coreference models are generally trained on monolingual annotated data but annotating coreference is expensive and challenging. Hardmeier et al.(2013) have shown that parallel data contains latent anaphoric knowledge, but it has not been explored in end-to-end neural models yet. In this paper, we propose a simple yet effective model to exploit coreference knowledge from parallel data. In addition to the conventional modules learning coreference from annotations, we introduce an unsupervised module to capture cross-lingual coreference knowledge. Our proposed cross-lingual model achieves consistent improvements, up to 1.74 percentage points, on the OntoNotes 5.0 English dataset using 9 different synthetic parallel datasets. These experimental results confirm that parallel data can provide additional coreference knowledge which is beneficial to coreference resolution tasks.	翻訳日:2023-05-30 17:05:25 公開日:2023-05-28
# 量子古典的多重カーネル学習 Quantum-Classical Multiple Kernel Learning ( http://arxiv.org/abs/2305.17707v1 ) ライセンス: Link先を確認	Ara Ghukasyan and Jack S. Baker and Oktay Goktas and Juan Carrasquilla and Santosh Kumar Radha	(参考訳) 量子コンピュータがますます実用的になるにつれて、従来のアルゴリズムを改善するために量子計算を使う可能性も高まる。機械学習におけるカーネルメソッドは、近い将来にそのような改善が実現可能な分野のひとつだ。サポートベクターマシンのようなカーネル手法により、小さくてノイズの多い量子コンピュータは古典的に硬い量子カーネルを評価し、データの類似性のユニークな概念を捉えることができる。古典的機械学習の手法から着想を得て、マルチカーネル学習(mkl)の文脈でシミュレーションされた量子カーネルについて検討する。本研究では, 古典的, 量子量子的, 量子古典的カーネルのペアワイズ組み合わせについて, 支持ベクトルマシンによる分類性能の実証的研究を行った。 QCC-net (quantum-classical-convex Neural Network) と呼ばれる新しいアプローチを導入し、カーネルパラメータとともにベースカーネルの重みを最適化する。本手法は,MKL設定における各種性能指標の強化に有効であることを示す。より多くの機能(最大13次元)を持つデータを見ると、いくつかの組み合わせでカーネルの重み付けに成功するためのパラメータトレーニングが重要であることが分かります。相対効用指標として最適カーネル重みを用いると、特徴の数が増加するにつれて量子古典的カーネルの組み合わせにおけるトレーニング可能な量子カーネルからの寄与が増加する。単純な非パラメトリック量子カーネルを含む組合せの逆の傾向を観察する。 As quantum computers become increasingly practical, so does the prospect of using quantum computation to improve upon traditional algorithms. Kernel methods in machine learning is one area where such improvements could be realized in the near future. Paired with kernel methods like support-vector machines, small and noisy quantum computers can evaluate classically-hard quantum kernels that capture unique notions of similarity in data. Taking inspiration from techniques in classical machine learning, this work investigates simulated quantum kernels in the context of multiple kernel learning (MKL). We consider pairwise combinations of several classical-classical, quantum-quantum, and quantum-classical kernels in an empirical investigation of their classification performance with support-vector machines. We also introduce a novel approach, which we call QCC-net (quantum-classical-convex neural network), for optimizing the weights of base kernels together with any kernel parameters. We show this approach to be effective for enhancing various performance metrics in an MKL setting. Looking at data with an increasing number of features (up to 13 dimensions), we find parameter training to be important for successfully weighting kernels in some combinations. Using the optimal kernel weights as indicators of relative utility, we find growing contributions from trainable quantum kernels in quantum-classical kernel combinations as the number of features increases. We observe the opposite trend for combinations containing simpler, non-parametric quantum kernels.	翻訳日:2023-05-30 17:05:10 公開日:2023-05-28
# 雑音と混合音声からのスポットキーワード Spot keywords from very noisy and mixed speech ( http://arxiv.org/abs/2305.17706v1 ) ライセンス: Link先を確認	Ying Shi, Dong Wang, Lantian Li, Jiqing Han and Shi Yin	(参考訳) 現存するほとんどのキーワードスポッティング研究は、わずかまたは中程度の雑音のある条件に焦点を当てている。本稿では,強い干渉音声の下に埋もれたキーワード(振幅の10倍)を検出し,さらにさらに悪いことに,他のキーワードと混在する,より困難な課題に取り組むことを試みる。本稿では,雑音と混合音声から低エネルギーのキーワードを発見することをモデルに促す新しい混合訓練手法を提案する。バニラCNNと2つのEfficientNet (B0/B2)アーキテクチャで実験を行った。 google speech commandデータセットで評価された結果は、提案されたmix trainingアプローチが極めて効果的であり、標準データ拡張とmixupトレーニングを上回っていることを示している。 Most existing keyword spotting research focuses on conditions with slight or moderate noise. In this paper, we try to tackle a more challenging task: detecting keywords buried under strong interfering speech (10 times higher than the keyword in amplitude), and even worse, mixed with other keywords. We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech. Experiments were conducted with a vanilla CNN and two EfficientNet (B0/B2) architectures. The results evaluated with the Google Speech Command dataset demonstrated that the proposed mix training approach is highly effective and outperforms standard data augmentation and mixup training.	翻訳日:2023-05-30 17:04:47 公開日:2023-05-28
# 短文ストリームにおける信頼性と解釈可能なドリフト検出 Reliable and Interpretable Drift Detection in Streams of Short Texts ( http://arxiv.org/abs/2305.17750v1 ) ライセンス: Link先を確認	Ella Rabinovich, Matan Vetzler, Samuel Ackerman, Ateret Anaby-Tavor	(参考訳) データドリフトはモデル入力データの変化であり、機械学習モデルの性能劣化につながる重要な要因の1つである。ドリフトのモニタリングはこれらの問題を検知し、有害な結果を防ぐのに役立つ。意味のあるドリフト解釈は、モデルの効果的な再訓練に向けた基本的なステップである。本研究では,大規模タスク指向ダイアログシステムにおいて,信頼性の高いモデル非依存な変更点検出と解釈のためのエンドツーエンドフレームワークを提案する。当社のアプローチを評価し,顧客要求をダイアログシステムにシミュレートする意図分類学習データセットの新たな変種を用いて,そのメリットを実証する。データを公開しています。 Data drift is the change in model input data that is one of the key factors leading to machine learning models performance degradation over time. Monitoring drift helps detecting these issues and preventing their harmful consequences. Meaningful drift interpretation is a fundamental step towards effective re-training of the model. In this study we propose an end-to-end framework for reliable model-agnostic change-point detection and interpretation in large task-oriented dialog systems, proven effective in multiple customer deployments. We evaluate our approach and demonstrate its benefits with a novel variant of intent classification training dataset, simulating customer requests to a dialog system. We make the data publicly available.	翻訳日:2023-05-30 16:58:31 公開日:2023-05-28
# 音波伝搬のベイズ推定とニューラル推定 Bayesian inference and neural estimation of acoustic wave propagation ( http://arxiv.org/abs/2305.17749v1 ) ライセンス: Link先を確認	Yongchao Huang, Yuhang He, Hong Ge	(参考訳) 本研究では,物理と機械学習を組み合わせて音響信号を解析する新しい枠組みを提案する。この課題に対して, スペクトル音響特性を推定するためのベイズ推定手法, 前方および後方の物理的損失をニューラルネットワークに装備するニューラルネットワーク物理モデル, ベンチマークとして機能する非線形最小二乗手法の3つの手法を開発した。推定伝搬係数は、不確実性のある再局在に使用できる室インパルス応答(RIR)量につながる。このフレームワークの単純さと効率性は、シミュレーションデータ上で実証的に検証される。 In this work, we introduce a novel framework which combines physics and machine learning methods to analyse acoustic signals. Three methods are developed for this task: a Bayesian inference approach for inferring the spectral acoustics characteristics, a neural-physical model which equips a neural network with forward and backward physical losses, and the non-linear least squares approach which serves as benchmark. The inferred propagation coefficient leads to the room impulse response (RIR) quantity which can be used for relocalisation with uncertainty. The simplicity and efficiency of this framework is empirically validated on simulated data.	翻訳日:2023-05-30 16:58:21 公開日:2023-05-28
# タンパー検出のための画像ハッシュ最小化 Image Hash Minimization for Tamper Detection ( http://arxiv.org/abs/2305.17748v1 ) ライセンス: Link先を確認	Subhajit Maity, Ram Kumar Karsh	(参考訳) 画像ハッシュを用いたタンパー検出は現代の非常に一般的な問題である。この問題に対処するためのいくつかの研究と進歩がすでに行われている。しかし,既存の手法の多くは,改ざん面積が低い場合には改ざん検出の精度が低く,画像ハッシュも長い。本論文では,低改質領域の性能を向上しつつ,ハッシュ長を客観的に最小化する手法を提案する。 Tamper detection using image hash is a very common problem of modern days. Several research and advancements have already been done to address this problem. However, most of the existing methods lack the accuracy of tamper detection when the tampered area is low, as well as requiring long image hashes. In this paper, we propose a novel method objectively to minimize the hash length while enhancing the performance at low tampered area.	翻訳日:2023-05-30 16:58:11 公開日:2023-05-28
# ホワイトニングに基づく文埋め込みのコントラスト学習 Whitening-based Contrastive Learning of Sentence Embeddings ( http://arxiv.org/abs/2305.17746v1 ) ライセンス: Link先を確認	Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang	(参考訳) 本稿では,新しいシャッフルグループホワイトニングとコントラスト学習を組み合わせた,文埋め込み学習(whitenedcse)のためのホワイトニングベースのコントラスト学習手法を提案する。一般的に、対照的学習は単一のサンプル(すなわち正のサンプル)の歪みを閉じて負のサンプルを遠くへ押し出し、特徴空間のアライメントと均一性を促進する。プッシング」操作の一般的な代替手段は、全てのサンプルを均一に散乱させる特徴空間の白化である。ホワイトニングとコントラスト学習は、均一性に大きな冗長性を持つため、通常は個別に使用され、共同作業は容易ではない。本論文は, 初めて, ホワイトニングをコントラスト学習方式に統合し, 2つの利点を享受する。 1) 統一性の向上。これらの2つのアプローチは完全に冗長ではなく、実際には異なる均一性機構のために相補性を持っている。 2)アライメントの改善。特徴をチャネル軸に沿って複数のグループにランダムに分割し,各グループ内で独立してホワイトニングを行う。群分割をシャッフルすることで、単一のサンプルの複数の歪みを導き、正のサンプル多様性を増加させる。その結果、多様性が向上した複数の正のサンプルを使用することで、アライメントの向上によるコントラスト学習がさらに向上する。 7つの意味的テキスト類似性タスクに関する広範囲な実験は、我々の手法が対照的な学習ベースラインよりも一貫した改善を達成し、STSタスク上のスピアマン相関を78.78\%(+2.53\%)に設定していることを示している。 This paper presents a whitening-based contrastive learning method for sentence embedding learning (WhitenedCSE), which combines contrastive learning with a novel shuffled group whitening. Generally, contrastive learning pulls distortions of a single sample (i.e., positive samples) close and push negative samples far away, correspondingly facilitating the alignment and uniformity in the feature space. A popular alternative to the "pushing'' operation is whitening the feature space, which scatters all the samples for uniformity. Since the whitening and the contrastive learning have large redundancy w.r.t. the uniformity, they are usually used separately and do not easily work together. For the first time, this paper integrates whitening into the contrastive learning scheme and facilitates two benefits. 1) Better uniformity. We find that these two approaches are not totally redundant but actually have some complementarity due to different uniformity mechanism. 2) Better alignment. We randomly divide the feature into multiple groups along the channel axis and perform whitening independently within each group. By shuffling the group division, we derive multiple distortions of a single sample and thus increase the positive sample diversity. Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment. Extensive experiments on seven semantic textual similarity tasks show our method achieves consistent improvement over the contrastive learning baseline and sets new states of the art, e.g., 78.78\% (+2.53\% based on BERT\ba) Spearman correlation on STS tasks.	翻訳日:2023-05-30 16:58:04 公開日:2023-05-28
# LEAPで言語バリアを壊す:多言語LLMの学習戦略 Breaking Language Barriers with a LEAP: Learning Strategies for Polyglot LLMs ( http://arxiv.org/abs/2305.17740v1 ) ライセンス: Link先を確認	Akshay Nambi, Vaibhav Balloli, Mercy Ranjit, Tanuja Ganu, Kabir Ahuja, Sunayana Sitaram, Kalika Bali	(参考訳) 大規模言語モデル(llm)は、多くのドメインをグローバルに変革する最前線にある。しかしながら、その傾向と有効性は、非ラテン語スクリプトや低リソース言語に限られている。本稿では,LLMの多言語的性能向上という課題に取り組み,特に生成モデルに着目した。一般的な質問応答(QA)データセットを用いた多言語言語の体系的調査と評価を通じて,多言語ランドスケープにおけるLLMの真のポテンシャルを解き放つ新しい手法を提案する。提案手法は,多言語習熟度を著しく向上させる3つの重要な戦略を含む。まず,ポリグロットLLMに適したプロンプトを巧みに最適化することにより,その潜在能力を解放し,言語間で大幅な性能向上を実現する。第2に,GPT生成を多言語埋め込みと相乗化し,QAや検索といった重要なタスクにおいて,多言語のパフォーマンス向上を実現するハイブリッド手法を提案する。最後に,多言語LLMの性能をさらに向上させるために,最適プロンプト戦略,LLMモデル,クエリ毎の埋め込みを動的に選択する新しい学習アルゴリズムを提案する。この動的適応は言語間のLLMの有効性を最大化し、最高の静的およびランダムな戦略より優れる。以上の結果から,多言語理解と多言語生成の進歩が示唆された。 Large language models (LLMs) are at the forefront of transforming numerous domains globally. However, their inclusivity and effectiveness remain limited for non-Latin scripts and low-resource languages. This paper tackles the imperative challenge of enhancing the multilingual performance of LLMs, specifically focusing on Generative models. Through systematic investigation and evaluation of diverse languages using popular question-answering (QA) datasets, we present novel techniques that unlock the true potential of LLMs in a polyglot landscape. Our approach encompasses three key strategies that yield remarkable improvements in multilingual proficiency. First, by meticulously optimizing prompts tailored for polyglot LLMs, we unlock their latent capabilities, resulting in substantial performance boosts across languages. Second, we introduce a new hybrid approach that synergizes GPT generation with multilingual embeddings and achieves significant multilingual performance improvement on critical tasks like QA and retrieval. Finally, to further propel the performance of polyglot LLMs, we introduce a novel learning algorithm that dynamically selects the optimal prompt strategy, LLM model, and embeddings per query. This dynamic adaptation maximizes the efficacy of LLMs across languages, outperforming best static and random strategies. Our results show substantial advancements in multilingual understanding and generation across a diverse range of languages.	翻訳日:2023-05-30 16:57:34 公開日:2023-05-28
# spoofローカライズのためのレンジベース等誤差レート Range-Based Equal Error Rate for Spoof Localization ( http://arxiv.org/abs/2305.17739v1 ) ライセンス: Link先を確認	Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi	(参考訳) spoofのローカライズ(spoof localization)はセグメントレベル検出(segment-level detection)とも呼ばれ、部分的なspoof音声中のspoofを見つけるための重要なタスクである。等誤差率(EER)は、このような生体シナリオのパフォーマンスを測定するために広く使われている。 eerは唯一のしきい値のないメトリクスであるが、通常はスコアと参照を予め定義された時間分解能で使用し、誤分類されたセグメントの数をカウントするポイントベースで計算される。このような点に基づく測定は、この解決法に過度に依存し、誤った分類範囲を正確に測定することができない。誤分類範囲を適切に測定し,スプーフ局所化性能をよりよく評価するために,点ベースEERを範囲ベースEERにアップグレードする。そして,この二進探索アルゴリズムを範囲ベースEERの計算に適用し,古典的点ベースEERと比較する。そこで本研究では,適切な時間分解能を持つレンジベースEERとポイントベースEERを併用することにより,スプーフ局所化の性能を適切に評価できることを示す。 Spoof localization, also called segment-level detection, is a crucial task that aims to locate spoofs in partially spoofed audio. The equal error rate (EER) is widely used to measure performance for such biometric scenarios. Although EER is the only threshold-free metric, it is usually calculated in a point-based way that uses scores and references with a pre-defined temporal resolution and counts the number of misclassified segments. Such point-based measurement overly relies on this resolution and may not accurately measure misclassified ranges. To properly measure misclassified ranges and better evaluate spoof localization performance, we upgrade point-based EER to range-based EER. Then, we adapt the binary search algorithm for calculating range-based EER and compare it with the classical point-based EER. Our analyses suggest utilizing either range-based EER, or point-based EER with a proper temporal resolution can fairly and properly evaluate the performance of spoof localization.	翻訳日:2023-05-30 16:57:13 公開日:2023-05-28
# 記憶と無記憶の異なる重力波バーストプロファイルのための絡み合い収穫法 Entanglement harvesting for different gravitational wave burst profiles with and without memory ( http://arxiv.org/abs/2305.17735v1 ) ライセンス: Link先を確認	Subhajit Barman, Indranil Chakraborty, Sajal Mukherjee	(参考訳) 絡み合った収穫の可能性は驚くべき現象であり、背景形状や検出器の動きなどの影響を受けている。本稿では、線形化重力における異なる重力波(GW)バーストプロファイルが、2つの静的Unruh-DeWitt検出器間の収穫にどのように影響するかを考察する。この目的のために, ガウス, sech-squared および tanh のバーストプロファイルについて検討する。これらのうち、最初の2つのバーストはメモリを含まないが、後者はバニッシュしないメモリ効果からなる。いずれの場合も、絡み合いの収穫が可能であり、検出器間の距離が大きくなると減少することがわかった。また、この収穫は、記憶の有無によって定性的に異なる。記憶のない2つのバーストプロファイルでは、より長いバーストは低検出器遷移エネルギーレジームでより大きな収穫に対応し、この特性はより大きな遷移エネルギーのために反転する。一方、メモリを持つタン型プロファイルでは、短いバーストでは収穫が常に大きい。我々はこの発見の結果について簡単に議論する。 The possibility of entanglement harvesting is a fascinating phenomenon, which gets affected due to the background geometry, the motion of detectors, etc. In the present article, we study how different gravitational wave (GW) burst profiles in linearized gravity, with and without the asymptotic memory, may influence the harvesting between two static Unruh-DeWitt detectors. To this end, we investigate the following burst profiles -- Gaussian, sech-squared, and tanh. Out of these, the first two bursts contain no memory, while the latter consists of a non-vanishing memory effect. We found that in all of these cases, entanglement harvesting is possible, and it decreases with the increasing distance between detectors. Moreover, the harvesting differs qualitatively based on the presence or absence of the memory. For the two burst profiles without memory, longer bursts correspond to greater harvesting in the low detector transition energy regime, and this characteristic is reversed for larger transition energy. Meanwhile, for the tanh type profile with memory, harvesting is always greater for shorter bursts. We briefly discuss some of the consequences of our findings.	翻訳日:2023-05-30 16:56:54 公開日:2023-05-28
# 低リソース環境における事前学習オーディオエンコーダの検討 Investigating Pre-trained Audio Encoders in the Low-Resource Condition ( http://arxiv.org/abs/2305.17733v1 ) ライセンス: Link先を確認	Hao Yang, Jinming Zhao, Gholamreza Haffari, Ehsan Shareghi	(参考訳) 事前訓練された音声エンコーダは、様々な音声理解および生成タスクにおいて最先端の結果をプッシュする中心となっている。それでも、低リソース設定でのエンコーダの能力は、まだ十分に検討されていない。そこで本研究では,3つの最先端エンコーダ(Wav2vec2,WavLM,Whisper)を7つの音声理解および生成タスクにまたがる低リソース環境で,包括的な実験を行う。本稿では,エンコーダのタスク性能,収束速度,表現特性に関する定量的・定性的な解析を行う。これらのエンコーダの事前学習プロトコルと、それらが内部層で情報を取得する方法との接続を観察する。特に、whisperエンコーダは、パフォーマンスと収束速度の観点から、コンテンツ駆動タスクにおいて最大の低リソース能力を示す。 Pre-trained speech encoders have been central to pushing state-of-the-art results across various speech understanding and generation tasks. Nonetheless, the capabilities of these encoders in low-resource settings are yet to be thoroughly explored. To address this, we conduct a comprehensive set of experiments using a representative set of 3 state-of-the-art encoders (Wav2vec2, WavLM, Whisper) in the low-resource setting across 7 speech understanding and generation tasks. We provide various quantitative and qualitative analyses on task performance, convergence speed, and representational properties of the encoders. We observe a connection between the pre-training protocols of these encoders and the way in which they capture information in their internal layers. In particular, we observe the Whisper encoder exhibits the greatest low-resource capabilities on content-driven tasks in terms of performance and convergence speed.	翻訳日:2023-05-30 16:56:36 公開日:2023-05-28
# マルチターン会話データセットのための三段階共同自然言語理解 Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets ( http://arxiv.org/abs/2305.17729v1 ) ライセンス: Link先を確認	Henry Weld, Sijia Hu, Siqu Long, Josiah Poon, Soyeon Caren Han	(参考訳) 自然言語理解は通常、単語レベルでの単一発話を二重レベルの意味フレーム、文レベルの意図、スロットラベルにマッピングする。最高のパフォーマンスモデルは、インテント検出とスロットフィリングの間の明示的な相互作用を強制する。本稿では,新しい3レベル統合自然言語理解手法を提案し,ドメインを追加し,すべてのレベル間で意味情報を明示的に交換する。このアプローチでは、単一発話よりも自然な会話環境であるマルチターンデータセットの使用を可能にする。我々は,2つのマルチターンデータセットを用いて,共同スロット充填とインテント検出を行った最初のモデルとして評価を行った。本モデルはマルチターンデータセットのスロット充填とインテント検出において最先端のジョイントモデルを上回る。層間の明示的な相互作用の場所を解析する。ドメイン情報を含むとモデルの性能が向上する。 Natural language understanding typically maps single utterances to a dual level semantic frame, sentence level intent and slot labels at the word level. The best performing models force explicit interaction between intent detection and slot filling. We present a novel tri-level joint natural language understanding approach, adding domain, and explicitly exchange semantic information between all levels. This approach enables the use of multi-turn datasets which are a more natural conversational environment than single utterance. We evaluate our model on two multi-turn datasets for which we are the first to conduct joint slot-filling and intent detection. Our model outperforms state-of-the-art joint models in slot filling and intent detection on multi-turn data sets. We provide an analysis of explicit interaction locations between the layers. We conclude that including domain information improves model performance.	翻訳日:2023-05-30 16:56:19 公開日:2023-05-28
# 会話における直観推論のための構造因果モデル学習 Learning a Structural Causal Model for Intuition Reasoning in Conversation ( http://arxiv.org/abs/2305.17727v1 ) ライセンス: Link先を確認	Hang Chen, Bingyu Liao, Jing Luo, Wenjing Zhu, Xinyu Yang	(参考訳) NLP研究の重要な側面である推論は、大規模言語モデルを含む一般的なモデルでは適切に対処されていない。会話推論は、その重要な要素として、よく設計された認知モデルがないため、ほとんど未解明のままである。本稿では,会話認知に関する直観理論に触発された会話認知モデル(ccm)を開発し,各発話が情報チャネルをどのように受信し,再帰的に活性化するかを説明する。さらに, 代数的にCCMを構造因果モデル (Strucical causal model, SCM) に変換し, 様々な因果発見法と互換性を持たせた。さらに、発話レベルの関係推論のためのSCMの確率的実装を提案する。変分推論を利用することで、暗黙的原因の代用品を探索し、観測不能の問題に対処し、エビデンスの下限を通じて発話の因果表現を再構築する。さらに,すべての利用可能なデータセットが暗黙的原因非依存である現状を緩和し,暗黙的原因と完全原因ラベルを組み込んだ合成およびシミュレーションデータセットを構築した。広範な実験により,提案手法は,合成,シミュレーション,実世界のデータセットにおいて,既存の手法を大幅に上回ることを示した。最後に,潜在共同設立者の下でのccmの性能を分析し,現在解決されていない問題に対処するための理論的アイデアを提案する。 Reasoning, a crucial aspect of NLP research, has not been adequately addressed by prevailing models including Large Language Model. Conversation reasoning, as a critical component of it, remains largely unexplored due to the absence of a well-designed cognitive model. In this paper, inspired by intuition theory on conversation cognition, we develop a conversation cognitive model (CCM) that explains how each utterance receives and activates channels of information recursively. Besides, we algebraically transformed CCM into a structural causal model (SCM) under some mild assumptions, rendering it compatible with various causal discovery methods. We further propose a probabilistic implementation of the SCM for utterance-level relation reasoning. By leveraging variational inference, it explores substitutes for implicit causes, addresses the issue of their unobservability, and reconstructs the causal representations of utterances through the evidence lower bounds. Moreover, we constructed synthetic and simulated datasets incorporating implicit causes and complete cause labels, alleviating the current situation where all available datasets are implicit-causes-agnostic. Extensive experiments demonstrate that our proposed method significantly outperforms existing methods on synthetic, simulated, and real-world datasets. Finally, we analyze the performance of CCM under latent confounders and propose theoretical ideas for addressing this currently unresolved issue.	翻訳日:2023-05-30 16:56:07 公開日:2023-05-28
# convgenvismo:対話型生成視覚モデルの評価 ConvGenVisMo: Evaluation of Conversational Generative Vision Models ( http://arxiv.org/abs/2305.17784v1 ) ライセンス: Link先を確認	Narjes Nikzad Khasmakhi, Meysam Asgari-Chenaghlu, Nabiha Asghar, Philipp Schaer, Dietlind Z\"uhlke	(参考訳) Visual ChatGPT (Wu et al., 2023)のような会話生成視覚モデル(CGVM)は、コンピュータビジョンと自然言語処理技術の合成から最近登場した。これらのモデルは、ユーザからの言語入力を理解し、視覚的な出力とともに自然言語で応答を生成するため、人間と機械間のより自然な対話的なコミュニケーションを可能にする。これらのモデルの利用と展開に関するインフォームドな意思決定を行うには、現実的なデータセット上での適切な評価フレームワークを通じて、それらのパフォーマンスを分析することが重要である。本稿では,CGVMの評価を行う新しいタスクのためのフレームワークであるConvGenVisMoを提案する。 ConvGenVisMoは、このタスクのための新しいベンチマーク評価データセットを導入し、アウトプットを評価するために、既存のおよび新しい自動評価メトリクスのスイートを提供する。データセットと評価コードを含むすべてのconvgenvismoアセットは、githubで公開される予定だ。 Conversational generative vision models (CGVMs) like Visual ChatGPT (Wu et al., 2023) have recently emerged from the synthesis of computer vision and natural language processing techniques. These models enable more natural and interactive communication between humans and machines, because they can understand verbal inputs from users and generate responses in natural language along with visual outputs. To make informed decisions about the usage and deployment of these models, it is important to analyze their performance through a suitable evaluation framework on realistic datasets. In this paper, we present ConvGenVisMo, a framework for the novel task of evaluating CGVMs. ConvGenVisMo introduces a new benchmark evaluation dataset for this task, and also provides a suite of existing and new automated evaluation metrics to evaluate the outputs. All ConvGenVisMo assets, including the dataset and the evaluation code, will be made available publicly on GitHub.	翻訳日:2023-05-30 16:48:01 公開日:2023-05-28
# ロボット探索誘導のための視力予測 Visual Affordance Prediction for Guiding Robot Exploration ( http://arxiv.org/abs/2305.17783v1 ) ライセンス: Link先を確認	Homanga Bharadhwaj, Abhinav Gupta, Shubham Tulsiani	(参考訳) 人間の相互作用の空間に関する直感的な理解と、その理解を以前目にしたことのない場面に一般化できる容易さに動機づけられ、ロボットの探索を誘導するための視覚能力を学ぶためのアプローチを開発した。シーンの入力画像が与えられた場合、我々はそれと相互作用することで実現可能な、可算な将来の状態の分布を推測する。我々はTransformerベースのモデルを用いて,VQ-VAEの潜伏埋め込み空間における条件分布を学習し,これらのモデルが大規模かつ多種多様な受動的データを用いて訓練可能であることを示す。ロボット操作における視覚目標条件ポリシー学習中に,目標サンプル分布として振る舞うことによって探索を誘導するために,訓練されたアプライアンスモデルをどのように利用できるかを示す。 Motivated by the intuitive understanding humans have about the space of possible interactions, and the ease with which they can generalize this understanding to previously unseen scenes, we develop an approach for learning visual affordances for guiding robot exploration. Given an input image of a scene, we infer a distribution over plausible future states that can be achieved via interactions with it. We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE and show that these models can be trained using large-scale and diverse passive data, and that the learned models exhibit compositional generalization to diverse objects beyond the training distribution. We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.	翻訳日:2023-05-30 16:47:48 公開日:2023-05-28
# RASR2:RWTH ASR Toolkit for Generic Sequence-to-Sequence Speech Recognition RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition ( http://arxiv.org/abs/2305.17782v1 ) ライセンス: Link先を確認	Wei Zhou, Eugen Beck, Simon Berger, Ralf Schl\"uter, Hermann Ney	(参考訳) 現代のパブリックASRツールは、様々なシーケンス・ツー・シーケンス(S2S)モデルをトレーニングするためのリッチなサポートを提供するが、むしろオープン語彙シナリオのみをデコードするための単純なサポートを提供する。クローズドボキャブラリのシナリオでは、語彙制約付きデコードをサポートする公開ツールは、通常、古典的なASRのみに限られる。モデリングユニットの選択などの研究の可能性に関するこの制限を排除するため、本研究では、c++で実装された研究指向ジェネリックs2sデコーダであるrasr2を紹介する。さまざまなS2Sモデル、言語モデル、ラベル単位/トポロジ、ニューラルネットワークアーキテクチャに対して、強力な柔軟性/互換性を提供する。オープンおよびクローズドボキャブラリーの両方のシナリオに対して,検索モードと設定が豊富な汎用検索フレームワークに基づく,効率的なデコーディングを提供する。 RASR2をスイッチボードとLibrispeech corporaの両方で幅広い実験により評価した。ソースコードはオンラインで公開されている。 Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only. For closed-vocabulary scenarios, public tools supporting lexical-constrained decoding are usually only for classical ASR, or do not support all S2S models. To eliminate this restriction on research possibilities such as modeling unit choice, we present RASR2 in this work, a research-oriented generic S2S decoder implemented in C++. It offers a strong flexibility/compatibility for various S2S models, language models, label units/topologies and neural network architectures. It provides efficient decoding for both open- and closed-vocabulary scenarios based on a generalized search framework with rich support for different search modes and settings. We evaluate RASR2 with a wide range of experiments on both switchboard and Librispeech corpora. Our source code is public online.	翻訳日:2023-05-30 16:47:33 公開日:2023-05-28
# 計画的概要再配置のためのEDU抽出液の生成 Generating EDU Extracts for Plan-Guided Summary Re-Ranking ( http://arxiv.org/abs/2305.17779v1 ) ライセンス: Link先を確認	Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Kathleen McKeown, No\'emie Elhadad	(参考訳) 要約候補を生成して1つの要約を返す2段階のアプローチでは、標準的な単一ステップアプローチよりもROUGEスコアを改善することができる。しかし、標準的な復号法(ビーム探索、核サンプリング、多種多様なビーム探索)は、冗長でしばしば低品質なコンテンツの候補を生成する。本稿では,これらの問題に対処する候補を生成する新しい手法を設計する。それぞれの候補を独自のコンテンツプランで抽象化し、モデルのトップビームを用いて個別の計画誘導抽象を生成する。より具体的には、標準言語モデル(BART LM)が抽出コピー機構を備えた要素談話単位(EDU)コンテンツプランを自動回帰生成する。次に、コンテンツプランジェネレータからの上位kビームを使用して、個別のlmをガイドし、各個別のプランに対して単一の抽象的候補を生成する。提案手法から生成した抽象的候補とベースライン復号法に,既存のリランカ(BRIO)を適用した。 CNN/Dailymail,NYT,Xsumでは,ROUGE-2 F1が0.88,2.01,0.38,それぞれ上昇した。 CNN/DMの人間による評価は、これらの結果を検証する。同様に、CNN/DMの1kサンプルでは、GPT-3 を EDU に追従させると、サンプリングベース法を 1.05 ROUGE-2 F1 点で上回った。計画の生成と実現のためのコードはhttps://github.com/griff4692/edu-sumで公開されている。 Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach. Yet, standard decoding methods (i.e., beam search, nucleus sampling, and diverse beam search) produce candidates with redundant, and often low quality, content. In this paper, we design a novel method to generate candidates for re-ranking that addresses these issues. We ground each candidate abstract on its own unique content plan and generate distinct plan-guided abstracts using a model's top beam. More concretely, a standard language model (a BART LM) auto-regressively generates elemental discourse unit (EDU) content plans with an extractive copy mechanism. The top K beams from the content plan generator are then used to guide a separate LM, which produces a single abstractive candidate for each distinct plan. We apply an existing re-ranker (BRIO) to abstractive candidates generated from our method, as well as baseline decoding methods. We show large relevance improvements over previously published methods on widely used single document news article corpora, with ROUGE-2 F1 gains of 0.88, 2.01, and 0.38 on CNN / Dailymail, NYT, and Xsum, respectively. A human evaluation on CNN / DM validates these results. Similarly, on 1k samples from CNN / DM, we show that prompting GPT-3 to follow EDU plans outperforms sampling-based methods by 1.05 ROUGE-2 F1 points. Code to generate and realize plans is available at https://github.com/griff4692/edu-sum.	翻訳日:2023-05-30 16:47:13 公開日:2023-05-28
# ブロックチェーンにおける地理空間分布の解析 Analyzing Geospatial Distribution in Blockchains ( http://arxiv.org/abs/2305.17771v1 ) ライセンス: Link先を確認	Shashank Motepalli and Hans-Arno Jacobsen	(参考訳) ブロックチェーンは分散化されている。我々は、しばしば見過ごされているが定量化可能な次元である、トランザクション処理の地理空間分布を分析する。ブロックチェーンは、地理的に分散したトランザクション処理の可能性をもたらす。彼らは地理的に離れた場所のバリケータがコンセンサスプロトコルに参加することを可能にする。我々の観測に基づいて、実際には、ほとんどのバリデータはしばしば地理的に近接して集中している。さらに,少数の検証者が性能要件を満たさない傾向にあり,しばしばクラッシュ障害と誤認される。その結果、投獄(バリデータセットからの削除)および/またはスラッシュ(ネイティブトークンの罰金)によって罰せられる。我々のエミュレーションは, 制御条件下でも同様の結果を示し, バリスタの地理空間集中化の可能性について深刻な懸念を提起する。そこで我々は,コンセンサスプロトコルと容易に統合可能なソリューションを開発し,その有効性を実証した。 Blockchains are decentralized; are they genuinely? We analyze blockchain decentralization's often-overlooked but quantifiable dimension: geospatial distribution of transaction processing. Blockchains bring with them the potential for geospatially distributed transaction processing. They enable validators from geospatially distant locations to partake in consensus protocols; we refer to them as minority validators. Based on our observations, in practice, most validators are often geographically concentrated in close proximity. Furthermore, we observed that minority validators tend not to meet the performance requirements, often misidentified as crash failures. Consequently, they are subject to punishment by jailing (removal from the validator set) and/or slashing (penalty in native tokens). Our emulations, under controlled conditions, demonstrate the same results, raising serious concerns about the potential for the geospatial centralization of validators. To address this, we developed a solution that easily integrates with consensus protocols, and we demonstrated its effectiveness.	翻訳日:2023-05-30 16:46:45 公開日:2023-05-28
# ポイントPC:因果推論による事前知識によるポイントクラウド補完 Point-PC: Point Cloud Completion Guided by Prior Knowledge via Causal Inference ( http://arxiv.org/abs/2305.17770v1 ) ライセンス: Link先を確認	Weizhi Nie, Chuanqi Jiao, Ruidong Chen, Weijie Wang, Bruno Lepri, Nicu Sebe and Anan Liu	(参考訳) ポイント・クラウド・コンプリート(point cloud completion)は、閉塞と視野角の制限による部分的観察からスキャナーが捉えた生のポイント・クラウドを回復することを目的としている。多くのアプローチでは、部分的な入力から学習した大域的特徴によって、欠落部分を直接予測する部分完全パラダイムを採用している。これにより、グローバル機能が欠落している部分の完全な詳細を捉えられないため、詳細を復元することが難しくなる。本稿では,記憶ネットワークを用いて形状先行を検索し,欠落した形状情報を追加の幾何情報として選択する効果的な因果推論モデルを設計し,ポイントクラウド補完を支援するpoint-pcという新しい手法を提案する。具体的には,完全な形状特徴と対応する形状を ``key-value''' ペアの形式で格納するメモリ操作機構を提案する。部分入力から類似した形状を取り出すために,不完全形状の特徴を完全形状特徴の領域に伝達するために,コントラスト学習に基づく事前学習手法を適用する。さらに,部分的な入力と同じ意味構造を持つ,前もって形状の一部であった共同創設者を排除するためにバックドア調整を用いる。 ShapeNet-55、PCN、KITTIデータセットの実験結果から、Point-PCは最先端の手法に対して良好に動作することが示された。 Point cloud completion aims to recover raw point clouds captured by scanners from partial observations caused by occlusion and limited view angles. Many approaches utilize a partial-complete paradigm in which missing parts are directly predicted by a global feature learned from partial inputs. This makes it hard to recover details because the global feature is unlikely to capture the full details of all missing parts. In this paper, we propose a novel approach to point cloud completion called Point-PC, which uses a memory network to retrieve shape priors and designs an effective causal inference model to choose missing shape information as additional geometric information to aid point cloud completion. Specifically, we propose a memory operating mechanism where the complete shape features and the corresponding shapes are stored in the form of ``key-value'' pairs. To retrieve similar shapes from the partial input, we also apply a contrastive learning-based pre-training scheme to transfer features of incomplete shapes into the domain of complete shape features. Moreover, we use backdoor adjustment to get rid of the confounder, which is a part of the shape prior that has the same semantic structure as the partial input. Experimental results on the ShapeNet-55, PCN, and KITTI datasets demonstrate that Point-PC performs favorably against the state-of-the-art methods.	翻訳日:2023-05-30 16:46:30 公開日:2023-05-28
# AIMS:オールインクルーシブなマルチレベルセグメンテーション AIMS: All-Inclusive Multi-Level Segmentation ( http://arxiv.org/abs/2305.17768v1 ) ライセンス: Link先を確認	Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang	(参考訳) 正確な視覚エンティティセグメンテーションのための画像セグメンテーションの進展にもかかわらず、異なるレベルの興味のある領域選択のための画像編集アプリケーションの多様な要件は未解決のままである。本稿では,視覚領域をパート,エンティティ,リレーション(意味的関係を持つ2つのエンティティ)の3つのレベルに分割する新しいタスク,All-Inclusive Multi-Level Segmentation (AIMS)を提案する。また,マルチデータセットによるマルチタスクトレーニングによる統一aimモデルを構築し,アノテーションの不整合とタスク相関の2つの大きな課題に対処した。具体的には,3段階予測のためのタスク補完性,アソシエーション,プロンプトマスクエンコーダを提案する。本手法の有効性と一般化能力は, 単一データセット上の他の最先端手法や, セグメンテーションにおける並列処理と比較して実証された。コードとトレーニングモデルを一般公開する予定です。 Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved. In this paper, we propose a new task, All-Inclusive Multi-Level Segmentation (AIMS), which segments visual regions into three levels: part, entity, and relation (two entities with some semantic relationships). We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation. Specifically, we propose task complementarity, association, and prompt mask encoder for three-level predictions. Extensive experiments demonstrate the effectiveness and generalization capacity of our method compared to other state-of-the-art methods on a single dataset or the concurrent work on segmenting anything. We will make our code and training model publicly available.	翻訳日:2023-05-30 16:46:06 公開日:2023-05-28
# NeurOCS: モノクロ3次元物体定位のためのニューラルNOCSスーパービジョン NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization ( http://arxiv.org/abs/2305.17763v1 ) ライセンス: Link先を確認	Zhixiang Min, Bingbing Zhuang, Samuel Schulter, Buyu Liu, Enrique Dunn, Manmohan Chandraker	(参考訳) 運転シーンにおけるモノキュラーな3dオブジェクトのローカライゼーションは重要な課題だが、その不適切な性質のために難しい。物体表面上の各画素の3d座標の推定は、pnp問題に対して密度の高い2d-3d幾何制約を提供するため、大きなポテンシャルを持つ。しかし,リダデータの多彩さや多種多様なアーティファクトによる運転シーンや,インスタンスCADモデル収集の実用性などにより,高品質な地上真実監視は利用できない。本研究では,3次元物体の形状を識別可能なレンダリングにより学習するための入力として,インスタンスマスクと3次元ボックスを用いたNeurOCSを提案する。私たちのアプローチは、実際の運転シーンから直接、カテゴリレベルの形状を学習する上での洞察にかかっています。さらに,オブジェクト中心の視点からオブジェクト座標をより効果的に学習するための重要な設計選択について検討する。また,本フレームワークは,KITTI-Objectベンチマークで1位にランクインしたモノキュラー3Dローカライゼーションの新たな最先端化につながる。 Monocular 3D object localization in driving scenes is a crucial task, but challenging due to its ill-posed nature. Estimating 3D coordinates for each pixel on the object surface holds great potential as it provides dense 2D-3D geometric constraints for the underlying PnP problem. However, high-quality ground truth supervision is not available in driving scenes due to sparsity and various artifacts of Lidar data, as well as the practical infeasibility of collecting per-instance CAD models. In this work, we present NeurOCS, a framework that uses instance masks and 3D boxes as input to learn 3D object shapes by means of differentiable rendering, which further serves as supervision for learning dense object coordinates. Our approach rests on insights in learning a category-level shape prior directly from real driving scenes, while properly handling single-view ambiguities. Furthermore, we study and make critical design choices to learn object coordinates more effectively from an object-centric view. Altogether, our framework leads to new state-of-the-art in monocular 3D localization that ranks 1st on the KITTI-Object benchmark among published monocular methods.	翻訳日:2023-05-30 16:45:51 公開日:2023-05-28
# 機密コンピューティングに向けて - ビッグデータ分析とaiのためのセキュアなクラウドアーキテクチャ Towards Confidential Computing: A Secure Cloud Architecture for Big Data Analytics and AI ( http://arxiv.org/abs/2305.17761v1 ) ライセンス: Link先を確認	Naweiluo Zhou, Florent Dufour, Vinzent Bode, Peter Zinterhof, Nicolay J Hammer, Dieter Kranzlm\"uller	(参考訳) クラウドコンピューティングは、需要に基づいてコンピュータリソースをコスト効率よく供給する。そのため、さまざまな領域科学で広く採用されているビッグデータ分析や人工知能の有効なソリューションとなっている。バイオメディカルリサーチのような特定の分野におけるデータセキュリティは、ワークフローをクラウドに移行する際の大きな関心事である。セキュアなクラウドアーキテクチャを提示し、データ、ロジック、計算をトランジット、使用時、および停止時に安全に保ちながら、ワークフローのパッケージングとスケジューリングを可能にする方法について説明する。 Cloud computing provisions computer resources at a cost-effective way based on demand. Therefore it has become a viable solution for big data analytics and artificial intelligence which have been widely adopted in various domain science. Data security in certain fields such as biomedical research remains a major concern when moving their workflows to cloud, because cloud environments are generally outsourced which are more exposed to risks. We present a secure cloud architecture and describes how it enables workflow packaging and scheduling while keeping its data, logic and computation secure in transit, in use and at rest.	翻訳日:2023-05-30 16:45:28 公開日:2023-05-28
# 言語モデルは実用的な話者です Language Models are Pragmatic Speakers ( http://arxiv.org/abs/2305.17760v1 ) ライセンス: Link先を確認	Khanh Nguyen	(参考訳) 言語モデルはどのように考えるのか? 本稿では,言語モデルの多変量操作を特徴付ける有界プラグマティック話者と呼ばれる確率論的認知モデルを定式化する。特に,人間のフィードバックから強化学習を施した大規模言語モデル(Ouyang et al., 2022)では,概念的には高速・低速モデルに類似した思考モデルが実装されている(Kahneman, 2011)。本稿では,人間フィードバックからの強化学習の限界を思考の素早いモデルとして議論し,この枠組みを拡張するための方向性を提案する。全体として、我々の研究は、認知確率論的モデリングのレンズを通して言語モデルを見ることが、言語モデルを理解し、評価し、開発するための貴重な洞察を提供することを示した。 How do language models "think"? This paper formulates a probabilistic cognitive model called bounded pragmatic speaker, which can characterize the operation of different variants of language models. In particular, we show that large language models fine-tuned with reinforcement learning from human feedback (Ouyang et al., 2022) implements a model of thought that conceptually resembles a fast-and-slow model (Kahneman, 2011). We discuss the limitations of reinforcement learning from human feedback as a fast-and-slow model of thought and propose directions for extending this framework. Overall, our work demonstrates that viewing language models through the lens of cognitive probabilistic modeling can offer valuable insights for understanding, evaluating, and developing them.	翻訳日:2023-05-30 16:45:18 公開日:2023-05-28
# tab-cot: 思考のゼロショットタブチェーン Tab-CoT: Zero-shot Tabular Chain of Thought ( http://arxiv.org/abs/2305.17812v1 ) ライセンス: Link先を確認	Ziqi Jin and Wei Lu	(参考訳) 様々な自然言語処理(NLP)タスクにおいて、基礎となる複雑な推論プロセスを公開する能力により、チェーン・オブ・ファインメント(CoT)のプロンプト手法が成功した。このような推論プロセスは通常、暗黙的に構造化されたステップを示す。近年、より明確に構造化された推論手順を取り込むように促す方法を調査し始めた。本研究では,複雑な推論処理を高度に構造化された方法で明示的にモデル化できる新しい表形式CoTプロンプトであるTab-CoTを提案する。その単純さにもかかわらず、我々のアプローチは複数の次元(行と列の両方)にわたる推論を行うことができることを示す。我々は、様々な推論タスクに関する広範な実験を通じて、アプローチの強いゼロショットと少数ショットの能力を実証する。 The chain-of-though (CoT) prompting methods were successful in various natural language processing (NLP) tasks thanks to their ability to unveil the underlying complex reasoning processes. Such reasoning processes typically exhibit implicitly structured steps. Recent efforts also started investigating methods to encourage more explicitly structured reasoning procedures to be captured. In this work, we propose Tab-CoT, a novel tabular-format CoT prompting method, which allows the complex reasoning process to be explicitly modelled in a highly structured manner. Despite its simplicity, we show that our approach is capable of performing reasoning across multiple dimensions (i.e., both rows and columns). We demonstrate our approach's strong zero-shot and few-shot capabilities through extensive experiments on a range of reasoning tasks.	翻訳日:2023-05-30 16:38:48 公開日:2023-05-28
# 絡み合い測度について:離散位相空間とインバータ連鎖的視点 On Entanglement Measures: Discrete Phase Space and Inverter-Chain Link Viewpoint ( http://arxiv.org/abs/2305.17806v1 ) ライセンス: Link先を確認	Felix A. Buot	(参考訳) 文献における抽象統計解析とは対照的に, エンタングルメント解析と測定の具体的物理ダイアグラムモデルとその基礎となる離散位相空間物理学について述べる。本論文は, この複雑な絡み合い対策の教育的治療として機能する。我々は,エンタングル量子ビットの固有帰納特性と,その創発的量子ビット挙動について考察する。離散位相空間の観点から、共役は、絡み合いの定量的測度において、絡み合う二分系の変換対称性に翻訳される。焦点はバイパーティイトシステムであるが、物理インバータチェーンリンクモデルから容易に導出できるように、この概念はキュービットのマルチパートシステムに容易に拡張可能である。任意の多成分量子ビット系における形成の絡み合いの図式解析 In contrast to abstract statistical analyses in the literature, we present a concrete physical diagrammatic model of entanglement characterization and measure with its underlying discrete phase-space physics. This paper serves as a pedagogical treatment of this complex subject of entanglement measures. We review the important inherent concurrence property of entangled qubits, as well as underscore its emergent qubit behavior. From the discrete phase space point of view, concurrence translates to translation symmetry of entangled binary systems in some quantitative measure of entanglement. Although the focus is on bipartite system, the notion is readily extendable to multi-partite system of qubits, as can easily be deduced from the physical inverter-chain link model. A diagrammatic analysis of the entanglement of formation for any multi-partite qubit system is given	翻訳日:2023-05-30 16:38:36 公開日:2023-05-28
# シングルプレイヤー不完全リコールゲームの計算複雑性 The Computational Complexity of Single-Player Imperfect-Recall Games ( http://arxiv.org/abs/2305.17805v1 ) ライセンス: Link先を確認	Emanuel Tewolde, Caspar Oesterheld, Vincent Conitzer, Paul W. Goldberg	(参考訳) 睡眠美観問題や欠席ドライバゲームなど,不完全なリコールを伴うシングルプレイヤーの広範型ゲームについて検討した。そのようなゲームに対して、2つの自然な平衡概念が、最適解の代替概念として提案されている。 1つの平衡概念は、一般化二重半減法(gdh)を信念体系と証拠決定理論(edt)とし、もう1つは一般化三分法(gt)を信念体系と因果決定理論(cdt)として用いる。本研究は,多項式最大化問題の解概念である大域最適点,変数の部分集合に対する最適点,KKT(Karush-Kuhn-Tucker)点の3つの解概念について考察した。これらの対応に基づいて,これらの戦略の計算に関する様々な複雑性理論的な疑問を解決できる。元アンティー最適性と(EDT,GDH)-平衡についてはNP硬度と不適応性を求め,(CDT,GT)-平衡についてはCLS完全性を求める。 We study single-player extensive-form games with imperfect recall, such as the Sleeping Beauty problem or the Absentminded Driver game. For such games, two natural equilibrium concepts have been proposed as alternative solution concepts to ex-ante optimality. One equilibrium concept uses generalized double halving (GDH) as a belief system and evidential decision theory (EDT), and another one uses generalized thirding (GT) as a belief system and causal decision theory (CDT). Our findings relate those three solution concepts of a game to solution concepts of a polynomial maximization problem: global optima, optimal points with respect to subsets of variables and Karush-Kuhn-Tucker (KKT) points. Based on these correspondences, we are able to settle various complexity-theoretic questions on the computation of such strategies. For ex-ante optimality and (EDT,GDH)-equilibria, we obtain NP-hardness and inapproximability, and for (CDT,GT)-equilibria we obtain CLS-completeness results.	翻訳日:2023-05-30 16:38:25 公開日:2023-05-28
# ターゲットデータ生成:モデルの弱点の発見と修正 Targeted Data Generation: Finding and Fixing Model Weaknesses ( http://arxiv.org/abs/2305.17804v1 ) ライセンス: Link先を確認	Zexue He, Marco Tulio Ribeiro, Fereshte Khani	(参考訳) 集約精度が高い場合でも、最先端のNLPモデルはデータの特定のサブグループで体系的に失敗し、不公平な結果とユーザ信頼を損なう。新たなデータ収集は、これらの弱点に対処する助けにならない可能性がある。本稿では,課題のあるサブグループを自動的に識別し,ループ内に人間が参加する大規模言語モデル(llms)を用いて,それらのサブグループに対して新たなデータを生成するフレームワークであるtarget data generation(tdg)を提案する。 TDGは、各サブグループに対するデータ拡張の期待される利益と潜在的な害を推定し、全体的なパフォーマンスを損なうことなく、グループパフォーマンス内で最も改善する可能性のあるものを選択する。実験では、TDGは、最先端の感情分析と自然言語推論モデルのための挑戦的なサブグループの精度を大幅に向上するとともに、全体のテスト精度も向上する。 Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data, resulting in unfair outcomes and eroding user trust. Additional data collection may not help in addressing these weaknesses, as such challenging subgroups may be unknown to users, and underrepresented in the existing and new data. We propose Targeted Data Generation (TDG), a framework that automatically identifies challenging subgroups, and generates new data for those subgroups using large language models (LLMs) with a human in the loop. TDG estimates the expected benefit and potential harm of data augmentation for each subgroup, and selects the ones most likely to improve within group performance without hurting overall performance. In our experiments, TDG significantly improves the accuracy on challenging subgroups for state-of-the-art sentiment analysis and natural language inference models, while also improving overall test accuracy.	翻訳日:2023-05-30 16:38:02 公開日:2023-05-28
# 有限時間および弱過程の断熱性への普遍的ショートカット Universal shortcuts to adiabaticity of finite-time and weak processes ( http://arxiv.org/abs/2305.17802v1 ) ライセンス: Link先を確認	Pierre Naz\'e	(参考訳) 有限時間および弱過程の切替時間に対する断熱性への近道に関する解析式を提示する。弱いプロセスの最適プロトコルの普遍解に基づいており、そこでは待ち時間の概念を用いて断熱的プロセスの拡張が行われた。そのようなショートカットを見つけるために、振動緩和関数の典型例と横場量子イジング鎖の2つの例が解かれる。最後に、量子アニーリングにおけるこれらのショートカットの適用可能性の限界に関する議論が行われる。 The analytical expression for shortcuts to adiabaticity for any switching time of finite-time and weak processes is presented. It is based on the universal solution of the optimal protocols of weak processes, where the extension to adiabatic processes was made by means of the concept of waiting time. Two examples are solved in order to find such shortcuts: the typical case of oscillatory relaxation function and the transverse-field quantum Ising chain. In the end, a discussion about the limitations of the applicability of these shortcuts in quantum annealing is made.	翻訳日:2023-05-30 16:37:45 公開日:2023-05-28
# T2FNorm:OOD検出のための超簡易な列車時特徴正規化 T2FNorm: Extremely Simple Scaled Train-time Feature Normalization for OOD Detection ( http://arxiv.org/abs/2305.17797v1 ) ライセンス: Link先を確認	Sudarshan Regmi, Bibek Panthi, Sakar Dotel, Prashnna K. Gyawali, Danail Stoynov, Binod Bhattarai	(参考訳) ニューラルネットワークは、自信過剰な予測者として有名であり、現実世界のアプリケーションにおける安全なデプロイメントにとって大きな課題となっている。機能正規化は深層学習の文献で注目されているが、現在の列車時間正規化手法であるOut-of-Distribution(OOD)検出は、この可能性を十分に活用していない。実際、ニューラルネットワークにおける特徴正規化のナイーブな組み込みは、ood検出性能の改善を保証しない。本研究では,OODスコーリングの目的に非変換空間を用いながら,特徴を正規化を通じて超球面空間に変換するニューラルネットワークのトレーニング手法であるT2FNormを紹介する。 In-distribution(ID)におけるモデル精度を損なうことなく,OOD検出能力を驚くほど向上させる。本研究は,提案手法がすべてのサンプルの特徴の規範を実質的に減少させることを実証するものである。提案手法は, ポストホックOOD検出法を大幅に改善する。 Neural networks are notorious for being overconfident predictors, posing a significant challenge to their safe deployment in real-world applications. While feature normalization has garnered considerable attention within the deep learning literature, current train-time regularization methods for Out-of-Distribution(OOD) detection are yet to fully exploit this potential. Indeed, the naive incorporation of feature normalization within neural networks does not guarantee an improvement in OOD detection performance. In this work, we introduce T2FNorm, a novel approach to training neural networks that transforms features to hyperspherical space through normalization, while employing non-transformed space for OOD-scoring purposes. This method yields a surprising enhancement in OOD detection capabilities without compromising model accuracy in in-distribution(ID). Our investigation demonstrates that the proposed technique substantially diminishes the norm of the features of all samples, more so in the case of out-of-distribution samples, thereby addressing the prevalent concern of overconfidence in neural networks. The proposed method also significantly improves various post-hoc OOD detection methods.	翻訳日:2023-05-30 16:37:37 公開日:2023-05-28
# コーン・シャム計算と密度汎関数論の双変量観 Kohn-Sham computation and the bivariate view of density functional theory ( http://arxiv.org/abs/2305.17795v1 ) ライセンス: Link先を確認	Paul E. Lammert	(参考訳) KSマシンと呼ばれるコーン・シャム計算の抽象化により、密度汎関数論の数学的側面に基づいて関数解析的視点が発達する。この機械の自然な意味論は二変量であり、基底密度と対になるポテンシャルの列からなる。 ksマシンがいつ解(ポテンシャル成分が指定された目標に一致する)に収束できるかという問題はここでは解決されないが、関連するものがいくつかある。例えば、マシンはソリューションに向かって前進できるのか? エネルギー的な意味では、おそらく例外的な状況を避けるが、通常の密度混合ではなくポテンシャル混合方式を用いる。近接解のエネルギー的および関数的空間距離の概念は相容れないか? はい、かなりの程度です。もし一連の接地対のポテンシャル成分が目標密度に収束した場合、その密度成分は接地密度に集合するだろうか? はい、無限に漂う粒子番号をバリングします。 Informed by an abstraction of Kohn-Sham computation called a KS machine, a functional analytic perspective is developed on mathematical aspects of density functional theory. A natural semantics for the machine is bivariate, consisting of a sequence of potentials paired with a ground density. Although the question of when the KS machine can converge to a solution (where the potential component matches a designated target) is not resolved here, a number of related ones are. For instance: Can the machine progress toward a solution? Barring presumably exceptional circumstances, yes in an energetic sense, but using a potential-mixing scheme rather than the usual density-mixing variety. Are energetic and function space distance notions of proximity-to-solution commensurate? Yes, to a significant degree. If the potential components of a sequence of ground pairs converges to a target density, do the density components cluster on ground densities thereof? Yes, barring particle number drifting to infinity.	翻訳日:2023-05-30 16:37:18 公開日:2023-05-28
# LowDINO - 低パラメータ自己監督型学習モデル LowDINO -- A Low Parameter Self Supervised Learning Model ( http://arxiv.org/abs/2305.17791v1 ) ライセンス: Link先を確認	Sai Krishna Prathapaneni, Shvejan Shashank and Srikar Reddy K	(参考訳) 本研究は,画像分類やセグメンテーションなどの下流タスクにおいて,ssl(self-supervised learning)が成功していることを示す巨大ネットワークの特性を,小ネットワークが適用可能なニューラルネットワークアーキテクチャを設計する可能性を検討することを目的とする。従来の研究では、畳み込みニューラルネットワーク(ConvNets)を使用することで、深層学習モデルにおける表現の学習に欠かせない、固有の帰納バイアスが得られることが示されている。パラメータ数を減らすために、mobilevitブロックの使用によって注意メカニズムが利用され、結果として500万未満のパラメータを持つモデルが生成される。このモデルは、運動量エンコーダを用いた自己蒸留を用いて訓練され、教師の重み付けでは、最近のSOTA SSLモデルから視覚変換器(ViT)を使用する。モデルはImageNet1kデータセットでトレーニングされる。この研究は、重モデルに匹敵するSSLタスクを実行できる、より小さく、より効率的なニューラルネットワークアーキテクチャを設計するためのアプローチを提供する。 This research aims to explore the possibility of designing a neural network architecture that allows for small networks to adopt the properties of huge networks, which have shown success in self-supervised learning (SSL), for all the downstream tasks like image classification, segmentation, etc. Previous studies have shown that using convolutional neural networks (ConvNets) can provide inherent inductive bias, which is crucial for learning representations in deep learning models. To reduce the number of parameters, attention mechanisms are utilized through the usage of MobileViT blocks, resulting in a model with less than 5 million parameters. The model is trained using self-distillation with momentum encoder and a student-teacher architecture is also employed, where the teacher weights use vision transformers (ViTs) from recent SOTA SSL models. The model is trained on the ImageNet1k dataset. This research provides an approach for designing smaller, more efficient neural network architectures that can perform SSL tasks comparable to heavy models	翻訳日:2023-05-30 16:37:02 公開日:2023-05-28
# リアルタイムオブジェクト検出:PyTorchにおけるYOLOv1再実装 Real-time Object Detection: YOLOv1 Re-Implementation in PyTorch ( http://arxiv.org/abs/2305.17786v1 ) ライセンス: Link先を確認	Michael Shenoda	(参考訳) リアルタイムオブジェクト検出は、検出に基づく適切な判断をタイムリーに行う必要があるコンピュータビジョンシステムにおいて、解決すべき重要な問題である。私は、PyTorchフレームワークを使って実装するためにYOLO v1アーキテクチャを選択しました。最後に、私の実装のメトリクスとオリジナルのメトリクスを比較します。 Real-time object detection is a crucial problem to solve when in comes to computer vision systems that needs to make appropriate decision based on detection in a timely manner. I have chosen the YOLO v1 architecture to implement it using PyTorch framework, with goal to familiarize with entire object detection pipeline I attempted different techniques to modify the original architecture to improve the results. Finally, I compare the metrics of my implementation to the original.	翻訳日:2023-05-30 16:36:44 公開日:2023-05-28
# YOLOv5に基づく照明・回転不変リアルタイム車いす検出装置 Lighting and Rotation Invariant Real-time Vehicle Wheel Detector based on YOLOv5 ( http://arxiv.org/abs/2305.17785v1 ) ライセンス: Link先を確認	Michael Shenoda	(参考訳) コンピュータビジョンにおけるオブジェクト検出器の作成は、最初は畳み込みニューラルネットワーク(CNN)アーキテクチャに基づいて開発されたとき、いくつかの共通の課題がある。これらの課題は、様々なカメラの向き、照明条件、環境変化によって捉えられた画像に適応する必要があるモデルを作成するときにより明らかである。これらの条件をすべてカバーする最初のトレーニングサンプルが利用可能であることは、時間とコストのかかる大きな課題である。あらゆるタイプのオブジェクト検出を作成する場合、問題は存在するが、いくつかの型は一般的ではなく、公開されているラベル付きイメージデータセットを持たない。公開データセットは、まれなオブジェクトタイプに対して信頼性や包括性がない場合もあります。車いすは、YOLOv5アーキテクチャに基づいた光と回転不変のリアルタイム検出器のアプローチを示すために選ばれた例の1つである。目的は、他のタイプのリアルタイムオブジェクト検出器の開発にリファレンスとして使用できるシンプルなアプローチを提供することである。 Creating an object detector, in computer vision, has some common challenges when initially developed based on Convolutional Neural Network (CNN) architecture. These challenges are more apparent when creating model that needs to adapt to images captured by various camera orientations, lighting conditions, and environmental changes. The availability of the initial training samples to cover all these conditions can be an enormous challenge with a time and cost burden. While the problem can exist when creating any type of object detection, some types are less common and have no pre-labeled image datasets that exists publicly. Sometime public datasets are not reliable nor comprehensive for a rare object type. Vehicle wheel is one of those example that been chosen to demonstrate the approach of creating a lighting and rotation invariant real-time detector based on YOLOv5 architecture. The objective is to provide a simple approach that could be used as a reference for developing other types of real-time object detectors.	翻訳日:2023-05-30 16:36:37 公開日:2023-05-28
# 単一物体ポーズ追跡のための反仮説粒子フィルタ Counter-Hypothetical Particle Filters for Single Object Pose Tracking ( http://arxiv.org/abs/2305.17828v1 ) ライセンス: Link先を確認	Elizabeth A. Olson, Jana Pavlasek, Jasmine A. Berry, Odest Chadwicke Jenkins	(参考訳) 粒子フィルタリングは、6自由度(6D)のポーズ推定のための一般的な手法である。しかし, 粒子フィルタは6次元ポーズの高次元特性のため, 粒子の除去が困難である。粒子欠落が発生すると、重要サンプリング中の信念分布のモード崩壊を引き起こす可能性がある。真の状態を取り巻く領域がモード崩壊に苦しむ場合、その領域は粒子によって形成された確率質量では表現されないため、その信念の回復は困難である。以前の方法は、信念分布における粒子のランダム化と再設定によってこの問題を軽減するが、再活性化の頻度の決定は、ハンドチューニングされた抽象ヒューリスティックに依存している。本稿では,各段階における必要な再活性化速度を,標準確率と平行して用いられる対数-補足的確率関数を導入することによって推定する。 Evidential Reasoning の可算性と不確実性の概念にインスパイアされた我々の反補足的可能性関数の追加は、各粒子に疑念のレベルを割り当てる。粒子集合全体の信頼度と疑念の競合累積値は、再活性化する粒子の部分を決定するためにフィルタ内の故障のレベルを推定するために用いられる。剛体物体6次元ポーズトラッキングタスクにおける本手法の有効性を実証する。 Particle filtering is a common technique for six degree of freedom (6D) pose estimation due to its ability to tractably represent belief over object pose. However, the particle filter is prone to particle deprivation due to the high-dimensional nature of 6D pose. When particle deprivation occurs, it can cause mode collapse of the underlying belief distribution during importance sampling. If the region surrounding the true state suffers from mode collapse, recovering its belief is challenging since the area is no longer represented in the probability mass formed by the particles. Previous methods mitigate this problem by randomizing and resetting particles in the belief distribution, but determining the frequency of reinvigoration has relied on hand-tuning abstract heuristics. In this paper, we estimate the necessary reinvigoration rate at each time step by introducing a Counter-Hypothetical likelihood function, which is used alongside the standard likelihood. Inspired by the notions of plausibility and implausibility from Evidential Reasoning, the addition of our Counter-Hypothetical likelihood function assigns a level of doubt to each particle. The competing cumulative values of confidence and doubt across the particle set are used to estimate the level of failure within the filter, in order to determine the portion of particles to be reinvigorated. We demonstrate the effectiveness of our method on the rigid body object 6D pose tracking task.	翻訳日:2023-05-30 16:29:12 公開日:2023-05-28
# NOTABLE: プロンプトベースNLPモデルに対するトランスファー可能なバックドア攻撃 NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models ( http://arxiv.org/abs/2305.17826v1 ) ライセンス: Link先を確認	Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma	(参考訳) プロンプトベースの学習は、バックドア攻撃に弱い。プロンプトベースのモデルに対する既存のバックドア攻撃は、埋め込み層全体や単語埋め込みベクターにバックドアを注入することを検討する。このような攻撃は、下流タスクの再トレーニングや異なるプロンプト戦略によって容易に影響を受け、バックドア攻撃の転送可能性を制限することができる。そこで本研究では,ダウンストリームタスクやプロンプト戦略とは独立したプロンプトベースモデルに対する転送可能なバックドア攻撃を提案する。具体的には、適応型動詞化器を用いて特定の単語(例えばアンカー)にトリガーをバインドすることで、plmのエンコーダにバックドアを注入する。インプットにトリガーを貼り付け、敵に望まれるアンカーに到達し、下流タスクから独立し、戦略を促すことでバックドアを起動する。我々は,6つのNLPタスク,3つの人気モデル,および3つのプロンプト戦略の実験を行った。実験の結果、NOTABLEは優れた攻撃性能(すなわち、すべてのデータセットで90%以上の攻撃成功率)を達成し、2つの最先端ベースラインを上回ります。 3つの防衛策の評価は、NOTABLEの堅牢性を示している。私たちのコードはhttps://github.com/RU-System-Software-and-Security/Notableにある。 Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks. In this work, we propose transferable backdoor attacks against prompt-based models, called NOTABLE, which is independent of downstream tasks and prompting strategies. Specifically, NOTABLE injects backdoors into the encoders of PLMs by utilizing an adaptive verbalizer to bind triggers to specific words (i.e., anchors). It activates the backdoor by pasting input with triggers to reach adversary-desired anchors, achieving independence from downstream tasks and prompting strategies. We conduct experiments on six NLP tasks, three popular models, and three prompting strategies. Empirical results show that NOTABLE achieves superior attack performance (i.e., attack success rate over 90% on all the datasets), and outperforms two state-of-the-art baselines. Evaluations on three defenses show the robustness of NOTABLE. Our code can be found at https://github.com/RU-System-Software-and-Security/Notable.	翻訳日:2023-05-30 16:28:50 公開日:2023-05-28
# 多項ロジスティック回帰:高次元におけるヌル共変量に対する漸近正規性 Multinomial Logistic Regression: Asymptotic Normality on Null Covariates in High-Dimensions ( http://arxiv.org/abs/2305.17825v1 ) ライセンス: Link先を確認	Kai Tan and Pierre C. Bellec	(参考訳) 本稿では,次元と試料サイズが同じ順序の高次元状態における多相ロジスティックモデルにおける最大形推定(MLE)の漸近分布について検討する。古典的大サンプル理論は、ある条件下で mle の漸近正規性を与えるが、そのような古典的結果は、sul と cand\`es [2019] の独創的著作における二元ロジスティックの場合で文書化された高次元で失敗することが期待される。この問題は、3つ以上のクラスを持つ分類問題において、ヌル共変体上の多項ロジスティックMLE(クロスエントロピー最小化器)に対する漸近正規性および漸近カイ二乗結果を開発することで解決される。私たちの理論は、ある特徴の重要性をテストする新しい方法論につながります。合成データに関する広範囲なシミュレーション研究は、これらの漸近的な結果と、ある特徴の意義をテストするために提案されたp値の有効性を検証している。 This paper investigates the asymptotic distribution of the maximum-likelihood estimate (MLE) in multinomial logistic models in the high-dimensional regime where dimension and sample size are of the same order. While classical large-sample theory provides asymptotic normality of the MLE under certain conditions, such classical results are expected to fail in high-dimensions as documented for the binary logistic case in the seminal work of Sur and Cand\`es [2019]. We address this issue in classification problems with 3 or more classes, by developing asymptotic normality and asymptotic chi-square results for the multinomial logistic MLE (also known as cross-entropy minimizer) on null covariates. Our theory leads to a new methodology to test the significance of a given feature. Extensive simulation studies on synthetic data corroborate these asymptotic results and confirm the validity of proposed p-values for testing the significance of a given feature.	翻訳日:2023-05-30 16:28:28 公開日:2023-05-28
# エッジ検出器のROC解析 Analysis of ROC for Edge Detectors ( http://arxiv.org/abs/2305.17820v1 ) ライセンス: Link先を確認	Kai Yi Ji	(参考訳) 本稿では,BIPEDデータセットを用いた受信機動作特性(ROC)解析によるエッジ検出器の評価を行う。本研究は,この手法をMatlabに適用する際のメリットと欠点について検討する。 ROC分析は特定のエッジフィルタに適しているが,Laplacian,Laplacian of Gaussian,Cannyなどのフィルタでは,ROC測定値を用いてその性能を正確に測定する際の課題が提示される。この問題に対処するために,これらのフィルタの性能を向上させるために,より正確な評価を可能にするカスタマイズ技術を導入する。われわれのカスタマイズ努力により、より良い結果が得られ、最終的にエッジ検出器の包括的な評価が促進された。 This paper presents an evaluation of edge detectors using receiver operating characteristic (ROC) analysis on the BIPED dataset. Our study examines the benefits and drawbacks of applying this technique in Matlab. We observed that while ROC analysis is suitable for certain edge filters, but for filters such as Laplacian, Laplacian of Gaussian, and Canny, it presents challenges when accurately measuring their performance using ROC metrics. To address this issue, we introduce customization techniques to enhance the performance of these filters, enabling more accurate evaluation. Through our customization efforts, we achieved improved results, ultimately facilitating a comprehensive assessment of the edge detectors.	翻訳日:2023-05-30 16:28:09 公開日:2023-05-28
# 大規模言語モデル, 科学的知識, 事実性: 抗生物質発見の体系的分析 Large Language Models, scientific knowledge and factuality: A systematic analysis in antibiotic discovery ( http://arxiv.org/abs/2305.17819v1 ) ライセンス: Link先を確認	Magdalena Wysocka, Oskar Wysocki, Maxime Delmas, Vincent Mutel, Andre Freitas	(参考訳) 大規模言語モデル(LLM)から科学文献の大規模なコーパスに訓練された情報を推測して抽出することは、生体医学研究の新しい時代を招き、既存の医学的証拠にアクセスする障壁を減らせる可能性がある。本研究は,抗生物質発見の文脈をモチベーションシナリオとして,生体医学的背景知識を用いた対話におけるllmの可能性を検討する。天然物からの生物医学的発見の文脈は、生物と関連する化学物質とそれに関連する抗生物質の性質の間の関係的な証拠を理解することを伴う。我々は,これらの関係をエンコードし,表現するllmの能力に関する体系的な評価を行い,フラレンシ,即効性,意味的一貫性,事実的知識,生成した応答の特異性を検証する。化学化合物定義生成と化学化合物-真菌関係決定の2つの課題において, 体系的解析を9つの最先端モデル(chatgptとgpt-4を含む)に適用した。その結果,近年のモデルでは流動性が向上しているが,事実的正確性は依然として低く,過度に表現されたエンティティに偏っていることがわかった。 LLMが生物医学的知識基盤として機能する能力は疑問視され、新たな体系的評価フレームワークの必要性が強調される。最高性能のGPT-4は70%の化合物と43.6%のキノコとの事実関係を、最高のオープンソースモデルであるBioGPTは30%の化合物を、最も優れたプロンプトの30%を生産した。その結果, LLMは, 現在, バイオメディカルな事実知識基盤としての利用には適していないものの, モデルがドメインに特化し, サイズ, フィードバックのレベルが上がるにつれて, 現実性に有望な新規性があることが示唆された。 Inferring over and extracting information from Large Language Models (LLMs) trained on a large corpus of scientific literature can potentially drive a new era in biomedical research, reducing the barriers for accessing existing medical evidence. This work examines the potential of LLMs for dialoguing with biomedical background knowledge, using the context of antibiotic discovery as an exemplar motivational scenario. The context of biomedical discovery from natural products entails understanding the relational evidence between an organism, an associated chemical and its associated antibiotic properties. We provide a systematic assessment on the ability of LLMs to encode and express these relations, verifying for fluency, prompt-alignment, semantic coherence, factual knowledge and specificity of generated responses. The systematic analysis is applied to nine state-of-the-art models (including ChatGPT and GPT-4) in two prompting-based tasks: chemical compound definition generation and chemical compound-fungus relation determination. Results show that while recent models have improved in fluency, factual accuracy is still low and models are biased towards over-represented entities. The ability of LLMs to serve as biomedical knowledge bases is questioned, and the need for additional systematic evaluation frameworks is highlighted. The best performing GPT-4 produced a factual definition for 70% of chemical compounds and 43.6% factual relations to fungi, whereas the best open source model BioGPT-large 30% of the compounds and 30% of the relations for the best-performing prompt. The results show that while LLMs are currently not fit for purpose to be used as biomedical factual knowledge bases, there is a promising emerging property in the direction of factuality as the models become domain specialised, scale-up in size and level of human feedback.	翻訳日:2023-05-30 16:27:50 公開日:2023-05-28
# 限られたトレーニングデータを用いた停電検出タスクの転送学習 Transfer Learning for Power Outage Detection Task with Limited Training Data ( http://arxiv.org/abs/2305.17817v1 ) ライセンス: Link先を確認	Olukunle Owolabi	(参考訳) 停電の早期検出は信頼性の高い配電システムの維持に不可欠である。本研究では,限定ラベルデータによる障害検出におけるトランスファー学習と言語モデルの利用について検討する。事前トレーニングと転送学習を活用することで、モデルは未知のクラスに一般化することができる。停電に関連するソーシャルメディアツイートのバランスの取れたデータセットを用いて,ゼロショット学習と少数ショット学習を用いた実験を行った。私たちの仮説は、限られたデータで事前学習された言語モデルは、ベースラインモデルよりも停止検出タスクにおいて高いパフォーマンスを達成できるというものです。その結果、古典的なモデルはゼロショット言語モデルよりも優れているが、少数ショットの微調整は性能を大幅に改善している。例えば、10%の微調整で、BERTは81.3%(+15.3%)、GPTは74.5%(+8.5%)である。これは、データ可用性に制限のあるシナリオで障害を分析し、ローカライズするために、実用的な意味を持つ。私たちの評価は、停電検出のための言語モデルによる、少数ショットの微調整の可能性に関する洞察を与え、その強みと限界を強調します。本研究は、重要なインフラを管理するために高度な自然言語処理技術を活用するための知識基盤に寄与する。 Early detection of power outages is crucial for maintaining a reliable power distribution system. This research investigates the use of transfer learning and language models in detecting outages with limited labeled data. By leveraging pretraining and transfer learning, models can generalize to unseen classes. Using a curated balanced dataset of social media tweets related to power outages, we conducted experiments using zero-shot and few-shot learning. Our hypothesis is that Language Models pretrained with limited data could achieve high performance in outage detection tasks over baseline models. Results show that while classical models outperform zero-shot Language Models, few-shot fine-tuning significantly improves their performance. For example, with 10% fine-tuning, BERT achieves 81.3% accuracy (+15.3%), and GPT achieves 74.5% accuracy (+8.5%). This has practical implications for analyzing and localizing outages in scenarios with limited data availability. Our evaluation provides insights into the potential of few-shot fine-tuning with Language Models for power outage detection, highlighting their strengths and limitations. This research contributes to the knowledge base of leveraging advanced natural language processing techniques for managing critical infrastructure.	翻訳日:2023-05-30 16:26:49 公開日:2023-05-28
# チェビシェフゲインプロファイルと高飽和度を有するジョセフソンパラメトリック増幅器 Josephson parametric amplifier with Chebyshev gain profile and high saturation ( http://arxiv.org/abs/2305.17816v1 ) ライセンス: Link先を確認	Ryan Kaufman, Theodore White, Mark I. Dykman, Andrea Iorio, George Stirling, Sabrina Hong, Alex Opremcak, Andreas Bengtsson, Lara Faoro, Joseph C. Bardin, Tim Burger, Robert Gasca, Ofer Naaman	(参考訳) 本稿では,3階Chebyshevプロトタイプに基づく帯域通過インピーダンスマッチングネットワークを用いたジョセフソンパラメトリック増幅器の設計を示す。我々は、8個の増幅器を4.6GHzで動作させ、1dB未満の利得と最大500MHzの帯域幅を持つ20dBの利得を示した。増幅器はさらに、rf-SQUIDアレイを非線形素子として使用することにより、-73dBm程度の高出力飽和出力を実現する。我々は,Sycamoreプロセッサを用いて,システム読み出し効率と,その飽和付近の信号-雑音比を特徴付け,増幅器の量子制限ノイズ性能と一致したデータを求める。さらに、入力電力と音間デチューニングの関数として2音実験における増幅器の変調歪みを測定し、信号パワーの関数として顕著なディップで小さなデチューニングで余分な歪みを観測し、電力依存誘電損失の観点から解釈する。 We demonstrate a Josephson parametric amplifier design with a band-pass impedance matching network based on a third-order Chebyshev prototype. We measured eight amplifiers operating at 4.6 GHz that exhibit gains of 20 dB with less than 1 dB gain ripple and up to 500 MHz bandwidth. The amplifiers further achieve high output saturation powers around -73 dBm based on the use of rf-SQUID arrays as their nonlinear element. We characterize the system readout efficiency and its signal-to-noise ratio near saturation using a Sycamore processor, finding the data consistent with near quantum limited noise performance of the amplifiers. In addition, we measure the amplifiers' intermodulation distortion in two-tone experiments as a function of input power and inter-tone detuning, and observe excess distortion at small detuning with a pronounced dip as a function of signal power, which we interpret in terms of power-dependent dielectric losses.	翻訳日:2023-05-30 16:26:20 公開日:2023-05-28
# ナノスケールにおける効率的な量子作業貯水池 Efficient Quantum Work Reservoirs at the Nanoscale ( http://arxiv.org/abs/2305.17815v1 ) ライセンス: Link先を確認	Jinghao Lyu and Alexander B. Boyd and James P. Crutchfield	(参考訳) 資源理論として再編成されると、熱力学は単発のレジームでシステムの挙動を解析できる。この場合、状態遷移を実装するのに必要な作業はα-レニイの発散によって制限されるため、確率的熱力学と比較して効率的な演算の同定が異なる。したがって, 確率的熱力学と資源論的熱力学との差を詳細に理解する必要がある。そこで本研究では,単発システムにおける可逆性について検討し,多段作業貯水池に使用する2段作業貯水池を一般化した。これにより、単発体制におけるあらゆる遷移において可逆性が得られる。そこで我々は,非散逸状態の多層作業貯水池を触媒と無触媒で体系的に探索する。資源理論的な結果から、ランダウアーの制約下にある2段階の作業貯水池は、計算中のエネルギー散逸を誤解を招く。対照的に,マルチレベル作業貯水池はランドウアーの束縛を達成し,エントロピーをゼロにする。 When reformulated as a resource theory, thermodynamics can analyze system behaviors in the single-shot regime. In this, the work required to implement state transitions is bounded by alpha-Renyi divergences and so differs in identifying efficient operations compared to stochastic thermodynamics. Thus, a detailed understanding of the difference between stochastic thermodynamics and resource-theoretic thermodynamics is needed. To this end, we study reversibility in the single-shot regime, generalizing the two-level work reservoirs used there to multi-level work reservoirs. This achieves reversibility in any transition in the single-shot regime. Building on this, we systematically explore multi-level work reservoirs in the nondissipation regime with and without catalysts. The resource-theoretic results show that two-level work reservoirs undershoot Landauer's bound, misleadingly implying energy dissipation during computation. In contrast, we demonstrate that multi-level work reservoirs achieve Landauer's bound and produce zero entropy.	翻訳日:2023-05-30 16:26:03 公開日:2023-05-28
# ナレッジデザイン:ナレッジリファインメントによるタンパク質設計の限界を押し上げる Knowledge-Design: Pushing the Limit of Protein Design via Knowledge Refinement ( http://arxiv.org/abs/2305.15151v3 ) ライセンス: Link先を確認	Zhangyang Gao, Cheng Tan, Stan Z. Li	(参考訳) 近年の研究では、アミノ酸配列を所望の構造に折りたたむことを目的としたタンパク質設計における競合性が示されている。しかし、その多くは予測信頼の重要性を無視し、広大なタンパク質空間をカバーできず、共通のタンパク質知識を取り入れていない。タンパク質関連タスクにおける事前学習モデルの成功と、リカバリが信頼と非常に相関しているという事実を目撃した後、この知識がタンパク質設計の限界をさらに推し進めるかどうか疑問である。そこで,我々は,低品質残基を洗練する知識認識モジュールを提案する。また、トレーニング時間の50%以上を節約するメモリ検索機構も導入しました。提案手法をCATH, TS50, TS500データセット上で広範囲に評価した結果, 知識設計法は従来のPiFold手法よりも約9倍高い性能を示した。具体的には、知識設計はCATH、TS50、TS500ベンチマークで60%以上のリカバリを達成する最初の方法である。また,提案手法の有効性を示すための追加分析を行った。コードは公開される予定だ。 Recent studies have shown competitive performance in protein design that aims to find the amino acid sequence folding into the desired structure. However, most of them disregard the importance of predictive confidence, fail to cover the vast protein space, and do not incorporate common protein knowledge. After witnessing the great success of pretrained models on diverse protein-related tasks and the fact that recovery is highly correlated with confidence, we wonder whether this knowledge can push the limits of protein design further. As a solution, we propose a knowledge-aware module that refines low-quality residues. We also introduce a memory-retrieval mechanism to save more than 50\% of the training time. We extensively evaluate our proposed method on the CATH, TS50, and TS500 datasets and our results show that our Knowledge-Design method outperforms the previous PiFold method by approximately 9\% on the CATH dataset. Specifically, Knowledge-Design is the first method that achieves 60+\% recovery on CATH, TS50 and TS500 benchmarks. We also provide additional analysis to demonstrate the effectiveness of our proposed method. The code will be publicly available.	翻訳日:2023-05-30 11:18:06 公開日:2023-05-28
# 幾何学的多グラフニューラルネットワークを用いた多状態RNA設計 Multi-State RNA Design with Geometric Multi-Graph Neural Networks ( http://arxiv.org/abs/2305.14749v3 ) ライセンス: Link先を確認	Chaitanya K. Joshi, Arian R. Jamasb, Ramon Vi\~nas, Charles Harris, Simon Mathis, Pietro Li\`o	(参考訳) 計算RNAの設計は、合成生物学や治療開発に広く応用されている。 RNAの多様な生物学的機能の基本はコンフォメーションの柔軟性であり、単一の配列が様々な異なる3D状態を採用することができる。現在、計算的生体分子設計タスクは逆問題として描かれており、配列は1つの望ましい構造的コンフォメーションを採用することに基づいて設計されている。本研究は,3次元RNAのバックボーン構造からなる形状RNA設計パイプラインであるgRNAdeを提案し,その設計におけるRNAコンフォメーションの多様性を明示的に説明・反映する。本稿では,新しい大規模3次元RNA設計データセット,特に多状態および構造的に多様なRNAに対して,単一状態アプローチによるネイティブシークエンスリカバリの改善のためのgRNAdeの有用性を示す。私たちのコードはhttps://github.com/chaitjo/geometric-rna-designで利用可能です。 Computational RNA design has broad applications across synthetic biology and therapeutic development. Fundamental to the diverse biological functions of RNA is its conformational flexibility, enabling single sequences to adopt a variety of distinct 3D states. Currently, computational biomolecule design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired structural conformation. In this work, we propose gRNAde, a geometric RNA design pipeline that operates on sets of 3D RNA backbone structures to explicitly account for and reflect RNA conformational diversity in its designs. We demonstrate the utility of gRNAde for improving native sequence recovery over single-state approaches on a new large-scale 3D RNA design dataset, especially for multi-state and structurally diverse RNAs. Our code is available at https://github.com/chaitjo/geometric-rna-design	翻訳日:2023-05-30 11:17:47 公開日:2023-05-28
# 非常に大きなグラフのための高速オンラインノードラベリング Fast Online Node Labeling for Very Large Graphs ( http://arxiv.org/abs/2305.16257v2 ) ライセンス: Link先を確認	Baojian Zhou, Yifan Sun, Reza Babanezhad	(参考訳) 本稿では,トランスダクティブ学習環境下でのオンラインノード分類問題について検討する。現在の手法は、$\mathcal{O}(n^3)$ランタイムと$\mathcal{O}(n^2)$空間の複雑さでグラフカーネル行列を反転させるか、ランダムに広がる木を大量にサンプリングする。本研究では,一連の著作(rakhlin et al., 2012, rakhlin and sridharan, 2015; 2017)によって導入された, \textit{online relax} 技法に基づく改善を提案する。まず、適切なパラメータ化されたグラフカーネルが選択されたときに、有効後悔$\mathcal{O}(\sqrt{n^{1+\gamma}})$を証明し、この緩和に基づいて、$\mathcal{O}(k\sqrt{n^{1+\gamma}})を満足する近似アルゴリズムFastONLを提案する。 FastONLの鍵は、逆行列列を効果的に近似し、一連の人気のあるカーネルに適用する \textit{ Generalized local push} メソッドである。さらに、予測コストは$\mathcal{O}(\text{vol}({\mathcal{S}})\log 1/\epsilon)$ である。実験の結果,我々のスケーラブルな手法は,局所的一貫性とグローバル的一貫性のトレードオフを良好に享受できることがわかった。 This paper studies the online node classification problem under a transductive learning setting. Current methods either invert a graph kernel matrix with $\mathcal{O}(n^3)$ runtime and $\mathcal{O}(n^2)$ space complexity or sample a large volume of random spanning trees, thus are difficult to scale to large graphs. In this work, we propose an improvement based on the \textit{online relaxation} technique introduced by a series of works (Rakhlin et al.,2012; Rakhlin and Sridharan, 2015; 2017). We first prove an effective regret $\mathcal{O}(\sqrt{n^{1+\gamma}})$ when suitable parameterized graph kernels are chosen, then propose an approximate algorithm FastONL enjoying $\mathcal{O}(k\sqrt{n^{1+\gamma}})$ regret based on this relaxation. The key of FastONL is a \textit{generalized local push} method that effectively approximates inverse matrix columns and applies to a series of popular kernels. Furthermore, the per-prediction cost is $\mathcal{O}(\text{vol}({\mathcal{S}})\log 1/\epsilon)$ locally dependent on the graph with linear memory cost. Experiments show that our scalable method enjoys a better tradeoff between local and global consistency.	翻訳日:2023-05-30 11:09:39 公開日:2023-05-28
# asrと感情音声 : 音声と感情認識の相互影響に関する単語レベルでの検討 ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition ( http://arxiv.org/abs/2305.16065v2 ) ライセンス: Link先を確認	Yuanchao Li, Zeyu Zhao, Ondrej Klejch, Peter Bell, Catherine Lai	(参考訳) 音声感情認識(SER: Speech Emotion Recognition)では、テキストデータは音声信号とともに、その固有の変動に対処するためにしばしば使用される。しかし、ほとんどの研究における注釈付きテキストへの依存は、実用的なSERシステムの開発を妨げる。この課題を克服するために、感情コーパス上でのASRパフォーマンスを分析し、ASR文字中の単語誤りと信頼スコアの分布を調べ、感情がASRにどう影響するかを把握し、感情音声認識(ASR)が感情音声にどのように作用するかを検討する。我々は、Kaldi ASR、wav2vec2、Conformer、Whisperの4つのASRシステムと、IEMOCAP、MOSI、MELDの3つのコーパスを用いて、一般化性を確保する。さらに、テキストベースのSERを単語誤り率を増大させ、ASRがSERに与える影響を調査する。本研究の目的は,情緒的音声へのASR適応と実世界におけるSERの利用を促進するために,ASRとSERの関係と相互影響を明らかにすることである。 In Speech Emotion Recognition (SER), textual data is often used alongside audio signals to address their inherent variability. However, the reliance on human annotated text in most research hinders the development of practical SER systems. To overcome this challenge, we investigate how Automatic Speech Recognition (ASR) performs on emotional speech by analyzing the ASR performance on emotion corpora and examining the distribution of word errors and confidence scores in ASR transcripts to gain insight into how emotion affects ASR. We utilize four ASR systems, namely Kaldi ASR, wav2vec2, Conformer, and Whisper, and three corpora: IEMOCAP, MOSI, and MELD to ensure generalizability. Additionally, we conduct text-based SER on ASR transcripts with increasing word error rates to investigate how ASR affects SER. The objective of this study is to uncover the relationship and mutual impact of ASR and SER, in order to facilitate ASR adaptation to emotional speech and the use of SER in real world.	翻訳日:2023-05-30 11:08:43 公開日:2023-05-28
# 効率良く解釈可能な自己回帰変圧器のための動的コンテキストプルーニング Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers ( http://arxiv.org/abs/2305.15805v2 ) ライセンス: Link先を確認	Sotiris Anagnostidis, Dario Pavllo, Luca Biggio, Lorenzo Noci, Aurelien Lucchi, Thomas Hofmann	(参考訳) 大規模言語モデル(llm)で採用されている自己回帰トランスフォーマーは、長いシーケンスにスケールするのは難しい。計算コストを減らそうとするいくつかの研究にもかかわらず、LLMのほとんどの研究は、シークエンス内の全てのトークン間の注意層を採用しており、2次的なコストが生じる。本研究では,モデル表現性を維持しながら文脈情報を動的にプルーピングする新しい手法を提案する。本手法では,生成プロセスの任意の時点において,どの非形式的トークンをドロップするかを決定する学習可能な機構を用いる。そうすることで、私たちのアプローチはパフォーマンスの懸念に対処するだけでなく、解釈性も向上させ、モデルの意思決定プロセスに対する貴重な洞察を提供します。本手法は, 簡易な微調整プロセスによって既存の事前学習モデルに適用でき, 刈り込み強度をスパーシティパラメータで指定できる。特に,経験的な結果から,下流タスクの大幅なパフォーマンス低下を伴わずに,コンテクストの最大80\%を効果的にプルーピングできることが示され,推論コストの軽減に有用なツールを提供することができた。リファレンス実装では、推論スループットの最大$2\times$向上と、さらにメモリ節約を実現しています。 Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational requirements during inference. Our method employs a learnable mechanism that determines which uninformative tokens can be dropped from the context at any point across the generation process. By doing so, our approach not only addresses performance concerns but also enhances interpretability, providing valuable insight into the model's decision-making process. Our technique can be applied to existing pre-trained models through a straightforward fine-tuning process, and the pruning strength can be specified by a sparsity parameter. Notably, our empirical findings demonstrate that we can effectively prune up to 80\% of the context without significant performance degradation on downstream tasks, offering a valuable tool for mitigating inference costs. Our reference implementation achieves up to $2\times$ increase in inference throughput and even greater memory savings.	翻訳日:2023-05-30 11:06:48 公開日:2023-05-28

Title

Authors

Abstract

論文公表日・翻訳日

# 境界満足度検査による法的コンプライアンスの早期検証

Early Verification of Legal Compliance via Bounded Satisfiability Checking ( http://arxiv.org/abs/2209.04052v3 )

ライセンス: Link先を確認

Nick Feng, Lina Marsso, Mehrdad Sabetzadeh, Marsha Chechik

(参考訳) 法的特性には、データ値と時間に関する推論が含まれる。計量一階時間論理(mfotl)は、法的性質を特定するための豊富な形式を提供する。 MFOTLは実行時監視による運用システム上の法的特性の検証に成功しているが、要求によってキャプチャされた初期システム開発におけるMFOTLベースの検証のためのソリューションは存在しない。 MFOTLで形式化された法的特性とシステム要件が与えられた場合、その特性のコンプライアンスは満足度チェックによって要求に基づいて検証することができる。本稿では,mfotlの実用的,音質的,完全性(所定のバウンド内)の充足性チェック手法を提案する。充足性モジュラー理論(smt)に基づいたこのアプローチでは、満足のいく解を漸進的に探索するために反例誘導戦略を用いる。本手法をZ3 SMTソルバを用いて実施し,医療・経営・銀行・航空分野にまたがる5つのケーススタディで評価した。提案手法は, 利害関係の法的性質が満たされているか否かを効率よく判断し, コンプライアンス違反につながる反例を生成できることを示す。

Legal properties involve reasoning about data values and time. Metric first-order temporal logic (MFOTL) provides a rich formalism for specifying legal properties. While MFOTL has been successfully used for verifying legal properties over operational systems via runtime monitoring, no solution exists for MFOTL-based verification in early-stage system development captured by requirements. Given a legal property and system requirements, both formalized in MFOTL, the compliance of the property can be verified on the requirements via satisfiability checking. In this paper, we propose a practical, sound, and complete (within a given bound) satisfiability checking approach for MFOTL. The approach, based on satisfiability modulo theories (SMT), employs a counterexample-guided strategy to incrementally search for a satisfying solution. We implemented our approach using the Z3 SMT solver and evaluated it on five case studies spanning the healthcare, business administration, banking and aviation domains. Our results indicate that our approach can efficiently determine whether legal properties of interest are met, or generate counterexamples that lead to compliance violations.

翻訳日:2023-10-24 14:54:49 公開日:2023-05-28

# RefBERT: 自動リネームリファクタリングのための2段階の事前トレーニングフレームワーク

RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring ( http://arxiv.org/abs/2305.17708v1 )

ライセンス: Link先を確認

Hao Liu, Yanlin Wang, Zhao Wei, Yong Xu, Juhong Wang, Hui Li, Rongrong Ji

(参考訳) リファクタリングは、ソフトウェア進化におけるソースコードの品質と保守性を改善するために欠かせないプラクティスです。リネームリファクタリングは最も頻繁に実行されるリファクタリングで、識別子の命名が不十分な場合に読みやすさを高めるために識別子の新しい名前を提案する。しかしながら、既存の作業の多くは、ソースコードの2つのバージョン間のリネームアクティビティのみを識別するが、新しい名前を提案する方法についての懸念を表明する作業はほとんどない。本稿では,変数名に対する自動リネームリファクタリングについて検討し,他のリネームリファクタリング活動よりも難しいと考えられる。まず,リネームリファクタリングと一般的な学習パラダイムとの関係,および自然言語処理におけるリネームリファクタリングと一般的なテキスト生成の違いを指摘する。本稿では,変数名のリファクタリングを行うための2段階事前学習フレームワークRefBERTを提案する。 RefBERTはまず、新しい名前のサブトークン数を予測し、それに従ってサブトークンを生成する。制約付きマスク付き言語モデリング、コントラスト学習、およびバッグ・オブ・トークンの損失を含むいくつかのテクニックをRefBERTに組み込んで、変数名の自動リネームリファクタリングをカスタマイズする。構築したリファクタリングデータセットに関する広範な実験を通して、RefBERTの生成した変数名は、既存のメソッドよりも正確で有意義であることを示す。

Refactoring is an indispensable practice of improving the quality and maintainability of source code in software evolution. Rename refactoring is the most frequently performed refactoring that suggests a new name for an identifier to enhance readability when the identifier is poorly named. However, most existing works only identify renaming activities between two versions of source code, while few works express concern about how to suggest a new name. In this paper, we study automatic rename refactoring on variable names, which is considered more challenging than other rename refactoring activities. We first point out the connections between rename refactoring and various prevalent learning paradigms and the difference between rename refactoring and general text generation in natural language processing. Based on our observations, we propose RefBERT, a two-stage pre-trained framework for rename refactoring on variable names. RefBERT first predicts the number of sub-tokens in the new name and then generates sub-tokens accordingly. Several techniques, including constrained masked language modeling, contrastive learning, and the bag-of-tokens loss, are incorporated into RefBERT to tailor it for automatic rename refactoring on variable names. Through extensive experiments on our constructed refactoring datasets, we show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.

翻訳日:2023-10-24 05:17:34 公開日:2023-05-28

# ソフトウェアエコシステムの生活と死

The Life and Death of Software Ecosystems ( http://arxiv.org/abs/2306.10020v1 )

ライセンス: Link先を確認

Raula Gaikovina Kula and Gregorio Robles

(参考訳) ソフトウェアエコシステムは近年,多くの注目を集めています。このような取り組みの境界がプロジェクトの数を超えると、Free/Libre と Open Source Software (FLOSS) エコシステムが出現するのを目の当たりにしています。この章では、アトラクション(と減退)と生態系の死に関連する、健全なエコシステムに寄与する2つの側面を探求する。機能と生存のためには、生態系は人々を惹きつけ、それらを乗船させ、保持する必要がある。第1節では、貢献者(とユーザ)を惹きつけるための挑発的な研究疑問として、flossエコシステムのライフサイクルを探求する。そして、第2節では、システムの死に焦点を合わせ、デッドシステムと推定されるシステムとその死後の状態を探索する。

Software ecosystems have gained a lot of attention in recent times. Industry and developers gather around technologies and collaborate to their advancement; when the boundaries of such an effort go beyond certain amount of projects, we are witnessing the appearance of Free/Libre and Open Source Software (FLOSS) ecosystems. In this chapter, we explore two aspects that contribute to a healthy ecosystem, related to the attraction (and detraction) and the death of ecosystems. To function and survive, ecosystems need to attract people, get them on-boarded and retain them. In Section One we explore possibilities with provocative research questions for attracting and detracting contributors (and users): the lifeblood of FLOSS ecosystems. Then in the Section Two, we focus on the death of systems, exploring some presumed to be dead systems and their state in the afterlife.

翻訳日:2023-10-23 19:25:19 公開日:2023-05-28

# JutePestDetect: 微調整変換学習を用いた害虫識別のためのインテリジェントアプローチ

JutePestDetect: An Intelligent Approach for Jute Pest Identification Using Fine-Tuned Transfer Learning ( http://arxiv.org/abs/2308.05179v1 )

ライセンス: Link先を確認

Md. Simul Hasan Talukder, Mohammad Raziuddin Chowdhury, Md Sakib Ullah Sourav, Abdullah Al Rakin, Shabbir Ahmed Shuvo, Rejwan Bin Sulaiman, Musarrat Saberin Nipun, Muntarin Islam, Mst Rumpa Islam, Md Aminul Islam, Zubaer Haque

(参考訳) あるアジア諸国では、ジュートは農業部門の収入と国内総生産(gdp)の主要な源の1つである。他の多くの作物と同様に、ジュテは害虫の媒介になりがちで、バングラデシュ、インド、ミャンマー、中国などの国では一般的に識別される。さらに、この方法は時間がかかり、挑戦的であり、やや不正確であり、かなりの財政的リスクをもたらす。この問題に対処するため,本研究では,早期にジュト害虫を同定する,高性能かつレジリエントな転写学習(TL)に基づくJutePestDetectモデルを提案する。まず,17クラス,約380枚の写真を含むjute pestデータセットを作成し,手作業や自動前処理,背景除去やリサイズなどのクリーニングにより評価した。その後、JutePestDetectモデルを設計するための先行研究から、DenseNet201、InceptionV3、MobileNetV2、VGG19、ResNet50の5つの著名な事前訓練モデルが選ばれた。各モデルは, 分類層をグローバル平均プール層に置き換え, 正則化のためのドロップアウト層を組み込むことで, 再検討を行った。モデルの性能を評価するために、精度、リコール、F1スコア、ROC曲線、混乱行列などの様々な指標を用いた。これらの分析は、モデルの有効性を決定するための追加の洞察を与えた。その中でも、DenseNet201ベースのカスタマイズされたJutePestDetectモデルは、他のモデルよりも優れ、99%の精度を実現した。その結果, 提案手法と戦略は, 全世界の農家にとって有益であり, ジュテの場合, 害虫識別の高度化に寄与する。

In certain Asian countries, Jute is one of the primary sources of income and Gross Domestic Product (GDP) for the agricultural sector. Like many other crops, Jute is prone to pest infestations, and its identification is typically made visually in countries like Bangladesh, India, Myanmar, and China. In addition, this method is time-consuming, challenging, and somewhat imprecise, which poses a substantial financial risk. To address this issue, the study proposes a high-performing and resilient transfer learning (TL) based JutePestDetect model to identify jute pests at the early stage. Firstly, we prepared jute pest dataset containing 17 classes and around 380 photos per pest class, which were evaluated after manual and automatic pre-processing and cleaning, such as background removal and resizing. Subsequently, five prominent pre-trained models -DenseNet201, InceptionV3, MobileNetV2, VGG19, and ResNet50 were selected from a previous study to design the JutePestDetect model. Each model was revised by replacing the classification layer with a global average pooling layer and incorporating a dropout layer for regularization. To evaluate the models performance, various metrics such as precision, recall, F1 score, ROC curve, and confusion matrix were employed. These analyses provided additional insights for determining the efficacy of the models. Among them, the customized regularized DenseNet201-based proposed JutePestDetect model outperformed the others, achieving an impressive accuracy of 99%. As a result, our proposed method and strategy offer an enhanced approach to pest identification in the case of Jute, which can significantly benefit farmers worldwide.

翻訳日:2023-10-23 14:52:33 公開日:2023-05-28

# GIMLET:授業に基づくゼロショット学習のための統一グラフテキストモデル

GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning ( http://arxiv.org/abs/2306.13089v1 )

ライセンス: Link先を確認

Haiteng Zhao, Shengchao Liu, Chang Ma, Hannan Xu, Jie Fu, Zhi-Hong Deng, Lingpeng Kong, Qi Liu

(参考訳) 近年,分子特性の予測が注目されている。主なボトルネックは、高価な実験実験によるラベルの不足である。本研究は、この問題を緩和し、タスクのテキスト知識をより活用するために、ゼロショット設定で分子関連タスクを達成するために自然言語命令を用いることの可能性を検討する。既存の分子テキストモデルは,命令の不適切な処理やグラフのキャパシティの制限などにより,この設定では性能に乏しいことが判明した。これらの問題を解決するために,グラフデータとテキストデータの言語モデルを統合するGIMLETを提案する。一般化された位置埋め込みを採用することにより、我々のモデルはグラフ構造と命令文の両方を追加のグラフ符号化モジュールなしでエンコードするように拡張される。 GIMLETはまた、アテンションメカニズムのタスク命令からグラフのエンコーディングを分離し、新しいタスク間のグラフ機能の一般化を強化する。我々は、タスク記述から派生した命令を含む、2,000分子以上のタスクからなるデータセットを構築する。我々は、GIMLETを分子タスクにプリトレーニングし、命令とともにモデルが幅広いタスクに効果的に転送できるようにする。実験の結果、gimletは命令ベースのゼロショット学習において分子テキストベースラインを大きく上回り、toxcastやmmvなどのタスクでgnnモデルを監督する閉じた結果を得ることができた。

Molecule property prediction has gained significant attention in recent years. The main bottleneck is the label insufficiency caused by expensive lab experiments. In order to alleviate this issue and to better leverage textual knowledge for tasks, this study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. We discover that existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. To overcome these issues, we propose GIMLET, which unifies language models for both graph and text data. By adopting generalized position embedding, our model is extended to encode both graph structures and instruction text without additional graph encoding modules. GIMLET also decouples encoding of the graph from tasks instructions in the attention mechanism, enhancing the generalization of graph features across novel tasks. We construct a dataset consisting of more than two thousand molecule tasks with corresponding instructions derived from task descriptions. We pretrain GIMLET on the molecule tasks along with instructions, enabling the model to transfer effectively to a broad range of tasks. Experimental results demonstrate that GIMLET significantly outperforms molecule-text baselines in instruction-based zero-shot learning, even achieving closed results to supervised GNN models on tasks such as toxcast and muv.

翻訳日:2023-06-26 01:11:56 公開日:2023-05-28

# 統合的舗装維持戦略(TDADSS-IPM)のための技術主導型適応的意思決定支援システムを目指して : 気候変動適応のためのリスクアセスメントフレームワークに着目して

Towards a Technology-Driven Adaptive Decision Support System for Integrated Pavement and Maintenance strategies (TDADSS-IPM): focus on risk assessment framework for climate change adaptation ( http://arxiv.org/abs/2306.01769v1 )

ライセンス: Link先を確認

Shahrzad Pour, Amir Masoumi, Niels Skov Dujardin

(参考訳) 舗装と維持戦略の意思決定支援システムは、伝統的にサイロが局所最適システムへと導かれるように設計されてきた。さらに、今日の業界4.0の結果、ビッグデータの利用は存在しなかったため、dssは当初、不確実性の源に適応した設計ではなかったため、厳格な決定につながった。気候現象に対する道路資産の脆弱性に触発され,TDADSS-IPMと呼ばれる統合的舗装・保守活動のための技術駆動型適応決定支援システムの導入に向けたビジョン的な一歩を踏み出した。このようなdssの一部として、ボトムアップリスク評価モデルがベイズ信条ネットワーク(bbn)を介して検討され、天候条件によるデンマークの道路の実際の状況を実現する。このようなモデルは知識ドメインのギャップを埋め、時間とともにトレーニングし、実際のイベントにリアルタイムで適用可能なプラットフォームを開発する。

Decision Support Systems for pavement and maintenance strategies have traditionally been designed as silos led to local optimum systems. Moreover, since big data usage didn't exist as result of Industry 4.0 as of today, DSSs were not initially designed adaptive to the sources of uncertainties led to rigid decisions. Motivated by the vulnerability of the road assets to the climate phenomena, this paper takes a visionary step towards introducing a Technology-Driven Adaptive Decision Support System for Integrated Pavement and Maintenance activities called TDADSS-IPM. As part of such DSS, a bottom-up risk assessment model is met via Bayesian Belief Networks (BBN) to realize the actual condition of the Danish roads due to weather condition. Such model fills the gaps in the knowledge domain and develops a platform that can be trained over time, and applied in real-time to the actual event.

翻訳日:2023-06-11 13:58:39 公開日:2023-05-28

# 言語モデル効率研究の定量的考察

A Quantitative Review on Language Model Efficiency Research ( http://arxiv.org/abs/2306.01768v1 )

ライセンス: Link先を確認

Meng Jiang, Hy Dang, Lingbo Tong

(参考訳) 言語モデル(LM)は拡張され、強力になっています。効率の向上は、ニューラル情報処理システムの中核的な研究テーマの1つである。 tay et al. (2022) はnlpの分野において必須となる効率的なトランスフォーマーの包括的な概要を提供した。しかし、『オン・アセスメント』のセクションでは、「多くの研究論文が独自のベンチマークを選択している」ため、彼らは「どの基本的な効率的なトランスフォーマーが考慮すべきか」というオープンな疑問を残した。残念ながら、あらゆるベンチマークでTransformerのパフォーマンスについて定量的な分析は行われなかった。さらに、状態空間モデル(SSM)は、前回レビューでは議論されなかった非アテンション機構を持つ長距離シーケンスをモデル化する能力を示した。本稿では、効率的なトランスフォーマーに関する一連の論文およびssmsに関する論文から得られた結果についてメタ分析を行う。 lm効率研究の定量的なレビューと今後の研究への提案を提供する。

Language models (LMs) are being scaled and becoming powerful. Improving their efficiency is one of the core research topics in neural information processing systems. Tay et al. (2022) provided a comprehensive overview of efficient Transformers that have become an indispensable staple in the field of NLP. However, in the section of "On Evaluation", they left an open question "which fundamental efficient Transformer one should consider," answered by "still a mystery" because "many research papers select their own benchmarks." Unfortunately, there was not quantitative analysis about the performances of Transformers on any benchmarks. Moreover, state space models (SSMs) have demonstrated their abilities of modeling long-range sequences with non-attention mechanisms, which were not discussed in the prior review. This article makes a meta analysis on the results from a set of papers on efficient Transformers as well as those on SSMs. It provides a quantitative review on LM efficiency research and gives suggestions for future research.

翻訳日:2023-06-11 13:58:20 公開日:2023-05-28

# 自己相似性に基づく無室性心音の検出法

A Method for Detecting Murmurous Heart Sounds based on Self-similar Properties ( http://arxiv.org/abs/2306.05283v1 )

ライセンス: Link先を確認

Dixon Vimalajeewa, Chihoon Lee, Brani Vidakovic

(参考訳) 心室とは、心臓の血流によって生じる非定型的な音である。重症心疾患の徴候となるため、心室の検出は心血管疾患の特定と管理に重要である。しかし、現在の無室性心音の同定法は、心音信号の固有特性を探究することで得られる貴重な知見を十分に活用していない。そこで本研究では,ウェーブレット領域から導かれる心臓音の自己相似性と複雑性特性に基づく,新たな識別的特徴セットを提案する。自己相似性はフラクタル挙動の評価によって特徴づけられる一方、複雑性はウェーブレットエントロピーの計算によって調べられる。標準分類器のセットを用いて, 大腿骨の検出におけるこれらの特徴の診断性能を評価した。一般に公開されている心拍データに適用した場合,提案するウェーブレットベースのマルチスケール機能は,より少ない特徴を持つ既存手法に匹敵する性能を示した。これは、心臓音における自己相似性と複雑性特性が、大腿骨検出の精度を向上させる潜在的なバイオマーカーであることを示唆している。

A heart murmur is an atypical sound produced by the flow of blood through the heart. It can be a sign of a serious heart condition, so detecting heart murmurs is critical for identifying and managing cardiovascular diseases. However, current methods for identifying murmurous heart sounds do not fully utilize the valuable insights that can be gained by exploring intrinsic properties of heart sound signals. To address this issue, this study proposes a new discriminatory set of multiscale features based on the self-similarity and complexity properties of heart sounds, as derived in the wavelet domain. Self-similarity is characterized by assessing fractal behaviors, while complexity is explored by calculating wavelet entropy. We evaluated the diagnostic performance of these proposed features for detecting murmurs using a set of standard classifiers. When applied to a publicly available heart sound dataset, our proposed wavelet-based multiscale features achieved comparable performance to existing methods with fewer features. This suggests that self-similarity and complexity properties in heart sounds could be potential biomarkers for improving the accuracy of murmur detection.

翻訳日:2023-06-11 13:16:49 公開日:2023-05-28

# 海面温度画像の再構成:クラウドマスキングと再構成のためのマスク付きオートエンコーダアプローチ

Reconstructing Sea Surface Temperature Images: A Masked Autoencoder Approach for Cloud Masking and Reconstruction ( http://arxiv.org/abs/2306.00835v1 )

ライセンス: Link先を確認

Angelina Agabin (1) and J. Xavier Prochaska (1) ((1) University of California, Santa Cruz)

(参考訳) この論文では、12ミクロン未満の波長を用いたリモートセンシングデータの解析をクラウドが妨害し、使用可能なデータの量を大幅に制限し、偏りのある地理的分布(赤道および沿岸地域)を創出するなど、リモートセンシング技術によって生成された海面温度(SST)データの解析において雲マスキングを緩和する新しいアルゴリズムを提案する。この問題を解決するために,マスク付き画素の再構成にMasked Autoencodingを用いたビジョントランスフォーマを用いた,教師なし機械学習アルゴリズムEnkiを提案する。生成したOGCM(Ocean General Circulation Model, OGCM)データセットにおいて, マスク比(t)が10%, 35%, 50%, 75%の4種類のモデルで訓練を行った。性能評価のために,4x4ピクセル^2のパッチを個別にパッチして,画像の10%,20%,30%,40%,50%を乱すランダムな ``clouds''' によるllc4320 sst画像の検証セットを再構成した。 p の全てのレベルにおいて、平均 rmse が 0.03k 未満、すなわち viirs データの推定センサ誤差よりも低い1つまたは複数のモデルが存在することを一貫して発見する。同様に、個々のパッチレベルでは、再構成はパッチの変動よりもRMSE 8倍小さい。そして、予想通り、高い複雑さを持つ画像では、復元エラーが大きくなる。また,画像境界に沿ったパッチが系統的に高い再構成誤差を呈することを明らかにした。円喜は雲のマスキングを再構築する手段として、インペインティングを乗り越える大きな約束を持っていると結論づける。今後の研究は、現実世界のデータを再構築するエンキを開発する。

This thesis presents a new algorithm to mitigate cloud masking in the analysis of sea surface temperature (SST) data generated by remote sensing technologies, e.g., Clouds interfere with the analysis of all remote sensing data using wavelengths shorter than 12 microns, significantly limiting the quantity of usable data and creating a biased geographical distribution (towards equatorial and coastal regions). To address this issue, we propose an unsupervised machine learning algorithm called Enki which uses a Vision Transformer with Masked Autoencoding to reconstruct masked pixels. We train four different models of Enki with varying mask ratios (t) of 10%, 35%, 50%, and 75% on the generated Ocean General Circulation Model (OGCM) dataset referred to as LLC4320. To evaluate performance, we reconstruct a validation set of LLC4320 SST images with random ``clouds'' corrupting p=10%, 20%, 30%, 40%, 50% of the images with individual patches of 4x4 pixel^2. We consistently find that at all levels of p there is one or multiple models that reconstruct the images with a mean RMSE of less than 0.03K, i.e. lower than the estimated sensor error of VIIRS data. Similarly, at the individual patch level, the reconstructions have RMSE 8x smaller than the fluctuations in the patch. And, as anticipated, reconstruction errors are larger for images with a higher degree of complexity. Our analysis also reveals that patches along the image border have systematically higher reconstruction error; we recommend ignoring these in production. We conclude that Enki shows great promise to surpass in-painting as a means of reconstructing cloud masking. Future research will develop Enki to reconstruct real-world data.

翻訳日:2023-06-02 14:46:57 公開日:2023-05-28

# 完全拡張ラグランジアンおよびランダム化反復スケッチによる制約付き最適化

Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching ( http://arxiv.org/abs/2305.18379v1 )

ライセンス: Link先を確認

Ilgee Hong, Sen Na, Michael W. Mahoney, Mladen Kolar

(参考訳) 等式制約付き非線形非凸最適化問題を解くことを検討する。このタイプの問題は、制約付きディープニューラルネットワークから最適制御、PDE制約付き最適化まで、機械学習とエンジニアリングの様々な応用に広く見られる。この問題クラスに対して適応的不適合ニュートン法を開発した。各イテレーションでは、ランダム化反復スケッチ解法を用いてラグランジアンニュートン系を非現実的に解き、正確に拡張されたラグランジアンメリット関数で行探索を行うことで、適切なステップを選択する。ランダム化された解法は、適切なスケッチ行列を備える場合、解法あたりのフロップの複雑性と保存コストを著しく低減し、決定論的線形系解法よりも有利である。本手法は,ランダム化ソルバの精度と正確な拡張ラグランジアンのペナルティパラメータを適応的に制御し,不正確なニュートン方向が正確な拡張ラグランジアンの降下方向であることを保証する。これにより、ほぼ確実にグローバルな収束を確立することができます。また, 単位ステップ化は局所的に許容されるので, 局所線形収束を示す。さらに, ランダム化解器の適応精度条件を徐々に鋭くすれば, 線形収束を超線形収束に強化できることを示す。 CUTEstテストセットにおけるベンチマーク非線形問題,LIBSVMのデータによる制約付きロジスティック回帰,PDE制約問題に対する本手法の優れた性能を示す。

We consider solving equality-constrained nonlinear, nonconvex optimization problems. This class of problems appears widely in a variety of applications in machine learning and engineering, ranging from constrained deep neural networks, to optimal control, to PDE-constrained optimization. We develop an adaptive inexact Newton method for this problem class. In each iteration, we solve the Lagrangian Newton system inexactly via a randomized iterative sketching solver, and select a suitable stepsize by performing line search on an exact augmented Lagrangian merit function. The randomized solvers have advantages over deterministic linear system solvers by significantly reducing per-iteration flops complexity and storage cost, when equipped with suitable sketching matrices. Our method adaptively controls the accuracy of the randomized solver and the penalty parameters of the exact augmented Lagrangian, to ensure that the inexact Newton direction is a descent direction of the exact augmented Lagrangian. This allows us to establish a global almost sure convergence. We also show that a unit stepsize is admissible locally, so that our method exhibits a local linear convergence. Furthermore, we prove that the linear convergence can be strengthened to superlinear convergence if we gradually sharpen the adaptive accuracy condition on the randomized solver. We demonstrate the superior performance of our method on benchmark nonlinear problems in CUTEst test set, constrained logistic regression with data from LIBSVM, and a PDE-constrained problem.

翻訳日:2023-05-31 22:04:56 公開日:2023-05-28

# 潜在量子化による解離

Disentanglement via Latent Quantization ( http://arxiv.org/abs/2305.18378v1 )

ライセンス: Link先を確認

Kyle Hsu and Will Dorrell and James C. R. Whittington and Jiajun Wu and Chelsea Finn

(参考訳) 乱れた表現学習では、モデルはデータセットの基盤となる変動源を区別し、互いに独立して表現するように要求される。モデルにはこれらの情報源に関する基礎的な真理情報がないため、帰納的バイアスは遠絡を可能にする上で最重要である。本研究では,厳しい通信ボトルネックを伴って,データを合成符号化・復号化するための帰納的バイアスを構築する。具体的には、これを行う。 (i)次元ごとに独立したスカラー符号帳で学習可能な離散符号に潜在空間を定量化すること。 (ii)異常に高い重量減少による強モデル正則化の適用。直感的には、量子化はエンコーダに多数のデータポイントにまたがる少数の潜在値の使用を強制し、デコーダは各値に一貫した意味を割り当てることを可能にする。正規化は、モデルをこの控えめな戦略へと導くのに役立ちます。本稿では,基本データ再構成 (vanilla autoencoder) と潜在データ再構成 (InfoGAN) の両方に付加することで,このアプローチの適用性を示す。また,これらのモデルを確実な評価のために,情報理論に密着した絡み合いのための新しい指標であるInfoMECを提案する。正規化とともに、潜在量子化は、ベンチマークデータセットの代表スイートにおける学習された表現のモジュラリティと明示性を劇的に改善する。特に、当社の量子化遅延オートエンコーダ(QLAE)は、データ再構成を損なうことなく、これらのキー不整合特性において、従来から強い手法よりも一貫して優れています。

In disentangled representation learning, a model is asked to tease apart a dataset's underlying sources of variation and represent them independently of one another. Since the model is provided with no ground truth information about these sources, inductive biases take a paramount role in enabling disentanglement. In this work, we construct an inductive bias towards compositionally encoding and decoding data by enforcing a harsh communication bottleneck. Concretely, we do this by (i) quantizing the latent space into learnable discrete codes with a separate scalar codebook per dimension and (ii) applying strong model regularization via an unusually high weight decay. Intuitively, the quantization forces the encoder to use a small number of latent values across many datapoints, which in turn enables the decoder to assign a consistent meaning to each value. Regularization then serves to drive the model towards this parsimonious strategy. We demonstrate the broad applicability of this approach by adding it to both basic data-reconstructing (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models. In order to reliably assess these models, we also propose InfoMEC, new metrics for disentanglement that are cohesively grounded in information theory and fix well-established shortcomings in previous metrics. Together with regularization, latent quantization dramatically improves the modularity and explicitness of learned representations on a representative suite of benchmark datasets. In particular, our quantized-latent autoencoder (QLAE) consistently outperforms strong methods from prior work in these key disentanglement properties without compromising data reconstruction.

翻訳日:2023-05-31 22:04:31 公開日:2023-05-28

# BadLabel: ラベルノイズ学習の評価と改善に関するロバストな視点

BadLabel: A Robust Perspective on Evaluating and Enhancing Label-noise Learning ( http://arxiv.org/abs/2305.18377v1 )

ライセンス: Link先を確認

Jingfeng Zhang, Bo Song, Haohan Wang, Bo Han, Tongliang Liu, Lei Liu, Masashi Sugiyama

(参考訳) ラベルノイズ学習(LNL)は、ノイズラベルを用いたトレーニングデータに基づいてモデルの一般化を促進することを目的としている。実用的なLNLアルゴリズムを実現するために、研究者はクラス条件からインスタンス依存ノイズまで様々なラベルノイズタイプを提案している。本稿では,既存のlnlアルゴリズムの性能を大きなマージンで著しく低下させることができるbadlabelというラベルノイズ型を提案する。 badlabelは、特定のサンプルを選択してラベルを他のラベルにフリップすることで、クリーンでノイズの多いラベルの損失値が区別不能になるような、標準分類に対するラベルフリッピング攻撃に基づいて作成される。さらに,badlabelが提示する課題に対処するために,各時代においてラベルを逆さまに摂動させるロバストなlnl法を提案し,クリーンラベルとノイズラベルの損失値を再度識別する。ラベル付きデータの小さなセットを一度選択すれば、セミ教師付き学習のテクニックを適用してモデルを正確に訓練することができる。実験の結果,既存のlnlアルゴリズムが新たに導入されたbadlabelノイズタイプに対して脆弱であることを実証し,提案するロバストなlnl手法は様々なラベルノイズ下でのモデルの一般化性能を効果的に向上できることを示した。ノイズの多いラベルの新しいデータセットとロバストなLNLアルゴリズムのソースコードはhttps://github.com/zjfheart/BadLabelsで入手できる。

Label-noise learning (LNL) aims to increase the model's generalization given training data with noisy labels. To facilitate practical LNL algorithms, researchers have proposed different label noise types, ranging from class-conditional to instance-dependent noises. In this paper, we introduce a novel label noise type called BadLabel, which can significantly degrade the performance of existing LNL algorithms by a large margin. BadLabel is crafted based on the label-flipping attack against standard classification, where specific samples are selected and their labels are flipped to other labels so that the loss values of clean and noisy labels become indistinguishable. To address the challenge posed by BadLabel, we further propose a robust LNL method that perturbs the labels in an adversarial manner at each epoch to make the loss values of clean and noisy labels again distinguishable. Once we select a small set of (mostly) clean labeled data, we can apply the techniques of semi-supervised learning to train the model accurately. Empirically, our experimental results demonstrate that existing LNL algorithms are vulnerable to the newly introduced BadLabel noise type, while our proposed robust LNL method can effectively improve the generalization performance of the model under various types of label noise. The new dataset of noisy labels and the source codes of robust LNL algorithms are available at https://github.com/zjfheart/BadLabels.

翻訳日:2023-05-31 22:04:08 公開日:2023-05-28

# 不規則なテンソルのための高速かつ正確なデュアルウェイストリーミングPARAFAC2 -アルゴリズムとその応用-

Fast and Accurate Dual-Way Streaming PARAFAC2 for Irregular Tensors -- Algorithm and Application ( http://arxiv.org/abs/2305.18376v1 )

ライセンス: Link先を確認

Jun-Gi Jang, Jeongyoung Lee, Yong-chan Park, U Kang

(参考訳) 2次元のテンソルのサイズが時間とともに増加する2方向ストリーミング設定における不規則テンソルの効率的かつ正確に解析するにはどうすればよいか? 双方向ストリーミング設定には、どのような異常がありますか? 不規則なテンソルは列の長さが同じで行の長さが異なる行列の集合である。デュアルウェイストリーミングでは、既存の行列の新しい行と新しい行列の両方が時間とともに現れる。 PARAFAC2分解は不規則なテンソルを解析するための重要なツールである。双方向ストリーミングにはリアルタイム解析が必要であるが、静的PARAFAC2分解法は、新しいデータが到着するたびに蓄積テンソルに対してPARAFAC2分解を実行するため、この設定では効率的に動作しない。既存のストリーミング PARAFAC2 分解法は限られた設定で動作し、新しい行列列を効率的に処理できない。本稿では,双方向ストリーミング環境で動作する効率的かつ高精度なparafac2分解手法であるdashを提案する。新しいデータが与えられると、Dashは、古いデータと新しいデータに関する用語を慎重に分割し、古いデータに関連する単純な計算を避けることで、PARAFAC2分解を効率的に行う。さらに、忘れる要因を適用することで、Dashは最近の動きに従うことができる。広範な実験により、dashは新しく到着したデータに対する既存のparafac2分解法よりも最大14.0倍高速になった。また、サブプライム・モルトゲージ危機やCOVID-19など、現実世界のデータセットの異常を検出するための発見も提供する。

How can we efficiently and accurately analyze an irregular tensor in a dual-way streaming setting where the sizes of two dimensions of the tensor increase over time? What types of anomalies are there in the dual-way streaming setting? An irregular tensor is a collection of matrices whose column lengths are the same while their row lengths are different. In a dual-way streaming setting, both new rows of existing matrices and new matrices arrive over time. PARAFAC2 decomposition is a crucial tool for analyzing irregular tensors. Although real-time analysis is necessary in the dual-way streaming, static PARAFAC2 decomposition methods fail to efficiently work in this setting since they perform PARAFAC2 decomposition for accumulated tensors whenever new data arrive. Existing streaming PARAFAC2 decomposition methods work in a limited setting and fail to handle new rows of matrices efficiently. In this paper, we propose Dash, an efficient and accurate PARAFAC2 decomposition method working in the dual-way streaming setting. When new data are given, Dash efficiently performs PARAFAC2 decomposition by carefully dividing the terms related to old and new data and avoiding naive computations involved with old data. Furthermore, applying a forgetting factor makes Dash follow recent movements. Extensive experiments show that Dash achieves up to 14.0x faster speed than existing PARAFAC2 decomposition methods for newly arrived data. We also provide discoveries for detecting anomalies in real-world datasets, including Subprime Mortgage Crisis and COVID-19.

翻訳日:2023-05-31 22:03:41 公開日:2023-05-28

# ジャンプする学習: ジェネレーティブモデリングのための薄型化と薄型化

Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling ( http://arxiv.org/abs/2305.18375v1 )

ライセンス: Link先を確認

Tianqi Chen and Mingyuan Zhou

(参考訳) 自然画像のための最先端の深層生成モデルを設計するための顕著なパラダイムとして認知学習が登場した。連続実数値データと分類データの分布をモデル化する方法は,最近提案された拡散モデルにおいてよく研究されている。しかし,本論文では,数量や非負の連続データといった,しばしばスパース,スキュード,ヘビーテール,および/またはオーバー分散といった,他の種類のデータをモデル化する能力に制限があることが判明した。そこで本研究では,様々な種類のデータを生成するための一般的なレシピとして,ジャンプ学習を提案する。ディープニューラルネットワークをトレーニングするための学習目標を構築するために、フォワードカウントシンニングプロセスを使用して、リバースカウント厚みプロセスを使用して、そのネットワークを通じてその生成を反復的に洗練する。我々は,ジャンプの学習が認知の学習と相容れないパフォーマンスを期待される場合と,それがよりよいパフォーマンスを期待される場合を実証する。例えば、トレーニングデータが非負の場合、ジャンプの学習が推奨され、強いスパーシリティ、歪み、重く、および/または不均一性を示す。

Learning to denoise has emerged as a prominent paradigm to design state-of-the-art deep generative models for natural images. How to use it to model the distributions of both continuous real-valued data and categorical data has been well studied in recently proposed diffusion models. However, it is found in this paper to have limited ability in modeling some other types of data, such as count and non-negative continuous data, that are often highly sparse, skewed, heavy-tailed, and/or overdispersed. To this end, we propose learning to jump as a general recipe for generative modeling of various types of data. Using a forward count thinning process to construct learning objectives to train a deep neural network, it employs a reverse count thickening process to iteratively refine its generation through that network. We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better. For example, learning to jump is recommended when the training data is non-negative and exhibits strong sparsity, skewness, heavy-tailedness, and/or heterogeneity.

翻訳日:2023-05-31 22:03:17 公開日:2023-05-28

# 純スペクトルグラフ埋め込み:Top-Nレコメンデーションのためのグラフ畳み込みの解釈

Pure Spectral Graph Embeddings: Reinterpreting Graph Convolution for Top-N Recommendation ( http://arxiv.org/abs/2305.18374v1 )

ライセンス: Link先を確認

Edoardo D'Amico, Aonghus Lawlor, Neil Hurley

(参考訳) レコメンダシステムアルゴリズムの開発におけるグラフ畳み込みの利用は、最近、コラボレーティブフィルタリングタスク(cf)において最先端の結果を達成している。グラフ畳み込み演算がグラフスペクトル領域のフィルタリング操作に結びついていることが証明されているが、なぜこれが協調フィルタリング問題により高い性能をもたらすのか理論的根拠は分かっていない。提示された作品には2つの貢献がある。まず,ユーザおよびアイテム表現学習プロセス全体におけるグラフ畳み込みの利用の効果について検討し,正規化随伴行列の最大固有値に対応する固有ベクトルが伝搬する部分空間に対して,フィルタリング操作から学習した潜在機能がどのようにプッシュされるか,および,この部分空間に横たわるベクトルがトレーニングデータ上の予測関数の総和に関連する目的関数の最適解であるかを示す。次に、グラフ畳み込みによって得られる解をエミュレートするために固有ベクトルを直接利用し、時間を要する勾配降下訓練手順の必要性をなくし、3つの実世界のデータセットで高いパフォーマンスを提供するアプローチを提案する。

The use of graph convolution in the development of recommender system algorithms has recently achieved state-of-the-art results in the collaborative filtering task (CF). While it has been demonstrated that the graph convolution operation is connected to a filtering operation on the graph spectral domain, the theoretical rationale for why this leads to higher performance on the collaborative filtering problem remains unknown. The presented work makes two contributions. First, we investigate the effect of using graph convolution throughout the user and item representation learning processes, demonstrating how the latent features learned are pushed from the filtering operation into the subspace spanned by the eigenvectors associated with the highest eigenvalues of the normalised adjacency matrix, and how vectors lying on this subspace are the optimal solutions for an objective function related to the sum of the prediction function over the training data. Then, we present an approach that directly leverages the eigenvectors to emulate the solution obtained through graph convolution, eliminating the requirement for a time-consuming gradient descent training procedure while also delivering higher performance on three real-world datasets.

翻訳日:2023-05-31 22:02:54 公開日:2023-05-28

# KAFA:視覚言語モデルの知識付加的特徴適応による画像広告理解の再考

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models ( http://arxiv.org/abs/2305.18373v1 )

ライセンス: Link先を確認

Zhiwei Jia and Pradyumna Narayana and Arjun R. Akula and Garima Pruthi and Hao Su and Sugato Basu and Varun Jampani

(参考訳) 画像広告の理解は、幅広い現実世界のアプリケーションにとって重要な課題だ。多様な非定型シーン、現実世界の実体、シーンテキストの推論の関与は極めて困難であるが、画像広告の解釈方法は、特に目覚しい一般化性と適応性を特徴とする基礎的な視覚言語モデル(VLM)の時代において、比較的過小評価されている。本稿では、事前学習したvlmのレンズを通して、画像広告理解に関する最初の実証研究を行う。我々は、これらのVLMを画像広告理解に適用するための実践的な課題をベンチマークし、明らかにする。本稿では,画像広告にマルチモーダル情報を効果的に融合し,実世界の知識を付与するシンプルな特徴適応戦略を提案する。我々は、この研究が、広告業界に広く関連する画像広告理解にさらに注意を向けることを望む。

Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper, we perform the first empirical study of image ad understanding through the lens of pre-trained VLMs. We benchmark and reveal practical challenges in adapting these VLMs to image ad understanding. We propose a simple feature adaptation strategy to effectively fuse multimodal information for image ads and further empower it with knowledge of real-world entities. We hope our study draws more attention to image ad understanding which is broadly relevant to the advertising industry.

翻訳日:2023-05-31 22:02:33 公開日:2023-05-28

# AnoRand:ランダムラベルによる半教師付きディープラーニング異常検出手法

AnoRand: A Semi Supervised Deep Learning Anomaly Detection Method by Random Labeling ( http://arxiv.org/abs/2305.18389v1 )

ライセンス: Link先を確認

Mansour Zoubeirou A Mayaki and Michel Riveill

(参考訳) 異常検出(英: Anomaly detection)またはより一般的には異常検出(英: outliers detection)は、理論的および応用機械学習において最も一般的で困難な課題の一つである。最大の課題は、一般的にラベル付きデータやラベルがまったくないものはほとんどないことです。本稿では,深層学習アーキテクチャとランダムな合成ラベル生成を組み合わせることで,‘textbf{AnoRand}’と呼ばれる半教師付き異常検出手法を提案する。提案アーキテクチャは,(1)フィードフォワードフェルセプトロンからなるノイズ検出(ND)ブロックと(2)オートエンコーダ(AE)ブロックの2つの構成ブロックを有する。この新しいアーキテクチャの主な考え方は、1つのクラス(例えば、異常検出の場合の多数クラス)を学習することであり、潜在空間でデータを表現できるオートエンコーダの能力と、データが高度に不均衡な場合に1つのクラスを学ぶためのフィードフォワードパーセプトロン(ffp)の能力を活用することである。まず、トレーニングセットから少数のサンプル(例えば2\%)をランダムに乱す(ノイズを加える)ことにより、合成異常を作成する。第2に, モデルへの入力として, 正常試料と合成試料を用いる。提案手法の性能を,合成データセットと57実世界のデータセットの17の非教師なし異常検出法と比較した。提案手法は一般に最先端の手法よりも優れており,ほとんどの参照データセット上で最高の性能(AUC ROCとAUC PR)を有することを示す。また、実際のラベルを使ってモデルをトレーニングすることで、教師ありの方法で手法をテストした。その結果、最先端の教師付きアルゴリズムと比較して非常に優れた性能を示した。

Anomaly detection or more generally outliers detection is one of the most popular and challenging subject in theoretical and applied machine learning. The main challenge is that in general we have access to very few labeled data or no labels at all. In this paper, we present a new semi-supervised anomaly detection method called \textbf{AnoRand} by combining a deep learning architecture with random synthetic label generation. The proposed architecture has two building blocks: (1) a noise detection (ND) block composed of feed forward ferceptron and (2) an autoencoder (AE) block. The main idea of this new architecture is to learn one class (e.g. the majority class in case of anomaly detection) as well as possible by taking advantage of the ability of auto encoders to represent data in a latent space and the ability of Feed Forward Perceptron (FFP) to learn one class when the data is highly imbalanced. First, we create synthetic anomalies by randomly disturbing (add noise) few samples (e.g. 2\%) from the training set. Second, we use the normal and the synthetic samples as input to our model. We compared the performance of the proposed method to 17 state-of-the-art unsupervised anomaly detection method on synthetic datasets and 57 real-world datasets. Our results show that this new method generally outperforms most of the state-of-the-art methods and has the best performance (AUC ROC and AUC PR) on the vast majority of reference datasets. We also tested our method in a supervised way by using the actual labels to train the model. The results show that it has very good performance compared to most of state-of-the-art supervised algorithms.

翻訳日:2023-05-31 21:53:56 公開日:2023-05-28

# 価値推定のための分位時間微分学習の統計的効果

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation ( http://arxiv.org/abs/2305.18388v1 )

ライセンス: Link先を確認

Mark Rowland, Yunhao Tang, Clare Lyle, R\'emi Munos, Marc G. Bellemare, Will Dabney

(参考訳) 強化学習における時間差に基づく政策評価の問題について検討する。特に,この課題に対して,分散強化学習アルゴリズムである量子時間差分学習(QTD)を用いて分析を行う。平均以上のリターン分布に興味がなくても、qtd(リターンの完全な分布について予測を学ぶ)は、表的な設定であっても平均リターンのみを予測する古典的td学習のようなアプローチよりも優れたパフォーマンスを提供する可能性があるという驚くべき結論に達した。

We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reach the surprising conclusion that even if a practitioner has no interest in the return distribution beyond the mean, QTD (which learns predictions about the full distribution of returns) may offer performance superior to approaches such as classical TD learning, which predict only the mean return, even in the tabular setting.

翻訳日:2023-05-31 21:53:27 公開日:2023-05-28

# 生成型adversarial networkを用いたキャラクタデザイナの創造性向上

Augmenting Character Designers Creativity Using Generative Adversarial Networks ( http://arxiv.org/abs/2305.18387v1 )

ライセンス: Link先を確認

Mohammad Lataifeh, Xavier Carrasco, Ashraf Elnagar, Naveed Ahmed

(参考訳) GAN(Generative Adversarial Networks)の最近の進歩は、様々な分野の研究者の注目を集めている。最近のganはリアリズムに重点を置いているが、ハイパーリアルなアウトプットを生成することは、この仕事の場合のように、いくつかのドメインにとって優先事項ではない。生成された結果は、様々なマルチメディアプロジェクトのために新しいキャラクターを概念化しながら、キャラクターデザイナーの創造性を高める認知コンポーネントとして使われる。このような創造的な文脈で最も適したGANを選択するために、まず、単一のグラフィックス処理ユニットを用いて新しいビジュアル文字データセットをスクラッチからトレーニングした場合に、異なるGANアーキテクチャとそれらのパフォーマンスの比較を示す。また,この分野の多くの研究者が直面する課題である計算資源の制限を克服するために,転送学習やデータ拡張といった代替手法も検討する。さらに, キャラクタデザイナーエージェンシー上で生成した視覚の認知的価値を評価するために, 混合手法を用いている。その結果,文字設計プロセスへの早期適応が示すように,この文脈において極めて効果的であることが証明された。この研究の延長として、提案手法は人間と機械間の新しい共同設計プロセスとしてさらに評価され、生成した概念がどのように相互作用し、設計プロセスの結果に影響を与えるかを調査する。

Recent advances in Generative Adversarial Networks (GANs) continue to attract the attention of researchers in different fields due to the wide range of applications devised to take advantage of their key features. Most recent GANs are focused on realism, however, generating hyper-realistic output is not a priority for some domains, as in the case of this work. The generated outcomes are used here as cognitive components to augment character designers creativity while conceptualizing new characters for different multimedia projects. To select the best-suited GANs for such a creative context, we first present a comparison between different GAN architectures and their performance when trained from scratch on a new visual characters dataset using a single Graphics Processing Unit. We also explore alternative techniques, such as transfer learning and data augmentation, to overcome computational resource limitations, a challenge faced by many researchers in the domain. Additionally, mixed methods are used to evaluate the cognitive value of the generated visuals on character designers agency conceptualizing new characters. The results discussed proved highly effective for this context, as demonstrated by early adaptations to the characters design process. As an extension for this work, the presented approach will be further evaluated as a novel co-design process between humans and machines to investigate where and how the generated concepts are interacting with and influencing the design process outcome.

翻訳日:2023-05-31 21:53:17 公開日:2023-05-28

# エアロフォイル空力学における計算流体力学の合成のためのオートエンコーダと生成逆ネットワークを用いた相乗的枠組み

A Synergistic Framework Leveraging Autoencoders and Generative Adversarial Networks for the Synthesis of Computational Fluid Dynamics Results in Aerofoil Aerodynamics ( http://arxiv.org/abs/2305.18386v1 )

ライセンス: Link先を確認

Tanishk Nandal, Vaibhav Fulara, Raj Kumar Singh

(参考訳) 計算流体力学(cfd)では、空力挙動の正確な予測は翼の設計と最適化において重要な役割を果たす。本研究では,CFD結果を生成するために,自動エンコーダとGAN(Generative Adversarial Networks)を相乗的に組み合わせた新しい手法を提案する。我々の革新的なフレームワークは、オートエンコーダの本質的な能力を利用して、エアロフォイルジオメトリーを圧縮された20長ベクトル表現にエンコードする。その後、条件付きganネットワークは、このベクトルを正確に圧力分布プロットに変換し、固定風速、攻撃角、乱流レベル仕様を説明する。トレーニングプロセスは、javafoilソフトウェアから取得した細心の注意深いキュレートされたデータセットを使用し、広範囲の翼のジオメトリを包含する。提案手法は空力予測にかかわる時間とコストを低減し, 翼の性能を効果的に評価できる可能性を示す。この結果は流体力学における計算技術の進歩に寄与し、空気力学における設計および最適化プロセスの強化への道を開いた。

In the realm of computational fluid dynamics (CFD), accurate prediction of aerodynamic behaviour plays a pivotal role in aerofoil design and optimization. This study proposes a novel approach that synergistically combines autoencoders and Generative Adversarial Networks (GANs) for the purpose of generating CFD results. Our innovative framework harnesses the intrinsic capabilities of autoencoders to encode aerofoil geometries into a compressed and informative 20-length vector representation. Subsequently, a conditional GAN network adeptly translates this vector into precise pressure-distribution plots, accounting for fixed wind velocity, angle of attack, and turbulence level specifications. The training process utilizes a meticulously curated dataset acquired from JavaFoil software, encompassing a comprehensive range of aerofoil geometries. The proposed approach exhibits profound potential in reducing the time and costs associated with aerodynamic prediction, enabling efficient evaluation of aerofoil performance. The findings contribute to the advancement of computational techniques in fluid dynamics and pave the way for enhanced design and optimization processes in aerodynamics.

翻訳日:2023-05-31 21:52:54 公開日:2023-05-28

# Heterophily グラフのための自己注意型デュアル埋め込み

Self-attention Dual Embedding for Graphs with Heterophily ( http://arxiv.org/abs/2305.18385v1 )

ライセンス: Link先を確認

Yurui Lai, Taiyan Zhang, Rui Fan

(参考訳) グラフニューラルネットワーク(GNN)はノード分類タスクにおいて高い成功を収めている。 GNNはグラフがホモフィル性である、すなわち、隣接するノードは同じクラスに属する可能性が高いと仮定する。しかし、多くの実世界のグラフはヘテロ親和性があり、標準のGNNを用いた分類精度ははるかに低い。本研究ではヘテロ親和性グラフとホモ親和性グラフの両方に有効な新しいGNNを設計する。私たちの仕事は3つの主要な観察に基づいている。まず、ノードの特徴とグラフトポロジが異なるグラフで異なる量の情報を提供するため、それらを独立してエンコードし、適応的に優先順位付けする必要があることを示す。第2に,グラフトポロジ情報を伝播する際の負の注意重み付けを行うことで,精度が向上することを示す。最後に,ノード間の非対称な注意重み付けが有効であることを示す。我々は、これらの観測を新しい自己認識機構を通じて活用するGNNを設計する。本アルゴリズムは,数千から数百万のノードを含む実世界のグラフ上で評価し,既存のGNNと比較して最先端の結果が得られることを示す。また,設計の主成分が異なるグラフ上で有効であることも分析した。

Graph Neural Networks (GNNs) have been highly successful for the node classification task. GNNs typically assume graphs are homophilic, i.e. neighboring nodes are likely to belong to the same class. However, a number of real-world graphs are heterophilic, and this leads to much lower classification accuracy using standard GNNs. In this work, we design a novel GNN which is effective for both heterophilic and homophilic graphs. Our work is based on three main observations. First, we show that node features and graph topology provide different amounts of informativeness in different graphs, and therefore they should be encoded independently and prioritized in an adaptive manner. Second, we show that allowing negative attention weights when propagating graph topology information improves accuracy. Finally, we show that asymmetric attention weights between nodes are helpful. We design a GNN which makes use of these observations through a novel self-attention mechanism. We evaluate our algorithm on real-world graphs containing thousands to millions of nodes and show that we achieve state-of-the-art results compared to existing GNNs. We also analyze the effectiveness of the main components of our design on different graphs.

翻訳日:2023-05-31 21:52:34 公開日:2023-05-28

# インクリメンタル学習者に対するバックドア攻撃 : 実証的評価研究

Backdoor Attacks Against Incremental Learners: An Empirical Evaluation Study ( http://arxiv.org/abs/2305.18384v1 )

ライセンス: Link先を確認

Yiqi Zhong, Xianming Liu, Deming Zhai, Junjun Jiang, Xiangyang Ji

(参考訳) 時系列のシーケンシャルなデータを扱う際に発生する破滅的な忘れる問題を軽減するために、大量のインクリメンタル学習アルゴリズムが提案されている。しかし、インクリメンタル学習者の敵対的堅牢性は広く検証されておらず、潜在的なセキュリティリスクは残る。具体的には、中毒ベースのバックドア攻撃では、ILにおけるストリーミングデータの性質は、分散およびクロスタスク攻撃の可能性を生み出すことで、敵にとって非常に便利なものであると論じる。研究コミュニティの注目を引き付けるため,我々は,3つの学習シナリオ,特にバックドア知識のクロスタスク一般化効果に対して,11人の典型的なインクリメンタル学習者の高い脆弱性を実証的に明らかにした。最後に、アクティベーションクラスタリングに基づく防御機構は、潜在的なセキュリティリスクを軽減するトリガーパターンの検出に有効であることが判明した。

Large amounts of incremental learning algorithms have been proposed to alleviate the catastrophic forgetting issue arises while dealing with sequential data on a time series. However, the adversarial robustness of incremental learners has not been widely verified, leaving potential security risks. Specifically, for poisoning-based backdoor attacks, we argue that the nature of streaming data in IL provides great convenience to the adversary by creating the possibility of distributed and cross-task attacks -- an adversary can affect \textbf{any unknown} previous or subsequent task by data poisoning \textbf{at any time or time series} with extremely small amount of backdoor samples injected (e.g., $0.1\%$ based on our observations). To attract the attention of the research community, in this paper, we empirically reveal the high vulnerability of 11 typical incremental learners against poisoning-based backdoor attack on 3 learning scenarios, especially the cross-task generalization effect of backdoor knowledge, while the poison ratios range from $5\%$ to as low as $0.1\%$. Finally, the defense mechanism based on activation clustering is found to be effective in detecting our trigger pattern to mitigate potential security risks.

翻訳日:2023-05-31 21:52:16 公開日:2023-05-28

# ネットワーク・プルーニングの3段階モデル

A Three-regime Model of Network Pruning ( http://arxiv.org/abs/2305.18383v1 )

ライセンス: Link先を確認

Yefan Zhou, Yaoqing Yang, Arin Chang, Michael W. Mahoney

(参考訳) 最近の研究は、例えばトレーニングエポックの数など、複雑なインフルエンストレーニングハイパーパラメータが機械学習モデルの実行可能性に与える影響を強調している。おそらく意外なことに、特定のハイパーパラメータの調整がprunabilityにどのように影響するかを正確に予測する体系的なアプローチは、いまだに解明されていない。このギャップに対処するために,学習の統計力学に基づく現象論的モデルを導入する。提案手法は,ニューラルネットワーク(NN)トレーニングハイパーパラメータが刈り取り性能に与える影響をモデル化するために,温度的パラメータと負荷的パラメータを用いる。プレプルーニングモデルにおける負荷様パラメータの値に依存すると、プレプルーニングモデルにおける温度様パラメータの値が増加するか、その後のプルーニング性能が向上または損なわれる可能性がある。この変遷に基づき,pruned nn 損失景観のグローバル構造を分類することにより,3次元の登録モデルを構築した。本モデルでは, 高温のディコトミウス効果は, ポストプランンモデルにおける異なるタイプの大域構造間の遷移と関係していることを明らかにした。結果から,ケーススタディを3つ提示した。 1) 刈取改善のための過度パラメータの増大又は減少の判定 2) モデル群からプルーンする最良のモデルを選択すること,及び 3) シャープネス認識最小化法のハイパーパラメータを調整し, 刈り取り性能を向上する。

Recent work has highlighted the complex influence training hyperparameters, e.g., the number of training epochs, can have on the prunability of machine learning models. Perhaps surprisingly, a systematic approach to predict precisely how adjusting a specific hyperparameter will affect prunability remains elusive. To address this gap, we introduce a phenomenological model grounded in the statistical mechanics of learning. Our approach uses temperature-like and load-like parameters to model the impact of neural network (NN) training hyperparameters on pruning performance. A key empirical result we identify is a sharp transition phenomenon: depending on the value of a load-like parameter in the pruned model, increasing the value of a temperature-like parameter in the pre-pruned model may either enhance or impair subsequent pruning performance. Based on this transition, we build a three-regime model by taxonomizing the global structure of the pruned NN loss landscape. Our model reveals that the dichotomous effect of high temperature is associated with transitions between distinct types of global structures in the post-pruned model. Based on our results, we present three case-studies: 1) determining whether to increase or decrease a hyperparameter for improved pruning; 2) selecting the best model to prune from a family of models; and 3) tuning the hyperparameter of the Sharpness Aware Minimization method for better pruning performance.

翻訳日:2023-05-31 21:51:52 公開日:2023-05-28

# 変圧器を用いた効率的な時系列予測のための訓練中の適応的スパーシリティレベル

Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers ( http://arxiv.org/abs/2305.18382v1 )

ライセンス: Link先を確認

Zahra Atashgahi, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu

(参考訳) リアルタイムアプリケーション、特にディープニューラルネットワーク(DNN)では、効率的な時系列予測が重要になっている。 DNNの効率性は、疎結合とモデルサイズの削減によって達成できる。しかしながら、トレーニング中に自動的にスパーシリティレベルを見つけることは、データセット間のロススパーシティトレードオフの不均一性のため、依然として難しい課題である。本稿では,事前定義されたスパーシリティレベルを必要とせず,損失とスパーシリティの最適なバランスを求めるために, \textbf{a}daptive \textbf{s}parsity \textbf{l}evel} (\textbf{pals}) による\enquote{\textbf{p}runingを提案する。 PALSはスパーストレーニングとインターントレーニングの両方からインスピレーションを得ている。スパースニューラルネットワークのトレーニングにおいて、新しい"expand"メカニズムを導入し、モデルを動的に縮小、拡張、あるいは安定して、適切なスパース性レベルを見つけることができる。本稿では,その優れた時系列予測性能と計算コストで知られている変圧器の効率向上に着目する。それでも、PALSは任意のDNNに直接適用することができる。これらの議論の範囲では、DLinearモデルにもその効果が示される。 6つのベンチマークデータセットと5つの最先端トランスフォーマーによる実験結果から,PALSは高密度モデルに匹敵する性能を維持しながら,モデルサイズを大幅に削減することが示された。さらに興味深いことに、PALSは、MSEとMAEの損失でそれぞれ30例中12例と14例において、密度モデルよりも優れており、パラメータ数が65%、FLOPが63%減少している。私たちのコードは、論文の受理時に公開されます。

Efficient time series forecasting has become critical for real-world applications, particularly with deep neural networks (DNNs). Efficiency in DNNs can be achieved through sparse connectivity and reducing the model size. However, finding the sparsity level automatically during training remains a challenging task due to the heterogeneity in the loss-sparsity tradeoffs across the datasets. In this paper, we propose \enquote{\textbf{P}runing with \textbf{A}daptive \textbf{S}parsity \textbf{L}evel} (\textbf{PALS}), to automatically seek an optimal balance between loss and sparsity, all without the need for a predefined sparsity level. PALS draws inspiration from both sparse training and during-training methods. It introduces the novel "expand" mechanism in training sparse neural networks, allowing the model to dynamically shrink, expand, or remain stable to find a proper sparsity level. In this paper, we focus on achieving efficiency in transformers known for their excellent time series forecasting performance but high computational cost. Nevertheless, PALS can be applied directly to any DNN. In the scope of these arguments, we demonstrate its effectiveness also on the DLinear model. Experimental results on six benchmark datasets and five state-of-the-art transformer variants show that PALS substantially reduces model size while maintaining comparable performance to the dense model. More interestingly, PALS even outperforms the dense model, in 12 and 14 cases out of 30 cases in terms of MSE and MAE loss, respectively, while reducing 65% parameter count and 63% FLOPs on average. Our code will be publicly available upon acceptance of the paper.

翻訳日:2023-05-31 21:51:30 公開日:2023-05-28

# 大量鉱石から溶出する金: 臨界試料選択による効率的なデータセット蒸留

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection ( http://arxiv.org/abs/2305.18381v1 )

ライセンス: Link先を確認

Yue Xu, Yong-Lu Li, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang

(参考訳) データ効率の学習は、特にデータセットの蒸留が有効な解となる大規模なマルチモーダルモデルの現在の傾向を考えると、大きな注目を集めている。しかし、データセットの蒸留プロセス自体は依然として非常に非効率である。本研究では,情報理論を参考に蒸留問題をモデル化する。データセットの蒸留に重大なデータ冗長性が存在することを観察し、トレーニングサンプルの有用性をより強調する。最適なデータ選択の包括的分析によって検証される,最も価値のあるサンプルを活用するための一連の手法を提案する。新しい戦略はトレーニングコストを大幅に削減し、既存の蒸留アルゴリズムをより大きく、より多様化したデータセットに拡張する。さらに, この戦略は, 蒸留とネットワークのダイナミクスに関する新たな分析を開拓し, 性能を継続的に向上させる。本手法は,imagenet-1k や kinetics-400 など,より大規模なデータセットや不均一なデータセットに蒸留アルゴリズムを拡張できる。私たちのコードは公開されます。

Data-efficient learning has drawn significant attention, especially given the current trend of large multi-modal models, where dataset distillation can be an effective solution. However, the dataset distillation process itself is still very inefficient. In this work, we model the distillation problem with reference to information theory. Observing that severe data redundancy exists in dataset distillation, we argue to put more emphasis on the utility of the training samples. We propose a family of methods to exploit the most valuable samples, which is validated by our comprehensive analysis of the optimal data selection. The new strategy significantly reduces the training cost and extends a variety of existing distillation algorithms to larger and more diversified datasets, e.g. in some cases only 0.04% training data is sufficient for comparable distillation performance. Moreover, our strategy consistently enhances the performance, which may open up new analyses on the dynamics of distillation and networks. Our method is able to extend the distillation algorithms to much larger-scale datasets and more heterogeneous datasets, e.g. ImageNet-1K and Kinetics-400. Our code will be made publicly available.

翻訳日:2023-05-31 21:50:56 公開日:2023-05-28

# 自律走行車両の協調RL試験における可能性に基づくクレジットアサインメント

Potential-based Credit Assignment for Cooperative RL-based Testing of Autonomous Vehicles ( http://arxiv.org/abs/2305.18380v1 )

ライセンス: Link先を確認

Utku Ayvaz, Chih-Hong Cheng, Hao Shen

(参考訳) 自律走行車(AV)は、一般的な現実のケースでは極めてよく機能するが、予期せぬケースでは不合理な動作が重大な安全上の懸念を引き起こす。本稿では,av計画と意思決定モジュールのための挑戦的なテストケースを生成するための協調強化学習(rl)の概念を提案する。コラボレーティブrlの重要な課題の1つは、クレジット割り当て問題であり、すべてのパラメータとタイミングを考慮して、トラフィックシナリオで相互作用する複数のエージェントに対して適切な報酬の割り当てが非自明であることが判明した。この課題に対処するために,信用割り当て問題を解決するために,反事実分析に着想を得た,新たな可能性ベースの報酬形成手法を提案する。シミュレーション環境における評価は,局所的および大域的な報酬を用いた他の手法に対する提案手法の優位性を示す。

While autonomous vehicles (AVs) may perform remarkably well in generic real-life cases, their irrational action in some unforeseen cases leads to critical safety concerns. This paper introduces the concept of collaborative reinforcement learning (RL) to generate challenging test cases for AV planning and decision-making module. One of the critical challenges for collaborative RL is the credit assignment problem, where a proper assignment of rewards to multiple agents interacting in the traffic scenario, considering all parameters and timing, turns out to be non-trivial. In order to address this challenge, we propose a novel potential-based reward-shaping approach inspired by counterfactual analysis for solving the credit-assignment problem. The evaluation in a simulated environment demonstrates the superiority of our proposed approach against other methods using local and global rewards.

翻訳日:2023-05-31 21:50:38 公開日:2023-05-28

# 初期化時の等尺埋め込み獲得における活性化と正規化の影響について

On the impact of activation and normalization in obtaining isometric embeddings at initialization ( http://arxiv.org/abs/2305.18399v1 )

ライセンス: Link先を確認

Amir Joudaki, Hadi Daneshmand, Francis Bach

(参考訳) 本稿では,入力のバッチに対応する出力のペアワイズ内積を含むディープニューラルネットワークにおけるペナルティメートグラム行列の構造について検討する。いくつかのアーキテクチャでは、このグラム行列は初期化の深さで縮退し、トレーニングが劇的に遅くなることが観察されている。バッチやレイヤの正規化といった正規化層は、ランクの崩壊を防止する上で重要な役割を果たす。有望な進歩にもかかわらず、既存の理論結果 (i) 変圧器で広く使用される層正規化には拡張しない。 (ii) 正規化のバイアスを有限深さで定量的に特徴づけることができない。このギャップを埋めるために, 活性化層と連動して, 層正規化により, 多層パーセプトロンのグラム行列が初期化深さの指数関数的速度で等化に偏っていることを証明した。活性化関数のエルミート展開を用いてこの速度を定量化し、アイソメトリへのバイアスにおける高次($2$)エルミート係数の重要性を強調する。

In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs. In several architectures it has been observed that this Gram matrix becomes degenerate with depth at initialization, which dramatically slows training. Normalization layers, such as batch or layer normalization, play a pivotal role in preventing the rank collapse issue. Despite promising advances, the existing theoretical results (i) do not extend to layer normalization, which is widely used in transformers, (ii) can not characterize the bias of normalization quantitatively at finite depth. To bridge this gap, we provide a proof that layer normalization, in conjunction with activation layers, biases the Gram matrix of a multilayer perceptron towards isometry at an exponential rate with depth at initialization. We quantify this rate using the Hermite expansion of the activation function, highlighting the importance of higher order ($\ge 2$) Hermite coefficients in the bias towards isometry.

翻訳日:2023-05-31 21:44:34 公開日:2023-05-28

# 画像生成における不適切さの軽減:世界のユリティーを反映する価値はあるか?

Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World's Ugliness? ( http://arxiv.org/abs/2305.18398v1 )

ライセンス: Link先を確認

Manuel Brack, Felix Friedrich, Patrick Schramowski, Kristian Kersting

(参考訳) テキスト条件付き画像生成モデルは近年,画像品質とテキストアライメントの驚くべき結果が得られ,急速に成長するアプリケーションに採用されている。非常にデータ駆動であり、ウェブからランダムにスクラップされた数十億規模のデータセットに依存しているため、不適切な人間の行動を再現する。具体的には,様々な生成型テキストから画像へのモデルに対して,大規模に発生する不適切なデジェネレーションを実証する。そこで我々は,不適切なコンテンツの生成を抑制するため,推論時の緩和戦略を評価する。以上の結果から,モデルの表現を人間の好みに合わせるために活用できることが示唆された。

Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the web, they also reproduce inappropriate human behavior. Specifically, we demonstrate inappropriate degeneration on a large-scale for various generative text-to-image models, thus motivating the need for monitoring and moderating them at deployment. To this end, we evaluate mitigation strategies at inference to suppress the generation of inappropriate content. Our findings show that we can use models' representations of the world's ugliness to align them with human preferences.

翻訳日:2023-05-31 21:44:16 公開日:2023-05-28

# ソーシャルメディアデータを用いた2023年トルコ大統領選挙結果の予測

Prediction of the 2023 Turkish Presidential Election Results Using Social Media Data ( http://arxiv.org/abs/2305.18397v1 )

ライセンス: Link先を確認

Aysun Bozanta, Fuad Bayrak, Ayse Basar

(参考訳) ソーシャルメディアプラットフォームは政治キャンペーンの運営方法に影響を与えるため、政治家が市民と直接対話するための重要なツールとなっている。各国の選挙は、ソーシャルメディアのデータが選挙結果に大きな影響を及ぼす可能性があることを示している。本研究では,2023年トルコ総選挙における政党の投票シェアを,様々なプラットフォームからのソーシャルメディアデータと従来の投票データを組み合わせて予測することを目的とする。私たちのアプローチは、コンテンツよりもソーシャルメディアの対話の数を考えるボリュームベースのアプローチです。様々な時間窓の予測モデルを比較した。その結果、全ての時間ウィンドウにおいて、ARIMAXモデルは他のアルゴリズムよりも優れていることがわかった。

Social media platforms influence the way political campaigns are run and therefore they have become an increasingly important tool for politicians to directly interact with citizens. Previous elections in various countries have shown that social media data may significantly impact election results. In this study, we aim to predict the vote shares of parties participating in the 2023 elections in Turkey by combining social media data from various platforms together with traditional polling data. Our approach is a volume-based approach that considers the number of social media interactions rather than content. We compare several prediction models across varying time windows. Our results show that for all time windows, the ARIMAX model outperforms the other algorithms.

翻訳日:2023-05-31 21:44:03 公開日:2023-05-28

# LLMは暗号化プロンプトを理解できる:プライバシーに配慮したフレンドリーなトランスフォーマーを目指して

LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers ( http://arxiv.org/abs/2305.18396v1 )

ライセンス: Link先を確認

Xuanqi Liu and Zhuotao Liu

(参考訳) 以前の作業では、サーバクライアント設定でトランスフォーマーベースの大規模言語モデル(llms)用のプライベート推論フレームワークを構築しようとしており、そこではサーバがモデルパラメータを保持し、クライアントが推論のためにプライベートデータを入力する。しかし、これらのフレームワークは、プライベートインプットが元のllmを通じて前方に伝播するときに大きなオーバーヘッドを課す。本稿では,プライバシ計算フレンドリー近似を用いたトランスフォーマアーキテクチャにおける計算・通信重演算子の置換により,モデル性能への影響を小さくして,プライベート推論コストを大幅に削減できることを示す。最先端のiron(neurips 2022)と比較して、当社のプライバシコンピューティングフレンドリーなモデル推論パイプラインは、ほぼ同じ精度を維持しながら、計算速度が5\times$で、通信オーバーヘッドが80\%削減されます。

Prior works have attempted to build private inference frameworks for transformer-based large language models (LLMs) in a server-client setting, where the server holds the model parameters and the client inputs the private data for inference. However, these frameworks impose significant overhead when the private inputs are forward propagated through the original LLMs. In this paper, we show that substituting the computation- and communication-heavy operators in the transformer architecture with privacy-computing friendly approximations can greatly reduce the private inference costs with minor impact on model performance. Compared to the state-of-the-art Iron (NeurIPS 2022), our privacy-computing friendly model inference pipeline achieves a $5\times$ acceleration in computation and an 80\% reduction in communication overhead, while retaining nearly identical accuracy.

翻訳日:2023-05-31 21:43:53 公開日:2023-05-28

# 知識集約型タスクにおける小言語モデルの知識強化推論蒸留

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks ( http://arxiv.org/abs/2305.18395v1 )

ライセンス: Link先を確認

Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang

(参考訳) 大規模言語モデル(LLM)は、知識の複雑な理解を必要とする知識集約的推論タスクにおいて、有望な性能を示す。しかし、LLMの実際のアプリケーションへの展開は、高い計算要求とデータプライバシに関する懸念のために困難である可能性がある。従来の研究は、ラベル付きデータで微調整したり、LLMを蒸留することで、タスク固有小言語モデル(LM)の構築に重点を置いてきた。しかしながら、これらのアプローチは、必要となる知識を記憶する小さなlmsの能力に制限があるため、知識集約的推論タスクには不向きである。記憶の理論的解析により,外部知識ベースから獲得した知識を付加した理性を生成するため,小さなLMを微調整する新しい手法であるKARD(Knowledge-Augmented Reasoning Distillation)を提案する。さらに,理論生成に関連する文書を得るためのニューラルリランカも提案する。我々は、KARDが知識集約推論データセットであるMedQA-USMLEとStrategyQAにおいて、小さなT5モデルとFlan-T5モデルの性能を著しく向上させることを示す。特に,MedQA-USMLEベンチマークとStrategyQAベンチマークの2倍のパラメータを持つ細調整された3Bモデルに対して,2億5000万モデルで優れた性能を実現する。

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small language models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and Flan-T5 models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE and StrategyQA. Notably, our method makes the 250M models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.

翻訳日:2023-05-31 21:43:34 公開日:2023-05-28

# バイレベル学習による最適正規化パラメータについて

On Optimal Regularization Parameters via Bilevel Learning ( http://arxiv.org/abs/2305.18394v1 )

ライセンス: Link先を確認

Matthias J. Ehrhardt, Silvia Gazzola and Sebastian J. Scott (Department of Mathematical Sciences, University of Bath, Bath, UK)

(参考訳) 変分正規化は線形逆問題を解くためによく使われ、正規化子によるデータの忠実度を増強する。正規化器は事前情報を促進するために使用され、正規化パラメータによって重み付けされる。適切な正規化パラメータの選択は重要であり、様々な選択が全く異なる再構成につながる。相違原理やL曲線といった既存の戦略を用いて適切なパラメータ値を決定することができるが、近年はバイレベル学習と呼ばれる教師付き機械学習アプローチが採用されている。バイレベル学習は最適パラメータを決定する強力なフレームワークであり、ネスト最適化問題を解決することを含む。従来の戦略は様々な理論的な成果を享受するが、この環境における二段階学習はいまだ発展途上である。 1つの必須性質は、決定された正則化パラメータの正則性である。本研究では,既存の理論よりも最適正則化パラメータの正値性をよりよく特徴付ける新しい条件を提案する。数値計算により、この新条件を小・大ともに検証・検討する。

Variational regularization is commonly used to solve linear inverse problems, and involves augmenting a data fidelity by a regularizer. The regularizer is used to promote a priori information, and is weighted by a regularization parameter. Selection of an appropriate regularization parameter is critical, with various choices leading to very different reconstructions. Existing strategies such as the discrepancy principle and L-curve can be used to determine a suitable parameter value, but in recent years a supervised machine learning approach called bilevel learning has been employed. Bilevel learning is a powerful framework to determine optimal parameters, and involves solving a nested optimisation problem. While previous strategies enjoy various theoretical results, the well-posedness of bilevel learning in this setting is still a developing field. One necessary property is positivity of the determined regularization parameter. In this work, we provide a new condition that better characterises positivity of optimal regularization parameters than the existing theory. Numerical results verify and explore this new condition for both small and large dimensional problems.

翻訳日:2023-05-31 21:43:12 公開日:2023-05-28

# 知らないことを知っているプライベートなモデルを訓練する

Training Private Models That Know What They Don't Know ( http://arxiv.org/abs/2305.18393v1 )

ライセンス: Link先を確認

Stephan Rabanser, Anvith Thudi, Abhradeep Thakurta, Krishnamurthy Dvijotham, Nicolas Papernot

(参考訳) 自信過剰だが誤った予測を避けるための、信頼できるディープラーニングモデルのトレーニングは、長年の課題である。センシティブなデータに提供される保護は、学習プロセスに付加的なランダムさを注入するコストでもたらされます。本研究では、差分プライバシー制約の下で、選択型分類器(不確実性のある場合に排除できる)を徹底的に調査する。プライバシリークのリスクを増大させるため、いくつかの一般的な選択予測アプローチは、差分プライベート環境では効果がないことがわかった。同時に,市販のプライベート学習アルゴリズムが生成するチェックポイントのみを使用する最近のアプローチが,dp下では特に適していることを示す。さらに、差分プライバシーは実用性を損なうだけでなく、選択分類性能を低下させることを示した。プライバシレベルにまたがるこの効果を分析するために,モデルユーティリティレベルにまたがる選択的予測性能を分離する新しい評価機構を提案する。実験の結果,プライバシ予算の減少に伴い,非プライベートモデルで達成可能な性能レベルを回復することは可能であるが,かなりのカバレッジコストが伴うことがわかった。

Training reliable deep learning models which avoid making overconfident but incorrect predictions is a longstanding challenge. This challenge is further exacerbated when learning has to be differentially private: protection provided to sensitive data comes at the price of injecting additional randomness into the learning process. In this work, we conduct a thorough empirical investigation of selective classifiers -- that can abstain when they are unsure -- under a differential privacy constraint. We find that several popular selective prediction approaches are ineffective in a differentially private setting as they increase the risk of privacy leakage. At the same time, we identify that a recent approach that only uses checkpoints produced by an off-the-shelf private learning algorithm stands out as particularly suitable under DP. Further, we show that differential privacy does not just harm utility but also degrades selective classification performance. To analyze this effect across privacy levels, we propose a novel evaluation mechanism which isolate selective prediction performance across model utility levels. Our experimental results show that recovering the performance level attainable by non-private models is possible but comes at a considerable coverage cost as the privacy budget decreases.

翻訳日:2023-05-31 21:42:57 公開日:2023-05-28

# 不確かさ量化を用いた発音の良さを用いた構音障害児の音声明瞭度評価

Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification ( http://arxiv.org/abs/2305.18392v1 )

ライセンス: Link先を確認

Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung

(参考訳) 本稿では,不確実性定量化(UQ)を利用した変形性音声の自動明瞭度評価のための改良されたGoP(Goodness of Pronunciation)を提案する。現在のgop法は、ニューラルネットワークによる自信過剰な予測に大きく依存している。この問題を軽減するため、GoPではUQテクニックが使用された。 1)音素予測の正規化(エントロピー,マージン,maxlogit,logit-margin) 2)スコア関数の変更(スケーリング,事前正規化)。その結果、事前正規化されたmaxlogit gopは、英語、韓国語、タミル語のベースラインgopと比較して、それぞれ5.66%、3.91%、23.65%という高いパフォーマンスを達成している。さらに、音素分析を行い、どの音素スコアが各言語におけるインテリジェンススコアと大きく相関しているかを特定する。

This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes Uncertainty Quantification (UQ) for automatic speech intelligibility assessment for dysarthric speech. Current GoP methods rely heavily on neural network-driven overconfident predictions, which is unsuitable for assessing dysarthric speech due to its significant acoustic differences from healthy speech. To alleviate the problem, UQ techniques were used on GoP by 1) normalizing the phoneme prediction (entropy, margin, maxlogit, logit-margin) and 2) modifying the scoring function (scaling, prior normalization). As a result, prior-normalized maxlogit GoP achieves the best performance, with a relative increase of 5.66%, 3.91%, and 23.65% compared to the baseline GoP for English, Korean, and Tamil, respectively. Furthermore, phoneme analysis is conducted to identify which phoneme scores significantly correlate with intelligibility scores in each language.

翻訳日:2023-05-31 21:42:39 公開日:2023-05-28

# MemeGraphs: ミームを知識グラフにリンクする

MemeGraphs: Linking Memes to Knowledge Graphs ( http://arxiv.org/abs/2305.18391v1 )

ライセンス: Link先を確認

Vasiliki Kougia, Simon Fetzel, Thomas Kirchmair, Erion \c{C}ano, Sina Moayed Baharlou, Sahand Sharifzadeh, Benjamin Roth

(参考訳) ミームは、ソーシャルメディアやインターネット全般において、画像とテキストのモダリティを組み合わせることで、トレンドやアイデアを伝える一般的な形態である。ユーモアや皮肉を表現できるが、不快な内容を持つこともある。ミームの自動分析と分類は、その解釈が視覚要素、言語、背景知識の理解に依存しているため、難しい。したがって、ミーム全体を分類するために、これらのソースとそれらの相互作用を有意義に表現することが重要である。本研究では,映像をオブジェクトとその視覚的関係で表現するシーングラフと,トランスフォーマーアーキテクチャを用いたミーム分類のための構造化表現として知識グラフを提案する。提案手法を,ミームの学習(構造化)表現のみを用いるマルチモーダルモデルImgBERTと比較し,一貫した改善を観察する。さらに、自動生成されたグラフとエンティティリンクを比較した、人間のグラフアノテーションによるデータセットも提供します。分析により、人間のアノテーションよりも多くのエンティティをリンクする自動手法が示され、自動的に生成されたグラフはミームのヘイトフルネス分類に適していることが示された。

Memes are a popular form of communicating trends and ideas in social media and on the internet in general, combining the modalities of images and text. They can express humor and sarcasm but can also have offensive content. Analyzing and classifying memes automatically is challenging since their interpretation relies on the understanding of visual elements, language, and background knowledge. Thus, it is important to meaningfully represent these sources and the interaction between them in order to classify a meme as a whole. In this work, we propose to use scene graphs, that express images in terms of objects and their visual relations, and knowledge graphs as structured representations for meme classification with a Transformer-based architecture. We compare our approach with ImgBERT, a multimodal model that uses only learned (instead of structured) representations of the meme, and observe consistent improvements. We further provide a dataset with human graph annotations that we compare to automatically generated graphs and entity linking. Analysis shows that automatic methods link more entities than human annotators and that automatically generated graphs are better suited for hatefulness classification in memes.

翻訳日:2023-05-31 21:42:22 公開日:2023-05-28

# 予習変圧器における創発的モジュラリティ

Emergent Modularity in Pre-trained Transformers ( http://arxiv.org/abs/2305.18390v1 )

ライセンス: Link先を確認

Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou

(参考訳) この研究は、人間の脳によく見られる特徴であり、汎用知能に欠かせない機能である、事前訓練されたトランスフォーマーにおけるモジュラリティの存在を調べる。 1)ニューロンの機能的特殊化:各ニューロンが主に特定の機能に特化しているかどうかを評価し,その答えがイエスであることを確かめる。 2) 機能に基づくニューロングループ化: 機能によってニューロンをモジュールに分類する構造を探索し, 各モジュールが対応する機能のために機能する。考えられる膨大な量の構造を考えると、我々は期待できる候補としてMixture-of-Expertsに注目し、ニューロンを専門家に分割し、通常異なる入力に対して異なる専門家を活性化する。実験の結果,特定の機能に特化しているニューロンがクラスター化されている機能の専門家がいることがわかった。さらに、機能専門家のアクティベーションの摂動は、対応する機能に大きく影響する。最後に,事前学習中にモジュール構造がどのように出現するかを調べ,モジュール構造が早期に安定化し,ニューロン安定化よりも高速であることが判明した。トランスフォーマーはまずモジュール構造を構築し、次に細粒度のニューロン機能を学ぶことを示唆する。コードとデータはhttps://github.com/THUNLP/modularity-analysis.comで公開されています。

This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis.

翻訳日:2023-05-31 21:42:03 公開日:2023-05-28

# 機能学習ネットワークは、現実的なスケールで幅に一貫性がある

Feature-Learning Networks Are Consistent Across Widths At Realistic Scales ( http://arxiv.org/abs/2305.18411v1 )

ライセンス: Link先を確認

Nikhil Vyas, Alexander Atanasov, Blake Bordelon, Depen Morwani, Sabarish Sainathan, Cengiz Pehlevan

(参考訳) 様々なアーキテクチャやデータセットにわたる特徴学習ニューラルネットワークのダイナミクスに対する幅の影響について検討する。トレーニングの初期段階では、オンラインデータでトレーニングされた広いニューラルネットワークは、同じ損失曲線を持つだけでなく、トレーニングを通じてポイントワイズテスト予測にも同意している。 CIFAR-5mのような単純なタスクでは、これは現実的な幅のネットワークのトレーニングを通して行われる。また,内部表現,前活性化分布,安定性現象のエッジ,大きな学習速度効果などモデルの構造的性質が広い幅にわたって一致していることが示されている。これは、現実のモデルに見られる現象が無限幅、特徴学習の限界によって捉えられるという仮説を動機付ける。難しいタスク(イメージネットや言語モデリングなど)や後のトレーニング時間では、有限幅偏差は体系的に増加する。 2つの異なる効果は、これらの幅の偏差を引き起こす。まず、ネットワーク出力は、幅に逆らって初期化依存分散スケーリングを持ち、ネットワークをセンシングすることで除去できる。しかし、より狭いネットワークのアンサンブルは、単一のワイドネットワークよりも性能が劣っている。これを幅の狭いバイアスと呼ぶ。この有限幅バイアスの起源に関するスペクトル的な視点で結論付ける。

We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training. For simple tasks such as CIFAR-5m this holds throughout training for networks of realistic widths. We also show that structural properties of the models, including internal representations, preactivation distributions, edge of stability phenomena, and large learning rate effects are consistent across large widths. This motivates the hypothesis that phenomena seen in realistic models can be captured by infinite-width, feature-learning limits. For harder tasks (such as ImageNet and language modeling), and later training times, finite-width deviations grow systematically. Two distinct effects cause these deviations across widths. First, the network output has initialization-dependent variance scaling inversely with width, which can be removed by ensembling networks. We observe, however, that ensembles of narrower networks perform worse than a single wide network. We call this the bias of narrower width. We conclude with a spectral perspective on the origin of this finite-width bias.

翻訳日:2023-05-31 21:35:44 公開日:2023-05-28

# 乳がん生存の理解:マルチオミクスデータを用いた因果関係と言語モデルを用いて

Understanding Breast Cancer Survival: Using Causality and Language Models on Multi-omics Data ( http://arxiv.org/abs/2305.18410v1 )

ライセンス: Link先を確認

Mugariya Farooq, Shahad Hardan, Aigerim Zhumbhayeva, Yujia Zheng, Preslav Nakov, Kun Zhang

(参考訳) 医療におけるより有用で説明可能な機械学習モデルの必要性は、観察データの解析による因果関係の発見を目的とした因果発見アルゴリズムの開発と活用の重要性を高める。説明可能なアプローチは、臨床医や生物学者が疾患の予後を予測し、適切な治療を提案するのを助ける。しかし、因果発見、ゲノム学、乳癌の交差点での研究はほとんど行われておらず、このギャップを埋めることを目指しています。また,実データに対する因果関係が不明なため,実データにおける因果関係の発見手法の評価は一般には困難であり,そのために,大規模言語モデルを用いた評価問題に対処することを提案する。特に,乳がんと診断された患者の生存にゲノムの様々な摂動がどのように影響するかを調べるために,適切な因果発見アルゴリズムを利用する。我々は, PC, Greedy Equivalence Search (GES), Generalized Precision Matrixベースの3つの因果探索アルゴリズムを用いた。 The Cancer Genome Atlasのサブセットを実験し、705例の乳癌患者に対して、突然変異、コピー数の変化、タンパク質レベル、遺伝子発現に関する情報を含む。以上より,因果発見アルゴリズムを用いた患者の生命状態に関連する重要な因子が明らかになった。しかし、これらの結果の信頼性は医療分野でも懸念されている。それゆえ、この研究の別の貢献として、結果は、ブルーバートなどの生物医学文献で訓練された言語モデルと、医療コーパスで訓練された他の大きな言語モデルによって検証される。本研究は, 臨床応用における信頼性の高い因果関係を明らかにするために, 因果発見アルゴリズムと言語モデルの適切な利用を約束する。

The need for more usable and explainable machine learning models in healthcare increases the importance of developing and utilizing causal discovery algorithms, which aim to discover causal relations by analyzing observational data. Explainable approaches aid clinicians and biologists in predicting the prognosis of diseases and suggesting proper treatments. However, very little research has been conducted at the crossroads between causal discovery, genomics, and breast cancer, and we aim to bridge this gap. Moreover, evaluation of causal discovery methods on real data is in general notoriously difficult because ground-truth causal relations are usually unknown, and accordingly, in this paper, we also propose to address the evaluation problem with large language models. In particular, we exploit suitable causal discovery algorithms to investigate how various perturbations in the genome can affect the survival of patients diagnosed with breast cancer. We used three main causal discovery algorithms: PC, Greedy Equivalence Search (GES), and a Generalized Precision Matrix-based one. We experiment with a subset of The Cancer Genome Atlas, which contains information about mutations, copy number variations, protein levels, and gene expressions for 705 breast cancer patients. Our findings reveal important factors related to the vital status of patients using causal discovery algorithms. However, the reliability of these results remains a concern in the medical domain. Accordingly, as another contribution of the work, the results are validated through language models trained on biomedical literature, such as BlueBERT and other large language models trained on medical corpora. Our results profess proper utilization of causal discovery algorithms and language models for revealing reliable causal relations for clinical applications.

翻訳日:2023-05-31 21:35:09 公開日:2023-05-28

# 方向性指向多目的学習:単純で証明可能な確率的アルゴリズム

Direction-oriented Multi-objective Learning: Simple and Provable Stochastic Algorithms ( http://arxiv.org/abs/2305.18409v1 )

ライセンス: Link先を確認

Peiyao Xiao, Hao Ban, Kaiyi Ji

(参考訳) 多目的最適化(MOO)は、複数の基準による学習やマルチタスク学習(MTL)など、多くの機械学習問題において重要なフレームワークとなっている。本稿では,MTLにおける平均損失などの目的の線形結合を最適化する方向の近傍において,共通降下方向を正規化することにより,新たな方向指向多目的問題を提案する。この定式化には特殊ケースとしてGDとMGDAが含まれ、CAGradのような方向指向の利点を享受し、確率的アルゴリズムの設計を容易にする。そこで本研究では,SGD方式の簡易な更新による確率方向指向型多目的勾配降下(SDMGrad)と,目的数が大きければ効率的な客観的サンプリングを行うSDMGrad-OSを提案する。定数レベルの正則化パラメータ $\lambda$ に対して、SDMGrad と SDMGrad-OS がパレート定常点に確実に収束することを示す。増加する$\lambda$ に対して、この収束点は目的の線形結合の定常点に還元される。マルチタスク型教師付き学習と強化学習の一連の課題において提案手法の優れた性能を示す。コードはhttps://github.com/ml-opt-lab/sdmgrad.comで提供される。

Multi-objective optimization (MOO) has become an influential framework in many machine learning problems with multiple objectives such as learning with multiple criteria and multi-task learning (MTL). In this paper, we propose a new direction-oriented multi-objective problem by regularizing the common descent direction within a neighborhood of a direction that optimizes a linear combination of objectives such as the average loss in MTL. This formulation includes GD and MGDA as special cases, enjoys the direction-oriented benefit as in CAGrad, and facilitates the design of stochastic algorithms. To solve this problem, we propose Stochastic Direction-oriented Multi-objective Gradient descent (SDMGrad) with simple SGD type of updates, and its variant SDMGrad-OS with an efficient objective sampling in the setting where the number of objectives is large. For a constant-level regularization parameter $\lambda$, we show that SDMGrad and SDMGrad-OS provably converge to a Pareto stationary point with improved complexities and milder assumptions. For an increasing $\lambda$, this convergent point reduces to a stationary point of the linear combination of objectives. We demonstrate the superior performance of the proposed methods in a series of tasks on multi-task supervised learning and reinforcement learning. Code is provided at https://github.com/ml-opt-lab/sdmgrad.

翻訳日:2023-05-31 21:34:20 公開日:2023-05-28

# 分子マルチモーダルプリトレーニングのための群対称確率微分方程式モデル

A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining ( http://arxiv.org/abs/2305.18407v1 )

ライセンス: Link先を確認

Shengchao Liu, Weitao Du, Zhiming Ma, Hongyu Guo, Jian Tang

(参考訳) 分子事前トレーニングは、AIベースの薬物発見のパフォーマンスを高めるための、急速にゴーツースキーマになりつつある。当然、分子は2次元トポロジカルグラフや3次元幾何学的点雲として表すことができる。既存のほとんどの関連手法は単一のモダリティにのみ焦点をあてているが、最近の研究により、これらの2つのモダリティ間の相互情報(MI)の最大化は分子表現能力を高めることが示されている。一方、既存の分子のマルチモーダル事前学習は、トポロジーと幾何学から符号化された表現空間に基づいて近似MIに近づき、分子の臨界構造情報が失われる。この問題に対処するため,MoleculeSDEを提案する。分子SDEは群対称(SE(3)-等変および反射反対称)確率微分方程式モデルを利用して、2Dトポロジーから3次元幾何学を生成する。より厳密なMIバウンドを得るだけでなく、以前の作業よりも豊富な下流タスクを可能にする。プレトレーニングベースライン17点と比較することにより,32のダウンストリームタスク中26点において,MoleculeSDEが最先端のパフォーマンスを持つ表現表現を学習できることを実証的に検証する。

Molecule pretraining has quickly become the go-to schema to boost the performance of AI-based drug discovery. Naturally, molecules can be represented as 2D topological graphs or 3D geometric point clouds. Although most existing pertaining methods focus on merely the single modality, recent research has shown that maximizing the mutual information (MI) between such two modalities enhances the molecule representation ability. Meanwhile, existing molecule multi-modal pretraining approaches approximate MI based on the representation space encoded from the topology and geometry, thus resulting in the loss of critical structural information of molecules. To address this issue, we propose MoleculeSDE. MoleculeSDE leverages group symmetric (e.g., SE(3)-equivariant and reflection-antisymmetric) stochastic differential equation models to generate the 3D geometries from 2D topologies, and vice versa, directly in the input space. It not only obtains tighter MI bound but also enables prosperous downstream tasks than the previous work. By comparing with 17 pretraining baselines, we empirically verify that MoleculeSDE can learn an expressive representation with state-of-the-art performance on 26 out of 32 downstream tasks.

翻訳日:2023-05-31 21:33:48 公開日:2023-05-28

# マイクロチャネルにおける熱伝達係数の予測に対する機械学習アプローチ

A machine learning approach to the prediction of heat-transfer coefficients in micro-channels ( http://arxiv.org/abs/2305.18406v1 )

ライセンス: Link先を確認

Tullio Traverso, Francesco Coletti, Luca Magri, Tassos G. Karayiannis, Omar K. Matar

(参考訳) 小型熱交換器の最適設計と運転には, 作業流体, チャネルジオメトリー, プロセス条件の関数としての二相熱伝達係数(HTC)の正確な予測が重要である。人工知能研究の進歩は、HTCのデータ駆動サロゲートモデルを得るための機械学習(ML)アルゴリズムの適用を最近強化した。ほとんどの教師付き学習アルゴリズムでは、そのタスクは非線形回帰問題である。これらのモデルは従来の経験的相関よりも優れていることが証明されているにもかかわらず、データの過度な適合、不確実性推定の欠如、結果の解釈可能性といった重要な制限がある。これらの制約に対処するために,本稿では,多出力ガウス過程回帰(gpr)を用いて,マイクロチャネル内のhtcを質量流量,熱流束,システム圧力,チャネル径,長さの関数として推定する。モデルは高忠実度実験データのBrunel Two-Phase Flowデータベースを用いて訓練される。 GPRの利点は、データ効率、トレーニング対象のハイパーパラメータ(典型的には入力次元の数と同じ順序)の少なさ、および限界可能性の最大化によって保証されるデータ適合とモデル複雑性の間の自動トレードオフ(ベイズ的アプローチ)である。本稿では,外挿におけるGPRモデルの性能向上のための研究指針を提案する。

The accurate prediction of the two-phase heat transfer coefficient (HTC) as a function of working fluids, channel geometries and process conditions is key to the optimal design and operation of compact heat exchangers. Advances in artificial intelligence research have recently boosted the application of machine learning (ML) algorithms to obtain data-driven surrogate models for the HTC. For most supervised learning algorithms, the task is that of a nonlinear regression problem. Despite the fact that these models have been proven capable of outperforming traditional empirical correlations, they have key limitations such as overfitting the data, the lack of uncertainty estimation, and interpretability of the results. To address these limitations, in this paper, we use a multi-output Gaussian process regression (GPR) to estimate the HTC in microchannels as a function of the mass flow rate, heat flux, system pressure and channel diameter and length. The model is trained using the Brunel Two-Phase Flow database of high-fidelity experimental data. The advantages of GPR are data efficiency, the small number of hyperparameters to be trained (typically of the same order of the number of input dimensions), and the automatic trade-off between data fit and model complexity guaranteed by the maximization of the marginal likelihood (Bayesian approach). Our paper proposes research directions to improve the performance of the GPR-based model in extrapolation.

翻訳日:2023-05-31 21:33:27 公開日:2023-05-28

# Dink-Net: 大きなグラフ上のニューラルクラスタリング

Dink-Net: Neural Clustering on Large Graphs ( http://arxiv.org/abs/2305.18405v1 )

ライセンス: Link先を確認

Yue Liu, Ke Liang, Jun Xia, Sihang Zhou, Xihong Yang, Xinwang Liu, Stan Z. Li

(参考訳) ディープグラフクラスタリング(ディープグラフクラスタリング)は、グラフのノードをディープニューラルネットワークで結合しないクラスタにグループ化することを目的としている。しかし、既存の方法は百万のノードを持つ大きなグラフにスケールできない。この問題を解決するために,拡張と縮小という概念を用いてスケーラブルなディープグラフクラスタリング手法(Dink-Net)を提案する。まず、ノードを識別することにより、拡張によって劣化しても、自己教師された方法で表現が学習される。一方、クラスタセンターは学習可能なニューラルネットワークパラメータとして初期化される。次に、提案するクラスタ拡張損失とクラスタ縮小損失を逆方向に最小化することにより、クラスタリング分布を最適化する。これらの設定により、2段階のクラスタリング、すなわち表現学習とクラスタリング最適化をエンドツーエンドフレームワークに統合し、ネットワークにクラスタリングに優しい機能を学習させる。さらに、dink-netは、設計された損失関数がミニバッチデータを採用して、パフォーマンス低下なしにもクラスタリング分布を最適化するため、大きなグラフによくスケールする。実験結果と理論的解析はともに本手法の優越性を示している。ランナアップと比較して、Dink-Netは1億1100万ノードと16億エッジを持つogbn-papers100Mデータセットで9.62%のNMI改善を達成した。ソースコードはhttps://github.com/yueliu 1999/Dink-Netで公開されている。さらに、ディープグラフクラスタリングのコレクション(ペーパー、コード、データセット)はhttps://github.com/yueliu 1999/Awesome-Deep-Graph-Clusteringで共有されている。

Deep graph clustering, which aims to group the nodes of a graph into disjoint clusters with deep neural networks, has achieved promising progress in recent years. However, the existing methods fail to scale to the large graph with million nodes. To solve this problem, a scalable deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink. Firstly, by discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner. Meanwhile, the cluster centres are initialized as learnable neural parameters. Subsequently, the clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss in an adversarial manner. By these settings, we unify the two-step clustering, i.e., representation learning and clustering optimization, into an end-to-end framework, guiding the network to learn clustering-friendly features. Besides, Dink-Net scales well to large graphs since the designed loss functions adopt the mini-batch data to optimize the clustering distribution even without performance drops. Both experimental results and theoretical analyses demonstrate the superiority of our method. Compared to the runner-up, Dink-Net achieves 9.62% NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges. The source code is released at https://github.com/yueliu1999/Dink-Net. Besides, a collection (papers, codes, and datasets) of deep graph clustering is shared at https://github.com/yueliu1999/Awesome-Deep-Graph-Clustering.

翻訳日:2023-05-31 21:33:04 公開日:2023-05-28

# 複数質問応答のための大規模言語モデルによるコンフォーマル予測

Conformal Prediction with Large Language Models for Multi-Choice Question Answering ( http://arxiv.org/abs/2305.18404v1 )

ライセンス: Link先を確認

Bhawesh Kumar, Charlie Lu, Gauri Gupta, Anil Palepu, David Bellamy, Ramesh Raskar, Andrew Beam

(参考訳) 大規模言語モデルが広く開発され続けるにつれて、ロバストな不確実性定量化技術が、高スループットシナリオにおける安全なデプロイメントに不可欠になる。本研究では,複数質問応答の特定のタスクに対して,共形予測を用いて言語モデルに不確かさの定量化を行う方法について検討する。共形予測からの不確実性推定は予測精度と密接に相関していることがわかった。この観測は、選択分類や低品質予測のフィルタリングといった下流の応用に有用である。また,共形予測が主観的疑問に求める交換可能性の仮定についても検討し,多くの実用的応用においてより現実的なシナリオとなる可能性について考察した。我々の研究は、エラー率の確実な保証が必要な安全クリティカルな状況において、より信頼性が高く信頼性の高い大規模言語モデルの活用に寄与する。

As large language models continue to be widely developed, robust uncertainty quantification techniques will become crucial for their safe deployment in high-stakes scenarios. In this work, we explore how conformal prediction can be used to provide uncertainty quantification in language models for the specific task of multiple-choice question-answering. We find that the uncertainty estimates from conformal prediction are tightly correlated with prediction accuracy. This observation can be useful for downstream applications such as selective classification and filtering out low-quality predictions. We also investigate the exchangeability assumption required by conformal prediction to out-of-subject questions, which may be a more realistic scenario for many practical applications. Our work contributes towards more trustworthy and reliable usage of large language models in safety-critical situations, where robust guarantees of error rate are required.

翻訳日:2023-05-31 21:32:39 公開日:2023-05-28

# 低ランクパラメータ効率のファインチューニングを実現するPruning

Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2305.18403v1 )

ライセンス: Link先を確認

Mingyang Zhang and Haozhen and Chunhua Shen and Zhen Yang and Linlin Ou and Xinyi Yu and Bohan Zhuang

(参考訳) LLaMAやViT-Gのような大型の事前学習モデル(LPM)は、様々なタスクにおいて例外的な性能を示している。これらの大規模モデルを下流タスクで安価に微調整するためにパラメータ効率の良い微調整(peft)が登場したが、その展開は巨大なモデルスケールと計算コストによって依然として妨げられている。ニューラルネットワークのプルーニングは冗長パラメータを除去することでモデル圧縮のソリューションを提供するが、既存の手法のほとんどはパラメータ勾配の計算に依存している。しかし、勾配を求めることは、代替アプローチの探索を必要とするLPMに対して計算的に禁じられている。そこで我々は,LoRAPrune と呼ばれる LPM の微細調整と展開を効率的に行うための統一的なフレームワークを提案する。重要度推定のための事前学習パラメータの勾配ではなく,低ランク適応(lora)の値と勾配を利用するペフトアウェアプルーニング基準をまず設計する。次に,PEFTの利点を最大化しつつ,冗長パラメータを除去する反復的プルーニング手法を提案する。そこで,我々のLoRAPruneは,効率的な推論のための高精度でコンパクトなモデルを提供する。各種課題に対する実験結果から,本手法が最先端の成果をもたらすことを示す。例えば、VTAB-1kベンチマークでは、LoRAPruneはトレーニング可能なパラメータのわずか0.76%しか使用せず、それぞれ5.7%と4.3%のTop-1精度を達成している。さらに,peft法と同等の性能を達成し,pruningの利点を享受しながら高品質な結果を提供する効果を強調する。

Large pre-trained models (LPMs), such as LLaMA and ViT-G, have shown exceptional performance across various tasks. Although parameter-efficient fine-tuning (PEFT) has emerged to cheaply fine-tune these large models on downstream tasks, their deployment is still hindered by the vast model scale and computational costs. Neural network pruning offers a solution for model compression by removing redundant parameters, but most existing methods rely on computing parameter gradients. However, obtaining the gradients is computationally prohibitive for LPMs, which necessitates the exploration of alternative approaches. To this end, we propose a unified framework for efficient fine-tuning and deployment of LPMs, termed LoRAPrune. We first design a PEFT-aware pruning criterion, which utilizes the values and gradients of Low-Rank Adaption (LoRA), rather than the gradients of pre-trained parameters for importance estimation. We then propose an iterative pruning procedure to remove redundant parameters while maximizing the advantages of PEFT. Thus, our LoRAPrune delivers an accurate, compact model for efficient inference in a highly cost-effective manner. Experimental results on various tasks demonstrate that our method achieves state-of-the-art results. For instance, in the VTAB-1k benchmark, LoRAPrune utilizes only 0.76% of the trainable parameters and outperforms magnitude and movement pruning methods by a significant margin, achieving a mean Top-1 accuracy that is 5.7% and 4.3% higher, respectively. Moreover, our approach achieves comparable performance to PEFT methods, highlighting its efficacy in delivering high-quality results while benefiting from the advantages of pruning.

翻訳日:2023-05-31 21:32:26 公開日:2023-05-28

# neural sculpting: pruning と network analysis による階層的モジュラーなタスク構造を明らかにする

Neural Sculpting: Uncovering hierarchically modular task structure through pruning and network analysis ( http://arxiv.org/abs/2305.18402v1 )

ライセンス: Link先を確認

Shreyas Malakarjun Patil, Loizos Michael, Constantine Dovrolis

(参考訳) 自然な対象関数とタスクは通常、階層的なモジュール構造を示す - 階層構造にまとめられた、より単純なサブ関数に分解できる。このようなサブ関数には2つの重要な特徴がある:それらは異なる入力セット(入力分離性)を持ち、階層(再利用性)において高い入力として再利用される。従来の研究では、階層的にモジュール化されたニューラルネットワークは本質的に疎結合であり、学習効率、一般化、マルチタスク学習、転送可能性などの利点がある。しかし、与えられたタスクの下位部分関数とその階層構造を特定することは困難である。この作業の高レベルな疑問は、十分に深いニューラルネットワークを使ってタスクを学習すれば、そのタスクの下位機能階層をどうやって見つけられるのか、ということです。まず,タスクが階層的にモジュール化されているかどうかを判断し易いブール関数の領域について検討する。本稿では,繰り返し単位とエッジプルーニング(訓練中)に基づくアプローチと,モジュール検出と階層推論のためのネットワーク解析の組み合わせを提案する。最後に, この手法により, MNIST桁データセットに基づく幅広いブール関数と2つの視覚タスクの階層的モジュラリティを明らかにすることができることを示す。

Natural target functions and tasks typically exhibit hierarchical modularity - they can be broken down into simpler sub-functions that are organized in a hierarchy. Such sub-functions have two important features: they have a distinct set of inputs (input-separability) and they are reused as inputs higher in the hierarchy (reusability). Previous studies have established that hierarchically modular neural networks, which are inherently sparse, offer benefits such as learning efficiency, generalization, multi-task learning, and transferability. However, identifying the underlying sub-functions and their hierarchical structure for a given task can be challenging. The high-level question in this work is: if we learn a task using a sufficiently deep neural network, how can we uncover the underlying hierarchy of sub-functions in that task? As a starting point, we examine the domain of Boolean functions, where it is easier to determine whether a task is hierarchically modular. We propose an approach based on iterative unit and edge pruning (during training), combined with network analysis for module detection and hierarchy inference. Finally, we demonstrate that this method can uncover the hierarchical modularity of a wide range of Boolean functions and two vision tasks based on the MNIST digits dataset.

翻訳日:2023-05-31 21:31:54 公開日:2023-05-28

# 信頼あるフェデレーション学習における保護メカニズムの調整のためのメタラーニングフレームワーク

A Meta-learning Framework for Tuning Parameters of Protection Mechanisms in Trustworthy Federated Learning ( http://arxiv.org/abs/2305.18400v1 )

ライセンス: Link先を確認

Xiaojin Zhang, Yan Kang, Lixin Fan, Kai Chen, Qiang Yang

(参考訳) 信頼できるフェデレートラーニング(TFL)は通常、プライバシを保証するために保護メカニズムを活用する。しかし、保護機構は必然的にデータプライバシを保護しながら、ユーティリティ損失や効率の低下をもたらす。したがって、保護機構とそのパラメータは、 \textit{privacy leakage} と \textit{utility loss} と \textit{efficiency reduction} の最適なトレードオフを打つために慎重に選択する必要がある。この目的のために、フェデレートされた学習実践者は、3つの要因を測定し、それらの間のトレードオフを最適化し、目の前のアプリケーションに最も適した保護メカニズムを選択するツールが必要である。本稿では,(1) プライバシー漏洩, ユーティリティ損失, 効率低下のトレードオフを最適化する保護機構の発見問題として, TFL を定式化する枠組みを提案し, (2) 3つの要因の有界測定を正式に定義する。次に,この最適化問題を近似するメタラーニングアルゴリズムを提案し,ランダム化,準同型暗号,秘密共有,圧縮といった代表的な保護機構の最適保護パラメータを求める。さらに,これらの最適保護パラメータを実用的な水平連関学習設定で定量化するための推定アルゴリズムの設計を行い,推定誤差の理論的解析を行う。

Trustworthy Federated Learning (TFL) typically leverages protection mechanisms to guarantee privacy. However, protection mechanisms inevitably introduce utility loss or efficiency reduction while protecting data privacy. Therefore, protection mechanisms and their parameters should be carefully chosen to strike an optimal tradeoff between \textit{privacy leakage}, \textit{utility loss}, and \textit{efficiency reduction}. To this end, federated learning practitioners need tools to measure the three factors and optimize the tradeoff between them to choose the protection mechanism that is most appropriate to the application at hand. Motivated by this requirement, we propose a framework that (1) formulates TFL as a problem of finding a protection mechanism to optimize the tradeoff between privacy leakage, utility loss, and efficiency reduction and (2) formally defines bounded measurements of the three factors. We then propose a meta-learning algorithm to approximate this optimization problem and find optimal protection parameters for representative protection mechanisms, including Randomization, Homomorphic Encryption, Secret Sharing, and Compression. We further design estimation algorithms to quantify these found optimal protection parameters in a practical horizontal federated learning setting and provide a theoretical analysis of the estimation error.

翻訳日:2023-05-31 21:31:33 公開日:2023-05-28

# HyperTime: 時間分布シフトの圧縮のためのハイパーパラメータ最適化

HyperTime: Hyperparameter Optimization for Combating Temporal Distribution Shifts ( http://arxiv.org/abs/2305.18421v1 )

ライセンス: Link先を確認

Shaokun Zhang, Yiran Wu, Zhonghua Zheng, Qingyun Wu, Chi Wang

(参考訳) 本研究では,未確認試験データ中の時間分布変化に対して頑健な超パラメータを求めるために,超パラメータ最適化法である \emph{HyperTime} を提案する。我々の研究は、多くの場合、ハイパーパラメータ最適化によって時間的に堅牢な予測性能を達成することができるという重要な観察によって動機付けられている。この観察に基づいて,このような強固なハイパーパラメータ構成を見つけるのに役立つロバスト最適化文献から,'worst-case-oriented' という哲学を活用した。 hypertimeは、平均検証損失と、時系列検証セットに対する最悪の検証損失に対して、辞書の優先順位を課す。提案手法の独特な利点を明らかにするために, 期待されるテスト損失の上限を理論的に解析する。また,時間分布シフトを伴う複数の機械学習タスクにおいて,提案手法の強い経験的性能を示す。

In this work, we propose a hyperparameter optimization method named \emph{HyperTime} to find hyperparameters robust to potential temporal distribution shifts in the unseen test data. Our work is motivated by an important observation that it is, in many cases, possible to achieve temporally robust predictive performance via hyperparameter optimization. Based on this observation, we leverage the `worst-case-oriented' philosophy from the robust optimization literature to help find such robust hyperparameter configurations. HyperTime imposes a lexicographic priority order on average validation loss and worst-case validation loss over chronological validation sets. We perform a theoretical analysis on the upper bound of the expected test loss, which reveals the unique advantages of our approach. We also demonstrate the strong empirical performance of the proposed method on multiple machine learning tasks with temporal distribution shifts.

翻訳日:2023-05-31 21:25:50 公開日:2023-05-28

# 分散再現型ロバストQ-ラーニングのサンプル複雑度

Sample Complexity of Variance-reduced Distributionally Robust Q-learning ( http://arxiv.org/abs/2305.18420v1 )

ライセンス: Link先を確認

Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou

(参考訳) 分布シフト下での動的意思決定は、強化学習の理論と応用において基本的な関心事であり、データが収集される環境の分布は、モデルがデプロイされる環境と異なる可能性がある。本稿では,分布的変化にもかかわらずロバストなポリシを効果的に学習できる,分布的ロバストなq-learningアルゴリズムと分散低減アルゴリズムについて述べる。これらのアルゴリズムは、Kulback-Leiblerの不確実性を伴う無限水平$\gamma$-discounted robust Markov決定過程の$q$関数を、エントリワイズ$\epsilon$-degreeの精度で効率的に近似するように設計されている。さらに,分散還元分布ロバストなq-learningは,同期q-learningと分散還元技術を組み合わせて,その性能を向上させる。その結果,$s$ と $a$ が状態空間と作用空間を表す場合,$\tilde o(|s||a|(1-\gamma)^{-4}\epsilon^{-2})$ の上限値のminmaxサンプル複雑性が得られる。これは不確実性サイズ$\delta$から独立した最初の複雑性結果であり、新しい複雑性理論的な洞察を提供する。さらに、一連の数値実験により、分布シフトを扱うアルゴリズムの理論的知見と効率が確認された。

Dynamic decision making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment on which the data is collected can differ from that of the environment on which the model is deployed. This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced counterpart, that can effectively learn a robust policy despite distributional shifts. These algorithms are designed to efficiently approximate the $q$-function of an infinite-horizon $\gamma$-discounted robust Markov decision process with Kullback-Leibler uncertainty set to an entry-wise $\epsilon$-degree of precision. Further, the variance-reduced distributionally robust Q-learning combines the synchronous Q-learning with variance-reduction techniques to enhance its performance. Consequently, we establish that it attains a minmax sample complexity upper bound of $\tilde O(|S||A|(1-\gamma)^{-4}\epsilon^{-2})$, where $S$ and $A$ denote the state and action spaces. This is the first complexity result that is independent of the uncertainty size $\delta$, thereby providing new complexity theoretic insights. Additionally, a series of numerical experiments confirm the theoretical findings and the efficiency of the algorithms in handling distributional shifts.

翻訳日:2023-05-31 21:25:35 公開日:2023-05-28

# 双方向言語モデルによるセマンティックセグメンテーションによる長期ASRの改善

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR ( http://arxiv.org/abs/2305.18419v1 )

ライセンス: Link先を確認

W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath

(参考訳) 音声中の意味論的完全文を分離し,長文音声を分割する手法を提案する。これにより、ASRデコーダは不要に遠くのコンテキストを処理できなくなると同時に、現在の文内で関連するコンテキストが失われることを防ぐことができる。意味論的に完全な文境界は典型的には句読点によって区切られるが、残念ながら実世界の発話には句読点がほとんど含まれない。本研究は,文章・句読点に基づく双方向教師言語モデル(LM)から句読点知識を抽出することにより,この制限に対処する。本研究は, LM教師から蒸留したセグメンタと, 他の作品で使用されている音響ポーズベースの教師から蒸留したセグメンタとを, ストリーミングASRパイプラインで比較した。当社のsegmenterを使ったパイプラインは、youtubeのキャプションタスクにおいて、平均60msのレイテンシ削減とともに、平均3.2%のwarゲインを達成しています。

We propose a method of segmenting long-form speech by separating semantically complete sentences within the utterance. This prevents the ASR decoder from needlessly processing faraway context while also preventing it from missing relevant context within the current sentence. Semantically complete sentence boundaries are typically demarcated by punctuation in written text; but unfortunately, spoken real-world utterances rarely contain punctuation. We address this limitation by distilling punctuation knowledge from a bidirectional teacher language model (LM) trained on written, punctuated text. We compare our segmenter, which is distilled from the LM teacher, against a segmenter distilled from a acoustic-pause-based teacher used in other works, on a streaming ASR pipeline. The pipeline with our segmenter achieves a 3.2% relative WER gain along with a 60 ms median end-of-segment latency reduction on a YouTube captioning task.

翻訳日:2023-05-31 21:25:09 公開日:2023-05-28

# ビデオ連続学習のための時間情報の再検討

Just a Glimpse: Rethinking Temporal Information for Video Continual Learning ( http://arxiv.org/abs/2305.18418v1 )

ライセンス: Link先を確認

Lama Alssum, Juan Leon Alcazar, Merey Ramazanova, Chen Zhao, Bernard Ghanem

(参考訳) クラス増分学習は、現実世界のアプリケーションシナリオによく似ているため、継続的学習の研究において最も重要な設定の1つである。メモリサイズが制限されると、クラスやタスクの数が増えると、壊滅的な忘れることになる。ビデオ領域での継続的な学習は、ビデオデータが大量のフレームを含んでいるため、リプレイメモリにより高い負担がかかるため、さらに課題となる。現在の一般的なプラクティスは、ビデオストリームからサブサンプルのフレームをリプレイメモリに格納することです。本稿では,個別フレームに基づく効果的なビデオ連続学習のための新しい再生機構SMILEを提案する。広範にわたる実験により,映像の多様性は時間的情報よりも重要な役割を担っていることが明らかとなった。そこで本手法は,多数の一意なビデオを表す少数のフレームから学習することに焦点を当てている。 3つの代表的なビデオデータセット、kinetics, ucf101, activitynetにおいて、提案手法は最先端の性能を最大21.49%向上させた。

Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles real-world application scenarios. With constrained memory sizes, catastrophic forgetting arises as the number of classes/tasks increases. Studying continual learning in the video domain poses even more challenges, as video data contains a large number of frames, which places a higher burden on the replay memory. The current common practice is to sub-sample frames from the video stream and store them in the replay memory. In this paper, we propose SMILE a novel replay mechanism for effective video continual learning based on individual/single frames. Through extensive experimentation, we show that under extreme memory constraints, video diversity plays a more significant role than temporal information. Therefore, our method focuses on learning from a small number of frames that represent a large number of unique videos. On three representative video datasets, Kinetics, UCF101, and ActivityNet, the proposed method achieves state-of-the-art performance, outperforming the previous state-of-the-art by up to 21.49%.

翻訳日:2023-05-31 21:24:53 公開日:2023-05-28

# 配電系統の一般化を支援する格子符号上の決定点プロセスの注意

Determinantal Point Process Attention Over Grid Codes Supports Out of Distribution Generalization ( http://arxiv.org/abs/2305.18417v1 )

ライセンス: Link先を確認

Shanka Subhra Mondal, Steven Frankland, Taylor Webb, and Jonathan D. Cohen

(参考訳) ディープニューラルネットワークは、人間のような知性をエミュレートする上で大きな進歩を遂げており、脳がそれに依存する複雑な計算問題をどう解決するかを理解する方法として、ますます使われている。しかし、これらはまだ不足しているため、脳が人間の能力の強い一般化をサポートする方法についての洞察を得られていない。そのようなケースの1つは、out-of-distribution (ood) generalization - トレーニングセットの配布外にあるテスト例での成功したパフォーマンスである。ここでは、この能力に寄与する可能性のある脳内処理の特性を同定する。本稿では,ood一般化を実現するために,神経計算の具体的特徴を浮き彫りにした2部アルゴリズムについて述べるとともに,二つの難解な認知タスクにおける性能評価による概念実証を提供する。まず、哺乳類の脳がグリッドのような表現(例えば、円錐皮質)を用いて計量空間を表すという事実を描き出す: 表現空間をカバーする繰り返しモチーフで組織された関係構造の抽象表現。次に,DPP-A(Determinantal Point Process)を用いて,これらのグリッド表現上での注意機構を提案する。本稿では,標準タスク最適化エラーと DPP-A を併用した損失関数がグリッド符号の繰り返しモチーフを利用でき,共通アーキテクチャと統合してアナログおよび算術タスクのOOD一般化性能を向上できることを示す。これは、哺乳類の脳におけるグリッドコードがどのように一般化性能に寄与するかの解釈と、ニューラルネットワークにおけるそのような能力を改善する潜在的な手段の両方を提供する。

Deep neural networks have made tremendous gains in emulating human-like intelligence, and have been used increasingly as ways of understanding how the brain may solve the complex computational problems on which this relies. However, these still fall short of, and therefore fail to provide insight into how the brain supports strong forms of generalization of which humans are capable. One such case is out-of-distribution (OOD) generalization -- successful performance on test examples that lie outside the distribution of the training set. Here, we identify properties of processing in the brain that may contribute to this ability. We describe a two-part algorithm that draws on specific features of neural computation to achieve OOD generalization, and provide a proof of concept by evaluating performance on two challenging cognitive tasks. First we draw on the fact that the mammalian brain represents metric spaces using grid-like representations (e.g., in entorhinal cortex): abstract representations of relational structure, organized in recurring motifs that cover the representational space. Second, we propose an attentional mechanism that operates over these grid representations using determinantal point process (DPP-A) -- a transformation that ensures maximum sparseness in the coverage of that space. We show that a loss function that combines standard task-optimized error with DPP-A can exploit the recurring motifs in grid codes, and can be integrated with common architectures to achieve strong OOD generalization performance on analogy and arithmetic tasks. This provides both an interpretation of how grid codes in the mammalian brain may contribute to generalization performance, and at the same time a potential means for improving such capabilities in artificial neural networks.

翻訳日:2023-05-31 21:24:35 公開日:2023-05-28

# インメモリコンピューティングにおける多様なハードウェアノイズを軽減するバッチノルム最適化の役割と限界の検討

Examining the Role and Limits of Batchnorm Optimization to Mitigate Diverse Hardware-noise in In-memory Computing ( http://arxiv.org/abs/2305.18416v1 )

ライセンス: Link先を確認

Abhiroop Bhattacharjee, Abhishek Moitra, Youngeun Kim, Yeshwanth Venkatesha, and Priyadarshini Panda

(参考訳) アナログクロスバーなどのインメモリコンピューティング(imc)プラットフォームは、高面積および計算効率の低精度ディープニューラルネットワーク(dnn)の高速化を促進するため、注目されている。しかし、しばしば非決定論的かつ非線形であるクロスバーの固有の非理想性は、デプロイされたdnnの性能を低下させる。量子化誤差に加えて、推論中に最も頻繁に遭遇する非理想性には、クロスバー回路レベルの寄生抵抗や、確率的読み取りノイズや時間ドリフトのようなデバイスレベルの非理想性が含まれる。本研究では,これら非理想性がアナログクロスバーのドット生成操作に与える影響を詳細に検討し,非理想性の影響を軽減するために,バッチノルムパラメータのクロスバーアウェア微調整により,ほぼトレーニングレスな解の実現可能性を検討することを目的とする。これにより、メモリとトレーニングエネルギーの観点からハードウェアコストを削減し、クロスバー上のDNN重みの再トレーニングをIMCが認識する。

In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area- & compute-efficiencies. However, the intrinsic non-idealities in crossbars, which are often non-deterministic and non-linear, degrade the performance of the deployed DNNs. In addition to quantization errors, most frequently encountered non-idealities during inference include crossbar circuit-level parasitic resistances and device-level non-idealities such as stochastic read noise and temporal drift. In this work, our goal is to closely examine the distortions caused by these non-idealities on the dot-product operations in analog crossbars and explore the feasibility of a nearly training-less solution via crossbar-aware fine-tuning of batchnorm parameters in real-time to mitigate the impact of the non-idealities. This enables reduction in hardware costs in terms of memory and training energy for IMC noise-aware retraining of the DNN weights on crossbars.

翻訳日:2023-05-31 21:24:06 公開日:2023-05-28

# 幾何代数変換器

Geometric Algebra Transformers ( http://arxiv.org/abs/2305.18415v1 )

ライセンス: Link先を確認

Johann Brehmer, Pim de Haan, S\"onke Behrends, Taco Cohen

(参考訳) 幾何学的データに関わる問題は、コンピュータビジョン、ロボティクス、化学、物理学など様々な分野で発生する。このようなデータは、点、方向ベクトル、平面、変換などの多くの形式を取ることができるが、これまでは、それらの対称性を尊重しながら、そのような様々な幾何学的タイプに適用できる単一のアーキテクチャは存在しない。本稿では,幾何学データのための汎用アーキテクチャであるGeometric Algebra Transformer (GATr)を紹介する。 GATrは射影幾何学代数における入力、出力、隠れ状態を表し、共通幾何学的対象の16次元ベクトル空間表現とそれらに作用する作用素を提供する。 GATr は E(3) に対して同変であり、3次元ユークリッド空間の対称性群である。トランスとしては、GATrはスケーラブルで表現力があり、多用途である。 n体モデリングとロボット計画の実験では、GATrは非幾何学的ベースラインよりも強力な改善を示している。

Problems involving geometric data arise in a variety of fields, including computer vision, robotics, chemistry, and physics. Such data can take numerous forms, such as points, direction vectors, planes, or transformations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric algebra, which offers an efficient 16-dimensional vector space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a transformer, GATr is scalable, expressive, and versatile. In experiments with n-body modeling and robotic planning, GATr shows strong improvements over non-geometric baselines.

翻訳日:2023-05-31 21:23:48 公開日:2023-05-28

# StEik: ニューラルサイン付き距離関数の最適化と有限形状表現の安定化

StEik: Stabilizing the Optimization of Neural Signed Distance Functions and Finer Shape Representation ( http://arxiv.org/abs/2305.18414v1 )

ライセンス: Link先を確認

Huizong Yang, Yuxin Sun, Ganesh Sundaramoorthi, Anthony Yezzi

(参考訳) 形態の暗黙的神経表現(INR)を学習するための新しい知見と新しいパラダイム(StEik)を提案する。特に,INRに符号付き距離関数制約を課すのによく使われるエイコナール損失に光を当てた。ネットワークの表現力が増加するにつれて、最適化は連続極限における偏微分方程式(PDE)に近づき、不安定となることを示す。この不安定性は, 既設のネットワーク最適化において発現し, 再構成表面の不規則性や, 局所的局所最小値への収束を招き, 微妙な幾何学的・位相的構造を捉えることができないことを示す。我々は、現在文献で使われている損失に付加された他の用語が、実際にこれらの不安定性を排除することができるかを分析的に示す。しかし、そのような用語は表面を過度に規則化することができ、微細な形状の表現を妨げている。同様の連続体極限のpde理論に基づき、固有不安定性は相反するが過剰正規化はしない新しい正規化項を導入する。さらに, 安定度は連続限界で保証されているため, この安定化により, より微細な形状の細部を表現できる新しいネットワーク構造も検討できる。このような構造を二次層に導入する。複数のベンチマークデータセットの実験により、我々の新しい正規化とネットワークは、既存の最先端技術よりも正確な形状の詳細と正確なトポロジを捉えることができることが示された。

We present new insights and a novel paradigm (StEik) for learning implicit neural representations (INR) of shapes. In particular, we shed light on the popular eikonal loss used for imposing a signed distance function constraint in INR. We show analytically that as the representation power of the network increases, the optimization approaches a partial differential equation (PDE) in the continuum limit that is unstable. We show that this instability can manifest in existing network optimization, leading to irregularities in the reconstructed surface and/or convergence to sub-optimal local minima, and thus fails to capture fine geometric and topological structure. We show analytically how other terms added to the loss, currently used in the literature for other purposes, can actually eliminate these instabilities. However, such terms can over-regularize the surface, preventing the representation of fine shape detail. Based on a similar PDE theory for the continuum limit, we introduce a new regularization term that still counteracts the eikonal instability but without over-regularizing. Furthermore, since stability is now guaranteed in the continuum limit, this stabilization also allows for considering new network structures that are able to represent finer shape detail. We introduce such a structure based on quadratic layers. Experiments on multiple benchmark data sets show that our new regularization and network are able to capture more precise shape details and more accurate topology than existing state-of-the-art.

翻訳日:2023-05-31 21:23:34 公開日:2023-05-28

# APIから学ぶ: Black-Box Data-Free Meta-Learning

Learning to Learn from APIs: Black-Box Data-Free Meta-Learning ( http://arxiv.org/abs/2305.18413v1 )

ライセンス: Link先を確認

Zixuan Hu, Li Shen, Zhenyi Wang, Baoyuan Wu, Chun Yuan, Dacheng Tao

(参考訳) data-free meta-learning(dfml)の目的は、トレーニングデータにアクセスせずに事前学習されたモデルの集合からメタラーニングすることで、新しいタスクの効率的な学習を可能にすることである。既存のDFML作業はメタ学習しかできない (i)ホワイトボックス、及び (ii)小規模事前訓練モデル (iii)同じアーキテクチャで、任意のモデルアーキテクチャと内部のモデルスケールを備えたAPIへの推論アクセスしか持たない、より実用的な設定を無視します。本稿では,ブラックボックスapiの集合から単一メタモデルへ,より汎用的なメタ知識を転送するためのbi-level data-free meta knowledge distillation (bidf-mkd)フレームワークを提案する。具体的には、APIを照会するだけで、各APIを逆転して、ゼロ階勾配推定器を介してトレーニングデータを回復し、新しい二段階メタ知識蒸留構造を用いてメタラーニングを行い、境界クエリセットの回復手法を設計して、決定境界付近のより情報的なクエリセットを復元する。また,限られたAPI予算の設定内での一般化を促進するため,より補間されたタスクをカバーし,タスク分布の多様化を図るタスクメモリ再生を提案する。 bidf-mkdフレームワークの優れた性能を示す、さまざまな現実世界のシナリオにおける広範囲な実験。

Data-free meta-learning (DFML) aims to enable efficient learning of new tasks by meta-learning from a collection of pre-trained models without access to the training data. Existing DFML work can only meta-learn from (i) white-box and (ii) small-scale pre-trained models (iii) with the same architecture, neglecting the more practical setting where the users only have inference access to the APIs with arbitrary model architectures and model scale inside. To solve this issue, we propose a Bi-level Data-free Meta Knowledge Distillation (BiDf-MKD) framework to transfer more general meta knowledge from a collection of black-box APIs to one single meta model. Specifically, by just querying APIs, we inverse each API to recover its training data via a zero-order gradient estimator and then perform meta-learning via a novel bi-level meta knowledge distillation structure, in which we design a boundary query set recovery technique to recover a more informative query set near the decision boundary. In addition, to encourage better generalization within the setting of limited API budgets, we propose task memory replay to diversify the underlying task distribution by covering more interpolated tasks. Extensive experiments in various real-world scenarios show the superior performance of our BiDf-MKD framework.

翻訳日:2023-05-31 21:23:10 公開日:2023-05-28

# ホークスプロセスによる異種事象の短期的時間依存性検出

Short-term Temporal Dependency Detection under Heterogeneous Event Dynamic with Hawkes Processes ( http://arxiv.org/abs/2305.18412v1 )

ライセンス: Link先を確認

Yu Chen, Fengpei Li, Anderson Schneider, Yuriy Nevmyvaka, Asohan Amarasingham, Henry Lam

(参考訳) 多くのイベントシーケンスデータは相互に刺激的あるいは抑制的なパターンを示す。このような時間依存の信頼できる検出は科学的調査に不可欠である。事実上のモデルはマルチ変数ホークスプロセス(MHP)であり、その影響関数はグランガー因果関係の因果構造を自然に符号化する。しかし、既存の手法の大半は、実世界のデータと矛盾する一定のベースラインを持つ標準MHP強度の直接変換または非線形変換を用いる。不規則で不均一な強度の下では、相互相互作用の効果と強度変動の影響を区別するのに苦労するため、時間的依存を捉えることは困難である。本稿では,短期の時間依存検出問題に対処する。 MHPのクロスインパクトに対する最大誤差推定(MLE)は,対象HPではなく相互作用HPのヘテロジニアス強度を用いて,除去できないがマグニチュードで低減できる誤差を有することを示す。そこで我々は、MLEから修正した頑健で計算効率のよい手法を提案し、不均一強度の事前推定に頼らず、データ制限方式(例:少数ショット、反復観察なし)に適用できることを示した。様々なデータセットを広範囲に実験した結果,本手法は神経科学における新たな応用が注目され,既存の手法よりも有意なマージンで勝っていることがわかった。

Many event sequence data exhibit mutually exciting or inhibiting patterns. Reliable detection of such temporal dependency is crucial for scientific investigation. The de facto model is the Multivariate Hawkes Process (MHP), whose impact function naturally encodes a causal structure in Granger causality. However, the vast majority of existing methods use direct or nonlinear transform of standard MHP intensity with constant baseline, inconsistent with real-world data. Under irregular and unknown heterogeneous intensity, capturing temporal dependency is hard as one struggles to distinguish the effect of mutual interaction from that of intensity fluctuation. In this paper, we address the short-term temporal dependency detection issue. We show the maximum likelihood estimation (MLE) for cross-impact from MHP has an error that can not be eliminated but may be reduced by order of magnitude, using heterogeneous intensity not of the target HP but of the interacting HP. Then we proposed a robust and computationally-efficient method modified from MLE that does not rely on the prior estimation of the heterogeneous intensity and is thus applicable in a data-limited regime (e.g., few-shot, no repeated observations). Extensive experiments on various datasets show that our method outperforms existing ones by notable margins, with highlighted novel applications in neuroscience.

翻訳日:2023-05-31 21:22:47 公開日:2023-05-28

# 拡散モデルを用いた認知型クロスモーダルデータ生成

Cognitively Inspired Cross-Modal Data Generation Using Diffusion Models ( http://arxiv.org/abs/2305.18433v1 )

ライセンス: Link先を確認

Zizhao Hu, Mohammad Rostami

(参考訳) 拡散モデルに基づく既存のクロスモーダル生成法の多くは、異なるモダリティをまたいだ条件付き生成を可能にするために潜在空間の制御を提供するためのガイダンスを用いる。このような方法は、1つのモダリティのために個別に訓練されたモデルを通してガイダンスを提供することに焦点を当てている。その結果、これらの手法はクロスモーダル情報損失に悩まされ、一方向条件生成に限られる。マルチモーダル情報を取得し,モダリティ間の相関を学習する方法に着想を得て,チャネル毎のイメージコンディショニングを用いたマルチモーダル拡散モデルの学習とサンプリングスキームを,脳内の学習プロセスを模倣するためにトレーニングフェーズ中に学習する。実験の結果,すべての相関モダリティを条件としたデータ生成が可能となった。

Most existing cross-modal generative methods based on diffusion models use guidance to provide control over the latent space to enable conditional generation across different modalities. Such methods focus on providing guidance through separately-trained models, each for one modality. As a result, these methods suffer from cross-modal information loss and are limited to unidirectional conditional generation. Inspired by how humans synchronously acquire multi-modal information and learn the correlation between modalities, we explore a multi-modal diffusion model training and sampling scheme that uses channel-wise image conditioning to learn cross-modality correlation during the training phase to better mimic the learning process in the brain. Our empirical results demonstrate that our approach can achieve data generation conditioned on all correlated modalities.

翻訳日:2023-05-31 21:15:06 公開日:2023-05-28

# 説明可能なモデリングのための完全可視化による対話型決定木作成と拡張

Interactive Decision Tree Creation and Enhancement with Complete Visualization for Explainable Modeling ( http://arxiv.org/abs/2305.18432v1 )

ライセンス: Link先を確認

Boris Kovalerchuk Andrew Dunn, Alex Worland, Sridevi Wagle

(参考訳) 機械学習(ML)モデルの解釈可能性と予測精度を高めるため、MLモデルの可視化はMLプロセスの重要な部分である。決定木(DT)は、ディープラーニングモデルを含む多くのブラックボックスMLモデルを理解するために使用されるため、機械学習(ML)において不可欠である。本研究では,決定木を理解可能なモデルとして完全可視化する2つの新しい手法を提案する。これらの手法は、GLC(General Line Coordinates)とBC(Bended Coordinates)とSPC(Shifted Paired Coordinates)の2つのバージョンを使用する。曲げ座標は線座標の集合であり、各座標は各DTノードのしきい値点に曲げられる。 spcでは、各 n-d 点を 2-次元デカルト座標のシフト対を有向グラフとして可視化する。これらの新しいメソッドは、DTモデルをより完全に視覚化する既存のメソッドの機能を拡張し、補完する。これらの機能は,(1)属性間の関係,(2)DT構造に対する個々のケース,(3)DT内のデータフロー,(4)DTノード内の各分割しきい値の感度,(5)N-D空間の一部のケースの密度,の観測と解析を可能にする。これらの機能は、DTモデルの過剰な一般化や過度な適合を防ぐのに役立つため、ドメインの専門家やエンドユーザによるDTモデルのパフォーマンス評価と改善に不可欠である。この手法の利点は、実世界のベンチマークデータセットのケーススタディで説明される。この論文は、異なる一般線座標における決定木の可視化のためにそれらを一般化する方法も示している。

To increase the interpretability and prediction accuracy of the Machine Learning (ML) models, visualization of ML models is a key part of the ML process. Decision Trees (DTs) are essential in machine learning (ML) because they are used to understand many black box ML models including Deep Learning models. In this research, two new methods for creation and enhancement with complete visualizing Decision Trees as understandable models are suggested. These methods use two versions of General Line Coordinates (GLC): Bended Coordinates (BC) and Shifted Paired Coordinates (SPC). The Bended Coordinates are a set of line coordinates, where each coordinate is bended in a threshold point of the respective DT node. In SPC, each n-D point is visualized in a set of shifted pairs of 2-D Cartesian coordinates as a directed graph. These new methods expand and complement the capabilities of existing methods to visualize DT models more completely. These capabilities allow us to observe and analyze: (1) relations between attributes, (2) individual cases relative to the DT structure, (3) data flow in the DT, (4) sensitivity of each split threshold in the DT nodes, and (5) density of cases in parts of the n-D space. These features are critical for DT models' performance evaluation and improvement by domain experts and end users as they help to prevent overgeneralization and overfitting of the models. The advantages of this methodology are illustrated in the case studies on benchmark real-world datasets. The paper also demonstrates how to generalize them for decision tree visualizations in different General Line Coordinates.

翻訳日:2023-05-31 21:14:50 公開日:2023-05-28

# マルチタスク学習によるAirbnb検索ジャーニーの最適化

Optimizing Airbnb Search Journey with Multi-task Learning ( http://arxiv.org/abs/2305.18431v1 )

ライセンス: Link先を確認

Chun How Tan, Austin Chan, Malay Haldar, Jie Tang, Xin Liu, Mustafa Abdool, Huiji Gao, Liwei He, Sanjeev Katariya

(参考訳) 宿泊や体験のためのオンラインマーケットプレイスであるairbnbでは、宿泊客は予約リクエストが終わるまで数週間かけて複数のアイテムを探索し比較する。各予約要求は、チェックイン前にホストによって拒否またはキャンセルされる可能性がある。検索の旅路の長くて探索的な性質と、ゲストとホストの好みのバランスをとる必要性は、airbnbの検索ランキングにユニークな課題をもたらす。本稿では、これらの課題に対処する、新しいマルチタスクディープラーニングモデルアーキテクチャである journey ranker について述べる。 journey rankerは、中間のゲストアクションをポジティブとネガティブの両方のマイルストーンとして活用し、ゲストの予約を成功に導く。また、ゲスト状態や検索クエリなどのコンテキスト情報を使用して、ゲストとホストの好みのバランスをとる。モジュールで拡張可能な設計で、懸念を明確に分離した4つのモジュールで構成されており、Airbnbの検索ランキングコンテキストを超えたケースを簡単に使用できる。 Journey Rankerのオフラインおよびオンラインテストを実施して、4つのAirbnb製品に本番環境でのデプロイに成功した。

At Airbnb, an online marketplace for stays and experiences, guests often spend weeks exploring and comparing multiple items before making a final reservation request. Each reservation request may then potentially be rejected or cancelled by the host prior to check-in. The long and exploratory nature of the search journey, as well as the need to balance both guest and host preferences, present unique challenges for Airbnb search ranking. In this paper, we present Journey Ranker, a new multi-task deep learning model architecture that addresses these challenges. Journey Ranker leverages intermediate guest actions as milestones, both positive and negative, to better progress the guest towards a successful booking. It also uses contextual information such as guest state and search query to balance guest and host preferences. Its modular and extensible design, consisting of four modules with clear separation of concerns, allows for easy application to use cases beyond the Airbnb search ranking context. We conducted offline and online testing of the Journey Ranker and successfully deployed it in production to four different Airbnb products with significant business metrics improvements.

翻訳日:2023-05-31 21:14:24 公開日:2023-05-28

# スケーラブルで弱められた銀行取引分類

Scalable and Weakly Supervised Bank Transaction Classification ( http://arxiv.org/abs/2305.18430v1 )

ライセンス: Link先を確認

Liam Toran, Cory Van Der Walt, Alan Sammarone, Alex Keller (Flowcast.ai)

(参考訳) 本稿では,弱い監督,自然言語処理,ディープニューラルネットワーク技術を用いて,銀行取引を分類することを目的とする。我々の手法は、ヒューリスティックスとドメイン知識を活用して正確なトランザクション分類器を訓練することで、高価で入手が難しい手動アノテーションへの依存を最小限に抑える。本稿では,データプリプロセッシング,トランザクションテキスト埋め込み,アンカー,ラベル生成,識別型ニューラルネットワークトレーニング,システムアーキテクチャの概要など,効果的でスケーラブルなエンドツーエンドデータパイプラインを提案する。本手法は,既存の市場主導型ソリューションよりも優れており,正確な分類が可能であり,新規および複合的なユースケースに素早く拡張できることを示す。これにより、金融健康報告や信用リスク評価など、多くの金融応用を解き放つことができる。

This paper aims to categorize bank transactions using weak supervision, natural language processing, and deep neural network techniques. Our approach minimizes the reliance on expensive and difficult-to-obtain manual annotations by leveraging heuristics and domain knowledge to train accurate transaction classifiers. We present an effective and scalable end-to-end data pipeline, including data preprocessing, transaction text embedding, anchoring, label generation, discriminative neural network training, and an overview of the system architecture. We demonstrate the effectiveness of our method by showing it outperforms existing market-leading solutions, achieves accurate categorization, and can be quickly extended to novel and composite use cases. This can in turn unlock many financial applications such as financial health reporting and credit risk assessment.

翻訳日:2023-05-31 21:14:06 公開日:2023-05-28

# 一般線座標を用いた視覚知識発見

Visual Knowledge Discovery with General Line Coordinates ( http://arxiv.org/abs/2305.18429v1 )

ライセンス: Link先を確認

Lincoln Huber, Boris Kovalerchuk, Charles Recaido

(参考訳) 多次元データによるブラックボックス機械学習手法の理解は、機械学習の重要な課題である。多くの強力な機械学習手法がすでに存在するが、これらの手法はしばしば説明がつかないか、複雑なデータでは性能が悪い。本稿では,ロスレス一般線座標を用いた視覚知識発見手法を提案する。これらは、説明規則で非線形分類器を生成、説明、視覚化するために、以前に導入された一般直線座標と動的足場座標の拡張である。これらの非線形モデルとルールの正確性を保証するため、ラインコーディネート・リニアは最悪の検証分割を見つけるためのインタラクティブな視覚知識発見アルゴリズムも開発した。これらの拡張は、非線形、インタラクティブな規則、ハイパーブロックルール、最悪のケースリニアである。複数のベンチマークデータセットにまたがる実験により、この視覚知識探索法は他の視覚的および計算的機械学習アルゴリズムと競合し、線形および非線形分類における解釈可能性と精度の両方を改善した。これらの拡張の主な利点は、ハイパーブロックから正確で高度に解釈可能なモデルやルールを構築する能力、モデルの解釈可能性の弱さを分析する能力、対話的で人間主導の視覚知識発見手法による専門家知識の入力などである。

Understanding black-box Machine Learning methods on multidimensional data is a key challenge in Machine Learning. While many powerful Machine Learning methods already exist, these methods are often unexplainable or perform poorly on complex data. This paper proposes visual knowledge discovery approaches based on several forms of lossless General Line Coordinates. These are an expansion of the previously introduced General Line Coordinates Linear and Dynamic Scaffolding Coordinates to produce, explain, and visualize non-linear classifiers with explanation rules. To ensure these non-linear models and rules are accurate, General Line Coordinates Linear also developed new interactive visual knowledge discovery algorithms for finding worst-case validation splits. These expansions are General Line Coordinates non-linear, interactive rules linear, hyperblock rules linear, and worst-case linear. Experiments across multiple benchmark datasets show that this visual knowledge discovery method can compete with other visual and computational Machine Learning algorithms while improving both interpretability and accuracy in linear and non-linear classifications. Major benefits from these expansions consist of the ability to build accurate and highly interpretable models and rules from hyperblocks, the ability to analyze interpretability weaknesses in a model, and the input of expert knowledge through interactive and human-guided visual knowledge discovery methods.

翻訳日:2023-05-31 21:13:53 公開日:2023-05-28

# GRD:強化学習における解釈可能な再分配のための生成的アプローチ

GRD: A Generative Approach for Interpretable Reward Redistribution in Reinforcement Learning ( http://arxiv.org/abs/2305.18427v1 )

ライセンス: Link先を確認

Yudi Zhang, Yali Du, Biwei Huang, Ziyan Wang, Jun Wang, Meng Fang, Mykola Pechenizkiy

(参考訳) 強化学習における大きな課題は、将来の報酬にどの状態-作用ペアが責任を持つかを決定することである。 Return Decompositionは、ポリシーの不変性を保ちながら、観測されたシーケンスから報酬を再分配するソリューションを提供する。現在行われているほとんどのアプローチは、報酬の再分配を解釈不能な方法で構築するが、因果的観点から状態と行動の寄与を明示的にモデル化し、解釈可能な戻り分解をもたらす。本稿では,マルコフ報酬の生成と軌道回りの長期リターンを特徴付けることによる回帰分解における因果生成モデルの役割を考察し,遅延報酬シナリオにおける政策最適化のための生成回帰分解(grd)と呼ばれる枠組みを提案する。具体的には、GRDはまず、生成過程における観測不可能なマルコフ報酬と因果関係を識別する。そして、GRDは同定された因果生成モデルを用いて、エージェントの状態空間の最も好ましい部分空間上のポリシーを訓練するためのコンパクトな表現を形成する。理論的には、観測不能なマルコフ報酬関数は、基礎となる因果構造や因果モデルと同様に識別可能である。実験結果から,本手法は最先端の手法よりも優れており,その可視化によりさらに解釈性が示された。

A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed. Return Decomposition offers a solution by redistributing rewards from observed sequences while preserving policy invariance. While the majority of current approaches construct the reward redistribution in an uninterpretable manner, we propose to explicitly model the contributions of state and action from a causal perspective, resulting in an interpretable return decomposition. In this paper, we start by studying the role of causal generative models in return decomposition by characterizing the generation of Markovian rewards and trajectory-wise long-term return and further propose a framework, called Generative Return Decomposition (GRD), for policy optimization in delayed reward scenarios. Specifically, GRD first identifies the unobservable Markovian rewards and causal relations in the generative process. Then, GRD makes use of the identified causal generative model to form a compact representation to train policy over the most favorable subspace of the state space of the agent. Theoretically, we show that the unobservable Markovian reward function is identifiable, as well as the underlying causal structure and causal models. Experimental results show that our method outperforms state-of-the-art methods and the provided visualization further demonstrates the interpretability of our method.

翻訳日:2023-05-31 21:13:31 公開日:2023-05-28

# 付加製造試料の入力変数と引張強度の相関解析における説明可能な人工知能(XAI)手法の適用

Employing Explainable Artificial Intelligence (XAI) Methodologies to Analyze the Correlation between Input Variables and Tensile Strength in Additively Manufactured Samples ( http://arxiv.org/abs/2305.18426v1 )

ライセンス: Link先を確認

Akshansh Mishra, Vijaykumar S Jatti

(参考訳) 本研究では, インフィルパーセンテージ, 層高さ, 押出温度, 印刷速度などの入力パラメータが, 添加物製造による引張強度に及ぼす影響について検討した。本研究の目的は, 入力パラメータと引張強度の相関関係の理解を深めることと, 添加物製造プロセスの性能に影響を与える要因を明らかにすることである。この目的を達成するために,説明可能な人工知能(xai)技術を初めて活用し,データを分析し,システムの振る舞いに関する貴重な洞察を得ることができた。具体的には、機械学習モデル予測を解釈するための広く採用されているフレームワークであるSHAP(SHapley Additive exPlanations)を用いて、データに基づいてトレーニングされた機械学習モデルの振る舞いを説明する。その結果, インフィル率と押出温度は引張強度に最も大きな影響を与えるが, 層の高さや印刷速度の影響は比較的小さいことがわかった。さらに,入力パラメータと引張強度の関係は複雑で非線形であり,単純な線形モデルを用いて正確に記述することは困難であることがわかった。

This research paper explores the impact of various input parameters, including Infill percentage, Layer Height, Extrusion Temperature, and Print Speed, on the resulting Tensile Strength in objects produced through additive manufacturing. The main objective of this study is to enhance our understanding of the correlation between the input parameters and Tensile Strength, as well as to identify the key factors influencing the performance of the additive manufacturing process. To achieve this objective, we introduced the utilization of Explainable Artificial Intelligence (XAI) techniques for the first time, which allowed us to analyze the data and gain valuable insights into the system's behavior. Specifically, we employed SHAP (SHapley Additive exPlanations), a widely adopted framework for interpreting machine learning model predictions, to provide explanations for the behavior of a machine learning model trained on the data. Our findings reveal that the Infill percentage and Extrusion Temperature have the most significant influence on Tensile Strength, while the impact of Layer Height and Print Speed is relatively minor. Furthermore, we discovered that the relationship between the input parameters and Tensile Strength is highly intricate and nonlinear, making it difficult to accurately describe using simple linear models.

翻訳日:2023-05-31 21:13:07 公開日:2023-05-28

# 重み残差の低ランク近似による微調整モデルの効率的な保存

Efficient Storage of Fine-Tuned Models via Low-Rank Approximation of Weight Residuals ( http://arxiv.org/abs/2305.18425v1 )

ライセンス: Link先を確認

Simo Ryu, Seunghyun Seo, Jaejun Yoo

(参考訳) 本稿では,重量残差の低ランク特性を活かし,微調整モデルの効率的な保存法を提案する。我々の重要な観察は、大きな過パラメータモデルの重量残差がより強い低ランク特性を示すことである。この知見に基づき,低位重み残差を近似することにより,微調整モデル重みの効率的な保存を実現する新しい手法である効率的な残差符号化(ere)を提案する。さらに, 重み残差のロバスト性を分析し, 付加量子化と層別ランク割当てを利用して, 貯蔵効率の限界を押し上げる。実験の結果,様々なタスクやモダリティのパフォーマンスを保ちながらメモリフットプリントを大幅に削減できることがわかった。コードをリリースします。

In this paper, we present an efficient method for storing fine-tuned models by leveraging the low-rank properties of weight residuals. Our key observation is that weight residuals in large overparameterized models exhibit even stronger low-rank characteristics. Based on this insight, we propose Efficient Residual Encoding (ERE), a novel approach that achieves efficient storage of fine-tuned model weights by approximating the low-rank weight residuals. Furthermore, we analyze the robustness of weight residuals and push the limit of storage efficiency by utilizing additional quantization and layer-wise rank allocation. Our experimental results demonstrate that our method significantly reduces memory footprint while preserving performance in various tasks and modalities. We release our code.

翻訳日:2023-05-31 21:12:47 公開日:2023-05-28

# 学習時間と精度の最小化のための繰り返しランダムサンプリング

Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning ( http://arxiv.org/abs/2305.18424v1 )

ライセンス: Link先を確認

Patrik Okanovic, Roger Waleffe, Vasilis Mageirakos, Konstantinos E. Nikolakakis, Amin Karbasi, Dionysis Kalogerias, Nezihe Merve G\"urel, Theodoros Rekatsinas

(参考訳) データプルーニング、コアセット選択、データ蒸留等から学習するための少量のトレーニングデータを慎重に選択または生成する方法は、ニューラルネットワークのトレーニングコストの増大を減少させるのに有効であることが示されている。この成功の背後には、大規模なデータセットから有益なトレーニング例を特定するための厳密な設計戦略がある。しかし、これらの戦略は、訓練開始前にサブセットの選択やデータの蒸留に関連する追加計算コストを伴い、さらに、高データ圧縮方式では、多種多様でないランダムサンプリングさえ示される。そのため、多くのデータプルーニング、コアセット選択、蒸留法は、大規模データセット上でディープニューラルネットワークをトレーニングするための重要な効率指標となっている「正確化までの時間」を削減できない。本研究では,これらの課題に対処するために,強力で見過ごされているランダムサンプリング戦略を再検討し,モデルのトレーニング毎にトレーニングデータのサブセットをランダムにサンプリングする,ランダムサブセット(rsrまたはrs2)を繰り返しサンプリングする手法を導入する。我々は、imagenetを含む4つのデータセットにまたがる30の最先端データプルーニングとデータ蒸留法に対してrs2をテストする。その結果,RS2は既存の手法に比べて時間と精度を著しく低下させることがわかった。例えば、圧縮方式(各エポックのデータセットの10%未満を使用して)でimagenetをトレーニングすると、rs2は、競合するpruningメソッドと比較して29%の精度向上を実現し、ランタイムの7倍の削減を提供する。上記のメタスタディを超えて、rs2の収束解析を行い、その一般化機能について論じる。私たちの研究の主な目標は、効率的なトレーニングを目的とした将来のデータ選択や蒸留技術のための競合ベースラインとしてrs2を確立することです。

Methods for carefully selecting or generating a small set of training data to learn from, i.e., data pruning, coreset selection, and data distillation, have been shown to be effective in reducing the ever-increasing cost of training neural networks. Behind this success are rigorously designed strategies for identifying informative training examples out of large datasets. However, these strategies come with additional computational costs associated with subset selection or data distillation before training begins, and furthermore, many are shown to even under-perform random sampling in high data compression regimes. As such, many data pruning, coreset selection, or distillation methods may not reduce 'time-to-accuracy', which has become a critical efficiency measure of training deep neural networks over large datasets. In this work, we revisit a powerful yet overlooked random sampling strategy to address these challenges and introduce an approach called Repeated Sampling of Random Subsets (RSRS or RS2), where we randomly sample the subset of training data for each epoch of model training. We test RS2 against thirty state-of-the-art data pruning and data distillation methods across four datasets including ImageNet. Our results demonstrate that RS2 significantly reduces time-to-accuracy compared to existing techniques. For example, when training on ImageNet in the high-compression regime (using less than 10% of the dataset each epoch), RS2 yields accuracy improvements up to 29% compared to competing pruning methods while offering a runtime reduction of 7x. Beyond the above meta-study, we provide a convergence analysis for RS2 and discuss its generalization capability. The primary goal of our work is to establish RS2 as a competitive baseline for future data selection or distillation techniques aimed at efficient training.

翻訳日:2023-05-31 21:12:33 公開日:2023-05-28

# 学習リカレントニューラルネットワークのサンプル複雑性におけるノイズの役割について--長い列の指数ギャップについて

On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences ( http://arxiv.org/abs/2305.18423v1 )

ライセンス: Link先を確認

Alireza Fathollah Pour and Hassan Ashtiani

(参考訳) 我々は,ネットワーク内の各ニューロンの出力に$\mathcal{n}(0,\sigma^2)$による独立した雑音が分布する長さ$t$のシーケンスを分類するために,w$ (unbounded) 重み付き多層型sgmoid recurrentニューラルネットワークのクラスを考える。主な結果は、pac学習のサンプル複雑性が$o(w\log(t/\sigma))$で区切られることを示している。同じクラスの雑音のないバージョン(例えば$\sigma=0$)に対して、サンプル複雑性に対して$\Omega (wT)$の低い境界を証明する。以上の結果から,ノイズネットワークと非ノイズネットワークでは,サンプル複雑性の指数関数的差がt$で示される。さらに、1/\sigma$ 上の上限の軽度対数依存を考えると、このギャップは数値的に無視できる$\sigma$ の値でも維持される。

We consider the class of noisy multi-layered sigmoid recurrent neural networks with $w$ (unbounded) weights for classification of sequences of length $T$, where independent noise distributed according to $\mathcal{N}(0,\sigma^2)$ is added to the output of each neuron in the network. Our main result shows that the sample complexity of PAC learning this class can be bounded by $O (w\log(T/\sigma))$. For the non-noisy version of the same class (i.e., $\sigma=0$), we prove a lower bound of $\Omega (wT)$ for the sample complexity. Our results indicate an exponential gap in the dependence of sample complexity on $T$ for noisy versus non-noisy networks. Moreover, given the mild logarithmic dependence of the upper bound on $1/\sigma$, this gap still holds even for numerically negligible values of $\sigma$.

翻訳日:2023-05-31 21:12:00 公開日:2023-05-28

# 解釈可能な機械学習モデル発見のための並列座標

Parallel Coordinates for Discovery of Interpretable Machine Learning Models ( http://arxiv.org/abs/2305.18434v1 )

ライセンス: Link先を確認

Dustin Hayes, Boris Kovalerchuk

(参考訳) この研究は、並列座標における視覚的知識発見を用いて、解釈可能な機械学習の手法を前進させる。パラレル座標によるグラフィックデータ表現は、ハイパーキューブとハイパーブロック(hbs)の概念をエンドユーザにとって分かりやすくした。提案したデータ分類アルゴリズムであるHyperでは,混合および純粋なハイパーブロックを用いることが提案されている。ハイパーモデルは決定木を一般化する。アルゴリズムはいくつかの設定とオプションで表示され、インタラクティブ、自動オーバーラップ、非オーバーラップのハイパーブロックを検出する。さらに,視覚パターンの言語記述と連動してハイパーブロックの使用が実証された。 UCI MLリポジトリのベンチマークデータは、Hyperアルゴリズムを評価するために使用された。これにより、10倍のクロスバリデーションを用いて評価した混合HBと純粋なHBの発見が可能となった。ハイパーブロック間の接続、次元縮小、可視化が確立されている。エンドユーザーがハイパーブロックを見つけて観察する能力と、パターンを明確にするためのサイドバイサイドの可視化能力は、ハイパーブロック技術とハイパーアルゴリズムの大きな利点である。従来の並列座標ではサポートされていないが,不完全なn-Dデータを不完全な値で可視化する新しい手法を提案する。 HBが決定木上のデータの過一般化と過適合の両方を防止できる能力は、ハイパーブロックの別の利点として示される。ハイパーテクノロジーを実装するviscanvas 2.0ソフトウェアツールの特徴を紹介する。

This work uses visual knowledge discovery in parallel coordinates to advance methods of interpretable machine learning. The graphic data representation in parallel coordinates made the concepts of hypercubes and hyperblocks (HBs) simple to understand for end users. It is suggested to use mixed and pure hyperblocks in the proposed data classifier algorithm Hyper. It is shown that Hyper models generalize decision trees. The algorithm is presented in several settings and options to discover interactively or automatically overlapping or non-overlapping hyperblocks. Additionally, the use of hyperblocks in conjunction with language descriptions of visual patterns is demonstrated. The benchmark data from the UCI ML repository were used to evaluate the Hyper algorithm. It enabled the discovery of mixed and pure HBs evaluated using 10-fold cross validation. Connections among hyperblocks, dimension reduction and visualization have been established. The capability of end users to find and observe hyperblocks, as well as the ability of side-by-side visualizations to make patterns evident, are among major advantages ofhyperblock technology and the Hyper algorithm. A new method to visualize incomplete n-D data with missing values is proposed, while the traditional parallel coordinates do not support it. The ability of HBs to better prevent both overgeneralization and overfitting of data over decision trees is demonstrated as another benefit of the hyperblocks. The features of VisCanvas 2.0 software tool that implements Hyper technology are presented.

翻訳日:2023-05-31 21:03:07 公開日:2023-05-28

# Key-Value Transformer

Key-Value Transformer ( http://arxiv.org/abs/2305.19129v1 )

ライセンス: Link先を確認

Ali Borji

(参考訳) トランスフォーマーは、コンピュータビジョンや自然言語処理など、さまざまなAIタスクの一般的な標準ソリューションとして登場した。広く採用されているクエリ、キー、値の定式化(qkv)が重要な役割を果たしている。それにもかかわらず、これら3つの部品のトランスフォーマー性能に関する本質的な研究は行われていない。そこで我々は,左右対称の注意マップを生成するキー値定式化(KV)と,2次元位置符号化をアテンションマトリックスに組み込んだ非対称バージョンの評価を行った。注目すべきは、この変換器は元のパラメータよりも少ないパラメータと計算を必要とすることだ。 3種類のタスクタイプ(例えば、リストの逆転やソート)、視覚(mnistまたはcifar classification)、NLP(character generation and translation))を含む実験を通して、KV変換器が時々QKV変換器を上回っていることが判明した。しかし、QKVと比較して性能の低い事例も示しており、決定的な結論を出すことは困難である。それでも我々は、報告された結果が将来のより効率的なトランスフォーマーへの道を開くことを奨励し、予測している。

Transformers have emerged as the prevailing standard solution for various AI tasks, including computer vision and natural language processing. The widely adopted Query, Key, and Value formulation (QKV) has played a significant role in this. Nevertheless, no research has examined the essentiality of these three components for transformer performance. Therefore, we conducted an evaluation of the key-value formulation (KV), which generates symmetric attention maps, along with an asymmetric version that incorporates a 2D positional encoding into the attention matrix. Remarkably, this transformer requires fewer parameters and computation than the original one. Through experiments encompassing three task types -- synthetics (such as reversing or sorting a list), vision (mnist or cifar classification), and NLP (character generation and translation) -- we discovered that the KV transformer occasionally outperforms the QKV transformer. However, it also exhibits instances of underperformance compared to QKV, making it challenging to draw a definitive conclusion. Nonetheless, we consider the reported results to be encouraging and anticipate that they may pave the way for more efficient transformers in the future.

翻訳日:2023-05-31 15:36:51 公開日:2023-05-28

# 文脈内学習におけるラベルバイアスの軽減

Mitigating Label Biases for In-context Learning ( http://arxiv.org/abs/2305.19148v1 )

ライセンス: Link先を確認

Yu Fei, Yifan Hou, Zeming Chen, Antoine Bosselut

(参考訳) インコンテキスト学習(ICL)のための様々な設計設定、例えばインコンテキストの例の選択と順序は、モデルの予測に偏りがある。多くの研究がこれらの設計選択について論じているが、それらを分類し、その影響を緩和する体系的な調査はほとんど行われていない。本研究では,テキスト分類におけるICLの3種類のラベルバイアスについて,バニララベルバイアス,コンテキストラベルバイアス,ドメインラベルバイアス(概念化と検出を初めて行う)の3種類のタイプを定義した。本分析により, 先行ラベルバイアス校正法は, 3種類のバイアスに対処できないことがわかった。特に、ドメインラベルバイアスは、コンテキスト内例の選択によらず、多くのタスクでllmをランダムレベルのパフォーマンスに制限する。これらのバイアスの影響を緩和するために,タスクコーパスからランダムなドメイン内単語を用いて言語モデルのラベルバイアスを推定する簡易なバイアス校正法を提案する。予測時のこの推定バイアスを制御した後、ドメインコンテキストキャリブレーションにより、幅広いタスクにおけるGPT-JとGPT-3のICL性能が大幅に向上する。利益はドメインラベルバイアスが大きいタスク(マクロf1では最大37%)に相当します。さらに,様々なスケール,プリトレーニング手法,手作業によるタスク指示のモデルに一般化し,iclにおけるラベルバイアスの有意さを示した。

Various design settings for in-context learning (ICL), such as the choice and order of the in-context examples, can bias the model's predictions. While many studies discuss these design choices, there have been few systematic investigations into categorizing them and mitigating their impact. In this work, we define a typology for three types of label biases in ICL for text classification: vanilla-label bias, context-label bias, and domain-label bias (which we conceptualize and detect for the first time). Our analysis demonstrates that prior label bias calibration methods fall short of addressing all three types of biases. Specifically, domain-label bias restricts LLMs to random-level performance on many tasks regardless of the choice of in-context examples. To mitigate the effect of these biases, we propose a simple bias calibration method that estimates a language model's label bias using random in-domain words from the task corpus. After controlling for this estimated bias when making predictions, our novel domain-context calibration significantly improves the ICL performance of GPT-J and GPT-3 on a wide range of tasks. The gain is substantial on tasks with large domain-label bias (up to 37% in Macro-F1). Furthermore, our results generalize to models with different scales, pretraining methods, and manually-designed task instructions, showing the prevalence of label biases in ICL.

翻訳日:2023-05-31 15:25:34 公開日:2023-05-28

# 有限次元ベイズ推論のための条件付きスコアベース拡散モデル

Conditional score-based diffusion models for Bayesian inference in infinite dimensions ( http://arxiv.org/abs/2305.19147v1 )

ライセンス: Link先を確認

Lorenzo Baldassari, Ali Siahkoohi, Josselin Garnier, Knut Solna, Maarten V. de Hoop

(参考訳) 最初の導入以来、スコアベース拡散モデル(SDM)は、後方分布を効率的に近似する能力により、有限次元ベクトル空間における様々な線形逆問題の解法に成功している。しかし、無限次元関数空間の逆問題に対するsdmの使用は、最近、無条件スコアの学習によって解決された。このアプローチには、特定の逆問題に依存するいくつかの利点があるが、条件分布からサンプリングするには、観測データからの情報を近位最適化ステップに組み込む必要があり、最適化問題を何度も解く。これは計算コストのかかるフォワード作用素の逆問題では実現できないかもしれない。そこで本研究では, 無限次元ベイズ線形逆問題における後方分布を, 償却条件付きsdmを用いて学習する手法を提案する。特に、条件付き分母推定器は無限次元の条件付きスコアの一貫した推定器であることが証明される。 sdmを条件付き設定に拡張するには,条件付きスコアが無条件のスコアと相反する形で小さく吹き上がるため,ある程度の注意が必要である。また,観測の摂動に対する学習分布の堅牢性についても論じる。最後に、アプローチを検証する数値例を示し、さらなる洞察を提供する。

Since their first introduction, score-based diffusion models (SDMs) have been successfully applied to solve a variety of linear inverse problems in finite-dimensional vector spaces due to their ability to efficiently approximate the posterior distribution. However, using SDMs for inverse problems in infinite-dimensional function spaces has only been addressed recently and by learning the unconditional score. While this approach has some advantages, depending on the specific inverse problem at hand, in order to sample from the conditional distribution it needs to incorporate the information from the observed data with a proximal optimization step, solving an optimization problem numerous times. This may not be feasible in inverse problems with computationally costly forward operators. To address these limitations, in this work we propose a method to learn the posterior distribution in infinite-dimensional Bayesian linear inverse problems using amortized conditional SDMs. In particular, we prove that the conditional denoising estimator is a consistent estimator of the conditional score in infinite dimensions. We show that the extension of SDMs to the conditional setting requires some care because the conditional score typically blows up for small times contrarily to the unconditional score. We also discuss the robustness of the learned distribution against perturbations of the observations. We conclude by presenting numerical examples that validate our approach and provide additional insights.

翻訳日:2023-05-31 15:25:10 公開日:2023-05-28

# ASU-CNN:画像分類と特徴可視化のための効率的なディープアーキテクチャ

ASU-CNN: An Efficient Deep Architecture for Image Classification and Feature Visualizations ( http://arxiv.org/abs/2305.19146v1 )

ライセンス: Link先を確認

Jamshaid Ul Rahman, Faiza Makhdoom, Dianchen Lu

(参考訳) 活性化関数はディープニューラルネットワークの能力を決定する上で決定的な役割を果たす。アクティベーション関数に関する以前の研究は、主にモノトニックまたは非振動関数の効用に焦点を当てていたが、Growing Cosine Unitが多くのアプリケーションでタブーを破るまで続いた。本稿では,最近設計されたアクティベーション関数 asu を利用した畳み込みニューラルネットワークモデルである asu-cnn を提案する。この非単調および振動関数の効果は、異なる畳み込み層から特徴写像の可視化を通して検証される。提案するネットワークの最適化はAdam氏が学習率の微調整で提供する。ネットワークはcifar-10の分類のためのトレーニングとテストの両方で有望な結果を得た。実験により,コンピュータビジョンの分野に関するタスクを実行するためのモデルの有効性と有効性を確認した。

Activation functions play a decisive role in determining the capacity of Deep Neural Networks as they enable neural networks to capture inherent nonlinearities present in data fed to them. The prior research on activation functions primarily focused on the utility of monotonic or non-oscillatory functions, until Growing Cosine Unit broke the taboo for a number of applications. In this paper, a Convolutional Neural Network model named as ASU-CNN is proposed which utilizes recently designed activation function ASU across its layers. The effect of this non-monotonic and oscillatory function is inspected through feature map visualizations from different convolutional layers. The optimization of proposed network is offered by Adam with a fine-tuned adjustment of learning rate. The network achieved promising results on both training and testing data for the classification of CIFAR-10. The experimental results affirm the computational feasibility and efficacy of the proposed model for performing tasks related to the field of computer vision.

翻訳日:2023-05-31 15:24:51 公開日:2023-05-28

# オンライン学習の現代的紹介

A Modern Introduction to Online Learning ( http://arxiv.org/abs/1912.13213v6 )

ライセンス: Link先を確認

Francesco Orabona

(参考訳) 本稿では,オンラインコンベックス最適化の現代的展望を通して,オンライン学習の基本概念を紹介する。ここでは、オンライン学習は最悪の仮定の下で後悔の最小化の枠組みを指す。ユークリッドおよび非ユークリッド環境において、凸損失を伴うオンライン学習のための1次および2次アルゴリズムを提案する。すべてのアルゴリズムは、オンラインミラー降下やフォロー・ザ・レギュラライズド・リーダーとその変種をインスタンス化したものである。特に,適応型およびパラメータフリーオンライン学習アルゴリズムを用いて,非有界領域におけるアルゴリズムのパラメータのチューニングと学習の問題に注目する。非凸損失は凸サーロゲート損失とランダム化によって処理される。バンディットの設定も簡単に議論され、逆境や確率的多腕バンディットの問題に触れている。これらのノートは凸解析の事前の知識を必要とせず、必要な数学的ツールはすべて厳密に説明されている。さらに、含まれている全ての証明は可能な限り単純で短いものに慎重に選択されている。

In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the included proofs have been carefully chosen to be as simple and as short as possible.

翻訳日:2023-05-31 05:33:17 公開日:2023-05-28

# FairCanary: 迅速な継続的説明可能なフェアネス

FairCanary: Rapid Continuous Explainable Fairness ( http://arxiv.org/abs/2106.07057v4 )

ライセンス: Link先を確認

Avijit Ghosh, Aalok Shanbhag, Christo Wilson

(参考訳) 継続的モデル監視を提供するシステムは、(1)デプロイされた機械学習(ML)モデルと人工知能(AI)モデルの文書化された失敗、(2)これらのモデルに影響を与える新たな規制要件に対応して登場した。既存の監視システムは、デプロイされたMLモデルのパフォーマンスを継続的に追跡し、各予測に対する機能の重要性(説明)を計算し、開発者が創発的なモデルパフォーマンス問題の根本原因を特定するのに役立つ。 qdd(quantile demographic drift)は,分位数二分法を用いて部分群全体の予測分布の差を測定する,新しいモデルバイアス定量化指標である。 QDDは継続的な監視シナリオに最適であり、従来のしきい値ベースのバイアスメトリクスの統計的制限に悩まされず、結果ラベルを必要としない(実行時に利用できない可能性がある)。 QDDをFairCanaryと呼ばれる継続的モデル監視システムに組み込み、各予測毎に計算された既存の説明を再利用し、QDDバイアスメトリクスの説明を素早く計算します。この最適化により、FairCanaryは、機能レベルのバイアス説明を生成しようとする以前の作業よりも桁違いに高速になる。

Systems that offer continuous model monitoring have emerged in response to (1) well-documented failures of deployed Machine Learning (ML) and Artificial Intelligence (AI) models and (2) new regulatory requirements impacting these models. Existing monitoring systems continuously track the performance of deployed ML models and compute feature importance (a.k.a. explanations) for each prediction to help developers identify the root causes of emergent model performance problems. We present Quantile Demographic Drift (QDD), a novel model bias quantification metric that uses quantile binning to measure differences in the overall prediction distributions over subgroups. QDD is ideal for continuous monitoring scenarios, does not suffer from the statistical limitations of conventional threshold-based bias metrics, and does not require outcome labels (which may not be available at runtime). We incorporate QDD into a continuous model monitoring system, called FairCanary, that reuses existing explanations computed for each individual prediction to quickly compute explanations for the QDD bias metrics. This optimization makes FairCanary an order of magnitude faster than previous work that has tried to generate feature-level bias explanations.

翻訳日:2023-05-31 05:00:11 公開日:2023-05-28

# 2層ワイドニューラルネットワークを用いた平均正方形誤差回帰に対するグラディエントDescentのインプリシトバイアス

Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks ( http://arxiv.org/abs/2006.07356v5 )

ライセンス: Link先を確認

Hui Jin, Guido Mont\'ufar

(参考訳) 広帯域ニューラルネットワークの勾配降下訓練とそれに対応する関数空間の暗黙バイアスについて検討する。不定回帰の場合、幅=n$の浅いreluネットワークをトレーニングする解は、トレーニングデータに適合する関数の$n^{- 1/2}$以内であり、その初期関数との差は、ネットワークパラメータの初期化に使用される確率分布に依存する曲率ペナルティによって重み付けられた第2導関数の最小の2-ノルムである。様々な共通初期化手順の曲率ペナルティ関数を明示的に計算する。例えば、一様分布を持つ非対称初期化は一定曲率のペナルティをもたらし、従って解関数は訓練データの自然な立方体スプライン補間である。確率的勾配降下では、同じ暗黙のバイアス結果が得られる。 } 異なるアクティベーション関数に対して同様の結果が得られる。多変量回帰に対しては類似の結果を示し、第二微分は分数ラプラシアンのラドン変換に置き換えられる。一定のペナルティ関数をもたらす初期化スキームに対して、解は多調和スプラインである。また, トレーニングトラジェクタを平滑化スプラインの軌道に捕捉し, 正則化強度を低下させることを示した。

We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, we show that the solution of training a width-$n$ shallow ReLU network is within $n^{- 1/2}$ of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curvature penalty that depends on the probability distribution that is used to initialize the network parameters. We compute the curvature penalty function explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. \hj{For stochastic gradient descent we obtain the same implicit bias result.} We obtain a similar result for different activation functions. For multivariate regression we show an analogous result, whereby the second derivative is replaced by the Radon transform of a fractional Laplacian. For initialization schemes that yield a constant penalty function, the solutions are polyharmonic splines. Moreover, we show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength.

翻訳日:2023-05-31 04:56:34 公開日:2023-05-28

# 量子安全な非可逆抽出器

Quantum secure non-malleable-extractors ( http://arxiv.org/abs/2109.03097v4 )

ライセンス: Link先を確認

Naresh Goud Boddu, Rahul Jain, Upendra Kapshikar

(参考訳) 我々は、いくつかの明示的な量子セキュアな非可算抽出器を構成する。私たちが構築した量子安全な非可算抽出子は、Chattopadhyay, Goyal and Li [2015] と Cohen [2015] による構成に基づいている。 1) (ソース) min-entropy $k \geq \textsf{poly}\left(\log \left( \frac{n}{\epsilon} \right)\right)$ (n$ はソースの長さであり、$\epsilon$ はエラーパラメータである)。これまでaggarwal, chung, lin, vidick [2019] は、li [2012] が提案した内積ベースの非可算抽出器は量子安定であることを示したが、それは線形(n$)のミンエントロピーと種子長を必要とした。非可算抽出元とプライバシ増幅(cohen and vidick [2017] による量子設定で最初に確立された)の接続を使って、[2019] のためにプロトコルが要求する線形通信によって指数関数的に改善される、[2019] による通信$\textsf{poly}\left(\log \left( \frac{n}{\epsilon} \right)\right)$ でアクティブな量子敵に対してセキュアな、2ドルのプライバシ増幅プロトコルが得られる。 2) ミンエントロピー$k \geq n-n^{\Omega(1)}$に対して、明示的な量子セキュアな2$2-ソース非可換抽出器を構築し、大きさが$n^{\Omega(1)}$と誤差が$2^{-n^{\Omega(1)}}$とする。 3) 入力の改ざんが$t$-times で行われる場合の自然拡張についても検討した。我々は、シードされた(t=d^{\Omega(1)}$)および2$ソースケース(t=n^{\Omega(1)}$)に対して、明示的な量子セキュアな$t$-非可算抽出器を構築する。

We construct several explicit quantum secure non-malleable-extractors. All the quantum secure non-malleable-extractors we construct are based on the constructions by Chattopadhyay, Goyal and Li [2015] and Cohen [2015]. 1) We construct the first explicit quantum secure non-malleable-extractor for (source) min-entropy $k \geq \textsf{poly}\left(\log \left( \frac{n}{\epsilon} \right)\right)$ ($n$ is the length of the source and $\epsilon$ is the error parameter). Previously Aggarwal, Chung, Lin, and Vidick [2019] have shown that the inner-product based non-malleable-extractor proposed by Li [2012] is quantum secure, however it required linear (in $n$) min-entropy and seed length. Using the connection between non-malleable-extractors and privacy amplification (established first in the quantum setting by Cohen and Vidick [2017]), we get a $2$-round privacy amplification protocol that is secure against active quantum adversaries with communication $\textsf{poly}\left(\log \left( \frac{n}{\epsilon} \right)\right)$, exponentially improving upon the linear communication required by the protocol due to [2019]. 2) We construct an explicit quantum secure $2$-source non-malleable-extractor for min-entropy $k \geq n- n^{\Omega(1)}$, with an output of size $n^{\Omega(1)}$ and error $2^{- n^{\Omega(1)}}$. 3) We also study their natural extensions when the tampering of the inputs is performed $t$-times. We construct explicit quantum secure $t$-non-malleable-extractors for both seeded ($t=d^{\Omega(1)}$) as well as $2$-source case ($t=n^{\Omega(1)}$).

翻訳日:2023-05-31 04:48:40 公開日:2023-05-28

# pvCNN:プライバシ保護と検証可能な畳み込みニューラルネットワークテスト

pvCNN: Privacy-Preserving and Verifiable Convolutional Neural Network Testing ( http://arxiv.org/abs/2201.09186v3 )

ライセンス: Link先を確認

Jiasi Weng and Jian Weng and Gui Tang and Anjia Yang and Ming Li and Jia-Nan Liu

(参考訳) 本稿では,CNNモデル開発者が,モデルプライバシを尊重しつつ,複数のテスタの公開データよりも真正なCNNパフォーマンスをユーザに納得させることのできる,プライバシ保護と検証可能な畳み込みニューラルネットワーク(CNN)テストのための新しいアプローチを提案する。セキュリティと効率の両立を図るため、同型暗号化(HE)とゼロ知識簡潔な知識の非対話的議論(zk-SNARK)をCNNテストと適切に統合することで、3つの新しい取り組みを行う。まず、テスト対象のCNNモデルを、モデル開発者がローカルに保持するプライベート部分と、外部サーバにアウトソースされたパブリック部分に戦略的に分割する。そして、プライベート部は、テスタが送信したHE保護されたテストデータ上で動作し、その出力を公開部へ送信し、その後のCNNテストの計算を行う。第2に、上記のcnnテストの正確性は、2次元(2次元)畳み込み操作における証明オーバーヘッドの最適化に重点を置いて、zk-snarkベースの証明を生成することによって実現される。具体的には,複数のフィルタと入力間の2次元畳み込み演算をバッチ方式で表現する単一の乗算ゲートを持つ,新しい二次行列演算回路(qmps)を提案する。第3に、同一のcnnモデルに対して複数の証明を集約し、異なるテストデータ(すなわち異なるステートメント)を1つの証明に集約し、集約された証明の妥当性が元の複数の証明の妥当性を示すことを保証する。最後に,我々のqmps ベースの zk-snark は,既存の qaps ベースの zk-snark よりも約 13.9$\times$fast であり,高次元行列乗算では 17.6$\times$fast であることを示した。

This paper proposes a new approach for privacy-preserving and verifiable convolutional neural network (CNN) testing, enabling a CNN model developer to convince a user of the truthful CNN performance over non-public data from multiple testers, while respecting model privacy. To balance the security and efficiency issues, three new efforts are done by appropriately integrating homomorphic encryption (HE) and zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK) primitives with the CNN testing. First, a CNN model to be tested is strategically partitioned into a private part kept locally by the model developer, and a public part outsourced to an outside server. Then, the private part runs over HE-protected test data sent by a tester and transmits its outputs to the public part for accomplishing subsequent computations of the CNN testing. Second, the correctness of the above CNN testing is enforced by generating zk-SNARK based proofs, with an emphasis on optimizing proving overhead for two-dimensional (2-D) convolution operations, since the operations dominate the performance bottleneck during generating proofs. We specifically present a new quadratic matrix programs (QMPs)-based arithmetic circuit with a single multiplication gate for expressing 2-D convolution operations between multiple filters and inputs in a batch manner. Third, we aggregate multiple proofs with respect to a same CNN model but different testers' test data (i.e., different statements) into one proof, and ensure that the validity of the aggregated proof implies the validity of the original multiple proofs. Lastly, our experimental results demonstrate that our QMPs-based zk-SNARK performs nearly 13.9$\times$faster than the existing QAPs-based zk-SNARK in proving time, and 17.6$\times$faster in Setup time, for high-dimension matrix multiplication.

翻訳日:2023-05-31 04:40:21 公開日:2023-05-28

# VHR画像道路抽出のための強いコンテクストエンコーダを実現するスイニングトランスフォーマー結合CNN

Swin Transformer coupling CNNs Makes Strong Contextual Encoders for VHR Image Road Extraction ( http://arxiv.org/abs/2201.03178v2 )

ライセンス: Link先を確認

Tao Chen, Yiran Liu, Haoyu Jiang, Ruirui Li

(参考訳) 正確なセグメンテーション道路は、クラス内の変化、クラス間の違い、シャドウ、木、建物によって引き起こされる閉塞などにより困難である。これらの課題に対処するためには、重要なテクスチャの詳細への注意とグローバルな幾何学的文脈情報の認識が不可欠である。近年の研究では、CNN-Transformerハイブリッド構造は、CNNまたはTransformer単独でより優れていることが示されている。 cnnは局所的な細部特徴の抽出に優れているが、transformerは自然にグローバルな文脈情報を知覚する。本稿では,道路抽出タスクにresnetとswintransformersを組み合わせた2分岐ネットワークブロックconswinを提案する。このConSwinブロックは、両方のアプローチの長所を利用して、より詳細な特徴とグローバルな特徴を抽出する。コンスウィンに基づき,砂時計型道路抽出ネットワークを構築し,テクスチャや構造詳細情報をデコーダに伝達する2つの新しい接続構造を導入する。提案手法は,マサチューセッツおよびCHN6-CUGデータセットの精度,IOU,F1インジケータにおいて,最先端の手法よりも優れている。さらに,提案モジュールの有効性を検証し,可視化の結果から道路の表現性の向上が示された。

Accurately segmenting roads is challenging due to substantial intra-class variations, indistinct inter-class distinctions, and occlusions caused by shadows, trees, and buildings. To address these challenges, attention to important texture details and perception of global geometric contextual information are essential. Recent research has shown that CNN-Transformer hybrid structures outperform using CNN or Transformer alone. While CNN excels at extracting local detail features, the Transformer naturally perceives global contextual information. In this paper, we propose a dual-branch network block named ConSwin that combines ResNet and SwinTransformers for road extraction tasks. This ConSwin block harnesses the strengths of both approaches to better extract detailed and global features. Based on ConSwin, we construct an hourglass-shaped road extraction network and introduce two novel connection structures to better transmit texture and structural detail information to the decoder. Our proposed method outperforms state-of-the-art methods on both the Massachusetts and CHN6-CUG datasets in terms of overall accuracy, IOU, and F1 indicators. Additional experiments validate the effectiveness of our proposed module, while visualization results demonstrate its ability to obtain better road representations.

翻訳日:2023-05-31 04:39:08 公開日:2023-05-28

# HeterPS:異種環境における強化学習に基づくスケジューリングによる分散ディープラーニング

HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments ( http://arxiv.org/abs/2111.10635v3 )

ライセンス: Link先を確認

Ji Liu, Zhihua Wu, Dianhai Yu, Yanjun Ma, Danlei Feng, Minxu Zhang, Xinxuan Wu, Xuefeng Yao, Dejing Dou

(参考訳) ディープニューラルネットワーク(DNN)は多くのレイヤと多数のパラメータを利用して優れたパフォーマンスを実現する。 dnnモデルのトレーニングプロセスは一般的に、多くのスパースな機能を持つ大規模な入力データを処理し、高い入出力(io)コストを発生させるが、いくつかの層は計算集約的である。トレーニングプロセスは一般的に分散コンピューティングリソースを利用してトレーニング時間を短縮する。さらに、分散トレーニングプロセスには、CPU、複数のタイプのGPUなどの異種コンピューティングリソースが利用できる。したがって、トレーニングプロセスにおいて、多様なコンピューティングリソースに対する複数のレイヤのスケジューリングが重要となる。異種計算資源を用いてDNNモデルを効率的に訓練するために,分散アーキテクチャと強化学習(RL)に基づくスケジューリング手法からなる分散フレームワークであるPaddle-Heterogeneous Parameter Server(Paddle-HeterPS)を提案する。 Paddle-HeterPSの利点は、既存のフレームワークと比べて3倍である。まず、Paddle-HeterPSは異種コンピューティングリソースを用いた多様なワークロードの効率的なトレーニングプロセスを実現する。第二に、Paddle-HeterPS は RL ベースの手法を利用して、スループットの制約を満たしながらコストを最小限に抑えるため、各レイヤのワークロードを適切な計算リソースに効率的にスケジュールする。第3に、Paddle-HeterPSは分散コンピューティングリソース間のデータストレージとデータ通信を管理する。我々は、パドル・ヘターPSがスループット(14.5倍)と金銭的コスト(312.3%以下)で最先端のアプローチを著しく上回ることを示す広範な実験を行った。フレームワークのコードは、https://github.com/PaddlePaddle/Paddle.comで公開されている。

Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high Input/Output (IO) cost, while some layers are compute-intensive. The training process generally exploits distributed computing resources to reduce training time. In addition, heterogeneous computing resources, e.g., CPUs, GPUs of multiple types, are available for the distributed training process. Thus, the scheduling of multiple layers to diverse computing resources is critical for the training process. To efficiently train a DNN model using the heterogeneous computing resources, we propose a distributed framework, i.e., Paddle-Heterogeneous Parameter Server (Paddle-HeterPS), composed of a distributed architecture and a Reinforcement Learning (RL)-based scheduling method. The advantages of Paddle-HeterPS are three-fold compared with existing frameworks. First, Paddle-HeterPS enables efficient training process of diverse workloads with heterogeneous computing resources. Second, Paddle-HeterPS exploits an RL-based method to efficiently schedule the workload of each layer to appropriate computing resources to minimize the cost while satisfying throughput constraints. Third, Paddle-HeterPS manages data storage and data communication among distributed computing resources. We carry out extensive experiments to show that Paddle-HeterPS significantly outperforms state-of-the-art approaches in terms of throughput (14.5 times higher) and monetary cost (312.3% smaller). The codes of the framework are publicly available at: https://github.com/PaddlePaddle/Paddle.

翻訳日:2023-05-31 04:37:21 公開日:2023-05-28

# 視覚言語事前学習モデルは合成可能な原始概念を学ぶか?

Do Vision-Language Pretrained Models Learn Composable Primitive Concepts? ( http://arxiv.org/abs/2203.17271v3 )

ライセンス: Link先を確認

Tian Yun, Usha Bhalla, Ellie Pavlick, Chen Sun

(参考訳) 視覚言語(VL)事前訓練されたモデルは、マルチモーダル推論とゼロショット認識タスクにおいて印象的な性能を達成した。これらのVLモデルの多くは、未ラベルの画像とインターネットからのキャプションペアで事前訓練されている。本稿では,プリミティブな概念の表現 – 色や形状,対象部品の属性など – が,これらの事前学習されたVLモデルに自動的に組み込まれるかを検討する。そこで本研究では,合成概念マッピング(compmap)という2段階の枠組みを提案する。 CompMapはまず、テキストプロンプトでプリミティブな概念アクティベーションを生成するためにVLモデルを求め、続いて、プリミティブな概念アクティベーション(例えば、ブラックテールやレッドウィング)を複合的な概念(例えば、赤翼のブラックバード)にマッピングするコンポジションモデルを構築することを学ぶ。構成モデルは基礎的真理の原始概念から確実に学習できることを示す。したがって、プリミティブな概念が実際にVL事前学習モデルに現れるなら、そのプリミティブな概念アクティベーションは、専門家が設計したような構成モデルを学ぶのに使用できる。類似度を測定するための定量的指標を提案し,その計量を解釈可能性計量と呼ぶ。また,プリミティブ概念アクティベーションと学習合成モデルを用いて複合概念を予測した場合の分類精度を測定し,有用指標として参照する。本研究は,最先端のvlプリトレーニングモデルが,cubデータセットのきめ細かなビジュアル認識や,mit-statesデータセットの合成一般化タスクに非常に有用なプリミティブ概念を学習することを明らかにする。しかし,我々は,学習構成モデルが定性解析において低い解釈性を有することを観察した。本結果は,既存のVLモデルの限界と,プリミティブな概念の獲得を促す事前学習の必要性を明らかにする。

Vision-language (VL) pretrained models have achieved impressive performance on multimodal reasoning and zero-shot recognition tasks. Many of these VL models are pretrained on unlabeled image and caption pairs from the internet. In this paper, we study whether representations of primitive concepts--such as colors, shapes, or the attributes of object parts--emerge automatically within these pretrained VL models. We propose a two-step framework, Compositional Concept Mapping (CompMap), to investigate this. CompMap first asks a VL model to generate primitive concept activations with text prompts, and then learns to construct a composition model that maps the primitive concept activations (e.g. the likelihood of black tail or red wing) to composite concepts (e.g. a red-winged blackbird). We show that a composition model can be reliably learn from ground truth primitive concepts. We thus hypothesize that if primitive concepts indeed emerge in a VL pretrained model, its primitive concept activations can be used to learn a composition model similar to the one designed by experts. We propose a quantitative metric to measure the degree of similarity, and refer to the metric as the interpretability metric. We also measure the classification accuracy when using the primitive concept activations and the learned composition model to predict the composite concepts, and refer to it as the usefulness metric. Our study reveals that state-of-the-art VL pretrained models learn primitive concepts that are highly useful for fine-grained visual recognition on the CUB dataset, and compositional generalization tasks on the MIT-States dataset. However, we observe that the learned composition models have low interpretability in our qualitative analyses. Our results reveal the limitations of existing VL models, and the necessity of pretraining objectives that encourage the acquisition of primitive concepts.

翻訳日:2023-05-31 04:29:45 公開日:2023-05-28

# 顎・高齢者の音声認識におけるオンザフライ特徴に基づくラピッド話者適応

On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition ( http://arxiv.org/abs/2203.14593v3 )

ライセンス: Link先を確認

Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu

(参考訳) 関節症と高齢者の発話の正確な認識は、いまだに難しい課題である。アクセントや性別に起因する話者レベルの不均質性は、年齢や言語障害を伴うと、これらの話者の間に大きな多様性を生み出す。話者レベルのデータの不足は、データ集約型モデルに基づく話者適応手法の実用化を制限する。そこで本研究では、分散規則化スペクトルベース埋め込み(SVR)とスペクトル特徴駆動f-LHUC変換という、2つの新しいデータ効率・特徴量に基づくオンザフライ話者適応手法を提案する。 UASpeech dysarthric と DementiaBank Pitt の高齢者音声コーパスを用いて行った実験では、提案されたオンザフライ話者適応アプローチは、統計学的に有意な WER の 2.48%-2.85% の絶対 (7.92%-8.06% ) と、オフラインモデルに基づく LHUC の 1.82% の絶対 (5.63% の相対) の適応により、ベースライン iVector によるハイブリッド DNN/TDNN と E2E コンフォーマーシステムより一貫して優れていることが示唆された。

Accurate recognition of dysarthric and elderly speech remain challenging tasks to date. Speaker-level heterogeneity attributed to accent or gender, when aggregated with age and speech impairment, create large diversity among these speakers. Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods. To this end, this paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods: variance-regularized spectral basis embedding (SVR) and spectral feature driven f-LHUC transforms. Experiments conducted on UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest the proposed on-the-fly speaker adaptation approaches consistently outperform baseline iVector adapted hybrid DNN/TDNN and E2E Conformer systems by statistically significant WER reduction of 2.48%-2.85% absolute (7.92%-8.06% relative), and offline model based LHUC adaptation by 1.82% absolute (5.63% relative) respectively.

翻訳日:2023-05-31 04:28:15 公開日:2023-05-28

# ディエンス検索のためのテスト時間クエリ表現の最適化

Optimizing Test-Time Query Representations for Dense Retrieval ( http://arxiv.org/abs/2205.12680v3 )

ライセンス: Link先を確認

Mujeen Sung, Jungsoo Park, Jaewoo Kang, Danqi Chen, Jinhyuk Lee

(参考訳) 高密度検索の最近の進展は,事前学習されたクエリとコンテキストエンコーダからのクエリとコンテキストの品質表現に依存している。本稿では,テスト時検索結果からの信号により誘導されるインスタンスレベルのクエリ表現をさらに最適化する tour (test-time optimization of query representations) を提案する。クロスエンコーダの再ランク付けを利用して,検索結果にきめ細かい擬似ラベルを提供し,勾配降下を伴うクエリ表現を反復的に最適化する。理論的解析により,TOURは疑似関連性フィードバックのための古典的ロッキオアルゴリズムの一般化と見なすことができ,擬似ラベルをハードバイナリあるいはソフト連続ラベルとして活用する2つの変種を示す。提案する句再ランク付け器を用いて,まず句検索に tour を適用し,本手法の有効性を評価した。 TOURは、エンドツーエンドのオープンドメイン質問応答精度を大幅に向上し、また、経路検索性能も向上する。さらにTOURは、より効率的な実装で1.3-2.4倍高速に実行しながら、最大2.0%のダイレクトリランクを改善する。

Recent developments of dense retrieval rely on quality representations of queries and contexts from pre-trained query and context encoders. In this paper, we introduce TOUR (Test-Time Optimization of Query Representations), which further optimizes instance-level query representations guided by signals from test-time retrieval results. We leverage a cross-encoder re-ranker to provide fine-grained pseudo labels over retrieval results and iteratively optimize query representations with gradient descent. Our theoretical analysis reveals that TOUR can be viewed as a generalization of the classical Rocchio algorithm for pseudo relevance feedback, and we present two variants that leverage pseudo-labels as hard binary or soft continuous labels. We first apply TOUR on phrase retrieval with our proposed phrase re-ranker, and also evaluate its effectiveness on passage retrieval with an off-the-shelf re-ranker. TOUR greatly improves end-to-end open-domain question answering accuracy, as well as passage retrieval performance. TOUR also consistently improves direct re-ranking by up to 2.0% while running 1.3-2.4x faster with an efficient implementation.

翻訳日:2023-05-31 04:19:21 公開日:2023-05-28

# MVP: 自然言語生成のためのマルチタスク事前トレーニング

MVP: Multi-task Supervised Pre-training for Natural Language Generation ( http://arxiv.org/abs/2206.12131v3 )

ライセンス: Link先を確認

Tianyi Tang, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen

(参考訳) プレトレーニング言語モデル(PLM)は自然言語生成(NLG)タスクにおいて顕著な成功を収めた。現在、ほとんどのNPG指向のPLMは、大規模汎用コーパスを用いて教師なしで事前訓練されている。一方、ラベル付きデータ(例えば「教師付き事前訓練」)で事前訓練されたモデルの数は、教師なし事前訓練モデルよりも優れた性能を示している。教師付き事前学習の成功に触発され,自然言語生成のためのマルチタスク・スーパーバイザド・プレトレーニング(MVP)を提案する。大規模な自然言語生成コーパスであるMVPCorpusを、17ドルのデータセットから11ドルの多様なNLGタスクから収集しています。次に、これらの例を一般的なテキスト・トゥ・テキスト形式に統一し、テキスト生成モデルMVPを教師付きで事前訓練する。各タスクについて、特定のタスクを実行するモデルのキャパシティを刺激する、特定のソフトプロンプトを事前トレーニングします。我々のMVPモデルは、比較的小さなPLM上での最近の命令チューニングを利用する実践と見なすことができる。広範な実験により、多数のnlgタスクにおけるmvpモデルの有効性と汎用性を実証し、17ドルのデータセットから13ドルの最先端のパフォーマンスを実現し、bartを9.3\%$、flan-t5を5.8\%$で上回った。

Pre-trained language models (PLMs) have achieved remarkable success in natural language generation (NLG) tasks. Up to now, most NLG-oriented PLMs are pre-trained in an unsupervised manner using the large-scale general corpus. In the meanwhile, an increasing number of models pre-trained with labeled data (i.e. "supervised pre-training") showcase superior performance compared to unsupervised pre-trained models. Motivated by the success of supervised pre-training, we propose Multi-task superVised Pre-training (MVP) for natural language generation. We collect a large-scale natural language generation corpus, MVPCorpus, from $77$ datasets over $11$ diverse NLG tasks. Then we unify these examples into a general text-to-text format to pre-train the text generation model MVP in a supervised manner. For each task, we further pre-train specific soft prompts to stimulate the model's capacity to perform a specific task. Our MVP model can be seen as a practice that utilizes recent instruction tuning on relatively small PLMs. Extensive experiments have demonstrated the effectiveness and generality of our MVP model in a number of NLG tasks, which achieves state-of-the-art performance on $13$ out of $17$ datasets, outperforming BART by $9.3\%$ and Flan-T5 by $5.8\%$.

翻訳日:2023-05-31 04:10:47 公開日:2023-05-28

# 小脳分離のための信頼誘導型教師なしドメイン適応

Confidence-Guided Unsupervised Domain Adaptation for Cerebellum Segmentation ( http://arxiv.org/abs/2206.10357v2 )

ライセンス: Link先を確認

Xuan Li, Paule-J Toussaint, Alan Evans, and Xue Liu

(参考訳) 小脳の包括的高分解能アトラスの欠如は、正常な脳機能と疾患に対する小脳の関与の研究を妨げている。小脳皮質の葉のきつい側面のよい表現は、非常に複雑な表面とそれが手動の起伏に要する時間のために達成し難い。手動セグメンテーションの品質は人間の専門家による判断に影響され、自動ラベリングは既存のセグメンテーションアルゴリズムの限られた堅牢性によって制限される。 20umisotropic BigBrain データセットは、磁気共鳴イメージングによって得られる 1000um(1mm) の解像度と比較して、セマンティックセグメンテーションのための前例のない高解像度のフレームワークを提供する。手動アノテーション要件を不要にするために,allen brain human brain atlasの小脳からbigbrainへのアノテーションを教師なしの方法で適応的に伝達するモデルを訓練することを提案する。アレン脳とBigBrainの視覚的相違は、有意義なセグメンテーションマスクを提供する既存のアプローチや、BigBrainデータの分割と組織学的スライス作成によるアーティファクトの提供を妨げている。これらの問題に対処するために,まずアレン脳小脳を大脳と視覚の類似性を共有する空間に移す2段階の枠組みを提案する。次に,疑似ラベルからモデル学習を反復的に導くために,信頼度マップを用いた自己学習戦略を導入する。定量的実験により, 他の手法と比較して2.6%以上の損失低減が可能であることが判明した。

The lack of a comprehensive high-resolution atlas of the cerebellum has hampered studies of cerebellar involvement in normal brain function and disease. A good representation of the tightly foliated aspect of the cerebellar cortex is difficult to achieve because of the highly convoluted surface and the time it would take for manual delineation. The quality of manual segmentation is influenced by human expert judgment, and automatic labelling is constrained by the limited robustness of existing segmentation algorithms. The 20umisotropic BigBrain dataset provides an unprecedented high resolution framework for semantic segmentation compared to the 1000um(1mm) resolution afforded by magnetic resonance imaging. To dispense with the manual annotation requirement, we propose to train a model to adaptively transfer the annotation from the cerebellum on the Allen Brain Human Brain Atlas to the BigBrain in an unsupervised manner, taking into account the different staining and spacing between sections. The distinct visual discrepancy between the Allen Brain and BigBrain prevents existing approaches to provide meaningful segmentation masks, and artifacts caused by sectioning and histological slice preparation in the BigBrain data pose an extra challenge. To address these problems, we propose a two-stage framework where we first transfer the Allen Brain cerebellum to a space sharing visual similarity with the BigBrain. We then introduce a self-training strategy with a confidence map to guide the model learning from the noisy pseudo labels iteratively. Qualitative results validate the effectiveness of our approach, and quantitative experiments reveal that our method can achieve over 2.6% loss reduction compared with other approaches.

翻訳日:2023-05-31 04:09:45 公開日:2023-05-28

# 測定依存性と隠蔽性のトレードオフ関係としての緩和ベル不等式

Relaxed Bell inequalities as a trade-off relation between measurement dependence and hiddenness ( http://arxiv.org/abs/2206.06196v3 )

ライセンス: Link先を確認

Gen Kimura, Yugo Susuki and Kei Morisue

(参考訳) ベルの不等式に反する量子相関は、任意の(測定独立な)局所隠れ変数理論では説明できない。しかし、この違反は、現実、局所性、測定の独立性という基礎となる仮定の不一致を暗示し、各仮定が定量的に違反する程度を扱わない。対照的に、ホール (2010, 2011) はそれぞれの仮定を定量化し、基礎となる仮定の間のトレードオフ関係を与えるベル-CHSH不等式を一般化した。本稿では,隠蔽変数(隠蔽性)の定量化を導入し,任意の局所隠蔽変数理論に当てはまる隠蔽変数と測定依存性との間の新たなトレードオフ関係を導出する。

Quantum correlations that violate the Bell inequality cannot be explained by any (measurement independent) local hidden variable theory. However, the violation only implies incompatibility of the underlying assumptions of reality, locality, and measurement independence, and does not address the extent to which each assumption is violated quantitatively. In contrast, Hall (2010,2011) gave a quantification of each assumption and generalized the Bell-CHSH inequality that gives a trade-off relationship between the underlying assumptions. In this paper, we introduce a quantification of hidden variables (hiddenness) and derive a new trade-off relation between the hiddenness and the measurement dependency that holds for any local hidden variable theory.

翻訳日:2023-05-31 04:09:17 公開日:2023-05-28

# 正極性ラベルからのマルチラベルサンプルのマイニング

Mining Multi-Label Samples from Single Positive Labels ( http://arxiv.org/abs/2206.05764v4 )

ライセンス: Link先を確認

Youngin Cho, Daejin Kim, Mohammad Azam Khan, Jaegul Choo

(参考訳) cgans (conditional generative adversarial networks) はクラス条件生成タスクにおいて優れた結果を示している。複数の条件を同時に制御するために、cGANは複数のラベルのトレーニングデータセットを必要とする。それでも、膨大なアノテーションコストは、実世界のシナリオにおけるマルチラベルデータセットのアクセシビリティを制限する。そこで本研究では,各データインスタンスに明示的な負のラベルを持たない1つの正のラベルをアノテートする,単一正の設定という実践的設定について検討する。単一正の設定でマルチラベルデータを生成するために,マルコフ連鎖モンテカルロ法に基づいて,シングル・トゥ・マルチラベル(s2m)サンプリングと呼ばれる新しいサンプリング手法を提案する。提案するs2mサンプリング手法により,既存の無条件および条件付きganが最小限のアノテーションコストで高品質なマルチラベルデータを描画できる。実画像データセットに対する大規模な実験は、完全に注釈付きデータセットで訓練されたモデルと比較しても、我々の手法の有効性と正確性を検証する。

Conditional generative adversarial networks (cGANs) have shown superior results in class-conditional generation tasks. To simultaneously control multiple conditions, cGANs require multi-label training datasets, where multiple labels can be assigned to each data instance. Nevertheless, the tremendous annotation cost limits the accessibility of multi-label datasets in real-world scenarios. Therefore, in this study we explore the practical setting called the single positive setting, where each data instance is annotated by only one positive label with no explicit negative labels. To generate multi-label data in the single positive setting, we propose a novel sampling approach called single-to-multi-label (S2M) sampling, based on the Markov chain Monte Carlo method. As a widely applicable "add-on" method, our proposed S2M sampling method enables existing unconditional and conditional GANs to draw high-quality multi-label data with a minimal annotation cost. Extensive experiments on real image datasets verify the effectiveness and correctness of our method, even when compared to a model trained with fully annotated datasets.

翻訳日:2023-05-31 04:08:35 公開日:2023-05-28

# 複合超解像と逆トーン・マッピング:特徴分解集約ネットワークと新しいベンチマーク

Joint Super-Resolution and Inverse Tone-Mapping: A Feature Decomposition Aggregation Network and A New Benchmark ( http://arxiv.org/abs/2207.03367v2 )

ライセンス: Link先を確認

Gang Xu (1), Yu-chen Yang (1), Liang Wang (2), Jun Xu (1), Xian-Tong Zhen (3) ((1) Nankai University, (2) Institute of Automation, CAS, (3) Guangdong University of Petrochemical Technology)

(参考訳) 超解像と逆トーン・マッピング(交叉SR-ITM)は,低解像度および標準ダイナミックレンジ画像の解像度とダイナミックレンジの向上を目的としている。最近のネットワークは主に複雑なマルチブランチアーキテクチャによる画像分解技術に依存している。しかし、固定分解技術は多彩な画像に対するパワーをほとんど制限する。本稿では,分解機構の潜在的な力を利用するために,画像領域からより広い特徴領域へ一般化する。そこで本稿では,軽量な特徴分解集約ネットワーク(fdan)を提案する。特に,特徴分解ブロック(FDB)を設計して,詳細と基本特徴マップの学習可能な分離を実現し,FDBをカスケードして階層的特徴分解グループを構築する。さらに、比較手法をよりよく評価するために、ロバストモデルトレーニングと評価のための汎用シナリオを提供する共同SR-ITM、すなわちSRITM-4Kの大規模データセットを収集する。 2つのベンチマークデータセットによる実験結果から、FDANは効率的で、関節SR-ITMの最先端手法よりも優れていることが示された。 FDANとSRITM-4Kデータセットのコードはhttps://github.com/CS-GangXu/FDANで公開されている。

Joint Super-Resolution and Inverse Tone-Mapping (joint SR-ITM) aims to increase the resolution and dynamic range of low-resolution and standard dynamic range images. Recent networks mainly resort to image decomposition techniques with complex multi-branch architectures. However, the fixed decomposition techniques would largely restricts their power on versatile images. To exploit the potential power of decomposition mechanism, in this paper, we generalize it from the image domain to the broader feature domain. To this end, we propose a lightweight Feature Decomposition Aggregation Network (FDAN). In particular, we design a Feature Decomposition Block (FDB) to achieve learnable separation of detail and base feature maps, and develop a Hierarchical Feature Decomposition Group by cascading FDBs for powerful multi-level feature decomposition. Moreover, to better evaluate the comparison methods, we collect a large-scale dataset for joint SR-ITM, i.e., SRITM-4K, which provides versatile scenarios for robust model training and evaluation. Experimental results on two benchmark datasets demonstrate that our FDAN is efficient and outperforms state-of-the-art methods on joint SR-ITM. The code of our FDAN and the SRITM-4K dataset are available at https://github.com/CS-GangXu/FDAN.

翻訳日:2023-05-31 04:01:33 公開日:2023-05-28

# パラメトリック方程式発見のための深層学習と記号回帰

Deep Learning and Symbolic Regression for Discovering Parametric Equations ( http://arxiv.org/abs/2207.00529v2 )

ライセンス: Link先を確認

Michael Zhang, Samuel Kim, Peter Y. Lu, Marin Solja\v{c}i\'c

(参考訳) シンボリック回帰(symbolive regression)は、データの制御公式を学習し、科学的発見を変革する可能性を持つ機械学習技術である。しかし、シンボリック回帰は、解析できるシステムの複雑さと次元性にはまだ制限がある。一方、ディープラーニングは、非常に複雑で高次元のデータセットを解析する能力に機械学習を変革した。本稿では,ある係数が変化するが基礎となる支配方程式の構造が一定であるパラメトリックシステムにシンボリック回帰を拡張するニューラルネットワークアーキテクチャを提案する。本稿では,様々な解析式,ODE,PDEを様々な係数で表し,トレーニング領域の外によく外挿されていることを示す。ニューラルネットワークベースのアーキテクチャは、他のディープラーニングアーキテクチャとも統合でき、エンドツーエンドのトレーニングを受けたまま、高次元データを分析できる。この目的のために、アーキテクチャを畳み込みニューラルネットワークと統合し、様々なスプリングシステムの1次元画像を分析する。

Symbolic regression is a machine learning technique that can learn the governing formulas of data and thus has the potential to transform scientific discovery. However, symbolic regression is still limited in the complexity and dimensionality of the systems that it can analyze. Deep learning on the other hand has transformed machine learning in its ability to analyze extremely complex and high-dimensional datasets. We propose a neural network architecture to extend symbolic regression to parametric systems where some coefficient may vary but the structure of the underlying governing equation remains constant. We demonstrate our method on various analytic expressions, ODEs, and PDEs with varying coefficients and show that it extrapolates well outside of the training domain. The neural network-based architecture can also integrate with other deep learning architectures so that it can analyze high-dimensional data while being trained end-to-end. To this end we integrate our architecture with convolutional neural networks to analyze 1D images of varying spring systems.

翻訳日:2023-05-31 03:59:55 公開日:2023-05-28

# 自然言語生成のためのジョイントジェネレータ・ランカー学習

Joint Generator-Ranker Learning for Natural Language Generation ( http://arxiv.org/abs/2206.13974v3 )

ライセンス: Link先を確認

Weizhou Shen, Yeyun Gong, Yelong Shen, Song Wang, Xiaojun Quan, Nan Duan, Weizhu Chen

(参考訳) Generate-then-rankはテキスト生成のための広く使われているメカニズムであり、ジェネレータは複数のテキスト候補を生成し、ローダはテキスト候補の中で最良のものを選択する。しかし、既存の手法は通常、ジェネレータとランチャーを個別に訓練し、相互フィードバックを無視して生成品質をさらに向上させる。この制限に対処するために,ジェネレータとランカを単一のフレームワークに統合した新しい共同学習アルゴリズムであるJGRを提案する。 JGRは、データ可能性とランサー報酬を組み合わせたハイブリッド目的でジェネレータを最適化し、ジェネレータ出力と比較する対照的な損失でローダを訓練する。ジェネレータとランク装置を反復的に更新することにより、JGRは学習を効果的に調和させ、共同で品質を高めることができる。各種テキスト生成タスクにおけるJGRの評価を行い,3つの共通生成シナリオにおける4つの公開データセット上の既存手法を超えることを示す。私たちのコードとモデルはhttps://github.com/microsoft/ProphetNet/tree/master/JGRで公開されています。

Generate-then-rank is a widely used mechanism for text generation, where a generator produces multiple text candidates and a ranker chooses the best one among the text candidates. However, existing methods usually train the generator and the ranker individually, neglecting the mutual feedback that could further enhance the generation quality. To tackle this limitation, we propose JGR, a novel joint training algorithm that integrates the generator and the ranker in a single framework. JGR optimizes the generator with a hybrid objective that combines data likelihood and ranker reward, and trains the ranker with a contrastive loss that compares the generator outputs. By iteratively updating the generator and the ranker, JGR can effectively harmonize their learning and enhance their quality jointly. We evaluate JGR on various text generation tasks and demonstrate that it surpasses existing methods on four public datasets across three common generation scenarios. Our code and models are publicly available at https://github.com/microsoft/ProphetNet/tree/master/JGR.

翻訳日:2023-05-31 03:59:16 公開日:2023-05-28

# 制約付き微分的共役結合型バンディット

Differentially Private Federated Combinatorial Bandits with Constraints ( http://arxiv.org/abs/2206.13192v2 )

ライセンス: Link先を確認

Sambhav Solanki, Samhita Kanaparthy, Sankarshan Damle, Sujit Gujar

(参考訳) オンライン学習環境,すなわちフェデレーション学習(fl)では,協調学習パラダイムが急速に向上している。ほとんどのFL設定とは異なり、エージェントが競合する多くの状況がある。それぞれのエージェントは、他の人から学びたいと思っているが、他の人から学ぶために共有する情報の一部は、センシティブであり、したがって、プライバシを欲しがる。本研究は, 品質制約を維持しつつ, 類似の組合せ帯域問題を解決するために, 同時に作業するエージェント群について検討する。これらのエージェントは、差分プライバシーを利用して機密情報を秘密にしながら、集合的に学習できるのか? 私たちはコミュニケーションが後悔を減らすことを観察する。しかし、機密情報を保護するための差分プライバシー技術は、データを騒がしくし、後悔を改善するのに役立つほど劣化する可能性がある。したがって、いつ通信するか、どの共有データを学習して、後悔とプライバシのバランスを取るかを決めることが不可欠である。このような組み合わせMAB設定のために、プライバシ保存型フェデレーションコンビナート帯域幅アルゴリズムP-FCBを提案する。シミュレーションによりp-fcbの有効性を示す。さらに,本アルゴリズムは,品質のしきい値と有意義なプライバシー保証を保ちながら,後悔の点でも改善できることを示した。

There is a rapid increase in the cooperative learning paradigm in online learning settings, i.e., federated learning (FL). Unlike most FL settings, there are many situations where the agents are competitive. Each agent would like to learn from others, but the part of the information it shares for others to learn from could be sensitive; thus, it desires its privacy. This work investigates a group of agents working concurrently to solve similar combinatorial bandit problems while maintaining quality constraints. Can these agents collectively learn while keeping their sensitive information confidential by employing differential privacy? We observe that communicating can reduce the regret. However, differential privacy techniques for protecting sensitive information makes the data noisy and may deteriorate than help to improve regret. Hence, we note that it is essential to decide when to communicate and what shared data to learn to strike a functional balance between regret and privacy. For such a federated combinatorial MAB setting, we propose a Privacy-preserving Federated Combinatorial Bandit algorithm, P-FCB. We illustrate the efficacy of P-FCB through simulations. We further show that our algorithm provides an improvement in terms of regret while upholding quality threshold and meaningful privacy guarantees.

翻訳日:2023-05-31 03:58:58 公開日:2023-05-28

# LR-Net:低解像度画像分類のためのブロックベース畳み込みニューラルネットワーク

LR-Net: A Block-based Convolutional Neural Network for Low-Resolution Image Classification ( http://arxiv.org/abs/2207.09531v5 )

ライセンス: Link先を確認

Ashkan Ganj, Mohsen Ebadpour, Mahdi Darvish, Hamid Bahador

(参考訳) 近年,CNNによる画像分類と特徴抽出の成功により,画像分類が盛んになったが,ノイズや低品質の画像の分類に芸術モデルの状況を適用すると,画像分類の課題がより困難になる。モデルがこのタイプの画像から有意義な特徴を抽出することは、その低解像度と有意義なグローバルな特徴の欠如のため、依然として困難である。さらに、高解像度画像はトレーニングにより多くのレイヤーを必要とするため、トレーニングにより多くの時間と計算能力を要する。また,前述した深層ニューラルネットワークでは,層がより深くなり,勾配が消失する問題にも対処している。これらの問題すべてに対処するため,我々は,低レベルとグローバル両方の特徴を,ぼやけた低解像度画像から学習するために設計された,新しい画像分類アーキテクチャを開発した。ブロックの設計は,性能向上とパラメータサイズ削減のために,Residual ConnectionとInceptionモジュールの影響を強く受けていた。私たちはまた、MNISTファミリデータセットを使用して、Oracle-MNISTデータセットに特に重点を置いて、私たちの作業を評価します。提案するアーキテクチャが既存の最先端畳み込みニューラルネットワークよりも高速かつ正確であることを実証する詳細なテストを実施した。さらに,モデルのユニークな特性から,パラメータの少ない方がよい結果が得られる。

The success of CNN-based architecture on image classification in learning and extracting features made them so popular these days, but the task of image classification becomes more challenging when we apply state of art models to classify noisy and low-quality images. It is still difficult for models to extract meaningful features from this type of image due to its low-resolution and the lack of meaningful global features. Moreover, high-resolution images need more layers to train which means they take more time and computational power to train. Our method also addresses the problem of vanishing gradients as the layers become deeper in deep neural networks that we mentioned earlier. In order to address all these issues, we developed a novel image classification architecture, composed of blocks that are designed to learn both low level and global features from blurred and noisy low-resolution images. Our design of the blocks was heavily influenced by Residual Connections and Inception modules in order to increase performance and reduce parameter sizes. We also assess our work using the MNIST family datasets, with a particular emphasis on the Oracle-MNIST dataset, which is the most difficult to classify due to its low-quality and noisy images. We have performed in-depth tests that demonstrate the presented architecture is faster and more accurate than existing cutting-edge convolutional neural networks. Furthermore, due to the unique properties of our model, it can produce a better result with fewer parameters.

翻訳日:2023-05-31 03:49:12 公開日:2023-05-28

# MGG:マルチGPUプラットフォーム上での微細カーネル内通信-計算パイプライニングによるグラフニューラルネットワークの高速化

MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms ( http://arxiv.org/abs/2209.06800v2 )

ライセンス: Link先を確認

Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Kevin Barker, Ang Li, and Yufei Ding

(参考訳) グラフニューラルネットワーク(GNN)の入力グラフサイズの増加は、マルチGPUプラットフォームの使用需要を浮き彫りにしている。しかし,既存のマルチGPUGNNシステムは,従来のDNNのスケーリング手法に基づいて,計算と通信を個別に最適化している。不規則にスパースできめ細かなGNNワークロードに対して、そのようなソリューションは、ハイパフォーマンスデリバリのための計算と通信操作を共同でスケジュール/最適化する機会を逃している。そこで本研究では,マルチGPUプラットフォーム上でのフルグラフGNNを高速化するシステム設計であるMGGを提案する。 MGGの中核は、GPUカーネル内での微粒な計算通信オーバラップを容易にする、新しい動的ソフトウェアパイプラインである。特にMGGは、ワークロードのバランシングと運用オーバーラップを容易にするために、GNN対応パイプライン構築とGPU対応パイプラインマッピングを導入している。 MGGはまた、解析モデリングと最適化ヒューリスティックを備えたインテリジェントランタイム設計を取り入れ、実行性能を動的に改善する。 MGGは、DGL、MGG-UVM、ROCよりも平均4.41X、4.81X、10.83倍高速である。

The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG, a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41X, 4.81X, and 10.83X faster than DGL, MGG-UVM, and ROC, respectively.

翻訳日:2023-05-31 03:42:36 公開日:2023-05-28

# PaLI: 共同スケール多言語画像モデル

PaLI: A Jointly-Scaled Multilingual Language-Image Model ( http://arxiv.org/abs/2209.06794v3 )

ライセンス: Link先を確認

Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

(参考訳) 効率的なスケーリングとフレキシブルなタスクインターフェースにより、大きな言語モデルが多くのタスクで優れている。本稿では,この手法を言語と視覚の融合モデリングに拡張するPaLI(Pathways Language and Image Model)を提案する。 paliは視覚とテキストの入力に基づいてテキストを生成し、このインターフェイスは多くの言語で多くの視覚、言語、マルチモーダルタスクを実行する。 PaLIのトレーニングには、大きなトレーニング済みエンコーダデコーダ言語モデルと視覚変換器(ViT)を利用する。これにより、既存の能力を活用し、トレーニングのかなりのコストを活用できます。ビジョンと言語コンポーネントのジョイントスケーリングが重要であることが分かりました。既存の言語用トランスフォーマーはビジョンモデルよりもはるかに大きいため、4ビリオンパラメータのViT(ViT-e)をトレーニングし、さらに大きな容量のビジョンモデルの利点を定量化する。 PaLIをトレーニングするために、100以上の言語で10B画像とテキストを含む新しい画像テキストトレーニングセットに基づいて、事前学習タスクの多言語混合を作成する。 PaLIは、複数の視覚と言語タスク(キャプション、視覚的質問応答、シーンテキスト理解など)において最先端を達成しつつ、シンプルでモジュラーでスケーラブルな設計を維持している。

Effective scaling and a flexible task interface enable large language models to excel at many tasks. We present PaLI (Pathways Language and Image model), a model that extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaLI, we make use of large pre-trained encoder-decoder language models and Vision Transformers (ViTs). This allows us to capitalize on their existing capabilities and leverage the substantial cost of training them. We find that joint scaling of the vision and language components is important. Since existing Transformers for language are much larger than their vision counterparts, we train a large, 4-billion parameter ViT (ViT-e) to quantify the benefits from even larger-capacity vision models. To train PaLI, we create a large multilingual mix of pretraining tasks, based on a new image-text training set containing 10B images and texts in over 100 languages. PaLI achieves state-of-the-art in multiple vision and language tasks (such as captioning, visual question-answering, scene-text understanding), while retaining a simple, modular, and scalable design.

翻訳日:2023-05-31 03:42:14 公開日:2023-05-28

# コンテクスト化ハイブリッドモデルによるランキングとキャリブレーションの協調最適化

Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model ( http://arxiv.org/abs/2208.06164v2 )

ライセンス: Link先を確認

Xiang-Rong Sheng, Jingyue Gao, Yueyao Cheng, Siran Yang, Shuguang Han, Hongbo Deng, Yuning Jiang, Jian Xu, Bo Zheng

(参考訳) ランキング最適化手法の開発にもかかわらず、ポイントワイズ損失はクリックスルー率予測において依然として優位なアプローチである。これは、予測をクリック確率と見なすことができるため、ポイントワイズ損失のキャリブレーション能力に起因する可能性がある。実際には、CTR予測モデルは、一般的にランキング能力によって評価される。ランキング能力を最適化するために、ランキングの損失(例えば、ペアワイズまたはリストワイズ損失)は、通常、ポイントワイズ損失よりも優れたランク付けを達成できるため、採用できる。これまでの研究では、両者の損失から利益を得るために2つの損失を直接組み合わせて実験し、性能が向上した。しかし、以前の研究では、アウトプット・ロジットをクリックスルーレートとして意味付けしており、それが最適な解決策につながる可能性がある。この問題に対処するため,我々はランキング・キャリブレーション能力(JRC)を簡易に最適化する手法を提案する。 JRCは、サンプルのロジット値を異なるラベルで対比することでランキング能力を向上し、ロジットサブトラクションの関数である予測確率を制約する。さらに,JRCはロジットの解釈を強化し,ロジットが共同分布をモデル化していることを示す。このような解釈により、JRCは文脈化されたハイブリッド識別・生成目的をほぼ最適化していることを示す。パブリックデータセットと産業データセットとオンラインa/bテストの実験では,評価とキャリブレーションの両能力が改善されている。 2022年5月以降、JRCはAlibabaのディスプレイ広告プラットフォームに配備され、大幅な性能向上を実現している。

Despite the development of ranking optimization techniques, pointwise loss remains the dominating approach for click-through rate prediction. It can be attributed to the calibration ability of the pointwise loss since the prediction can be viewed as the click probability. In practice, a CTR prediction model is also commonly assessed with the ranking ability. To optimize the ranking ability, ranking loss (e.g., pairwise or listwise loss) can be adopted as they usually achieve better rankings than pointwise loss. Previous studies have experimented with a direct combination of the two losses to obtain the benefit from both losses and observed an improved performance. However, previous studies break the meaning of output logit as the click-through rate, which may lead to sub-optimal solutions. To address this issue, we propose an approach that can Jointly optimize the Ranking and Calibration abilities (JRC for short). JRC improves the ranking ability by contrasting the logit value for the sample with different labels and constrains the predicted probability to be a function of the logit subtraction. We further show that JRC consolidates the interpretation of logits, where the logits model the joint distribution. With such an interpretation, we prove that JRC approximately optimizes the contextualized hybrid discriminative-generative objective. Experiments on public and industrial datasets and online A/B testing show that our approach improves both ranking and calibration abilities. Since May 2022, JRC has been deployed on the display advertising platform of Alibaba and has obtained significant performance improvements.

翻訳日:2023-05-31 03:40:54 公開日:2023-05-28

# 単一量子ビットゲートテレポーテーションは量子アドバンテージを提供する

Single-qubit gate teleportation provides a quantum advantage ( http://arxiv.org/abs/2209.14158v2 )

ライセンス: Link先を確認

Libor Caha, Xavier Coiteux-Roy, Robert Koenig

(参考訳) ゲートテレポーテーション回路は、量子計算の利点をもたらすと信じられている計算の最も基本的な例の1つである: [quantum inf. comput., 4(2):134--145], terhal と divincenzo は、これらの回路が、合理的な複雑性・理論的な仮定の下で、効率的な古典的アルゴリズムによるシミュレーションを免れることを示した。ここでは、回路の出力分布に非ゼロ確率で現れる文字列を出力することが目的であるこのタスクの特に弱い形式である確率論的シミュレーション(Phys. A 106, 062430 (2022))を考える。単一量子Clifford-gate-teleportation回路であっても、このシミュレーション問題はファンインゲートが有界な定深古典回路では解決できない。その結果,パリティの計算問題,古典的回路複雑性におけるよく研究された問題への還元によって得られた。

Gate-teleportation circuits are arguably among the most basic examples of computations believed to provide a quantum computational advantage: In seminal work [Quantum Inf. Comput., 4(2):134--145], Terhal and DiVincenzo have shown that these circuits elude simulation by efficient classical algorithms under plausible complexity-theoretic assumptions. Here we consider possibilistic simulation [Phys. Rev. A 106, 062430 (2022)], a particularly weak form of this task where the goal is to output any string appearing with non-zero probability in the output distribution of the circuit. We show that even for single-qubit Clifford-gate-teleportation circuits this simulation problem cannot be solved by constant-depth classical circuits with bounded fan-in gates. Our results are unconditional and are obtained by a reduction to the problem of computing the parity, a well-studied problem in classical circuit complexity.

翻訳日:2023-05-31 03:32:16 公開日:2023-05-28

# スパイクニューラルネットワークのための時空間拡散注意法

A Spatial-channel-temporal-fused Attention for Spiking Neural Networks ( http://arxiv.org/abs/2209.10837v3 )

ライセンス: Link先を確認

Wuque Cai, Hongze Sun, Rui Liu, Yan Cui, Jun Wang, Yang Xia, Dezhong Yao, and Daqing Guo

(参考訳) スパイクニューラルネットワーク(SNN)は脳の計算戦略を模倣し、時空間情報処理においてかなりの能力を示す。人間の知覚に必須の要素として、視覚注意は生物視覚システムにおいてサルエント領域を選択するダイナミックなプロセスを指す。視覚注意機構はコンピュータビジョンアプリケーションで大きな成功を収めているが、snsに導入されることは滅多にない。そこで本研究では,SNNを誘導し,蓄積した歴史的空間チャネル情報を利用して,対象領域を効果的に捉えることのできる,空間チャネル融合型注意モジュール(SCTFA)を提案する。 3つのイベントストリームデータセット(DVS Gesture, SL-Animals-DVS, MNIST-DVS)の体系的評価により、SCTFAモジュール(SCTFA-SNN)を用いたSNNが、ベースラインSNN(BL-SNN)と他の2つのSNNモデルに分解された注目モジュールを著しく上回るだけでなく、既存の最先端手法との競合精度も達成できることを示した。さらに,SCTFA-SNNモデルでは,不完全なデータに直面する場合,ノイズに対する強い頑健さと優れた安定性を保ちながら,複雑性と効率の維持を図っている。これらの結果は、脳の適切な認知機構を組み込むことが、SNNの能力を高めるための有望なアプローチをもたらすことを示唆している。

Spiking neural networks (SNNs) mimic brain computational strategies, and exhibit substantial capabilities in spatiotemporal information processing. As an essential factor for human perception, visual attention refers to the dynamic process for selecting salient regions in biological vision systems. Although visual attention mechanisms have achieved great success in computer vision applications, they are rarely introduced into SNNs. Inspired by experimental observations on predictive attentional remapping, we propose a new spatial-channel-temporal-fused attention (SCTFA) module that can guide SNNs to efficiently capture underlying target regions by utilizing accumulated historical spatial-channel information in the present study. Through a systematic evaluation on three event stream datasets (DVS Gesture, SL-Animals-DVS and MNIST-DVS), we demonstrate that the SNN with the SCTFA module (SCTFA-SNN) not only significantly outperforms the baseline SNN (BL-SNN) and two other SNN models with degenerated attention modules, but also achieves competitive accuracy with existing state-of-the-art methods. Additionally, our detailed analysis shows that the proposed SCTFA-SNN model has strong robustness to noise and outstanding stability when faced with incomplete data, while maintaining acceptable complexity and efficiency. Overall, these findings indicate that incorporating appropriate cognitive mechanisms of the brain may provide a promising approach to elevate the capabilities of SNNs.

翻訳日:2023-05-31 03:30:58 公開日:2023-05-28

# Dense-ATOMIC:高知識カバレッジと大規模マルチホップパスを備えた高機能なATOMICを目指して

Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths ( http://arxiv.org/abs/2210.07621v2 )

ライセンス: Link先を確認

Xiangqing Shen, Siwei Wu, and Rui Xia

(参考訳) ATOMICは大規模なコモンセンス知識グラフ(CSKG)で、日々のif-thenの知識三重項、すなわち {head event, relation, tail event}を含んでいる。ワンホップの注釈法により、ATOMICは独立した二部グラフの集合となり、異なる二部グラフのイベント間の多数のリンクを無視し、結果として知識カバレッジやマルチホップパスの不足を引き起こした。本研究は,Dense-ATOMICを高知識と大規模マルチホップパスで構築することを目的としている。 ATOMICのイベントは、最初は一貫したパターンに正規化されます。次に,Rel-CSKGCと呼ばれるCSKG補完手法を提案し,三重項の先頭イベントと尾イベントの関係を推定し,ATOMICの既存の三重項に基づくCSKG補完モデルを訓練する。最終的に、このモデルを用いて、ATOMICの欠落したリンクを完了し、Dense-ATOMICを構築する。 ATOMICの注釈付きサブグラフにおける自動的および人的評価は、強いベースラインに対するRel-CSKGCの利点を示す。我々はさらに、Dense-ATOMICの知識被覆とマルチホップパスにおける利点を証明し、統計、人的評価、簡単な下流タスクの観点から、Dense-ATOMICの広範な評価を行う。 Rel-CSKGCとDense-ATOMICのソースコードはhttps://github.com/NUSTM/Dense-ATOMICで公開されている。

ATOMIC is a large-scale commonsense knowledge graph (CSKG) containing everyday if-then knowledge triplets, i.e., {head event, relation, tail event}. The one-hop annotation manner made ATOMIC a set of independent bipartite graphs, which ignored the numerous links between events in different bipartite graphs and consequently caused shortages in knowledge coverage and multi-hop paths. In this work, we aim to construct Dense-ATOMIC with high knowledge coverage and massive multi-hop paths. The events in ATOMIC are normalized to a consistent pattern at first. We then propose a CSKG completion method called Rel-CSKGC to predict the relation given the head event and the tail event of a triplet, and train a CSKG completion model based on existing triplets in ATOMIC. We finally utilize the model to complete the missing links in ATOMIC and accordingly construct Dense-ATOMIC. Both automatic and human evaluation on an annotated subgraph of ATOMIC demonstrate the advantage of Rel-CSKGC over strong baselines. We further conduct extensive evaluations on Dense-ATOMIC in terms of statistics, human evaluation, and simple downstream tasks, all proving Dense-ATOMIC's advantages in Knowledge Coverage and Multi-hop Paths. Both the source code of Rel-CSKGC and Dense-ATOMIC are publicly available on https://github.com/NUSTM/Dense-ATOMIC.

翻訳日:2023-05-31 03:24:31 公開日:2023-05-28

# VIMA:マルチモーダルプロンプトによる汎用ロボット操作

VIMA: General Robot Manipulation with Multimodal Prompts ( http://arxiv.org/abs/2210.03094v2 )

ライセンス: Link先を確認

Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan

(参考訳) プロンプトに基づく学習は自然言語処理において成功し、入力プロンプトによって指定されたタスクを実行するために単一の汎用言語モデルを指示することができる。しかしロボティクスにおけるタスク仕様は、ワンショットデモの模倣、言語指示の追従、視覚目標の達成など、さまざまな形態で実現されている。それらはしばしば異なるタスクと見なされ、特殊なモデルによって取り組まれる。ロボット操作タスクの幅広い範囲を多モーダルなプロンプトで表現し,テキストトークンと視覚トークンを介在することを示す。そこで本研究では,複数モーダルプロンプトを持つ数千のプロシージャ生成テーブルトップタスクと,模倣学習のための600K以上の専門トラジェクトリと,体系的一般化のための4段階評価プロトコルからなる新しいシミュレーションベンチマークを開発した。我々は、これらのプロンプトを処理するトランスフォーマーベースのロボットエージェントVIMAを設計し、自動回帰動作を出力する。 VIMAは強力なモデルスケーラビリティとデータ効率を実現するレシピを備えている。これは、同じトレーニングデータに対して最大2.9\times$タスク成功率で、最も難しいゼロショット一般化設定で代替設計を上回っている。 10\times$のトレーニングデータでは、vimaは最高の競合製品よりも2.7\times$が良い。コードとビデオのデモはhttps://vimalabs.github.io/で見ることができる。

Prompt-based learning has emerged as a successful paradigm in natural language processing, where a single general-purpose language model can be instructed to perform any task specified by input prompts. Yet task specification in robotics comes in various forms, such as imitating one-shot demonstrations, following language instructions, and reaching visual goals. They are often considered different tasks and tackled by specialized models. We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens. Accordingly, we develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert trajectories for imitation learning, and a four-level evaluation protocol for systematic generalization. We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively. VIMA features a recipe that achieves strong model scalability and data efficiency. It outperforms alternative designs in the hardest zero-shot generalization setting by up to $2.9\times$ task success rate given the same training data. With $10\times$ less training data, VIMA still performs $2.7\times$ better than the best competing variant. Code and video demos are available at https://vimalabs.github.io/

翻訳日:2023-05-31 03:22:30 公開日:2023-05-28

# 生成検索のための非パラメトリックデコーディング

Nonparametric Decoding for Generative Retrieval ( http://arxiv.org/abs/2210.02068v3 )

ライセンス: Link先を確認

Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vlad Karpukhin, Yi Lu, Minjoon Seo

(参考訳) 生成検索モデルは、外部メモリのないモデルパラメータで符号化された情報のみに依存し、その情報容量は制限され固定される。この制限を克服するため,既存の生成検索モデルに適用可能な非パラメトリックデコーディング(Npデコーディング)を提案する。 npデコードでは、バニラボカブ組込みをデコーダボカブ組込みとしてではなく、非パラメトリックコンテキスト化vocab組込み(外部メモリ)を使用する。文脈化されたvocab埋め込みを利用することで、生成的検索モデルはパラメトリック空間と非パラメトリック空間の両方を利用することができる。文書検索タスクにおける9つのデータセット(シングルホップ8個、マルチホップ1個)に対する評価は、生成的検索モデルにNpデコードを適用することにより、性能が大幅に向上することを示している。また、Npデコーディングはデータおよびパラメータ効率が高く、ゼロショット設定では高い性能を示す。

The generative retrieval model depends solely on the information encoded in its model parameters without external memory, its information capacity is limited and fixed. To overcome the limitation, we propose Nonparametric Decoding (Np Decoding) which can be applied to existing generative retrieval models. Np Decoding uses nonparametric contextualized vocab embeddings (external memory) rather than vanilla vocab embeddings as decoder vocab embeddings. By leveraging the contextualized vocab embeddings, the generative retrieval model is able to utilize both the parametric and nonparametric space. Evaluation over 9 datasets (8 single-hop and 1 multi-hop) in the document retrieval task shows that applying Np Decoding to generative retrieval models significantly improves the performance. We also show that Np Decoding is data- and parameter-efficient, and shows high performance in the zero-shot setting.

翻訳日:2023-05-31 03:22:11 公開日:2023-05-28

# 神経崩壊の摂動解析

Perturbation Analysis of Neural Collapse ( http://arxiv.org/abs/2210.16658v2 )

ライセンス: Link先を確認

Tom Tirer, Haoxiang Huang, Jonathan Niles-Weed

(参考訳) 分類のためのディープニューラルネットワークのトレーニングには、ゼロトレーニングエラーポイントを超えるトレーニング損失の最小化が含まれることが多い。この段階では、クラス内のサンプルの特徴(ペナルティメート層のアウトプット)の変化が減少し、異なるクラスの平均的な特徴が特定のタイトなフレーム構造に近づくという「神経崩壊」の挙動が観察されている。最近の研究は、全ての最小値が完全に崩壊する理想化されていない特徴モデルを通してこの振る舞いを分析する。しかし、実際的なネットワークやデータセットでは、例えば深い層は崩壊から程遠い中間の機能を任意に修正できないため、機能は通常正確な崩壊に達しない。本稿では,特徴を予め定義された特徴行列(例えば,中間特徴)の近傍に留まらせることにより,この現象を捉えることができるリッチなモデルを提案する。本研究では, 摂動解析により小近傍のモデルを調べ, 既往のモデルでは得られない結果を得る。例えば、最適化された特徴のクラス内変動を(最小限の仮定で「中央経路」の勾配流を解析することで)事前定義された入力特徴と比較し、近収束状態における最小値を分析し、正規化ハイパーパラメータが崩壊の近さに与える影響についての洞察を与える。我々は,実際の深層学習環境で実験を行い,理論を支持する。

Training deep neural networks for classification often includes minimizing the training loss beyond the zero training error point. In this phase of training, a "neural collapse" behavior has been observed: the variability of features (outputs of the penultimate layer) of within-class samples decreases and the mean features of different classes approach a certain tight frame structure. Recent works analyze this behavior via idealized unconstrained features models where all the minimizers exhibit exact collapse. However, with practical networks and datasets, the features typically do not reach exact collapse, e.g., because deep layers cannot arbitrarily modify intermediate features that are far from being collapsed. In this paper, we propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix (e.g., intermediate features). We explore the model in the small vicinity case via perturbation analysis and establish results that cannot be obtained by the previously studied models. For example, we prove reduction in the within-class variability of the optimized features compared to the predefined input features (via analyzing gradient flow on the "central-path" with minimal assumptions), analyze the minimizers in the near-collapse regime, and provide insights on the effect of regularization hyperparameters on the closeness to collapse. We support our theory with experiments in practical deep learning settings.

翻訳日:2023-05-31 03:15:13 公開日:2023-05-28

# 協調推論誘導言語モデルによる数学単語問題の解法

Solving Math Word Problems via Cooperative Reasoning induced Language Models ( http://arxiv.org/abs/2210.16257v4 )

ライセンス: Link先を確認

Xinyu Zhu, Junjie Wang, Lin Zhang, Yuxiang Zhang, Yongfeng Huang, Ruyi Gan, Jiaxing Zhang, Yujiu Yang

(参考訳) 大規模事前学習言語モデル(PLM)は、特に数学語問題(MWP)のような高レベルの知性を必要とする問題に新たな機会をもたらす。しかしながら、既存のPLMをMWPに直接適用することは、生成プロセスが十分な監督を欠いているため、人間としての高速な適応性を欠いているため失敗する可能性がある。人間の推論には、即時反応系(システム1)と微妙な推論系(システム2)から構成される二重推論の枠組みがあることに気付く。これにより、協調推論(Cooperative Reasoning, CoRe)と呼ばれる、MWPを解くための協調推論によるPLMを開発することとなり、システム1をジェネレータとして、システム2をバリデーションとして、人間のような推論アーキテクチャを実現する。提案手法では, ジェネレータは推論経路の生成に責任を持ち, 検証器を用いて評価を監督し, ジェネレータに対する信頼性の高いフィードバックを得る。我々はCoReフレームワークをいくつかの数学的推論データセット上で評価し、最先端の手法よりも優れた改善を実現した。私たちのコードはhttps://github.com/TianHongZXY/CoReで利用可能です。

Large-scale pre-trained language models (PLMs) bring new opportunities to challenging problems, especially those that need high-level intelligence, such as the math word problem (MWPs). However, directly applying existing PLMs to MWPs can fail as the generation process lacks sufficient supervision and thus lacks fast adaptivity as humans. We notice that human reasoning has a dual reasoning framework that consists of an immediate reaction system (system 1) and a delicate reasoning system (system 2), where the entire reasoning is determined by their interaction. This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier. In our approach, the generator is responsible for generating reasoning paths, and the verifiers are used to supervise the evaluation in order to obtain reliable feedback for the generator. We evaluate our CoRe framework on several mathematical reasoning datasets and achieve decent improvement over state-of-the-art methods, up to 9.6% increase over best baselines. Our codes are available at https://github.com/TianHongZXY/CoRe

翻訳日:2023-05-31 03:14:33 公開日:2023-05-28

# wavebound: 安定時系列予測のための動的エラー境界

WaveBound: Dynamic Error Bounds for Stable Time Series Forecasting ( http://arxiv.org/abs/2210.14303v2 )

ライセンス: Link先を確認

Youngin Cho, Daejin Kim, Dongmin Kim, Mohammad Azam Khan, Jaegul Choo

(参考訳) 時系列予測は、交通、エネルギー消費、経済と財政、疾病分析といった現実の応用において高い実用性のために重要な課題となっている。最近のディープラーニングベースのアプローチは、時系列予測で顕著な成功を示している。それでも、時系列データのダイナミクスのため、ディープネットワークは不安定なトレーニングと過度な適合に悩まされている。実世界のデータに現れる一貫性のないパターンは、モデルを特定のパターンにバイアスし、一般化を制限する。本稿では,時系列予測における過適合問題に対処するため,トレーニング損失の動的誤差境界を導入する。そこで本研究では,各イテレーションの時間ステップと特徴ごとにトレーニング損失の適切な誤差範囲を推定するウェーブバウンドと呼ばれる正規化手法を提案する。予測不可能なデータにモデルを集中させることで、WaveBoundはトレーニングプロセスを安定させ、一般化を大幅に改善する。大規模な実験により、WaveBoundは最先端モデルを含む既存のモデルを大きく改善することを示す。

Time series forecasting has become a critical task due to its high practicality in real-world applications such as traffic, energy consumption, economics and finance, and disease analysis. Recent deep-learning-based approaches have shown remarkable success in time series forecasting. Nonetheless, due to the dynamics of time series data, deep networks still suffer from unstable training and overfitting. Inconsistent patterns appearing in real-world data lead the model to be biased to a particular pattern, thus limiting the generalization. In this work, we introduce the dynamic error bounds on training loss to address the overfitting issue in time series forecasting. Consequently, we propose a regularization method called WaveBound which estimates the adequate error bounds of training loss for each time step and feature at each iteration. By allowing the model to focus less on unpredictable data, WaveBound stabilizes the training process, thus significantly improving generalization. With the extensive experiments, we show that WaveBound consistently improves upon the existing models in large margins, including the state-of-the-art model.

翻訳日:2023-05-31 03:13:53 公開日:2023-05-28

# 大規模交通速度推定のためのスパースセンシング:ラプラシアン強化低ランクテンソルクリグ法

Correlating sparse sensing for large-scale traffic speed estimation: A Laplacian-enhanced low-rank tensor kriging approach ( http://arxiv.org/abs/2210.11780v3 )

ライセンス: Link先を確認

Tong Nie, Guoyang Qin, Yunpeng Wang, Jian Sun

(参考訳) 交通速度は道路網の流動性を特徴づける中心である。多くの輸送アプリケーションは、リアルタイムナビゲーション、動的経路計画、混雑管理など、それに依存している。センサと通信技術の急速な進歩は、交通速度の検出をこれまで以上に容易にする。しかし,静的センサの配置不足や移動センサの浸透率の低下により,検出速度は不完全であり,ネットワーク全体の利用には程遠い。さらに、センサーは様々な理由でデータの誤りや欠落を招きやすいため、これらのセンサーの速度はノイズが高くなる可能性がある。これらの欠点は、不完全なデータから信頼できる見積もりを回収するための効果的な手法を必要とする。本研究では,この問題を時空間クリグ問題として認識し,低ランク性および多次元相関を考慮したラプラシア拡張低ランクテンソル補完(LETC)フレームワークを提案する。具体的には、時間連続性、時間周期性、空間近接性を含む3種類の速度相関を、時間グラフフーリエ変換、一般化時間整合正則化、拡散グラフ正則化という3つの異なる形式のグラフラプラシアンによって慎重に、同時にモデル化する。次に,提案したモデルをネットワークワイド・クリグにスケールアップするために,複数の有効な数値手法を用いて効率的な解アルゴリズムを設計する。 2つの公開100万レベルのトラヒックスピードデータセットで実験を行うことで、我々は最終的に結論を導き、提案するletcは、低観察率でも最先端のクリング性能を達成し、同時に、ベースライン法に比べて半分以上の計算時間を節約できることを示した。時空間的トラフィックデータモデリングとネットワークレベルでのkrigingに関する洞察も提供されている。

Traffic speed is central to characterizing the fluidity of the road network. Many transportation applications rely on it, such as real-time navigation, dynamic route planning, and congestion management. Rapid advances in sensing and communication techniques make traffic speed detection easier than ever. However, due to sparse deployment of static sensors or low penetration of mobile sensors, speeds detected are incomplete and far from network-wide use. In addition, sensors are prone to error or missing data due to various kinds of reasons, speeds from these sensors can become highly noisy. These drawbacks call for effective techniques to recover credible estimates from the incomplete data. In this work, we first identify the issue as a spatiotemporal kriging problem and propose a Laplacian enhanced low-rank tensor completion (LETC) framework featuring both lowrankness and multi-dimensional correlations for large-scale traffic speed kriging under limited observations. To be specific, three types of speed correlation including temporal continuity, temporal periodicity, and spatial proximity are carefully chosen and simultaneously modeled by three different forms of graph Laplacian, named temporal graph Fourier transform, generalized temporal consistency regularization, and diffusion graph regularization. We then design an efficient solution algorithm via several effective numeric techniques to scale up the proposed model to network-wide kriging. By performing experiments on two public million-level traffic speed datasets, we finally draw the conclusion and find our proposed LETC achieves the state-of-the-art kriging performance even under low observation rates, while at the same time saving more than half computing time compared with baseline methods. Some insights into spatiotemporal traffic data modeling and kriging at the network level are provided as well.

翻訳日:2023-05-31 03:12:30 公開日:2023-05-28

# SketchySGD:ランダムな曲率推定による信頼性確率最適化

SketchySGD: Reliable Stochastic Optimization via Randomized Curvature Estimates ( http://arxiv.org/abs/2211.08597v4 )

ライセンス: Link先を確認

Zachary Frangella, Pratik Rathore, Shipu Zhao, Madeleine Udell

(参考訳) SketchySGDは、サブサンプルのHessianに対するランダム化低ランク近似を用いることで、機械学習の既存の確率勾配法を改善し、幅広い凸機械学習問題に対してうまく機能する自動ステップサイズを導入する。固定段数を持つSketchySGDが最適の周りの小さな球に線形に収束することを理論的に示す。さらに、不条件条件下では、SketchySGDは最小二乗問題に対してSGDよりも高速に収束することを示す。この改善を実データに対するリッジ回帰実験で実証的に検証する。密度および疎度データを用いたリッジおよびロジスティック回帰問題の数値実験により、SketchySGDのデフォルトのハイパーパラメーターは、最高の性能が得られるように調整された場合でも、一般的な確率勾配法と同等あるいはより良い結果が得られることを示した。特にSketchySGDは、840ドル(約8万4000円)以上のRAMを格納するデータマトリックスを使って、不条件のロジスティック回帰問題を解決することができる。 sketchysgdの既定のハイパーパラメーターでアウト・オブ・ザ・ボックスを動作させ、悪条件の問題に優れる能力は、他の確率的勾配法よりも優れている。

SketchySGD improves upon existing stochastic gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian and by introducing an automated stepsize that works well across a wide range of convex machine learning problems. We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum. Further, in the ill-conditioned setting we show SketchySGD converges at a faster rate than SGD for least-squares problems. We validate this improvement empirically with ridge regression experiments on real data. Numerical experiments on both ridge and logistic regression problems with dense and sparse data, show that SketchySGD equipped with its default hyperparameters can achieve comparable or better results than popular stochastic gradient methods, even when they have been tuned to yield their best performance. In particular, SketchySGD is able to solve an ill-conditioned logistic regression problem with a data matrix that takes more than $840$GB RAM to store, while its competitors, even when tuned, are unable to make any progress. SketchySGD's ability to work out-of-the box with its default hyperparameters and excel on ill-conditioned problems is an advantage over other stochastic gradient methods, most of which require careful hyperparameter tuning (especially of the learning rate) to obtain good performance and degrade in the presence of ill-conditioning.

翻訳日:2023-05-31 03:06:10 公開日:2023-05-28

# mOKB6: 多言語オープンな知識ベースコンプリートベンチマーク

mOKB6: A Multilingual Open Knowledge Base Completion Benchmark ( http://arxiv.org/abs/2211.06959v2 )

ライセンス: Link先を確認

Shubham Mittal, Keshav Kolluru, Soumen Chakrabarti, Mausam

(参考訳) オープン知識ベース(Open KB)の自動補完は,オープン情報抽出(Open IE)システムによって得られる3つの形式(対象語句,関係語句,対象語句)から構築され,テキストに直接存在しない可能性のある新規事実の発見に有用である。しかし、Open KB Complete(Open KBC)の研究は、これまで英語のようなリソース豊富な言語に限られてきた。マルチ言語Open IEの最新の進歩を利用して、最初のマルチ言語Open KBCデータセット、mOKB6を構築し、ウィキペディアの事実を6言語(英語を含む)で記述した。従来のOpen KB構築パイプラインの改善には,マルチリンガルコア参照の解決と,エンティティリンクされたトリプルのみを保持することで,密集したOpen KBを作成する。我々は,タスクのためのいくつかのモデルを試行し,共有埋め込み空間と事実の翻訳の助けを借りて,言語を組み合わせるという一貫した利点を観察する。また、現在の多言語モデルは、異なるスクリプトの言語で見られる事実を覚えるのに苦労している。

Automated completion of open knowledge bases (Open KBs), which are constructed from triples of the form (subject phrase, relation phrase, object phrase), obtained via open information extraction (Open IE) system, are useful for discovering novel facts that may not be directly present in the text. However, research in Open KB completion (Open KBC) has so far been limited to resource-rich languages like English. Using the latest advances in multilingual Open IE, we construct the first multilingual Open KBC dataset, called mOKB6, containing facts from Wikipedia in six languages (including English). Improving the previous Open KB construction pipeline by doing multilingual coreference resolution and keeping only entity-linked triples, we create a dense Open KB. We experiment with several models for the task and observe a consistent benefit of combining languages with the help of shared embedding space as well as translations of facts. We also observe that current multilingual models struggle to remember facts seen in languages of different scripts.

翻訳日:2023-05-31 03:05:20 公開日:2023-05-28

# 拡散モデルに基づく雑音線形逆問題に対する後方サンプリング

Diffusion Model Based Posterior Sampling for Noisy Linear Inverse Problems ( http://arxiv.org/abs/2211.12343v2 )

ライセンス: Link先を確認

Xiangming Meng and Yoshiyuki Kabashima

(参考訳) 加法ガウス雑音を用いたユビキタス線形逆問題について考察し,拡散モデルに基づく後方サンプリング (DMPS) と呼ばれる教師なしサンプリング手法を提案する。具体的には、一つの拡散モデル(dm)を暗黙の先行として用いると、後続サンプリングの基本的な難易度は、ノイズ摂動度スコア、すなわちアニール度関数の勾配が難易度である。この問題を回避すべく,非形式的事前仮定を用いて,単純かつ効果的な閉形式近似を導入する。ノイズの超解像, ノイズ除去, デブロリング, カラー化など, 様々なノイズ線形逆問題に対して, 広範囲にわたる実験を行った。全てのタスクにおいて、提案したDMPSは、最先端の競合拡散後サンプリング(DPS)の3倍の速さで、様々なタスクにおいて高い競争力や性能を示す。結果を再現するコードはhttps://github.com/mengxiangming/dmpsで入手できる。

We consider the ubiquitous linear inverse problems with additive Gaussian noise and propose an unsupervised sampling approach called diffusion model based posterior sampling (DMPS) to reconstruct the unknown signal from noisy linear measurements. Specifically, using one diffusion model (DM) as an implicit prior, the fundamental difficulty in performing posterior sampling is that the noise-perturbed likelihood score, i.e., gradient of an annealed likelihood function, is intractable. To circumvent this problem, we introduce a simple yet effective closed-form approximation of it using an uninformative prior assumption. Extensive experiments are conducted on a variety of noisy linear inverse problems such as noisy super-resolution, denoising, deblurring, and colorization. In all tasks, the proposed DMPS demonstrates highly competitive or even better performances on various tasks while being 3 times faster than the state-of-the-art competitor diffusion posterior sampling (DPS). The code to reproduce the results is available at https://github.com/mengxiangming/dmps.

翻訳日:2023-05-31 02:54:55 公開日:2023-05-28

# マルチエージェントリーグトレーニングによる異種エージェント協調学習

Learning Heterogeneous Agent Cooperation via Multiagent League Training ( http://arxiv.org/abs/2211.11616v2 )

ライセンス: Link先を確認

Qingxu Fu, Xiaolin Ai, Jianqiang Yi, Tenghai Qiu, Wanmai Yuan, Zhiqiang Pu

(参考訳) 現実世界の多くのマルチエージェントシステムは、異なる能力と機能を持つ複数のタイプのエージェントを含んでいる。このような異質なマルチエージェントシステムには、大きな実用的利点がある。しかし、それらはまた、非定常問題やポリシーバージョン反復問題のようなマルチエージェント強化学習のための均質なシステムと比較される。本研究ではヘテロジニアス・リーグ・トレーニング(HLT)と呼ばれる汎用強化学習アルゴリズムを提案する。 hltは、エージェントがトレーニング中に検討したポリシーのプールを追跡し、将来のポリシー最適化を促進するために異種ポリシーのリーグを収集する。さらに、異なるレベルの協力スキルを持つチームメイトとコラボレーションする際のエージェント行動の多様性を高めるためにハイパーネットワークが導入された。我々は,(1)HLTが協調的不均一なタスクの成功率を促進すること,(2)HLTは政策バージョン反復問題の解決に有効なアプローチであること,(3)HLTは異種チームにおける各役割の学習の困難さを評価するための実践的な方法を提供する。

Many multiagent systems in the real world include multiple types of agents with different abilities and functionality. Such heterogeneous multiagent systems have significant practical advantages. However, they also come with challenges compared with homogeneous systems for multiagent reinforcement learning, such as the non-stationary problem and the policy version iteration issue. This work proposes a general-purpose reinforcement learning algorithm named Heterogeneous League Training (HLT) to address heterogeneous multiagent problems. HLT keeps track of a pool of policies that agents have explored during training, gathering a league of heterogeneous policies to facilitate future policy optimization. Moreover, a hyper-network is introduced to increase the diversity of agent behaviors when collaborating with teammates having different levels of cooperation skills. We use heterogeneous benchmark tasks to demonstrate that (1) HLT promotes the success rate in cooperative heterogeneous tasks; (2) HLT is an effective approach to solving the policy version iteration problem; (3) HLT provides a practical way to assess the difficulty of learning each role in a heterogeneous team.

翻訳日:2023-05-31 02:54:21 公開日:2023-05-28

# サンプル選択と平衡損失を用いた長周期雑音データからの学習

Learning from Long-Tailed Noisy Data with Sample Selection and Balanced Loss ( http://arxiv.org/abs/2211.10906v3 )

ライセンス: Link先を確認

Lefan Zhang, Zhang-Hao Tian, Wujun Zhou, Wei Wang

(参考訳) ディープラーニングの成功は、大規模かつ高精細なトレーニングデータに依存する一方で、現実世界のアプリケーションにおけるデータは、一般的にロングテールでノイズが多い。ロングテールデータやノイズデータを扱うために多くの手法が提案されているが、ロングテールデータを扱うためにいくつかの手法が開発されている。そこで本研究では,長い尾を持つ雑音データからサンプル選択と損失のバランスをとる頑健な学習法を提案する。具体的には、ノイズのあるトレーニングデータをクリーンなラベル付きセットとサンプル選択付き未ラベルセットに分離し、モデルバイアスに基づくバランスの取れた損失で、深いニューラルネットワークを半教師付きでトレーニングする。ベンチマーク実験により,本手法が既存の最先端手法より優れていることが示された。

The success of deep learning depends on large-scale and well-curated training data, while data in real-world applications are commonly long-tailed and noisy. Many methods have been proposed to deal with long-tailed data or noisy data, while a few methods are developed to tackle long-tailed noisy data. To solve this, we propose a robust method for learning from long-tailed noisy data with sample selection and balanced loss. Specifically, we separate the noisy training data into clean labeled set and unlabeled set with sample selection, and train the deep neural network in a semi-supervised manner with a balanced loss based on model bias. Extensive experiments on benchmarks demonstrate that our method outperforms existing state-of-the-art methods.

翻訳日:2023-05-31 02:53:35 公開日:2023-05-28

# ニューラル高次条件ランダム場を用いた共同情報抽出のためのインスタンス間相互作用のモデル化

Modeling Instance Interactions for Joint Information Extraction with Neural High-Order Conditional Random Field ( http://arxiv.org/abs/2212.08929v2 )

ライセンス: Link先を確認

Zixia Jia, Zhaohui Yan, Wenjuan Han, Zilong Zheng, Kewei Tu

(参考訳) 統合情報抽出(IE)は、典型的なモデルインスタンス(例えば、イベントトリガー、エンティティ、ロール、リレーションシップ)の表現強化、型依存のスコアリング、グローバルデコードによるインタラクションである。従来のモデルでは,一対のインスタンスのバイナリ型依存性のスコアリングが一般的であり,ビームサーチなどの局所探索を利用して大域的解を求める。クロスインスタンスインタラクションをよりよく統合するために、我々は、高次条件ランダムフィールドとしてIEを定式化する共同IEフレームワーク(CRFIE)を導入する。具体的には、一対のインスタンスだけでなく三重項の相互作用を直接モデル化するために、二元因子と三元因子を設計する。そして、これらの因子を用いて全てのインスタンスのラベルを共同で予測する。正確な高階推定の難解性問題に対処するために,平均場変分推論法から展開される高階のニューラルデコーダを取り入れ,一貫した学習と推論を実現する。実験の結果,本手法は3つのieタスクにおいてベースラインと先行作業と比較して一貫した改善が得られた。

Prior works on joint Information Extraction (IE) typically model instance (e.g., event triggers, entities, roles, relations) interactions by representation enhancement, type dependencies scoring, or global decoding. We find that the previous models generally consider binary type dependency scoring of a pair of instances, and leverage local search such as beam search to approximate global solutions. To better integrate cross-instance interactions, in this work, we introduce a joint IE framework (CRFIE) that formulates joint IE as a high-order Conditional Random Field. Specifically, we design binary factors and ternary factors to directly model interactions between not only a pair of instances but also triplets. Then, these factors are utilized to jointly predict labels of all instances. To address the intractability problem of exact high-order inference, we incorporate a high-order neural decoder that is unfolded from a mean-field variational inference method, which achieves consistent learning and inference. The experimental results show that our approach achieves consistent improvements on three IE tasks compared with our baseline and prior work.

翻訳日:2023-05-31 02:47:18 公開日:2023-05-28

# 離散ウェーブレット変換と生成逆ネットワークに基づくカラー文書画像の3段階二元化

Three-stage binarization of color document images based on discrete wavelet transform and generative adversarial networks ( http://arxiv.org/abs/2211.16098v3 )

ライセンス: Link先を確認

Yu-Shian Lin, Rui-Yang Ju, Chih-Chia Chen, Ting-Yu Lin, Jen-Shiun Chiang

(参考訳) 劣化したカラー文書画像における背景テキスト情報の効率的なセグメンテーションは熱い研究課題である。古文書の保存が長期にわたって不完全なため、染色、黄化、インクの浸出など様々な種類の劣化が画像二項化の結果に深刻な影響を与えている。本稿では, 離散ウェーブレット変換 (DWT) とGAN (Generative Adversarial Network) を用いて, 劣化したカラー文書画像の画像強調とバイナライズを行う3段階手法を提案する。ステージ1では、DWTを用いてLLサブバンド画像を保持し、画像強調を実現する。ステージ2では、元の入力画像は4つのシングルチャネル画像(赤、緑、青、灰色)に分割され、それぞれが独立した敵ネットワークを訓練する。トレーニングされた敵ネットワークモデルを用いて、画像から色前景情報を抽出する。グローバルな特徴とローカルな特徴を組み合わせるために、ステージ2からの出力画像と元の入力画像を用いて、文書バイナライゼーションのための独立した敵ネットワークを訓練する。実験の結果,提案手法は文書画像二元化コンテスト(DIBCO)データセットにおいて,従来のSOTA法よりも優れていた。私たちは実装コードをhttps://github.com/abcpp12383/ThreeStageBinarizationでリリースします。

The efficient segmentation of foreground text information from the background in degraded color document images is a hot research topic. Due to the imperfect preservation of ancient documents over a long period of time, various types of degradation, including staining, yellowing, and ink seepage, have seriously affected the results of image binarization. In this paper, a three-stage method is proposed for image enhancement and binarization of degraded color document images by using discrete wavelet transform (DWT) and generative adversarial network (GAN). In Stage-1, we use DWT and retain the LL subband images to achieve the image enhancement. In Stage-2, the original input image is split into four (Red, Green, Blue and Gray) single-channel images, each of which trains the independent adversarial networks. The trained adversarial network models are used to extract the color foreground information from the images. In Stage-3, in order to combine global and local features, the output image from Stage-2 and the original input image are used to train the independent adversarial networks for document binarization. The experimental results demonstrate that our proposed method outperforms many classical and state-of-the-art (SOTA) methods on the Document Image Binarization Contest (DIBCO) dataset. We release our implementation code at https://github.com/abcpp12383/ThreeStageBinarization.

翻訳日:2023-05-31 02:45:50 公開日:2023-05-28

# 対人ロバスト性が精度差に及ぼす影響の理解

Understanding the Impact of Adversarial Robustness on Accuracy Disparity ( http://arxiv.org/abs/2211.15762v2 )

ライセンス: Link先を確認

Yuzheng Hu, Fan Wu, Hongyang Zhang, Han Zhao

(参考訳) 敵対的ロバスト性は標準的な精度に反する可能性があり、異なるクラスにさらに異なる影響を与える可能性があることは、長い間実証されてきたが、そのような観察がどの程度の程度で、クラスの不均衡が内部でどのように役割を果たすのかについては、未解決の問題である。本稿では,ガウス混合モデルの下で線形分類器を詳しく検討することにより,この精度格差の問題を解明しようとする。本研究は, 対向ロバスト性の影響を, 頑健性制約による全クラスにおける標準精度を低下させる固有の効果と, 標準トレーニングと比較して精度の相違を増大させるクラス不均衡比によって引き起こされる影響の2つに分解する。さらに,データモデルを安定分布の一般族に一般化することにより,そのような効果がガウス混合モデルを超えて広がることを示す。より具体的には、敵対的ロバスト性の制約はバランスのとれたクラス設定の標準的精度を一貫して低下させるが、クラス不均衡比は安定分布の重く、ガウスの場合と比較して精度の差において根本的に異なる役割を担っていることを示す。さらに,合成データと実世界のデータの両方について実験を行い,理論的な知見を裏付ける。また,実世界のデータセット上での非線形モデルにも影響が及ぶ可能性が示唆された。私たちのコードはGitHubでhttps://github.com/Accuracy-Disparity/AT-on-ADで公開されています。

While it has long been empirically observed that adversarial robustness may be at odds with standard accuracy and may have further disparate impacts on different classes, it remains an open question to what extent such observations hold and how the class imbalance plays a role within. In this paper, we attempt to understand this question of accuracy disparity by taking a closer look at linear classifiers under a Gaussian mixture model. We decompose the impact of adversarial robustness into two parts: an inherent effect that will degrade the standard accuracy on all classes due to the robustness constraint, and the other caused by the class imbalance ratio, which will increase the accuracy disparity compared to standard training. Furthermore, we also show that such effects extend beyond the Gaussian mixture model, by generalizing our data model to the general family of stable distributions. More specifically, we demonstrate that while the constraint of adversarial robustness consistently degrades the standard accuracy in the balanced class setting, the class imbalance ratio plays a fundamentally different role in accuracy disparity compared to the Gaussian case, due to the heavy tail of the stable distribution. We additionally perform experiments on both synthetic and real-world datasets to corroborate our theoretical findings. Our empirical results also suggest that the implications may extend to nonlinear models over real-world datasets. Our code is publicly available on GitHub at https://github.com/Accuracy-Disparity/AT-on-AD.

翻訳日:2023-05-31 02:45:26 公開日:2023-05-28

# 宇宙デブリのための量子重力センサ

Quantum Gravitational Sensor for Space Debris ( http://arxiv.org/abs/2211.15695v2 )

ライセンス: Link先を確認

Meng-Zhi Wu, Marko Toro\v{s}, Sougato Bose, Anupam Mazumdar

(参考訳) 物質波干渉計は、等価原理や重力の量子性をテストするなど、重力実験の基本的な応用がある。さらに、物質波干渉計を量子センサとして使用して、外部の巨大な移動物体による局所重力加速度を測定することで、技術応用に役立てることができる。本稿では,外部移動物体からの重力勾配信号を記述するための3次元モデルを構築し,Stern-Gerlach セットアップに基づく物質波干渉計による達成可能な感度を理論的に検討する。応用として、メソスコピック干渉(MIMAC)と重力波検出法(New J. Phys. 22 083012 (2020))について検討し、周波数空間解析を用いて重力勾配に対する感度を定量化する。我々は,地球近傍の物体と衛星近傍の宇宙デブリを考察し,その距離,速度,方向の関数として物体の最小検出可能な質量を推定する。

Matter-wave interferometers have fundamental applications for gravity experiments such as testing the equivalence principle and the quantum nature of gravity. In addition, matter-wave interferometers can be used as quantum sensors to measure the local gravitational acceleration caused by external massive moving objects, thus lending itself for technological applications. In this paper, we will establish a three dimensional model to describe the gravity gradient signal from an external moving object, and theoretically investigate the achievable sensitivities using the matter-wave interferometer based on the Stern-Gerlach set-up. As an application we will consider the Mesoscopic Interference for Metric and Curvature (MIMAC) and Gravitational wave detection scheme [New J. Phys. 22, 083012 (2020)] and quantify its sensitivity to gravity gradients using frequency-space analysis. We will consider objects near Earth-based experiments and space debris in proximity of satellites and estimate the minimum detectable mass of the object as a function of their distance, velocity, and orientation.

翻訳日:2023-05-31 02:44:59 公開日:2023-05-28

# perturb初期特徴:半教師付きノード分類のためのスパース特徴に基づくニューラルネットワークの一般化

Perturb Initial Features: Generalization of Neural Networks Under Sparse Features for Semi-supervised Node Classification ( http://arxiv.org/abs/2211.15081v7 )

ライセンス: Link先を確認

Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim

(参考訳) グラフニューラルネットワーク(GNN)は、半教師付き設定で一般的に使用される。これまでの研究は主に、ホモ親和性グラフとヘテロ親和性グラフの両方でよく機能する適切なグラフフィルタ(例えばアグリゲーション法)の発見に重点を置いてきた。これらの手法は有効であるが、初期データがゼロでない要素をほとんど含まないノード機能に悩まされることがある。これは、トレーニングサンプルがグラフフィルタ(超平面)の全範囲をカバーしていないため、最初の射影行列の特定の次元で過度に適合する可能性がある。そこで本研究では,新しいデータ拡張戦略を提案する。具体的には、初期特徴と超平面の両方を反転させることで、学習可能なパラメータをより正確に更新し、推論中に目に見えない特徴の堅牢性を向上する訓練スペースを構築する。私たちの知る限りでは、これは最初の機能によって引き起こされるオーバーフィットを軽減する最初の試みです。実世界のデータセットに対する大規模な実験により,提案手法によりノード分類精度が46.5%向上した。

Graph neural networks (GNNs) are commonly used in semi-supervised settings. Previous research has primarily focused on finding appropriate graph filters (e.g. aggregation methods) to perform well on both homophilic and heterophilic graphs. While these methods are effective, they can still suffer from the sparsity of node features, where the initial data contain few non-zero elements. This can lead to overfitting in certain dimensions in the first projection matrix, as training samples may not cover the entire range of graph filters (hyperplanes). To address this, we propose a novel data augmentation strategy. Specifically, by flipping both the initial features and hyperplane, we create additional space for training, which leads to more precise updates of the learnable parameters and improved robustness for unseen features during inference. To the best of our knowledge, this is the first attempt to mitigate the overfitting caused by the initial features. Extensive experiments on real-world datasets show that our proposed technique increases node classification accuracy by up to 46.5% relatively.

翻訳日:2023-05-31 02:44:41 公開日:2023-05-28

# 信頼できない言語モデル:パラメトリックおよび非パラメトリック記憶の有効性と限界を探る

When Not to Trust Language Models: Investigating Effectiveness and Limitations of Parametric and Non-Parametric Memories ( http://arxiv.org/abs/2212.10511v2 )

ライセンス: Link先を確認

Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh Hajishirzi

(参考訳) 大きな言語モデル(LM)は、多種多様なタスクにおける印象的なパフォーマンスにもかかわらず、豊かな世界の知識を必要とするタスクに苦戦し、豊富な世界の知識を符号化するためにパラメータのみに依存するという制限を暗示している。本稿では,10モデルと4つの拡張手法を用いた大規模知識探索実験をPopQA上で実施することにより,事実知識の記憶におけるLMの強みと限界を理解することを目的とする。 LMは、あまり一般的でない事実知識に苦しむとともに、長期にわたる事実知識の記憶の改善に失敗する。そして, 検索拡張されたLMは, 大容量のLMよりもはるかに優れており, 高人気エンティティに関する問題では, LMの非支援が競争力を維持していることを示す。これらの結果に基づき,非パラメトリック記憶を必要時にのみ検索できる,強力かつ効率的な検索型lms法を考案した。実験結果から,モデルの性能が大幅に向上し,推論コストが低減された。

Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge. This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments of 10 models and 4 augmentation methods on PopQA, our new open-domain QA dataset with 14k questions. We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the long tail. We then show that retrieval-augmented LMs largely outperform orders of magnitude larger LMs, while unassisted LMs remain competitive in questions about high-popularity entities. Based on those findings, we devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary. Experimental results show that this significantly improves models' performance while reducing the inference costs.

翻訳日:2023-05-31 02:35:57 公開日:2023-05-28

# Naamapadam: インデックス言語用の大規模なエンティティアノテーション付きデータ

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages ( http://arxiv.org/abs/2212.10168v2 )

ライセンス: Link先を確認

Arnav Mhaske, Harshit Kedia, Sumanth Doddapaneni, Mitesh M. Khapra, Pratyush Kumar, Rudra Murthy V, Anoop Kunchukuttan

(参考訳) 現在、Naamapadamは、2つの言語ファミリーから11の主要なインドの言語に対して、最も広く公開されている名前付きエンティティ認識(NER)データセットである。このデータセットには、11言語中9つの標準エンティティカテゴリ(Person、Location、Organization)から少なくとも100万のエンティティが注釈付けされた400万以上の文が含まれている。トレーニングデータセットは、英語文から対応するインド語翻訳に自動的にタグ付けされたエンティティを投影することにより、サマナンタル並列コーパスから自動的に作成される。また、9言語用に手動でアノテーション付きのテストセットを作成します。 Naamapadam-testデータセット上で得られたデータセットの有用性を示す。 IndicNERは、Naamapadamトレーニングセットを微調整した多言語IndicBERTモデルである。 IndicNERは、9ドルのテスト言語のうち、80ドル以上でF1スコアを達成している。データセットとモデルは、https://ai4bharat.iitm.ac.in/naamapadamでオープンソースライセンスで利用できる。

We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. The dataset contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location, and, Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language translation. We also create manually annotated testsets for 9 languages. We demonstrate the utility of the obtained dataset on the Naamapadam-test dataset. We also release IndicNER, a multilingual IndicBERT model fine-tuned on Naamapadam training set. IndicNER achieves an F1 score of more than $80$ for $7$ out of $9$ test languages. The dataset and models are available under open-source licences at https://ai4bharat.iitm.ac.in/naamapadam.

翻訳日:2023-05-31 02:35:38 公開日:2023-05-28

# マルチアスペクト制御可能なテキスト生成のための拡張可能なプラグアンドプレイ法

An Extensible Plug-and-Play Method for Multi-Aspect Controllable Text Generation ( http://arxiv.org/abs/2212.09387v2 )

ライセンス: Link先を確認

Xuancheng Huang, Zijun Liu, Peng Li, Tao Li, Maosong Sun, Yang Liu

(参考訳) 近年、複数の側面(感情、話題、キーワードなど)で生成されたテキストを制御するマルチアスペクト制御可能なテキスト生成が注目されている。プレフィックスチューニングのようなパラメータ効率のよいチューニングに基づく手法は、プラグ・アンド・プレイ方式でマルチアスペクト制御を実現することができるが、複数のプレフィックスの相互干渉は、制約を著しく劣化させ、トレーニング時に見えないアスペクトの組み合わせに拡張性を制限する。本研究は, 干渉の理論的下限を提供し, プレフィックスが挿入される層数に応じて干渉が増加することを実証的に見出した。これらの分析に基づいて,プレフィックスの介入を正規化するためにトレーニング可能なゲートを用いることを提案する。その結果、新しい制約を低コストで拡張できるように、対応するプラグインを単に結合することで、アスペクトのトレーニング時間未認識の組み合わせを制御することができる。さらに,分類的制約と自由形式制約の両方を統一的に処理する方法を提案する。テキスト生成と機械翻訳の実験は、制約精度、テキスト品質、拡張性に基づくベースラインよりも、我々のアプローチの方が優れていることを示す。

Recently, multi-aspect controllable text generation that controls the generated text in multiple aspects (e.g., sentiment, topic, and keywords) has attracted increasing attention. Although methods based on parameter efficient tuning like prefix-tuning could achieve multi-aspect controlling in a plug-and-play way, the mutual interference of multiple prefixes leads to significant degeneration of constraints and limits their extensibility to training-time unseen aspect combinations. In this work, we provide a theoretical lower bound for the interference and empirically found that the interference grows with the number of layers where prefixes are inserted. Based on these analyses, we propose using trainable gates to normalize the intervention of prefixes to restrain the growing interference. As a result, controlling training-time unseen combinations of aspects can be realized by simply concatenating corresponding plugins such that new constraints can be extended at a lower cost. In addition, we propose a unified way to process both categorical and free-form constraints. Experiments on text generation and machine translation demonstrate the superiority of our approach over baselines on constraint accuracy, text quality, and extensibility.

翻訳日:2023-05-31 02:34:12 公開日:2023-05-28

# 有限結果空間上の相対確率:その公理化、性質および応用に関する体系的検討

Relative Probability on Finite Outcome Spaces: A Systematic Examination of its Axiomatization, Properties, and Applications ( http://arxiv.org/abs/2212.14555v3 )

ライセンス: Link先を確認

Max Sklar

(参考訳) この研究は、確率を絶対測度ではなく相対測度として捉えることを提案する。この概念を実証するために, 有限結果空間に着目し, 相対確率関数の要件を定める3つの基本公理を考案する。次に、これらの関数の例のライブラリとそれらを構成するシステムを提供します。さらに、ベイズ推論の相対版とそのデジタル実装について議論する。最後に、相対確率空間の位相閉包を証明し、限界の下で情報を保存する能力を強調した。

This work proposes a view of probability as a relative measure rather than an absolute one. To demonstrate this concept, we focus on finite outcome spaces and develop three fundamental axioms that establish requirements for relative probability functions. We then provide a library of examples of these functions and a system for composing them. Additionally, we discuss a relative version of Bayesian inference and its digital implementation. Finally, we prove the topological closure of the relative probability space, highlighting its ability to preserve information under limits.

翻訳日:2023-05-31 02:27:16 公開日:2023-05-28

# スピン-1/2等方性ハイゼンベルククラスター中の量子傷

Quantum scars in spin-1/2 isotropic Heisenberg clusters ( http://arxiv.org/abs/2212.12362v2 )

ライセンス: Link先を確認

G. Zhang and Z. Song

(参考訳) スピン1/2等方性ハイゼンベルククラスターにおけるエネルギー準位と固有状態の塔の統計量に及ぼす外部場の影響について検討した。一方向の一様場が存在する場合、システムのsu(2)対称性は、ほぼ全スペクトルが同じレベルの間隔を持つ多数の塔からなることを許す。有限クラスタ上での厳密な対角化は、ポアソンからウィグナー・ダイソンの分布から平均レベル間隔比の異なる値のレベル統計を導出し、積分性から非可積分性への遷移を示すことを示している。しかし、3つのタイプのクラスターでは、最も大きな塔は対称性がほぼ破れており、量子の傷を負っていることが判明した。顕著なことに、非熱化状態はグリーンベルガー・ホルン・ザイリンガー状態とW状態を含み、ネールの状態が動的過程で急速に崩壊する間は、回復の特徴を保っている。また, 実験的検出のための動的スキームも提案している。我々の発見は、有限サイズの量子スピンクラスターにおける熱化に無害な量子情報処理の可能性を明らかにする。

We investigate the influence of the external fields on the statistics of energy levels and towers of eigenstates in spin-1/2 isotropic Heisenberg clusters, including chain, ladder, square and triangular lattices. In the presence of uniform field in one direction, the SU(2) symmetry of the system allows that almost whole spectrum consists of a large number of towers with identical level spacing. Exact diagonalization on finite clusters shows that random transverse fields in other two directions drive the level statistics from Poisson to Wigner-Dyson distributions with different values of mean level spacing ratio, indicating the transition from integrability to non-integrability. However, for the three types of clusters, it is found that the largest tower still hold approximately even the symmetry is broken, resulting to a quantum scar. Remarkably, the non-thermalized states cover the Greenberger-Horn-Zeilinger and W states, which maintain the feature of revival while a Neel state decays fast in the dynamic processes. In addition, some dynamic schemes for experimental detection are proposed. Our finding reveals the possibility of quantum information processing that is immune to the thermalization in finite size quantum spin clusters.

翻訳日:2023-05-31 02:26:07 公開日:2023-05-28

# Parsel: 分解による言語モデルとのアルゴリズム推論

Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions ( http://arxiv.org/abs/2212.10561v3 )

ライセンス: Link先を確認

Eric Zelikman, Qian Huang, Gabriel Poesia, Noah D. Goodman, Nick Haber

(参考訳) 最近の大言語モデル(llm)推論の成功にもかかわらず、llmは複雑なプログラムの生成のような階層的多段階推論タスクに苦しむ。これらのタスクでは、人間が高レベルなアルゴリズム設計から始めて、各部分を徐々に実装する。コードLLMによる複雑なアルゴリズムの自動実装と検証を可能にするフレームワークであるParselを紹介する。 Parselでは、アルゴリズムタスクを階層的な自然言語関数記述に自動的に分解し、テストを使って可能な関数実装の組み合わせを検索する。プログラム合成やロボット計画など,階層的推論を必要とする領域でParselを使用できることを示す。 parselを使用することで、アプリデータセットの競合レベルの問題をllmが解決し、アルファコードとcodexを直接サンプリングすることで、以前の結果よりもパスレートが75\%高くなり、サンプル予算も小さくなることが分かりました。さらに、自動生成されたテストでは、ParselはHumanEvalの最先端のpass@1パフォーマンスを67\%から85\%に改善できる。また, Parselを用いたLCM生成ロボット計画の精度は, 直接生成した計画の2倍以上であることがわかった。最後に、ParselがLLMの制限にどう対処するかを検討し、Parselが人間のプログラマにとってどのように役立つかについて議論する。コードをhttps://github.com/ezelikman/parselでリリースします。

Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs. For these tasks, humans often start with a high-level algorithmic design and implement each part gradually. We introduce Parsel, a framework enabling automatic implementation and validation of complex algorithms with code LLMs. With Parsel, we automatically decompose algorithmic tasks into hierarchical natural language function descriptions and then search over combinations of possible function implementations using tests. We show that Parsel can be used across domains requiring hierarchical reasoning, including program synthesis and robotic planning. We find that, using Parsel, LLMs solve more competition-level problems in the APPS dataset, resulting in pass rates over 75\% higher than prior results from directly sampling AlphaCode and Codex, while often using a smaller sample budget. Moreover, with automatically generated tests, we find that Parsel can improve the state-of-the-art pass@1 performance on HumanEval from 67\% to 85\%. We also find that LLM-generated robotic plans using Parsel are more than twice as likely to be considered accurate than directly generated plans. Lastly, we explore how Parsel addresses LLM limitations and discuss how Parsel may be useful for human programmers. We release our code at https://github.com/ezelikman/parsel

翻訳日:2023-05-31 02:25:06 公開日:2023-05-28

# 数保存型散逸量子状態生成の反応拡散ダイナミクス

Reaction-diffusive dynamics of number-conserving dissipative quantum state preparation ( http://arxiv.org/abs/2301.05258v3 )

ライセンス: Link先を確認

P. A. Nosov, D. S. Shapiro, M. Goldstein, I. S. Burmistrov

(参考訳) 非自明な量子多体相関状態の制御生成のための散逸の使用は、非常に基本的かつ実用的な関心事である。閉じた系では、拡散する拡散を引き起こすような数保存の結果はどうなるのか? 本研究では,一方のバンドを空にし,他方のバンドを配置し,他方が位相状態の散逸安定化のために導入された2バンドシステムのパラダイムモデルについて検討する。散逸動力学の平均場処理を超越して, 粒子とホール密度モードを中間長さと時間スケールで拡散的に配置し, 外部磁場に対する非線形応答でのみ励起できることを実証した。また,このモードの拡散挙動を最長及び時間スケールで制限するプロセスも同定する。驚くべきことに、これらの過程はフィッシャー-コルモゴロフ-ペトロフスキー-ピスクノフ方程式によって制御される反応拡散ダイナミクスをもたらし、設計された暗黒状態が有限粒子とホール密度を持つ状態に向かって不安定になることがわかった。

The use of dissipation for the controlled creation of nontrivial quantum many-body correlated states is of much fundamental and practical interest. What is the result of imposing number conservation, which, in closed system, gives rise to diffusive spreading? We investigate this question for a paradigmatic model of a two-band system, with dissipative dynamics aiming to empty one band and to populate the other, which had been introduced before for the dissipative stabilization of topological states. Going beyond the mean-field treatment of the dissipative dynamics, we demonstrate the emergence of a diffusive regime for the particle and hole density modes at intermediate length- and time-scales, which, interestingly, can only be excited in nonlinear response to external fields. We also identify processes that limit the diffusive behavior of this mode at the longest length- and time-scales. Strikingly, we find that these processes lead to a reaction-diffusion dynamics governed by the Fisher-Kolmogorov-Petrovsky-Piskunov equation, making the designed dark state unstable towards a state with a finite particle and hole density.

翻訳日:2023-05-31 02:15:30 公開日:2023-05-28

# 命令生成モデルのためのタスク指向認知能力の定義、評価、改善

Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models ( http://arxiv.org/abs/2301.05149v2 )

ライセンス: Link先を確認

Lingjun Zhao and Khanh Nguyen and Hal Daum\'e III

(参考訳) 最近の研究は、人間の心理テストを通して言語モデルの認知能力を研究する。これらの研究は、これらのモデルの一般的な能力を理解するのに役立つが、テストに合格するのに十分な能力を持つモデルが実際に実際のタスクを実行するのにこれらの能力を使用するという保証はない。本研究は,言語モデルがタスクの実行に活用するヒューマンライクな認知能力であるタスク指向認知能力を定式化する。これらの能力 (i)優れた候補発声(検索能力)を迅速に生成する能力 (二)聴取者がそれらの発話をどのように解釈し、最も適切なもの(実用的能力)を選択するかを予測する能力。言語モデルのこれらの機能と人間の機能を比較するための評価スキームを設計する。ナビゲーション命令生成問題において,様々なモデルを調べるためにこの手法を適用すると,その実用性が極めて不足していることが分かる。この洞察は、リスナのよりよいモデルでそれらを増強し、実際の人間を誘導する成功率の11%を大きく向上させます。我々の研究は、言語モデルと人間を結びつけるための原則化された手続きを持つことを提唱している。 (i)タスク指向能力の定式化二その不足を定量化する方法を考案すること、及び (iii)反復的に改善する。

Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. While these studies are helpful for understanding the general capabilities of these models, there is no guarantee that a model possessing sufficient capabilities to pass those tests would actually use those capabilities in performing real-life tasks. In this work, we formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks. These capabilities are (i) the ability to quickly generate good candidate utterances (the search capability) (ii) the ability to predict how a listener interprets those utterances and choose the most appropriate one (the pragmatic capability). We design an evaluation scheme for comparing these capabilities of a language model with those of a human. Applying this scheme to examine various models in a navigation instruction generation problem, we find that their pragmatic capability is severely lacking. This insight leads us to augment them with better models of the listener and obtain a significant boost of 11% in success rate in guiding real humans. Our work advocates for having a principled procedure for aligning language models with humans that involves (i) formulating task-oriented capabilities, (ii) devising a method to quantify their deficiency, and (iii) iteratively improving them.

翻訳日:2023-05-31 02:15:11 公開日:2023-05-28

# 量子軌道に沿った幾何学的位相

Geometric phases along quantum trajectories ( http://arxiv.org/abs/2301.04222v4 )

ライセンス: Link先を確認

Ludmila Viotti, Ana Laura Gramajo, Paula I. Villar, Fernando C. Lombardo, Rosario Fazio

(参考訳) ハミルトニアンを統治するパラメータの循環的進化を行う監視量子系は、量子軌道に依存する幾何学的位相を蓄積し、それに続く系は進化する。フェーズ値は、ユニタリダイナミクスと、システムと環境の相互作用の両方によって決定されます。したがって、幾何学的位相はランダムな量子ジャンプの発生により確率的特性を得る。本稿では,観測量子系における幾何位相の分布関数について検討し,開量子系における幾何位相を測定するために,いつ,何が異なるかについて議論する。また,監視されたエコープロトコルについて検討し,実験で抽出された干渉パターンの分布が幾何位相と関連している場合について議論する。さらに, 量子ジャンプを伴わない単一軌道に対して, サイクル後に得られた位相の位相遷移を示し, この臨界挙動がエコープロトコルでどのように観測されるかを示す。同じパラメータに対して、密度行列は特異点を示さない。外部環境下での時間変化磁場に浸漬したスピン1/2のパラダイムケースを考慮し,本研究の主な成果を概説する。しかしながら、我々の分析の主な結果は非常に一般的であり、その定性的特徴において、研究されたモデルの選択に依存しない。

A monitored quantum system undergoing a cyclic evolution of the parameters governing its Hamiltonian accumulates a geometric phase that depends on the quantum trajectory followed by the system on its evolution. The phase value will be determined both by the unitary dynamics and by the interaction of the system with the environment. Consequently, the geometric phase will acquire a stochastic character due to the occurrence of random quantum jumps. Here we study the distribution function of geometric phases in monitored quantum systems and discuss when/if different quantities, proposed to measure geometric phases in open quantum systems, are representative of the distribution. We also consider a monitored echo protocol and discuss in which cases the distribution of the interference pattern extracted in the experiment is linked to the geometric phase. Furthermore, we unveil, for the single trajectory exhibiting no quantum jumps, a topological transition in the phase acquired after a cycle and show how this critical behavior can be observed in an echo protocol. For the same parameters, the density matrix does not show any singularity. We illustrate all our main results by considering a paradigmatic case, a spin-1/2 immersed in time-varying a magnetic field in presence of an external environment. The major outcomes of our analysis are however quite general and do not depend, in their qualitative features, on the choice of the model studied.

翻訳日:2023-05-31 02:14:50 公開日:2023-05-28

# 最大最適性マージン:文脈線形計画法と逆線形計画法の統一的アプローチ

Maximum Optimality Margin: A Unified Approach for Contextual Linear Programming and Inverse Linear Programming ( http://arxiv.org/abs/2301.11260v2 )

ライセンス: Link先を確認

Chunlin Sun, Shang Liu, Xiaocheng Li

(参考訳) 本稿では,機械学習予測タスクの出力を下流最適化問題,例えば線形プログラムの客観的係数ベクトルの入力として使用する予測列最適化問題について検討する。この問題は予測分析や文脈線形プログラミングとしても知られている。既存のアプローチは、ほとんどどちらかに苦しむ (i)最適化難解性(非凸目的関数)/統計的非効率性(準最適一般化境界)、又は (ii)制約や損失校正がないなどの強い条件を必要とすること。我々は、下流最適化の最適条件により機械学習損失関数を設計する「textit{maximum optimality margin}」と呼ばれる問題に対する新しいアプローチを開発する。 max-marginの定式化は、計算効率と学習手順の良質な理論特性の両方を享受する。さらに,本手法では,目的関数ではなく,学習データにおける最適解の観測しか必要とせず,文脈的・文脈的・文脈的両条件下での逆線形プログラミング問題に対する新たな自然なアプローチとして,オフライン・オンライン両方の設定で提案手法を解析し,数値実験を用いてその性能を実証する。

In this paper, we study the predict-then-optimize problem where the output of a machine learning prediction task is used as the input of some downstream optimization problem, say, the objective coefficient vector of a linear program. The problem is also known as predictive analytics or contextual linear programming. The existing approaches largely suffer from either (i) optimization intractability (a non-convex objective function)/statistical inefficiency (a suboptimal generalization bound) or (ii) requiring strong condition(s) such as no constraint or loss calibration. We develop a new approach to the problem called \textit{maximum optimality margin} which designs the machine learning loss function by the optimality condition of the downstream optimization. The max-margin formulation enjoys both computational efficiency and good theoretical properties for the learning procedure. More importantly, our new approach only needs the observations of the optimal solution in the training data rather than the objective function, which makes it a new and natural approach to the inverse linear programming problem under both contextual and context-free settings; we also analyze the proposed method under both offline and online settings, and demonstrate its performance using numerical experiments.

翻訳日:2023-05-31 02:08:14 公開日:2023-05-28

# DIFFormer:エネルギー制約拡散によるスケーラブル(グラフ)トランス

DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion ( http://arxiv.org/abs/2301.09474v4 )

ライセンス: Link先を確認

Qitian Wu, Chenxiao Yang, Wentao Zhao, Yixuan He, David Wipf, Junchi Yan

(参考訳) 現実世界のデータ生成には、しばしばインスタンス間の複雑な相互依存があり、標準学習パラダイムのiidデータ仮説に違反し、望ましいインスタンス表現を学習するための幾何学的構造を明らかにするための課題となる。この目的のために、データセットから進化状態へインスタンスのバッチをエンコードするエネルギー制約拡散モデルを導入し、その相互作用によって他のインスタンスの情報を取り込む。拡散過程は下降条件 w.r.t.~ 潜在構造上のインスタンス表現の大域的一貫性を特徴づける原理エネルギー関数によって制約される。我々は、任意のインスタンスペア間の対拡散強度の閉形式最適推定を示唆する厳密な理論を提案し、これは、DIFFormer (diffusion-based Transformers)と呼ばれる新しいタイプのニューラルエンコーダを生み出し、二つのインスタンスをインスタンス化する単純なバージョンと、複雑な構造を学ぶための高度なバージョンである。実験では,大規模グラフのノード分類,半教師付き画像/テキスト分類,空間-時空間ダイナミクス予測など,様々なタスクにおいて優れた性能を持つ汎用エンコーダバックボーンとしてモデルの適用性が強調された。

Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t.~a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based Transformers), with two instantiations: a simple version with linear complexity for prohibitive instance numbers, and an advanced version for learning complex structures. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks, such as node classification on large graphs, semi-supervised image/text classification, and spatial-temporal dynamics prediction.

翻訳日:2023-05-31 02:06:09 公開日:2023-05-28

# 1次元長距離量子球面モデルにおける絡み合いギャップ

Entanglement gap in 1D long-range quantum spherical models ( http://arxiv.org/abs/2301.09143v2 )

ライセンス: Link先を確認

Sascha Wald, Raul Arias, Vincenzo Alba

(参考訳) 本研究では1次元長範囲量子球面モデル(QSM)における絡み合いギャップの有限サイズスケーリングについて検討する。熱力学の限界が明確に定義された弱い長距離QSMに焦点をあてる。このモデルは連続相転移を示し、強磁性相から常磁性を分離する。遷移の普遍性クラスは長距離指数$\alpha$に依存する。熱力学的限界では、絡み合いギャップは常磁性相では有限であり、強磁性相では消滅することを示す。強磁性相では、絡み合いギャップは標準磁気相関関数によって理解される。エンタングルメントギャップは$\delta\xi\simeq c_\alpha l^{-(1/2-\alpha/4)} で崩壊し、定数 $c_\alpha$ はモデルの低エネルギー特性に依存する。これは、分散の下部が長距離物理学の影響を受けていることを反映する。最後に、乗法対数補正は、高次元の場合とは対照的に、エンタングルメントギャップのスケーリングに欠落している。

We investigate the finite-size scaling of the entanglement gap in the one dimensional long-range quantum spherical model (QSM). We focus on the weak long-range QSM, for which the thermodynamic limit is well-defined. This model exhibits a continuous phase transition, separating a paramagnetic from a ferromagnet phase. The universality class of the transition depends on the long-range exponent $\alpha$. We show that in the thermodynamic limit the entanglement gap is finite in the paramagnetic phase, and it vanishes in the ferromagnetic phase. In the ferromagnetic phase the entanglement gap is understood in terms of standard magnetic correlation functions. The entanglement gap decays as $\delta\xi\simeq C_\alpha L^{-(1/2-\alpha/4)}$, where the constant $C_\alpha$ depends on the low-energy properties of the model. This reflects that the lower part of the dispersion is affected by the long range physics. Finally, multiplicative logarithmic corrections are absent in the scaling of the entanglement gap, in contrast with the higher-dimensional case.

翻訳日:2023-05-31 02:05:30 公開日:2023-05-28

# 階層的蒸留による事前学習言語モデルからCifに基づく音声認識への知識伝達

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation ( http://arxiv.org/abs/2301.13003v2 )

ライセンス: Link先を確認

Minglun Han, Feilong Chen, Jing Shi, Shuang Xu, Bo Xu

(参考訳) 大規模事前学習型言語モデル(PLM)は自然言語処理タスクにおいて大きな可能性を示している。 PLMの能力を活用して自動音声認識(ASR)システムを強化することも有望な研究方向として現れている。しかし, 従来の研究は PLM の非曲げ構造と PLM の不十分な利用によって制限される可能性がある。これらの問題を緩和するため,CIFモデルに基づく階層的知識蒸留(HKD)を提案する。 plmからasrモデルに知識を移すため、hkdは音響レベルでの対照的な損失を伴うクロスモーダル知識蒸留と、言語レベルでの回帰損失を伴う知識蒸留を用いる。従来のCIFモデルと比較すると,AISHELL-1 と LibriSpeech の相対誤差率の 15% と 9% の削減を実現している。

Large-scale pre-trained language models (PLMs) have shown great potential in natural language processing tasks. Leveraging the capabilities of PLMs to enhance automatic speech recognition (ASR) systems has also emerged as a promising research direction. However, previous works may be limited by the inflexible structures of PLMs and the insufficient utilization of PLMs. To alleviate these problems, we propose the hierarchical knowledge distillation (HKD) on the continuous integrate-and-fire (CIF) based ASR models. To transfer knowledge from PLMs to the ASR models, HKD employs cross-modal knowledge distillation with contrastive loss at the acoustic level and knowledge distillation with regression loss at the linguistic level. Compared with the original CIF-based model, our method achieves 15% and 9% relative error rate reduction on the AISHELL-1 and LibriSpeech datasets, respectively.

翻訳日:2023-05-31 01:57:46 公開日:2023-05-28

# テーマ駆動型キーフレーズ抽出によるソーシャルメディア談話の分析

Theme-driven Keyphrase Extraction to Analyze Social Media Discourse ( http://arxiv.org/abs/2301.11508v2 )

ライセンス: Link先を確認

William Romano, Omar Sharif, Madhusudan Basak, Joseph Gatto, and Sarah Preum

(参考訳) ソーシャルメディアプラットフォームは、自己報告された健康体験を共有する上で重要なリソースであり、さまざまな健康トピックに関する豊富なデータを提供する。大規模ソーシャルメディアデータ分析を可能にする自然言語処理(nlp)の進歩にもかかわらず、健康関連コンテンツにキーフレーズ抽出を適用することにはギャップがある。キーワード抽出は、定義済みのエンティティクラスに制約されることなく、ソーシャルメディアの会話における健全な概念を特定するために使用される。本稿では,ユーザが生成した健康テキストから臨床に関連のあるキーフレーズを捉えるための先駆的アプローチとして,ソーシャルメディア用にカスタマイズされたテーマ駆動キーフレーズ抽出フレームワークを提案する。テーマは抽出タスクの目的によって決定される広いカテゴリとして定義される。テーマ駆動型キーフレーズ抽出の新たな課題を定式化し,オピオイド使用障害の治療にソーシャルメディアテキストを効率的にマイニングする可能性を示す。本稿では,ソーシャルメディアデータから実行可能な洞察を抽出し,最小教師付きNLPモデルを用いてキーフレーズを効率的に抽出する可能性を示す。我々の貢献は、テーマ駆動型キーフレーズ抽出のための新しいデータ収集とキュレーションフレームワークの開発と、Redditコミュニティから人間注釈付きキーフレーズからなるMOUD-キーフレーズの作成である。また、ソーシャルメディアデータからキーフレーズを効率的に抽出するための最小教師付きNLPモデルのスコープも同定する。最後に,大規模言語モデル(chatgpt)が教師なしキーフレーズ抽出モデルよりも優れており,その効果を評価した。

Social media platforms are vital resources for sharing self-reported health experiences, offering rich data on various health topics. Despite advancements in Natural Language Processing (NLP) enabling large-scale social media data analysis, a gap remains in applying keyphrase extraction to health-related content. Keyphrase extraction is used to identify salient concepts in social media discourse without being constrained by predefined entity classes. This paper introduces a theme-driven keyphrase extraction framework tailored for social media, a pioneering approach designed to capture clinically relevant keyphrases from user-generated health texts. Themes are defined as broad categories determined by the objectives of the extraction task. We formulate this novel task of theme-driven keyphrase extraction and demonstrate its potential for efficiently mining social media text for the use case of treatment for opioid use disorder. This paper leverages qualitative and quantitative analysis to demonstrate the feasibility of extracting actionable insights from social media data and efficiently extracting keyphrases using minimally supervised NLP models. Our contributions include the development of a novel data collection and curation framework for theme-driven keyphrase extraction and the creation of MOUD-Keyphrase, the first dataset of its kind comprising human-annotated keyphrases from a Reddit community. We also identify the scope of minimally supervised NLP models to extract keyphrases from social media data efficiently. Lastly, we found that a large language model (ChatGPT) outperforms unsupervised keyphrase extraction models, and we evaluate its efficacy in this task.

翻訳日:2023-05-31 01:55:16 公開日:2023-05-28

# 経済深層学習モデルを用いたIoTボットネットの検出

IoT Botnet Detection Using an Economic Deep Learning Model ( http://arxiv.org/abs/2302.02013v4 )

ライセンス: Link先を確認

Nelly Elsayed, Zag ElSayed, Magdy Bayoumi

(参考訳) 技術の革新と流通の急速な進歩は、この10年間で増加している。世界中のIoT(Internet of Things)システムの急速な成長は、悪意のあるサードパーティが生み出したネットワークセキュリティ上の課題を増大させている。したがって、セキュリティ上の懸念やIoTシステムの制限を考慮に入れた、信頼性の高い侵入検知とネットワークフォサイシクスシステムは、そのようなシステムを保護する上で不可欠である。 IoTボットネット攻撃は企業や個人にとって重要な脅威のひとつだ。そこで本稿では,IoTボットネット攻撃を検知する経済的深層学習モデルを提案する。提案手法は, 実装予算を小さくし, 訓練および検出プロセスを高速化することで, 最先端検出モデルよりも高い精度を達成した。

The rapid progress in technology innovation usage and distribution has increased in the last decade. The rapid growth of the Internet of Things (IoT) systems worldwide has increased network security challenges created by malicious third parties. Thus, reliable intrusion detection and network forensics systems that consider security concerns and IoT systems limitations are essential to protect such systems. IoT botnet attacks are one of the significant threats to enterprises and individuals. Thus, this paper proposed an economic deep learning-based model for detecting IoT botnet attacks along with different types of attacks. The proposed model achieved higher accuracy than the state-of-the-art detection models using a smaller implementation budget and accelerating the training and detecting processes.

翻訳日:2023-05-31 01:50:23 公開日:2023-05-28

# 逆摂動に対するランダム化アンサンブルのロバスト性について

On the Robustness of Randomized Ensembles to Adversarial Perturbations ( http://arxiv.org/abs/2302.01375v3 )

ライセンス: Link先を確認

Hassan Dbouk, Naresh R. Shanbhag

(参考訳) 1つの分類器が推論中にランダムに選択されるランダム化アンサンブル分類器(recs)は、計算要件が限定された可逆的ロバスト分類器を実現する伝統的な意味付け手法の魅力的な代替として登場した。しかし、最近の研究は、RECの構築方法が当初主張していたよりも脆弱であることを示し、「RECはいつ有用か?」「限界は何か?」「どのようにトレーニングするのか?」といった根本的な疑問を提起している。本研究では,recsの理論的限界,有用であるために必要な条件等に関する基礎的な結果が導出され,まずrecsを非神秘化する。この新たな理解を活用して、ロバストなRECをトレーニングするための新しいブースティングアルゴリズム(BARRE)を提案し、さまざまなネットワークアーキテクチャやデータセットにまたがる強い$\ell_\infty$ノルムバウンドな敵に対する防御効果を実証的に実証する。私たちのコードはhttps://github.com/hsndbk4/BARREで参照できます。

Randomized ensemble classifiers (RECs), where one classifier is randomly selected during inference, have emerged as an attractive alternative to traditional ensembling methods for realizing adversarially robust classifiers with limited compute requirements. However, recent works have shown that existing methods for constructing RECs are more vulnerable than initially claimed, casting major doubts on their efficacy and prompting fundamental questions such as: "When are RECs useful?", "What are their limits?", and "How do we train them?". In this work, we first demystify RECs as we derive fundamental results regarding their theoretical limits, necessary and sufficient conditions for them to be useful, and more. Leveraging this new understanding, we propose a new boosting algorithm (BARRE) for training robust RECs, and empirically demonstrate its effectiveness at defending against strong $\ell_\infty$ norm-bounded adversaries across various network architectures and datasets. Our code can be found at https://github.com/hsndbk4/BARRE.

翻訳日:2023-05-31 01:49:50 公開日:2023-05-28

# 二重等分散による新しいノードと新しい関係型のインダクティブリンク予測

Inductive Link Prediction for Both New Nodes and New Relation Types via Double Equivariance ( http://arxiv.org/abs/2302.01313v5 )

ライセンス: Link先を確認

Jianfei Gao, Yangze Zhou, Jincheng Zhou, Bruno Ribeiro

(参考訳) 近年のリレーショナルラーニングの進歩にもかかわらず、新しいノードとテストにおける新しい関係型を持つ離散属性多重グラフにおける帰納的リンク予測の課題は未解決の問題である。本研究は,ノードの同一性とエッジ関係の両方の置換に同値な,二重交換性の概念とそれに関連する二重置換同変グラフニューラルネットワークを定義することで,この問題に取り組む。我々のニューラルネットワークは、訓練ノードと関係から任意に新しいテストノードと関係へと誘導的に一般化できる関係の構造的表現を課し、適応や再訓練を必要とせず、関係学習における新たな方向性を可能にする。最後に、このような二重同値表現に対する一般的な青写真を導入し、既存の作品が正確に実行できない2つの実世界のベンチマークで実証的にその能力を示す。

Despite recent advances in relational learning, the task of inductive link prediction in discrete attributed multigraphs with both new nodes and new relation types in test remains an open problem. In this work we tackle this task by defining the concept of double exchangeability and its associated double-permutation equivariant graph neural network that are equivariant to permutations of both node identities and edge relations. Our neural architecture imposes a structural representation of relations that can inductively generalize from training nodes and relations to arbitrarily new test nodes and relations, without the need for adaptation or retraining, thus enabling a new direction in relational learning. Finally, we introduce a general blueprint for such double equivariant representations and empirically showcase its capability on two proposed real-world benchmarks that no existing works can perform accurately.

翻訳日:2023-05-31 01:49:28 公開日:2023-05-28

# 実演による専門知識の理解: オフライン逆強化学習のための最大可能性フレームワーク

Understanding Expertise through Demonstrations: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning ( http://arxiv.org/abs/2302.07457v2 )

ライセンス: Link先を確認

Siliang Zeng, Chenliang Li, Alfredo Garcia, Mingyi Hong

(参考訳) オフライン逆強化学習(オフラインirl)は、専門家エージェントによる固定された有限のデモンストレーションで観察された動作を裏付ける報酬と環境ダイナミクスの構造を回復することを目的としている。タスクの実行に関する専門知識の正確なモデルは、臨床意思決定や自動運転といった安全性に敏感な応用に応用できる。しかし、観察された行動において暗黙的な専門家の選好の構造は、専門家の環境力学のモデル(すなわち「世界」)と密接に関連している。したがって、限られた範囲の有限データから得られた世界の不正確なモデルは、推定報酬において不正確を複雑にする可能性がある。この問題に対処するため,我々は,専門家の政策(下位レベル)の保守的モデルに基づいて上層レベルが最大化されるような推定タスクの2レベル最適化手法を提案する。政策モデルは、世界の推定モデルの不確実性の増大するペナルティの対象となる報酬を最大化するという点で保守的である。本稿では,二段階最適化問題の定式化を解いた新しいアルゴリズムフレームワークを提案し,関連する報酬推定器の性能の統計的および計算的保証を提供する。最後に、提案アルゴリズムは、MuJoCoの連続制御タスクとD4RLベンチマークの異なるデータセットに対して、最先端のオフラインIRLと模倣学習ベンチマークを大きなマージンで上回ることを示す。

Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving. However, the structure of an expert's preferences implicit in observed actions is closely linked to the expert's model of the environment dynamics (i.e. the ``world''). Thus, inaccurate models of the world obtained from finite data with limited coverage could compound inaccuracy in estimated rewards. To address this issue, we propose a bi-level optimization formulation of the estimation task wherein the upper level is likelihood maximization based upon a conservative model of the expert's policy (lower level). The policy model is conservative in that it maximizes reward subject to a penalty that is increasing in the uncertainty of the estimated model of the world. We propose a new algorithmic framework to solve the bi-level optimization problem formulation and provide statistical and computational guarantees of performance for the associated reward estimator. Finally, we demonstrate that the proposed algorithm outperforms the state-of-the-art offline IRL and imitation learning benchmarks by a large margin, over the continuous control tasks in MuJoCo and different datasets in the D4RL benchmark.

翻訳日:2023-05-31 01:38:23 公開日:2023-05-28

# 外乱認識対象検出のための正規化フローベース特徴合成

Normalizing Flow based Feature Synthesis for Outlier-Aware Object Detection ( http://arxiv.org/abs/2302.07106v3 )

ライセンス: Link先を確認

Nishant Kumar, Sini\v{s}a \v{S}egvi\'c, Abouzar Eslami, Stefan Gumhold

(参考訳) 自律運転のようなアプリケーションには、信頼性の高いオブジェクト検出器の現実的な展開が不可欠である。しかし、Faster R-CNNのような汎用オブジェクト検出器は、不整形物体の過信予測を提供する傾向にある。最近の異常物体検出手法は, クラス条件ガウシアンによるインスタンスワイド特徴の密度を推定し, 低様領域から合成外乱特徴を訓練する。しかし、この戦略は、合成された外層特徴が他のクラス条件ガウス多様体に従えば低い確率を持つことを保証しない。そこで本研究では,すべてのイリアークラスの合同データ分布を可逆正規化フローで学習することにより,イリアーとイリアーオブジェクトを区別する,新しい外れ値認識型オブジェクト検出フレームワークを提案する。フローモデルの適切なサンプリングは、合成されたアウトリアーが全てのオブジェクトクラスのインリアーよりも低い可能性を持つことを保証するため、インリアーとアウトリアーの間のより良い決定境界をモデル化する。提案手法は,画像データとビデオデータの両方において,外部認識オブジェクト検出の最先端性を大幅に向上させる。コードはhttps://github.com/nish03/ffsで利用可能

Real-world deployment of reliable object detectors is crucial for applications such as autonomous driving. However, general-purpose object detectors like Faster R-CNN are prone to providing overconfident predictions for outlier objects. Recent outlier-aware object detection approaches estimate the density of instance-wide features with class-conditional Gaussians and train on synthesized outlier features from their low-likelihood regions. However, this strategy does not guarantee that the synthesized outlier features will have a low likelihood according to the other class-conditional Gaussians. We propose a novel outlier-aware object detection framework that distinguishes outliers from inlier objects by learning the joint data distribution of all inlier classes with an invertible normalizing flow. The appropriate sampling of the flow model ensures that the synthesized outliers have a lower likelihood than inliers of all object classes, thereby modeling a better decision boundary between inlier and outlier objects. Our approach significantly outperforms the state-of-the-art for outlier-aware object detection on both image and video datasets. Code available at https://github.com/nish03/FFS

翻訳日:2023-05-31 01:38:00 公開日:2023-05-28

# LiDAR点雲における変化検出のための最適輸送

Optimal Transport for Change Detection on LiDAR Point Clouds ( http://arxiv.org/abs/2302.07025v2 )

ライセンス: Link先を確認

Marco Fiorucci, Peter Naylor, Makoto Yamada

(参考訳) 多時期リモートセンシングデータにおける変化の検出は、災害、森林破壊、都市計画といった実際の生活の様々な側面を監視する上で重要な役割を果たす。後者の文脈では、景観や市マネジャーが持続可能な開発を促進するためには、新しく建設された建物と取り壊された建物の両方を特定することが不可欠である。大気中のLiDAR点雲の使用は都市の変化検出において広く行われているが、最も一般的なアプローチは、点雲を補間された高さ測定の正規格子、すなわちデジタル標高モデル(DEM)に変換することである。しかし、DEMの補間ステップは、オブジェクトの高さに関連する情報損失を引き起こし、3次元のLiDAR点雲の高分解能が最も有益となるような建物変更の検出能力に影響を与える。距離ベース計算法とセマンティックセグメンテーション前処理法のいずれかを用いて点雲上で直接変化を検出する最近の試みにもかかわらず、都市計画において最重要となる正と負の両方の変化を識別できるのはM3C2距離計算法のみである。先行する議論に動機づけられ, 最適な輸送に基づく変更検出パイプラインを導入し, 新しく建設された建物(ポジティブな変化)と解体された建物(ネガティブな変化)を区別する。本研究では,リダ点雲の双時間対で発生する建物変化に関連する質量の生成と破壊に対処するために,不均衡な最適輸送の利用を提案する。我々は,M3C2とNicolas CourtyらによるこれまでのIGARSS 2016で提示した最適輸送方式よりも優れた性能を示すことで,変更検出のために利用可能な唯一のLiDARデータセットに対するアプローチの有効性を実証した。

The detection of changes occurring in multi-temporal remote sensing data plays a crucial role in monitoring several aspects of real life, such as disasters, deforestation, and urban planning. In the latter context, identifying both newly built and demolished buildings is essential to help landscape and city managers to promote sustainable development. While the use of airborne LiDAR point clouds has become widespread in urban change detection, the most common approaches require the transformation of a point cloud into a regular grid of interpolated height measurements, i.e. Digital Elevation Model (DEM). However, the DEM's interpolation step causes an information loss related to the height of the objects, affecting the detection capability of building changes, where the high resolution of LiDAR point clouds in the third dimension would be the most beneficial. Notwithstanding recent attempts to detect changes directly on point clouds using either a distance-based computation method or a semantic segmentation pre-processing step, only the M3C2 distance computation-based approach can identify both positive and negative changes, which is of paramount importance in urban planning. Motivated by the previous arguments, we introduce a principled change detection pipeline, based on optimal transport, capable of distinguishing between newly built buildings (positive changes) and demolished ones (negative changes). In this work, we propose to use unbalanced optimal transport to cope with the creation and destruction of mass related to building changes occurring in a bi-temporal pair of LiDAR point clouds. We demonstrate the efficacy of our approach on the only publicly available airborne LiDAR dataset for change detection by showing superior performance over the M3C2 and the previous optimal transport-based method presented by Nicolas Courty et al.at IGARSS 2016.

翻訳日:2023-05-31 01:37:41 公開日:2023-05-28

# ニューラルネットワーク関数空間距離の効率的なパラメトリック近似

Efficient Parametric Approximations of Neural Network Function Space Distance ( http://arxiv.org/abs/2302.03519v2 )

ライセンス: Link先を確認

Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Grosse

(参考訳) モデルパラメータとトレーニングデータの重要な特性をコンパクトに要約して、データセット全体の保存と/または反復することなく、後で使用できるようにすることがしばしば有用である。具体的には、トレーニングセット上の関数空間距離(fsd)、すなわち2つのニューラルネットワークの出力間の平均不一致を推定することを検討する。本稿では,線形化アクティベーション関数トリック(laftr)を提案し,reluニューラルネットワークに対するfsdの効率的な近似を導出する。鍵となるアイデアは、統計的ゲーティングを伴う線形ネットワークとしてアーキテクチャを近似することである。ネットワーク単位あたりのパラメータは1つしかないが、より大きなメモリ要件を持つ他のパラメトリック近似よりも優れている。連続学習に適用すると、パラメトリック近似は最先端の非パラメトリック近似と競合し、多くのトレーニング例を格納する必要がある。さらに,影響関数を精度良く推定し,データセット全体にわたるコストのかかる反復を伴わない誤記例の検出に有効性を示す。

It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.

翻訳日:2023-05-31 01:35:48 公開日:2023-05-28

# 臨床データを用いたX線拡散による異常検出のためのMDF-Net

MDF-Net for Abnormality Detection by Fusing X-Rays with Clinical Data ( http://arxiv.org/abs/2302.13390v2 )

ライセンス: Link先を確認

Chihcheng Hsieh and Isabel Blanco Nobre and Sandra Costa Sousa and Chun Ouyang and Margot Brereton and Jacinto C. Nascimento and Joaquim Jorge and Catarina Moreira

(参考訳) 本研究は,胸部x線画像における深層学習(dl)分類器の性能に及ぼす患者の臨床情報を含む影響について検討した。現在の分類器は胸部X線画像のみを用いて高い性能を示すが, 臨床データは画像の解釈や適切な診断に不可欠であると考えられた。本研究では,患者の臨床データ(構造化データ)と胸部X線(画像データ)を同時に処理できる2つの融合法からなる新しいアーキテクチャを提案する。これらのデータモダリティは異なる次元空間にあるため,マスクr-cnnモデルにおけるマルチモーダル学習プロセスを容易にする空間配置戦略,空間化を提案する。 MIMIC-CXR(ケストX線画像)、MIMIC IV-ED(患者の臨床データ)、REFLACX(胸部X線における疾患部位の注釈)の3つの指標からなるデータセットであるMIMIC-Eyeを用いて広範囲にわたる実験的評価を行った。その結果, 患者の臨床データをDLモデルに組み込むことで, 胸部X線のみを用いた標準的なMask R-CNNと比較して, 胸部X線像の病変局在を12倍に向上させることがわかった。さらにアブレーション研究は、多モードDLアーキテクチャの重要性と、疾患の局所化における患者の臨床データの取り込みも強調している。本研究で提案するアーキテクチャは,研究の科学的再現性を促進するために公開されている(https://github.com/chihchenghsieh/multimodal-abnormalities-detection)。

This study investigates the effects of including patients' clinical information on the performance of deep learning (DL) classifiers for disease location in chest X-ray images. Although current classifiers achieve high performance using chest X-ray images alone, our interviews with radiologists indicate that clinical data is highly informative and essential for interpreting images and making proper diagnoses. In this work, we propose a novel architecture consisting of two fusion methods that enable the model to simultaneously process patients' clinical data (structured data) and chest X-rays (image data). Since these data modalities are in different dimensional spaces, we propose a spatial arrangement strategy, spatialization, to facilitate the multimodal learning process in a Mask R-CNN model. We performed an extensive experimental evaluation using MIMIC-Eye, a dataset comprising modalities: MIMIC-CXR (chest X-ray images), MIMIC IV-ED (patients' clinical data), and REFLACX (annotations of disease locations in chest X-rays). Results show that incorporating patients' clinical data in a DL model together with the proposed fusion methods improves the disease localization in chest X-rays by 12\% in terms of Average Precision compared to a standard Mask R-CNN using only chest X-rays. Further ablation studies also emphasize the importance of multimodal DL architectures and the incorporation of patients' clinical data in disease localization. The architecture proposed in this work is publicly available to promote the scientific reproducibility of our study (https://github.com/ChihchengHsieh/multimodal-abnormalities-detection)

翻訳日:2023-05-31 01:29:16 公開日:2023-05-28

# マルチスケールリモートセンシングオブジェクト検出のためのTucker Bilinear Attention Network

Tucker Bilinear Attention Network for Multi-scale Remote Sensing Object Detection ( http://arxiv.org/abs/2303.05329v2 )

ライセンス: Link先を確認

Tao Chen, Ruirui Li, Jiafeng Fu, and Daguang Jiang

(参考訳) vhrリモートセンシング画像における物体検出は,都市計画,土地資源管理,救助活動などにおいて重要な役割を担っている。リモートセンシング対象の大規模変動は、VHRリモートセンシング対象検出における大きな課題の1つである。既存の手法では,特徴ピラミッドの構造を改善し,異なる注意モジュールを採用することで,高解像度リモートセンシング物体の検出精度を向上させる。しかし、小さなターゲットでは、重要な詳細機能が失われているため、検出が著しく欠落している。マルチスケールの機能融合とバランスの方法にはまだ改善の余地があります。本稿では, 早期核融合の段階と後期核融合の段階にそれぞれ適用可能な2つの新しいモジュール, Guided Attention と Tucker Bilinear Attention を提案する。前者はクリーンなキーの詳細機能を効果的に保持でき、後者はセマンティックレベルの相関マイニングによって特徴のバランスを改善することができる。 2つのモジュールに基づいて、我々は新しいマルチスケールリモートセンシングオブジェクト検出フレームワークを構築した。鐘も笛もない。提案手法は小型オブジェクトの平均精度を大幅に向上させ,dota,dior,nwpu vhr-10.codeの9つの最先端手法と比較して,平均精度が最も高い。

Object detection on VHR remote sensing images plays a vital role in applications such as urban planning, land resource management, and rescue missions. The large-scale variation of the remote-sensing targets is one of the main challenges in VHR remote-sensing object detection. Existing methods improve the detection accuracy of high-resolution remote sensing objects by improving the structure of feature pyramids and adopting different attention modules. However, for small targets, there still be seriously missed detections due to the loss of key detail features. There is still room for improvement in the way of multiscale feature fusion and balance. To address this issue, this paper proposes two novel modules: Guided Attention and Tucker Bilinear Attention, which are applied to the stages of early fusion and late fusion respectively. The former can effectively retain clean key detail features, and the latter can better balance features through semantic-level correlation mining. Based on two modules, we build a new multi-scale remote sensing object detection framework. No bells and whistles. The proposed method largely improves the average precisions of small objects and achieves the highest mean average precisions compared with 9 state-of-the-art methods on DOTA, DIOR, and NWPU VHR-10.Code and models are available at https://github.com/Shinichict/GTNet.

翻訳日:2023-05-31 01:19:30 公開日:2023-05-28

# 改良した戦略カードゲーム(ハースストーン)

Mastering Strategy Card Game (Hearthstone) with Improved Techniques ( http://arxiv.org/abs/2303.05197v2 )

ライセンス: Link先を確認

Changnan Xiao, Yongxin Zhang, Xuefeng Huang, Qinhan Huang, Jie Chen, Peng Sun

(参考訳) 戦略カードゲームは知的なゲームプレイを要求される有名なジャンルであり、AIにとって理想的なテストベンチになり得る。これまでの作品は、エンド・ツー・エンドのポリシー機能と楽観的なスムーズな架空のプレイを組み合わせることで、戦略カードゲーム『Regend of Code and Magic』で有望なパフォーマンスを示している。本研究では,このアルゴリズムを,ゲームルールや機構においてより複雑な,有名な商用ゲームであるhearthstoneに適用する。我々はさらに,いくつかの改良手法を提案し,その結果,著しい進歩を遂げた。マシンvsヒューマンテストでは、中国のオフィシャルリーグの上位10位にランクインしたハートストーンストリーマーを招待します。私たちのモデルは、全試合(デッキビルディングとバトルの両方を含む)のベスト5のトーナメントで人間プレイヤーを倒し、意思決定の強い能力を示します。

Strategy card game is a well-known genre that is demanding on the intelligent game-play and can be an ideal test-bench for AI. Previous work combines an end-to-end policy function and an optimistic smooth fictitious play, which shows promising performances on the strategy card game Legend of Code and Magic. In this work, we apply such algorithms to Hearthstone, a famous commercial game that is more complicated in game rules and mechanisms. We further propose several improved techniques and consequently achieve significant progress. For a machine-vs-human test we invite a Hearthstone streamer whose best rank was top 10 of the official league in China region that is estimated to be of millions of players. Our models defeat the human player in all Best-of-5 tournaments of full games (including both deck building and battle), showing a strong capability of decision making.

翻訳日:2023-05-31 01:19:07 公開日:2023-05-28

# 確率的ツールボックスユーザガイド --xSPDE3:確率的常微分方程式と偏微分方程式のための拡張可能なソフトウェア

The Stochastic Toolbox User's Guide -- xSPDE3: extensible software for stochastic ordinary and partial differential equations ( http://arxiv.org/abs/2303.04448v2 )

ライセンス: Link先を確認

Simon Kiesewetter, Ria R. Joseph, Peter D. Drummond

(参考訳) xspdeツールボックスは、生物学、化学、工学、医学、物理学、量子技術への応用を含む、確率的偏微分方程式と常微分方程式を扱う。時間ステップやサンプリングエラー推定を含む統計平均を計算する。 xSPDE は高次収束、フーリエスペクトル、確率密度を提供する。ツールボックスにはグラフィカルな出力と$\chi^{2}$統計、重み付け、投影、フォワードバックワードの方程式がある。入出力量子スペクトルを生成することができる。すべての方程式は、任意の次元、任意のベクトル場成分、および任意の区間の両端において、独立周期、ディリクレ、ノイマンあるいはロビン境界条件を持つことができる。

The xSPDE toolbox treats stochastic partial and ordinary differential equations, with applications in biology, chemistry, engineering, medicine, physics and quantum technologies. It computes statistical averages, including time-step and/or sampling error estimation. xSPDE can provide higher order convergence, Fourier spectra and probability densities. The toolbox has graphical output and $\chi^{2}$ statistics, as well as weighted, projected, or forward-backward equations. It can generate input-output quantum spectra. All equations may have independent periodic, Dirichlet, and Neumann or Robin boundary conditions in any dimension, for any vector field component, and at either end of any interval.

翻訳日:2023-05-31 01:18:51 公開日:2023-05-28

# バックドアフェデレーション学習への学習

Learning to Backdoor Federated Learning ( http://arxiv.org/abs/2303.03320v3 )

ライセンス: Link先を確認

Henger Li, Chen Wu, Sencun Zhu, Zizhan Zheng

(参考訳) フェデレーション学習(fl)システムでは、悪意のある参加者は、モデルのメインタスクのパフォーマンスを維持しながら、簡単にバックドアを集約モデルに埋め込むことができる。近年,訓練段階の集約型防御や訓練後の緩和防衛など,様々な防御が提案されている。これらの防御は、主にヒューリスティックスに基づく既存のバックドア攻撃に対して合理的な性能を得るが、より先進的な攻撃に直面すると不十分であることを示す。特に,攻撃者がまずローカルデータとFLシステムの共通知識をベースとしたシミュレータを用いて(非明視的)攻撃ポリシーを訓練し,実際のFL訓練中に適用できる汎用強化学習ベースのバックドア攻撃フレームワークを提案する。我々の攻撃フレームワークは適応的かつ柔軟であり、最先端の防御の下でも強力な攻撃性能と耐久性を実現する。

In a federated learning (FL) system, malicious participants can easily embed backdoors into the aggregated model while maintaining the model's performance on the main task. To this end, various defenses, including training stage aggregation-based defenses and post-training mitigation defenses, have been proposed recently. While these defenses obtain reasonable performance against existing backdoor attacks, which are mainly heuristics based, we show that they are insufficient in the face of more advanced attacks. In particular, we propose a general reinforcement learning-based backdoor attack framework where the attacker first trains a (non-myopic) attack policy using a simulator built upon its local data and common knowledge on the FL system, which is then applied during actual FL training. Our attack framework is both adaptive and flexible and achieves strong attack performance and durability even under state-of-the-art defenses.

翻訳日:2023-05-31 01:18:20 公開日:2023-05-28

# 量子コンピュータによる分子電子構造計算

Molecular Electronic Structure Calculation via a Quantum Computer ( http://arxiv.org/abs/2303.09911v3 )

ライセンス: Link先を確認

Hamid Reza Naeij, Erfan Mahmoudi, Hossein Davoodi Yeganeh and Mohsen Akbari

(参考訳) 量子コンピュータは電子構造を計算し、多電子分子系の基底状態エネルギーを推定するために用いられる。本研究では,量子ビット数が増加傾向にあるh3+,oh-,hf,bh3などの分子の基底状態エネルギーを計算するハイブリッド量子古典アルゴリズムとして,変分量子固有ソルバ(vqe)アルゴリズムを実装した。我々はFermionのパリティ変換をqubitエンコーディングに、Unitary Coupled Cluster for Single and Double Excitations (UCCSD) を用いてアンサッツを構築する。量子シミュレーションの結果とフルコンフィグレーション相互作用 (fci) をベンチマークエネルギーとして,unrestricted hartree-fock (uhf) を一般的な計算手法として計算化学手法と比較した。以上の結果から,vqeとfciから得られる分子基底状態エネルギーは良好な一致を示した。さらに,VQEから得られた基底状態エネルギーの精度は,これまでに報告した値よりも高い。

Quantum computers can be used to calculate the electronic structure and estimate the ground state energy of many-electron molecular systems. In the present study, we implement the Variational Quantum Eigensolver (VQE) algorithm, as a hybrid quantum-classical algorithm to calculate the ground state energy of the molecules such as H3+, OH-, HF and BH3 in which the number of qubits has an increasing trend. We use the parity transformation for Fermion to qubit encoding and the Unitary Coupled Cluster for Single and Double excitations (UCCSD) to construct an ansatz. We compare our quantum simulation results with the computational chemistry approaches including Full Configuration Interaction (FCI), as benchmark energy and Unrestricted Hartree-Fock (UHF), as a common computational method. Our results show that there is a good agreement between molecular ground state energy obtained from VQE and FCI. Moreover, the accuracy of the ground state energies obtained from VQE in our work is higher than the previously reported values.

翻訳日:2023-05-31 01:08:46 公開日:2023-05-28

# 表面電子のリドバーグ状態に基づく制御なしゲート

Controlled-NOT gate based on the Rydberg states of surface electrons ( http://arxiv.org/abs/2303.08650v3 )

ライセンス: Link先を確認

Jun Wang, Wan-Ting He, Cong-Wei Lu, Yang-Yang Wang, Qing Ai, Hai-Bo Wang

(参考訳) 長いコヒーレンス時間と効率的な操作のため、表面電子(se)は量子計算と量子シミュレーションのための完全な2次元プラットフォームを提供する。本研究では,制御NOT(CNOT)ゲートを実現するための理論スキームを提案し,SEの4レベルRydberg構造上に2量子系を符号化する。状態伝達は中間レベルを持つ3レベル構造によって達成される。 2つの外部電磁界でSEを同時に駆動することにより、電磁誘導透過(EIT)効果の暗黒状態を利用して、最も散逸した状態の人口を抑制し、散逸に対する堅牢性を高める。このスキームの忠実性は、実験的に達成可能なパラメータで 0.9989 である。

Due to the long coherence time and efficient manipulation, the surface electron (SE) provides a perfect two-dimensional platform for quantum computation and quantum simulation. In this work, a theoretical scheme to realize the controlled-NOT (CNOT) gate is proposed, where the two-qubit system is encoded on the four-level Rydberg structure of SE. The state transfer is achieved by a three-level structure with an intermediate level. By simultaneously driving the SE with two external electromagnetic fields, the dark state in the electromagnetically induced transparency (EIT) effect is exploited to suppress the population of the most dissipative state and increase the robustness against dissipation. The fidelity of the scheme is 0.9989 with experimentally achievable parameters.

翻訳日:2023-05-31 01:08:09 公開日:2023-05-28

# 超音波トモグラフィインバージョンのためのニューラルオペレータ学習

Neural Operator Learning for Ultrasound Tomography Inversion ( http://arxiv.org/abs/2304.03297v2 )

ライセンス: Link先を確認

Haocheng Dai, Michael Penwarden, Robert M. Kirby, Sarang Joshi

(参考訳) 複雑な関数空間間のマッピング手段としてのニューラル演算子学習は、計算科学と工学(CS&E)の分野で大きな注目を集めている。本稿では,時空超音波CT(USCT)問題に対するニューラル演算子学習を適用した。我々は、フルウェーブ・ソルバを用いて、飛行時間(TOF)データと異種音速場のマッピングを学習し、トレーニングデータを生成する。演算子学習のこの新しい応用は、計算集約的な反復逆問題を解く必要性を回避している。オペレータは非線形マッピングをオフラインで学習し、モデルを通過する単一のフォワードパスで異種音場を予測する。超音波断層撮影におけるオペレーターの学習はこれが初めてであり、ビーストイメージングにおける腫瘍の同定のための軟組織分布のリアルタイム予測の第一歩である。

Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a full-wave solver to generate the training data. This novel application of operator learning circumnavigates the need to solve the computationally intensive iterative inverse problem. The operator learns the non-linear mapping offline and predicts the heterogeneous sound field with a single forward pass through the model. This is the first time operator learning has been used for ultrasound tomography and is the first step in potential real-time predictions of soft tissue distribution for tumor identification in beast imaging.

翻訳日:2023-05-31 00:59:31 公開日:2023-05-28

# Re-IQA: 野生の画像品質評価のための教師なし学習

Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild ( http://arxiv.org/abs/2304.00451v2 )

ライセンス: Link先を確認

Avinab Saha, Sandeep Mishra, Alan C. Bovik

(参考訳) 自動知覚画像品質評価は、何十億ものインターネットとソーシャルメディアユーザーに影響を与える難しい問題である。そこで本研究では, 2つの異なるエンコーダを訓練し, 教師なし設定で高レベルコンテンツと低レベル画像品質特徴を学習する, 専門家の混合手法を提案する。このアプローチのユニークな特徴は、画像コンテンツを表すハイレベルな特徴を補完する低レベルの画像品質表現を生成する能力である。 2つのエンコーダをトレーニングするフレームワークをRe-IQAと呼ぶ。野生の画質評価のために、re-iqaフレームワークから得られた補完的な低レベルおよび高レベル画像表現をデプロイして、画像表現を地上の真理品質スコアにマッピングするために使用される線形回帰モデルをトレーニングします。提案手法は,複数の大規模画像品質評価データベースにおいて,実歪みと合成歪みの両方を含む最先端のニューラルネットワークを教師なし環境でトレーニングし,知覚に関連のある表現を生成する方法を示す。得られた低レベル・高レベルの特徴は相補的であり,線形回帰器の性能に肯定的な影響を及ぼす。この作業に関連するすべてのコードのパブリックリリースは、githubで公開されている。

Automatic Perceptual Image Quality Assessment is a challenging problem that impacts billions of internet, and social media users daily. To advance research in this field, we propose a Mixture of Experts approach to train two separate encoders to learn high-level content and low-level image quality features in an unsupervised setting. The unique novelty of our approach is its ability to generate low-level representations of image quality that are complementary to high-level features representing image content. We refer to the framework used to train the two encoders as Re-IQA. For Image Quality Assessment in the Wild, we deploy the complementary low and high-level image representations obtained from the Re-IQA framework to train a linear regression model, which is used to map the image representations to the ground truth quality scores, refer Figure 1. Our method achieves state-of-the-art performance on multiple large-scale image quality assessment databases containing both real and synthetic distortions, demonstrating how deep neural networks can be trained in an unsupervised setting to produce perceptually relevant representations. We conclude from our experiments that the low and high-level features obtained are indeed complementary and positively impact the performance of the linear regressor. A public release of all the codes associated with this work will be made available on GitHub.

翻訳日:2023-05-31 00:59:03 公開日:2023-05-28

# Diffusion Schr\"odinger Bridge Matching

Diffusion Schr\"odinger Bridge Matching ( http://arxiv.org/abs/2303.16852v2 )

ライセンス: Link先を確認

Yuyang Shi, Valentin De Bortoli, Andrew Campbell, Arnaud Doucet

(参考訳) 輸送問題の解決、すなわちある分布を別の分布に輸送する地図を見つけることは、機械学習に多くの応用がある。生成的モデルに動機づけられた新しい質量移動法が最近提案されており、例えば、分極拡散モデル(ddms)とフローマッチングモデル(fmms)は、そのような移動を確率微分方程式(sde)または常微分方程式(ode)で実装している。しかし、多くの応用において、魅力的な特性を持つ決定論的動的最適輸送(OT)マップを近似することが望ましいが、DDMとFMMはOTマップに近い輸送を提供することが保証されていない。対照的に、Schr\"odinger bridges (SBs) は OT のエントロピー規則化されたバージョンを復元する確率的動的写像を計算する。残念なことに、SBを近似する既存の数値法は、次元のスケールが低かったり、繰り返しにまたがってエラーを蓄積する。本稿では,SB問題を解決するための新しい手法であるIterative Markovian Fitting (IMF)と,IMFの反復計算のための新しい数値アルゴリズムであるDiffusion Schr\"odinger Bridge Matching (DSBM)を紹介する。 DSBMは従来のSB数値よりも大幅に改善され、様々な最近の輸送方法の特殊な/制限ケースとして回復する。様々な問題についてDSBMの性能を実証する。

Solving transport problems, i.e. finding a map transporting one given distribution to another, has numerous applications in machine learning. Novel mass transport methods motivated by generative modeling have recently been proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models (FMMs) implement such a transport through a Stochastic Differential Equation (SDE) or an Ordinary Differential Equation (ODE). However, while it is desirable in many applications to approximate the deterministic dynamic Optimal Transport (OT) map which admits attractive properties, DDMs and FMMs are not guaranteed to provide transports close to the OT map. In contrast, Schr\"odinger bridges (SBs) compute stochastic dynamic mappings which recover entropy-regularized versions of OT. Unfortunately, existing numerical methods approximating SBs either scale poorly with dimension or accumulate errors across iterations. In this work, we introduce Iterative Markovian Fitting (IMF), a new methodology for solving SB problems, and Diffusion Schr\"odinger Bridge Matching (DSBM), a novel numerical algorithm for computing IMF iterates. DSBM significantly improves over previous SB numerics and recovers as special/limiting cases various recent transport methods. We demonstrate the performance of DSBM on a variety of problems.

翻訳日:2023-05-31 00:57:27 公開日:2023-05-28

# 自然言語生成対話サービスのチャット体験を予測する要因は何か?

Which Factors Predict the Chat Experience of a Natural Language Generation Dialogue Service? ( http://arxiv.org/abs/2304.10785v2 )

ライセンス: Link先を確認

Eason Chen

(参考訳) 本稿では,自然言語生成ダイアログシステムにおけるチャット体験を予測するための概念モデルを提案する。部分最小方形構造方程式モデリング (PLS-SEM) を用いた120人の被験者によるモデルの評価を行い, R-square (R2) を0.541で取得した。モデルは、生成に使用するプロンプト、会話におけるコヒーレンス、感情、類似性、ユーザの認識するダイアログエージェントの好適性など、さまざまな要因を考察する。次に,提案モデルのサブセットの有効性をさらに検討する。その結果,対話におけるユーザの好適性,一貫性,感情,類似性は,ユーザのチャット体験の肯定的な予測要因であることがわかった。さらに,外向性,開放性,良心性,同意性,非ニューロティシズムなどの特徴を持つ対話エージェントが好まれる可能性が示唆された。本研究を通じて,アダプティブダイアログシステムでは,収集したデータを用いてモデル内の要因を推測し,これらの要因によりユーザのチャット体験を予測し,プロンプトを調整して最適化する。

In this paper, we proposed a conceptual model to predict the chat experience in a natural language generation dialog system. We evaluated the model with 120 participants with Partial Least Squares Structural Equation Modeling (PLS-SEM) and obtained an R-square (R2) with 0.541. The model considers various factors, including the prompts used for generation; coherence, sentiment, and similarity in the conversation; and users' perceived dialog agents' favorability. We then further explore the effectiveness of the subset of our proposed model. The results showed that users' favorability and coherence, sentiment, and similarity in the dialogue are positive predictors of users' chat experience. Moreover, we found users may prefer dialog agents with characteristics of Extroversion, Openness, Conscientiousness, Agreeableness, and Non-Neuroticism. Through our research, an adaptive dialog system might use collected data to infer factors in our model, predict the chat experience for users through these factors, and optimize it by adjusting prompts.

翻訳日:2023-05-31 00:50:47 公開日:2023-05-28

# 動的シーン理解のための教師なしオブジェクト中心ボクセル化

Unsupervised Object-Centric Voxelization for Dynamic Scene Understanding ( http://arxiv.org/abs/2305.00393v2 )

ライセンス: Link先を確認

Siyu Gao, Yanpeng Zhao, Yunbo Wang, Xiaokang Yang

(参考訳) 教師なしの3Dシナリオで世界の構成力学を理解することは難しい。既存のアプローチでは、タイムキューを効果的に利用できないか、シーン分解のマルチビュー一貫性を無視している。本稿では,複数の実体(オブジェクトなど)を持つ動的シーンの時間変化容積表現をパイロットで学習するための,逆ニューラルネットワークレンダリングフレームワークであるDynaVolを提案する。主な貢献は2つある。まず、時間依存の3Dグリッドを維持し、空間的位置を異なるエンティティに動的かつ柔軟に結合し、表現レベルで情報の分離を促進する。第2に, グリッドレベルの局所力学, オブジェクトレベルの大域的力学, 構成的ニューラルラジアンス場をエンドツーエンドアーキテクチャで共同学習することにより, オブジェクト中心のシーンボキセル化の時空間的一貫性を向上させる。ダイナボリの2段階のトレーニングスキームを提示し,マルチオブジェクト,多様なダイナミクス,実世界の形状とテクスチャを用いた様々なベンチマークでの有効性を検証する。可視化はhttps://sites.google.com/view/dynavol-visual.comで行います。

Understanding the compositional dynamics of the world in unsupervised 3D scenarios is challenging. Existing approaches either fail to make effective use of time cues or ignore the multi-view consistency of scene decomposition. In this paper, we propose DynaVol, an inverse neural rendering framework that provides a pilot study for learning time-varying volumetric representations for dynamic scenes with multiple entities (like objects). It has two main contributions. First, it maintains a time-dependent 3D grid, which dynamically and flexibly binds the spatial locations to different entities, thus encouraging the separation of information at a representational level. Second, our approach jointly learns grid-level local dynamics, object-level global dynamics, and the compositional neural radiance fields in an end-to-end architecture, thereby enhancing the spatiotemporal consistency of object-centric scene voxelization. We present a two-stage training scheme for DynaVol and validate its effectiveness on various benchmarks with multiple objects, diverse dynamics, and real-world shapes and textures. We present visualization at https://sites.google.com/view/dynavol-visual.

翻訳日:2023-05-31 00:41:24 公開日:2023-05-28

# FedVS: 分割モデルのためのストラグラー耐性とプライバシ保護による垂直的フェデレーション学習

FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models ( http://arxiv.org/abs/2304.13407v2 )

ライセンス: Link先を確認

Songze Li, Duanyi Yao, Jin Liu

(参考訳) 中央サーバと多くの分散クライアントからなる垂直連合学習(VFL)システムにおいて、トレーニングデータを垂直に分割し、異なる特徴を異なるクライアントにプライベートに格納する。分割VFLの問題は、サーバとクライアントの間で分割されたモデルをトレーニングすることだ。本稿では,分割VFLにおける2つの課題に対処することを目的とする。 1) 研修中にクライアントを絞ったことによる性能の低下 2) クライアントがアップロードしたデータ埋め込みからのデータとモデルのプライバシリーク。我々はこれらの2つの課題に同時に対処するためにFedVSを提案する。 fedvsの鍵となるアイデアは、ローカルデータやモデルのシークレット共有スキームをデザインすることであり、クライアントと好奇心に満ちたサーバに対する情報理論的なプライバシーが保証され、全てのクライアントの埋め込みの集約は、非ストラグリングクライアントから計算共有を復号することで損失なく再構築される。様々な種類のVFLデータセット(表、CV、マルチビューを含む)に対する大規模な実験は、ベースラインプロトコルに対するトラグラー緩和とプライバシ保護におけるFedVSの普遍的な利点を示している。

In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients' uploaded data embeddings. We propose FedVS to simultaneously address these two challenges. The key idea of FedVS is to design secret sharing schemes for the local data and models, such that information-theoretical privacy against colluding clients and curious server is guaranteed, and the aggregation of all clients' embeddings is reconstructed losslessly, via decrypting computation shares from the non-straggling clients. Extensive experiments on various types of VFL datasets (including tabular, CV, and multi-view) demonstrate the universal advantages of FedVS in straggler mitigation and privacy protection over baseline protocols.

翻訳日:2023-05-31 00:38:52 公開日:2023-05-28

# GPT-NAS:生成事前学習モデルによる進化的ニューラルネットワーク探索

GPT-NAS: Evolutionary Neural Architecture Search with the Generative Pre-Trained Model ( http://arxiv.org/abs/2305.05351v2 )

ライセンス: Link先を確認

Caiyang Yu, Xianggen Liu, Wentao Feng, Chenwei Tang, Jiancheng Lv

(参考訳) 最適なニューラルネットワークアーキテクチャを自動設計する有効な方法の1つとして、ニューラルネットワーク探索(NAS)が登場した。ニューラルアーキテクチャはいくつかのタスクで人間レベルの性能を達成したが、NAS法から得られるものはほとんどない。主な理由は、ニューラルネットワークの巨大な検索空間であり、NASアルゴリズムを非効率にする。この研究は、進化的アルゴリズム(EA)を探索戦略とする生成事前学習(GPT)モデルを用いてニューラルネットワークを最適化する、GPT-NASと呼ばれる新しいアーキテクチャ探索アルゴリズムを提案する。 GPT-NASでは、大規模コーパスで事前学習した生成モデルが、ニューラルネットワーク構築の基本法則を学習できると仮定する。したがって、GPT-NAS は GPT モデルを利用して基本的なアーキテクチャ要素を適切に提案し、EA を用いて最適解を求める。このようなアプローチは、検索プロセスに事前知識を導入することで、検索スペースを大幅に削減することができる。 GPT-NAS法は,手作業で設計した7つのニューラルネットワークと,競合するNAS法によって提供される13のアーキテクチャより有意に優れていた。さらに,提案アルゴリズムは,GPTのないものに比べて12%程度の微調整ニューラルアーキテクチャの性能向上を実現し,さらに,ニューラルアーキテクチャの探索に有効であることを示す。

Neural Architecture Search (NAS) has emerged as one of the effective methods to design the optimal neural network architecture automatically. Although neural architectures have achieved human-level performances in several tasks, few of them are obtained from the NAS method. The main reason is the huge search space of neural architectures, making NAS algorithms inefficient. This work presents a novel architecture search algorithm, called GPT-NAS, that optimizes neural architectures by Generative Pre-Trained (GPT) model with an evolutionary algorithm (EA) as the search strategy. In GPT-NAS, we assume that a generative model pre-trained on a large-scale corpus could learn the fundamental law of building neural architectures. Therefore, GPT-NAS leverages the GPT model to propose reasonable architecture components given the basic one and then utilizes EAs to search for the optimal solution. Such an approach can largely reduce the search space by introducing prior knowledge in the search process. Extensive experimental results show that our GPT-NAS method significantly outperforms seven manually designed neural architectures and thirteen architectures provided by competing NAS methods. In addition, our experiments also indicate that the proposed algorithm improves the performance of finely tuned neural architectures by up to about 12% compared to those without GPT, further demonstrating its effectiveness in searching neural architectures.

翻訳日:2023-05-31 00:31:56 公開日:2023-05-28

# factify-5wqa: 質問応答による5wのアスペクトベースファクト検証

FACTIFY-5WQA: 5W Aspect-based Fact Verification through Question Answering ( http://arxiv.org/abs/2305.04329v2 )

ライセンス: Link先を確認

Anku Rani, S.M Towhidul Islam Tonmoy, Dwip Dalal, Shreya Gautam, Megha Chakraborty, Aman Chadha, Amit Sheth, Amitava Das

(参考訳) 自動事実検証は近年大きな注目を集めている。現代自動ファクトチェックシステムは、人間に解釈できない数値スコアを用いて真理度を推定することに焦点を当てている。ヒューマン・ファクト・チェッカーは一般に、正当性クレームを検証し、その真偽か単なる仮面かを判断するためのいくつかの論理的なステップに従う。人気のあるファクトチェックwebサイトは、半真実、半偽、偽、火のズボンなど、ファクト分類のための共通の構造に従う。したがって, 事実に関連のある質問を行う際に, 人間の事実チェックを支援するアスペクトベース(どの部分が真か, どれが偽か)で説明可能なシステムを持つことが求められ, 最終判断に到達するためには, 個別に検証することができる。本稿では5wフレームワーク(who,what, when, where, and why)を提案する。そこで,本稿では,391,041のファクトと関連する5wのqasからなる,factify-5wqaという半自動生成データセットを提案する。セマンティックロールラベリングシステムを用いて、5Wを探索し、マスク付き言語モデルを用いてクレームのQAペアを生成する。最後に,これらの回答を証拠文書から自動的に検出するベースラインQAシステムについて報告する。最後に,言い換えられた主張を自動検証する堅牢な事実検証システムを提案する。データセットとベースラインモデルはhttps: //github.com/ankuranii/acl-5W-QAで利用可能である。

Automatic fact verification has received significant attention recently. Contemporary automatic fact-checking systems focus on estimating truthfulness using numerical scores which are not human-interpretable. A human fact-checker generally follows several logical steps to verify a verisimilitude claim and conclude whether its truthful or a mere masquerade. Popular fact-checking websites follow a common structure for fact categorization such as half true, half false, false, pants on fire, etc. Therefore, it is necessary to have an aspect-based (delineating which part(s) are true and which are false) explainable system that can assist human fact-checkers in asking relevant questions related to a fact, which can then be validated separately to reach a final verdict. In this paper, we propose a 5W framework (who, what, when, where, and why) for question-answer-based fact explainability. To that end, we present a semi-automatically generated dataset called FACTIFY-5WQA, which consists of 391, 041 facts along with relevant 5W QAs - underscoring our major contribution to this paper. A semantic role labeling system has been utilized to locate 5Ws, which generates QA pairs for claims using a masked language model. Finally, we report a baseline QA system to automatically locate those answers from evidence documents, which can serve as a baseline for future research in the field. Lastly, we propose a robust fact verification system that takes paraphrased claims and automatically validates them. The dataset and the baseline model are available at https: //github.com/ankuranii/acl-5W-QA

翻訳日:2023-05-31 00:30:55 公開日:2023-05-28

# バイアスノイズ量子ビットに対するスケーラブルノイズ量子回路

Scalable noisy quantum circuits for biased-noise qubits ( http://arxiv.org/abs/2305.02045v3 )

ライセンス: Link先を確認

Marco Fellous-Asiani, Moein Naseri, Chandan Datta, Alexander Streltsov, Micha{\l} Oszmaniec

(参考訳) 量子誤差軽減は、量子アルゴリズムに対するノイズの影響を低減することができる。しかし、回路サイズで指数関数的にスケールするリソースを必要とするため、スケーラブルではない。本研究では,安定猫量子ビットの既存システムに動機づけられたビットフリップ誤差のみに影響されるバイアスノイズ量子ビットについて考察する。この特性により、アルゴリズム繰り返しの多項式オーバーヘッドだけで確実に実行される、絡み合いと非クリフォードゲートを含むノイズの多いアダマールテストのクラスを設計できる。また,従来のアルゴリズムでは,Adamardテストの特定の変種を効率的にシミュレートすることができた。我々は,このアルゴリズムを,大規模かつ複雑な量子回路のスケールにおける雑音のバイアスの単純なベンチマークとして用いることを提案する。我々の回路の強いノイズ耐性はさらなる研究の動機となり、量子計算の利点が高度に特定されながらノイズの多い回路に到達できるかどうかを確かめる。

Quantum error mitigation allows to reduce the impact of noise on quantum algorithms. Yet, it is not scalable as it requires resources scaling exponentially with the circuit size. In this work, we consider biased-noise qubits affected only by bit-flip errors, which is motivated by existing systems of stabilized cat qubits. This property allows us to design a class of noisy Hadamard-tests involving entangling and certain non-Clifford gates, which can be conducted reliably with only a polynomial overhead in algorithm repetitions. On the flip side we also found a classical algorithm able to efficiently simulate our specific variants of Hadamard test. We propose to use this algorithm as a simple benchmark of the biasness of the noise at the scale of large and complicated quantum circuits. The strong noise-resilience of our circuits could motivate further research, to see if a quantum computational advantage could be reached for highly specific, yet noisy circuits.

翻訳日:2023-05-31 00:29:46 公開日:2023-05-28

# Vision meets Definitions: Unsupervised Visual Word Sense Disambiguation incorporated Gloss Information

Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information ( http://arxiv.org/abs/2305.01788v2 )

ライセンス: Link先を確認

Sunjae Kwon, Rishabh Garodia, Minhwa Lee, Zhichao Yang, Hong Yu

(参考訳) Visual Word Sense Disambiguation (VWSD) は、与えられたコンテキストに対する対象単語の正しい感覚を最も正確に表現した画像を見つけるためのタスクである。これまで、画像テキストマッチングモデルは多義語認識に苦しめられていた。本稿では,外部語彙知識ベース,特に感覚定義の光沢情報を用いた教師なしVWSD手法を提案する。具体的には,解答の感覚情報が提供されない場合に,ベイズ推論を用いて感覚定義を取り入れることを提案する。さらに,時間外問題(OOD)を改善するために,GPT-3を用いた文脈認識定義生成を提案する。実験の結果,ベイズ推定法によりVWSDの性能は有意に向上した。さらに,既存の定義生成手法よりも優れた性能を示すOOD例では,文脈認識による定義生成が顕著な性能向上を実現した。できるだけ早くソースコードを公開します。

Visual Word Sense Disambiguation (VWSD) is a task to find the image that most accurately depicts the correct sense of the target word for the given context. Previously, image-text matching models often suffered from recognizing polysemous words. This paper introduces an unsupervised VWSD approach that uses gloss information of an external lexical knowledge-base, especially the sense definitions. Specifically, we suggest employing Bayesian inference to incorporate the sense definitions when sense information of the answer is not provided. In addition, to ameliorate the out-of-dictionary (OOD) issue, we propose a context-aware definition generation with GPT-3. Experimental results show that the VWSD performance significantly increased with our Bayesian inference-based approach. In addition, our context-aware definition generation achieved prominent performance improvement in OOD examples exhibiting better performance than the existing definition generation method. We will publish source codes as soon as possible.

翻訳日:2023-05-31 00:29:07 公開日:2023-05-28

# 講演とAIリスニング - EHRにおける言語のスティグマティクスがAIのパフォーマンスに与える影響

People Talking and AI Listening: How Stigmatizing Language in EHR Notes Affect AI Performance ( http://arxiv.org/abs/2305.10201v2 )

ライセンス: Link先を確認

Yizhi Liu, Weiguang Wang, Guodong Gordon Gao, Ritu Agarwal

(参考訳) EHR(Electronic Health Record)は、医療におけるAI(AI)主導の変革に必要なデータソースとして機能する。しかし、EHRノートに反映された臨床バイアスは、これらのバイアスを継承し増幅し、健康格差を持続させるAIモデルにつながる可能性がある。本研究では,変圧器を用いた深層学習モデルと説明可能なAI(XAI)技術を用いた死亡予測における音声合成言語(SL)の影響について検討した。以上の結果から,臨床医が作成したSLは,特に黒人患者に対して,AIモデル開発における人種格差の源泉として,AIのパフォーマンスに悪影響を及ぼすことが明らかとなった。 SLの効果を緩和するための運用的に効率的な方法を探るため,臨床医の協調ネットワークを通じてSLの生成パターンを調査し,AIモデルにおける人種格差に強い影響を与えると認識した。中央臨床医によるSLの除去は,全データのSLを除去するよりも,より効率的なバイアス低減戦略であることがわかった。本研究は,責任あるai開発に有効な洞察を提供し,臨床行動の理解と,ehr note writing in healthcareに寄与する。

Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction using a Transformer-based deep learning model and explainable AI (XAI) techniques. Our findings demonstrate that SL written by clinicians adversely affects AI performance, particularly so for black patients, highlighting SL as a source of racial disparity in AI model development. To explore an operationally efficient way to mitigate SL's impact, we investigate patterns in the generation of SL through a clinicians' collaborative network, identifying central clinicians as having a stronger impact on racial disparity in the AI model. We find that removing SL written by central clinicians is a more efficient bias reduction strategy than eliminating all SL in the entire corpus of data. This study provides actionable insights for responsible AI development and contributes to understanding clinician behavior and EHR note writing in healthcare.

翻訳日:2023-05-31 00:12:41 公開日:2023-05-28

# グラフ上のロングテールカテゴリの特徴付け

Characterizing Long-Tail Categories on Graphs ( http://arxiv.org/abs/2305.09938v2 )

ライセンス: Link先を確認

Haohui Wang, Baoyu Jing, Kaize Ding, Yada Zhu, Dawei Zhou

(参考訳) ロングテールデータ配信は、金融取引ネットワーク、eコマースネットワーク、コラボレーションネットワークなど、多くの現実世界のネットワークで一般的である。最近の開発の成功にもかかわらず、既存の作品は主にグラフ拡張や客観的な重み付けによる機械学習モデルのデバイアスに焦点を当てている。しかし、グラフ上の長い尾のカテゴリの挙動を特徴づけ、実際のシナリオにおける一般化性能を理解するための理論的ツールを提供する文献は限られている。このギャップを埋めるために,マルチタスク学習の方法で問題を定式化することにより,グラフ上の長い尾の分類のための最初の一般化を提案し,各タスクは1つの特定のカテゴリの予測に対応する。その結果,ロングテール分類の一般化性能は,すべてのタスクの損失範囲とタスクの総数に支配されていることがわかった。理論的な知見に基づいて,グラフのロングテールカテゴリの性能を向上させるための新しい汎用フレームワークtail2learnを提案する。特に,ラベル制限されたクラスを他のクラスが共有する関連情報から恩恵を受ける階層型タスクグループ化モジュールから始め,頭と尾のクラスの勾配寄与のバランスをとるために,バランスのとれたコントラスト学習モジュールを更に設計する。最後に、様々な実世界のデータセットに関する広範な実験は、グラフ上の長い尾のカテゴリをキャプチャするTail2Learnの有効性を示した。

Long-tail data distributions are prevalent in many real-world networks, including financial transaction networks, e-commerce networks, and collaboration networks. Despite the success of recent developments, the existing works mainly focus on debiasing the machine learning models via graph augmentation or objective reweighting. However, there is limited literature that provides a theoretical tool to characterize the behaviors of long-tail categories on graphs and understand the generalization performance in real scenarios. To bridge this gap, we propose the first generalization bound for long-tail classification on graphs by formulating the problem in the fashion of multi-task learning, i.e., each task corresponds to the prediction of one particular category. Our theoretical results show that the generalization performance of long-tail classification is dominated by the range of losses across all tasks and the total number of tasks. Building upon the theoretical findings, we propose a novel generic framework Tail2Learn to improve the performance of long-tail categories on graphs. In particular, we start with a hierarchical task grouping module that allows label-limited classes to benefit from the relevant information shared by other classes; then, we further design a balanced contrastive learning module to balance the gradient contributions of head and tail classes. Finally, extensive experiments on various real-world datasets demonstrate the effectiveness of Tail2Learn in capturing long-tail categories on graphs.

翻訳日:2023-05-31 00:12:19 公開日:2023-05-28

# グラフセグメントトレーニングによる大規模グラフ特性予測の学習

Learning Large Graph Property Prediction via Graph Segment Training ( http://arxiv.org/abs/2305.12322v2 )

ライセンス: Link先を確認

Kaidi Cao, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Dustin Zelle, Yanqi Zhou, Charith Mendis, Jure Leskovec, Bryan Perozzi

(参考訳) 各予測にはグラフ全体の知識が必要であり、トレーニング中に利用可能なメモリ量は制限されているため、大きなグラフの特性を予測するための学習は困難である。本稿では,大きなグラフ特性の予測を一定メモリフットプリントで学習するために,分割・コンカレントアプローチを利用する一般的なフレームワークであるグラフセグメントトレーニング(GST)を提案する。 GSTは、まず大きなグラフをセグメントに分割し、トレーニングイテレーション毎にサンプリングされた少数のセグメントをバックプロパゲートする。バックプロパゲーションのためにサンプリングされていないセグメントに対する埋め込みを効率的に得るために,歴史的埋め込みテーブルを導入することにより,GSTパラダイムを洗練する。歴史的埋め込みの安定性を軽減するため,2つの新しい手法を設計する。まず,入力分布シフトを補正するために予測ヘッドを微調整する。第2に,トレーニング中に古い埋め込みをドロップしてバイアスを減らすために,stale embedded dropoutを導入する。我々は、MalNetとTpuGraphsという2つの大きなグラフ特性予測ベンチマーク上で、GST-EFD(すべての手法を併用)の完全な手法を評価する。実験の結果,GST-EFDはメモリ効率が良く,高速でありながら,通常の全グラフ学習システムよりもテスト精度が若干向上していることがわかった。

Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first divides a large graph into segments and then backpropagates through only a few segments sampled per training iteration. We refine the GST paradigm by introducing a historical embedding table to efficiently obtain embeddings for segments not sampled for backpropagation. To mitigate the staleness of historical embeddings, we design two novel techniques. First, we finetune the prediction head to fix the input distribution shift. Second, we introduce Stale Embedding Dropout to drop some stale embeddings during training to reduce bias. We evaluate our complete method GST-EFD (with all the techniques together) on two large graph property prediction benchmarks: MalNet and TpuGraphs. Our experiments show that GST-EFD is both memory-efficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime.

翻訳日:2023-05-31 00:01:50 公開日:2023-05-28

# 確率的エンサンブルニューラルネットワークダイナミクスを用いたブリッジングアクティブ探索と不確実性対応展開

Bridging Active Exploration and Uncertainty-Aware Deployment Using Probabilistic Ensemble Neural Network Dynamics ( http://arxiv.org/abs/2305.12240v2 )

ライセンス: Link先を確認

Taekyung Kim, Jungwi Mun, Junwon Seo, Beomsu Kim, Seongil Hong

(参考訳) 近年,ロボット工学における学習に基づく制御は,実環境における複雑なタスクに対処する能力から注目されている。機械学習アルゴリズムと計算能力の進歩により、このアプローチは未知または部分的に知られているロボットのダイナミクスを学習することでロボットの制御問題を解くためにますます重要になっている。効率的なデータ収集と人間の監督の最小化のためには、ロボットが最高の情報を得る状態へ自身を誘導する能動的探査が不可欠である。同様に、不確実性を認識したデプロイメントは、ロボット制御において、学習されたモデルから情報を得た不確実なアクションが不安定な動きや失敗に繋がる可能性がある、という懸念が高まっている。しかし、活発な探索と不確実性を認識した展開は独立に研究されており、それらをシームレスに統合する文献は限られている。本稿では,ロボット制御領域におけるこれらの2つのタスクをブリッジするモデルベース強化学習フレームワークを提案する。本フレームワークは,確率的アンサンブルニューラルネットワークを用いてダイナミクス学習を行い,jensen-renyiダイバージェンスによる認識的不確かさの定量化を可能にする。調査と展開の対立する2つのタスクは、最先端のサンプリングベースのMPCによって最適化され、トレーニングデータの効率的な収集と、不確実な状態アクション空間の回避に成功した。自動運転車と車輪付きロボットの両方で実験を行い、探索と展開の両方に有望な結果を示す。

In recent years, learning-based control in robotics has gained significant attention due to its capability to address complex tasks in real-world environments. With the advances in machine learning algorithms and computational capabilities, this approach is becoming increasingly important for solving challenging control problems in robotics by learning unknown or partially known robot dynamics. Active exploration, in which a robot directs itself to states that yield the highest information gain, is essential for efficient data collection and minimizing human supervision. Similarly, uncertainty-aware deployment has been a growing concern in robotic control, as uncertain actions informed by the learned model can lead to unstable motions or failure. However, active exploration and uncertainty-aware deployment have been studied independently, and there is limited literature that seamlessly integrates them. This paper presents a unified model-based reinforcement learning framework that bridges these two tasks in the robotics control domain. Our framework uses a probabilistic ensemble neural network for dynamics learning, allowing the quantification of epistemic uncertainty via Jensen-Renyi Divergence. The two opposing tasks of exploration and deployment are optimized through state-of-the-art sampling-based MPC, resulting in efficient collection of training data and successful avoidance of uncertain state-action spaces. We conduct experiments on both autonomous vehicles and wheeled robots, showing promising results for both exploration and deployment.

翻訳日:2023-05-31 00:01:31 公開日:2023-05-28

# 不確定入力を用いたガウス過程回帰に対するベイズ的アプローチ

Bayesian approach to Gaussian process regression with uncertain inputs ( http://arxiv.org/abs/2305.11586v2 )

ライセンス: Link先を確認

Dongwei Ye, Mengwu Guo

(参考訳) 従来のガウス過程の回帰は、モデル観測の出力データにノイズの存在を前提としている。しかし、多くの科学的・工学的応用において、観測データの入力位置は、モデリングの仮定や測定誤差などによる不確実性によっても損なわれる可能性がある。本研究では,ガウス過程の回帰に入力データの可変性を統合するベイズ法を提案する。 2種類のオブザーバブル -- 固定入力を持つノイズ分解出力と、予め分布が定義された不確定入力を持つ出力を考えると、後方分布はベイズフレームワークによって推定され、不確かさデータの位置を推定する。その後、そのような入力の定量化された不確かさを限界化によってガウス過程予測に組み込む。この新しい回帰手法の有効性は、不確定入力のベイズ推定によって予測の不確実性が大幅に減少するのに対し、一般化の一貫して良好な性能が観察されるいくつかの数値例を通して実証される。

Conventional Gaussian process regression exclusively assumes the existence of noise in the output data of model observations. In many scientific and engineering applications, however, the input locations of observational data may also be compromised with uncertainties owing to modeling assumptions, measurement errors, etc. In this work, we propose a Bayesian method that integrates the variability of input data into Gaussian process regression. Considering two types of observables -- noise-corrupted outputs with fixed inputs and those with prior-distribution-defined uncertain inputs, a posterior distribution is estimated via a Bayesian framework to infer the uncertain data locations. Thereafter, such quantified uncertainties of inputs are incorporated into Gaussian process predictions by means of marginalization. The effectiveness of this new regression technique is demonstrated through several numerical examples, in which a consistently good performance of generalization is observed, while a substantial reduction in the predictive uncertainties is achieved by the Bayesian inference of uncertain inputs.

翻訳日:2023-05-31 00:00:00 公開日:2023-05-28

# 言語モデルのポストホック説明は言語モデルを改善することができる

Post Hoc Explanations of Language Models Can Improve Language Models ( http://arxiv.org/abs/2305.11426v2 )

ライセンス: Link先を確認

Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, Himabindu Lakkaraju

(参考訳) 大規模言語モデル(LLM)は複雑なタスクの実行において顕著な能力を示した。さらに、最近の研究では、コンテキスト学習中に人間の注釈付き合理性(例えば、チェーン・オブ・マインド・プロンプト)を組み込むことで、特に推論能力を必要とするタスクにおいて、これらのモデルのパフォーマンスが著しく向上することが示されている。しかし、このような合理性の導入は、高い人間的関与を必要とするため、スケーラビリティの面での課題となる。そこで本研究では, 論理生成のプロセスを自動化することで, 上記の課題に対処する, AMPLIFY(Post Hoc Explanations)を用いたインテクスト学習の活用によるモデルパフォーマンスの増幅手法を提案する。この目的のために,各入力特徴がモデル予測に与える影響を捉えた帰属スコア(説明)を出力するポストホックな説明手法を利用する。より具体的には、ポストホックな説明から洞察を埋め込み、llmに補正信号を提供する自動自然言語理論を構築する。現実世界のデータセットによる大規模な実験により、私たちのフレームワークAMPLIFYは、Chain-of-Thoughtのような注釈付き論理に依存した従来のアプローチが不足するなど、幅広いタスクに対して約10～25%の精度の向上をもたらすことが示されています。本研究は,LLMの有効性を高める貴重なツールとして,ポストホック説明の可能性を強調した最初の試みである。さらに、amplifyの各コンポーネントの影響を実証するために、追加の実証分析とアブレーション研究を行い、その結果として、コンテキスト内学習を洗練するための重要な洞察を導きます。

Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of-Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales poses challenges in terms of scalability as this requires a high degree of human involvement. In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation. To this end, we leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. More specifically, we construct automated natural language rationales that embed insights from post hoc explanations to provide corrective signals to LLMs. Extensive experimentation with real-world datasets demonstrates that our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks, including those where prior approaches which rely on human-annotated rationales such as Chain-of-Thought prompting fall short. Our work makes one of the first attempts at highlighting the potential of post hoc explanations as valuable tools for enhancing the effectiveness of LLMs. Furthermore, we conduct additional empirical analyses and ablation studies to demonstrate the impact of each of the components of AMPLIFY, which, in turn, leads to critical insights for refining in-context learning.

翻訳日:2023-05-30 23:59:44 公開日:2023-05-28

# VisorGPT: 生成的事前学習による視覚的優先学習

VisorGPT: Learning Visual Prior via Generative Pre-Training ( http://arxiv.org/abs/2305.13777v3 )

ライセンス: Link先を確認

Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou

(参考訳) 視覚データ内の様々な物や物は、ディープニューラルネットワークによって学習できる特定の特徴を持ち、モデル内のオブジェクトの位置や形状など、視覚的に先行するものとして暗黙的に表現される。このような事前処理は多くの視覚タスクに影響を与える可能性がある。例えば、条件付き画像合成では、事前に固執しない空間条件は、視覚的に不正確な合成結果をもたらす。この作業は、視覚的事前学習とサンプリングのカスタマイズを可能にすることを目的としている。言語モデリングの進歩に触発されて、私たちはVisorGPTと呼ばれるジェネレーティブ・プレトレーニングを通してビジュアル・プレトレーニングを学ぶことを提案する。オブジェクトの視覚的位置を識別することで、ボックス、人間のポーズ、インスタンスマスクをシーケンスに分割することで、VisorGPTは最大化によって視覚的事前をモデル化することができる。さらに、様々な視覚的位置を統一し、学習前の逐次的な出力のサンプリングをカスタマイズできるようにする。実験の結果、visorgptは視覚前兆を効果的にモデル化できることが示され、例えば、制御ネットのような条件付き画像合成モデルのための正確な人間のポーズをカスタマイズするなど、多くの視覚タスクに使用できる。コードはhttps://github.com/Sierkinhane/VisorGPTでリリースされる。

Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, \emph{e.g.,} object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VisorGPT. By discretizing visual locations of objects, \emph{e.g.,} bounding boxes, human pose, and instance masks, into sequences, VisorGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate that VisorGPT can effectively model the visual prior, which can be employed for many vision tasks, such as customizing accurate human pose for conditional image synthesis models like ControlNet. Code will be released at https://github.com/Sierkinhane/VisorGPT.

翻訳日:2023-05-30 23:53:08 公開日:2023-05-28

# DiffProtect: 顔のプライバシー保護のための拡散モデルを用いた逆例の生成

DiffProtect: Generate Adversarial Examples with Diffusion Models for Facial Privacy Protection ( http://arxiv.org/abs/2305.13625v2 )

ライセンス: Link先を確認

Jiang Liu, Chun Pong Lau, Rama Chellappa

(参考訳) ますます広まりつつある顔認識(FR)システムは、特にソーシャルメディアで写真を公開している何十億ものユーザーにとって、個人のプライバシーに対する深刻な懸念を引き起こしている。いくつかの試みは、暗号化された顔画像を生成するために敵対的攻撃を利用する不正なFRシステムによって個人が識別されるのを防ぐために行われた。しかし、既存の手法は視覚品質の低下や攻撃成功率の低下に苦しむため、実用性が制限される。近年,拡散モデルが画像生成に多大な成功を収めている。拡散モデルは、視覚品質と攻撃性能の両方を改善するために、逆の例を生成するために使用できますか? 本稿では拡散オートエンコーダを用いてFRシステム上で意味論的に意味のある摂動を生成するDiffProtectを提案する。大規模な実験では、DiffProtectは最先端の手法よりも自然に見える暗号化画像を生成する一方で、CelebA-HQとFFHQのデータセットに対する24.5%と25.1%の絶対的な改善など、攻撃の成功率を大きく向上している。

The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. Several attempts have been made to protect individuals from being identified by unauthorized FR systems utilizing adversarial attacks to generate encrypted face images. However, existing methods suffer from poor visual quality or low attack success rates, which limit their utility. Recently, diffusion models have achieved tremendous success in image generation. In this work, we ask: can diffusion models be used to generate adversarial examples to improve both visual quality and attack performance? We propose DiffProtect, which utilizes a diffusion autoencoder to generate semantically meaningful perturbations on FR systems. Extensive experiments demonstrate that DiffProtect produces more natural-looking encrypted images than state-of-the-art methods while achieving significantly higher attack success rates, e.g., 24.5% and 25.1% absolute improvements on the CelebA-HQ and FFHQ datasets.

翻訳日:2023-05-30 23:52:46 公開日:2023-05-28

# ケイ酸塩の導電率予測のための非線形方程式の開発

Development of Non-Linear Equations for Predicting Electrical Conductivity in Silicates ( http://arxiv.org/abs/2305.13519v2 )

ライセンス: Link先を確認

Patrick dos Anjos, Lucas A. Quaresma, Marcelo L. P. Machado

(参考訳) 電気伝導度は電気炉(EAF)において基本的な重要性であり、この現象とプロセススラグとの相互作用はエネルギー損失と低い最適化をもたらす。数学的モデリングは現象の挙動を理解するのに役立ち、人工ニューラルネットワークを介してeafスラグの電気伝導率を予測するのに使われた。最高の人工ニューラルネットワークは、隠れた層に100のニューロンを持ち、6つの予測変数と予測変数、電気伝導率を持つ。平均絶対誤差と絶対誤差の標準偏差を算出し,各予測変数の効果を予測変数に関連付けるために感度解析を行った。

Electrical conductivity is of fundamental importance in electric arc furnaces (EAF) and the interaction of this phenomenon with the process slag results in energy losses and low optimization. As mathematical modeling helps in understanding the behavior of phenomena and it was used to predict the electrical conductivity of EAF slags through artificial neural networks. The best artificial neural network had 100 neurons in the hidden layer, with 6 predictor variables and the predicted variable, electrical conductivity. Mean absolute error and standard deviation of absolute error were calculated, and sensitivity analysis was performed to correlate the effect of each predictor variable with the predicted variable.

翻訳日:2023-05-30 23:52:24 公開日:2023-05-28

# LLMを用いたLLM推論パイプラインの応答長知覚とシーケンススケジューリング

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline ( http://arxiv.org/abs/2305.13144v2 )

ライセンス: Link先を確認

Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You

(参考訳) 大規模言語モデル(LLM)はAIの分野に革命をもたらし、様々なタスクで前例のない能力を示している。しかし、LLMの推論プロセスにはかなりの計算コストが伴う。本稿では,LLMのパワーを利用する効率的なLLM推論パイプラインを提案する。我々のアプローチは、LLMのポテンシャルをタップして、最小限のオーバーヘッドで応答長を正確に知覚し、予測することから始まります。この情報を活用することで、類似の応答長を持つクエリをマイクロバッチにグループ化する効率的なシーケンススケジューリング手法を導入する。 LLaMAモデルを用いて実世界の命令データセットに対するアプローチを評価し,提案手法の有効性を損なうことなく,推論スループットが86%向上したことを示す。特に,本手法は他の推論高速化手法と直交しており,LLM推論のための多くの既存のツールキット(例えば,FlashAttention, Quantization)に付加価値がある。

Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. However, the inference process for LLMs comes with significant computational costs. In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs. Our approach begins by tapping into the potential of LLMs to accurately perceive and predict the response length with minimal overhead. By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches. We evaluate our approach on real-world instruction datasets using the LLaMA-based model, and our results demonstrate an impressive 86% improvement in inference throughput without compromising effectiveness. Notably, our method is orthogonal to other inference acceleration techniques, making it a valuable addition to many existing toolkits (e.g., FlashAttention, Quantization) for LLM inference.

翻訳日:2023-05-30 23:51:59 公開日:2023-05-28

# コード言語モデルを用いたテキストからsqlへの誤り訂正

Text-to-SQL Error Correction with Language Models of Code ( http://arxiv.org/abs/2305.13073v2 )

ライセンス: Link先を確認

Ziru Chen, Shijie Chen, Michael White, Raymond Mooney, Ali Payani, Jayanth Srinivasa, Yu Su, Huan Sun

(参考訳) テキストからsqlへの構文解析の最近の進歩にもかかわらず、現在のセマンティックパーサは実用上十分正確ではない。本稿では,テキストからSQLへの自動誤り訂正モデルの構築方法について検討する。トークンレベルの編集は文脈外であり、時には曖昧であることに気付き、代わりに節レベルの編集モデルを構築することを提案する。また、ほとんどのコードの言語モデルはSQL用に事前訓練されていないが、一般的なデータ構造とPythonのようなプログラミング言語での操作を知っている。そこで本研究では,言語モデルの事前学習コーパスに係わる,SQLクエリとその編集のための新しい表現を提案する。誤差補正モデルは、異なるパーサーの正確なセットマッチング精度を2.4-6.5改善し、2つの強いベースラインに対して最大4.3ポイントの絶対改善を得る。私たちのコードとデータはhttps://github.com/OSU-NLP-Group/Auto-SQL-Correctionで公開されています。

Despite recent progress in text-to-SQL parsing, current semantic parsers are still not accurate enough for practical use. In this paper, we investigate how to build automatic text-to-SQL error correction models. Noticing that token-level edits are out of context and sometimes ambiguous, we propose building clause-level edit models instead. Besides, while most language models of code are not specifically pre-trained for SQL, they know common data structures and their operations in programming languages such as Python. Thus, we propose a novel representation for SQL queries and their edits that adheres more closely to the pre-training corpora of language models of code. Our error correction model improves the exact set match accuracy of different parsers by 2.4-6.5 and obtains up to 4.3 point absolute improvement over two strong baselines. Our code and data are available at https://github.com/OSU-NLP-Group/Auto-SQL-Correction.

翻訳日:2023-05-30 23:51:42 公開日:2023-05-28

# 深部強化学習によるスラムの道路計画

Road Planning for Slums via Deep Reinforcement Learning ( http://arxiv.org/abs/2305.13060v2 )

ライセンス: Link先を確認

Yu Zheng, Hongyuan Su, Jingtao Ding, Depeng Jin, Yong Li

(参考訳) 何百万人ものスラム住民がスラム内の不適切な道路インフラのために都市サービスへのアクセシビリティが低下しており、スラムの道路計画が都市の持続可能な発展に不可欠である。既存の再ブロックやヒューリスティックな手法は、異なるスラムに一般化できない時間を要するか、アクセシビリティや建設コストの観点から最適以下の道路計画が得られる。本稿では,スラムの道路配置を自動的に行うための深層強化学習手法を提案する。本研究では,スラムのトポロジー構造を捉える汎用グラフモデルを提案し,計画道路の場所を選択するための新しいグラフニューラルネットワークを考案する。マスキングポリシー最適化により,スラム内の場所を最小限の建設コストで接続する道路計画を作成することができる。異なる国における実世界のスラムに関する広範囲な実験により、モデルの有効性が検証され、既存のベースラインメソッドに対するアクセシビリティが14.3%向上した。異なるタスク間での移動に関するさらなる調査は、我々のモデルが単純なシナリオで道路計画スキルを習得し、より複雑なシナリオに適応できることを示し、我々のモデルを現実世界のスラムアップグレードに適用する可能性を示している。コードとデータはhttps://github.com/tsinghua-fib-lab/road-planning-for-slumsで入手できる。

Millions of slum dwellers suffer from poor accessibility to urban services due to inadequate road infrastructure within slums, and road planning for slums is critical to the sustainable development of cities. Existing re-blocking or heuristic methods are either time-consuming which cannot generalize to different slums, or yield sub-optimal road plans in terms of accessibility and construction costs. In this paper, we present a deep reinforcement learning based approach to automatically layout roads for slums. We propose a generic graph model to capture the topological structure of a slum, and devise a novel graph neural network to select locations for the planned roads. Through masked policy optimization, our model can generate road plans that connect places in a slum at minimal construction costs. Extensive experiments on real-world slums in different countries verify the effectiveness of our model, which can significantly improve accessibility by 14.3% against existing baseline methods. Further investigations on transferring across different tasks demonstrate that our model can master road planning skills in simple scenarios and adapt them to much more complicated ones, indicating the potential of applying our model in real-world slum upgrading. The code and data are available at https://github.com/tsinghua-fib-lab/road-planning-for-slums.

翻訳日:2023-05-30 23:51:29 公開日:2023-05-28

# Poisson から Gaussian ユニタリアンサンブル統計への移行のための Rosenzweig-Porter モデルの実験的検討

Experimental test of the Rosenzweig-Porter model for the transition from Poisson to Gaussian unitary ensemble statistics ( http://arxiv.org/abs/2305.12840v2 )

ライセンス: Link先を確認

Xiaodong Zhang, Weihua Zhang, Jiongning Che, and Barbara Dietz

(参考訳) 本稿では、積分可能な古典力学を持つ量子系の時間反転不変性(T)とカオス古典的相違性(カオス古典的相違性)に遷移する実験的研究について報告する。高温超伝導マイクロ波共振器を用いて高精度な実験を行い, その中心に位置するフェライトディスクを磁化することにより, T不変性およびカオスダイナミクスを誘導する。エルゴード相, フラクタル相, 局所相を示す多体量子カオスの文脈において, 現在, 集中的な研究が進められているRosenzweig-Porter(RP)モデルのスペクトル特性について, 1000個の固有周波数の完全列を決定し, 解析的予測を行う。さらに、このRPモデルとハイデルベルク法に基づいて、対応する開量子系の散乱(S)行列に対するランダム行列モデルにアプローチし、マイクロ波共振器の測定したS行列のゆらぎ特性を完璧に再現することを示す。

We report on an experimental investigation of the transition of a quantum system with integrable classical dynamics to one with violated time-reversal (T ) invariance and chaotic classical counterpart. High-precision experiments are performed with a flat superconducting microwave resonator with circular shape in which T invariance and a chaotic dynamics are induced by magnetizing a ferrite disk placed at its center. We determine a complete sequence of ' 1000 eigenfrequencies and verify analytical predictions for the spectral properties of the Rosenzweig-Porter (RP) model which, currently, is under intensive study in the context of many-body quantum chaos as it exhibits ergodic, fractal and localized phases. Furthermore, we introduce based on this RP model and the Heidelberg approach a random-matrix model for the scattering (S) matrix of the corresponding open quantum system and show that it perfectly reproduces the fluctuation properties of the measured S matrix of the microwave resonator.

翻訳日:2023-05-30 23:50:27 公開日:2023-05-28

# GPUに基づく並列アルゴリズムによるグラフ解析:量子クラスタリング

Graph Analysis Using a GPU-based Parallel Algorithm: Quantum Clustering ( http://arxiv.org/abs/2305.14641v2 )

ライセンス: Link先を確認

Zhe Wang, ZhiJie He, Ding Liu

(参考訳) 本稿では、グラフ構造に量子クラスタリングを適用する新しい方法を紹介する。量子クラスタリング(Quantum Clustering, QC)は、ポテンシャル関数を構築してクラスター中心を決定する、新しい密度に基づく教師なし学習手法である。本手法では,グラフ勾配降下アルゴリズムを用いてクラスタの中心を探索する。 GPU並列化はポテンシャル値の計算に利用される。また,広く使用されている5つのデータセットについて実験を行い,4つの指標を用いて評価した。その結果,提案手法の性能が向上した。最後に,実験結果に対する$\sigma$の影響について考察する。

The article introduces a new method for applying Quantum Clustering to graph structures. Quantum Clustering (QC) is a novel density-based unsupervised learning method that determines cluster centers by constructing a potential function. In this method, we use the Graph Gradient Descent algorithm to find the centers of clusters. GPU parallelization is utilized for computing potential values. We also conducted experiments on five widely used datasets and evaluated using four indicators. The results show superior performance of the method. Finally, we discuss the influence of $\sigma$ on the experimental results.

翻訳日:2023-05-30 23:42:39 公開日:2023-05-28

# WinDB: HMDフリーで歪みのないパノラマビデオ固定学習

WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning ( http://arxiv.org/abs/2305.13901v2 )

ライセンス: Link先を確認

Guotao Wang, Chenglizhao Chen, Aimin Hao, Hong Qin, Deng-Ping Fan

(参考訳) これまで、パンオプティカルビデオで固定コレクションを行う方法は、hmdを装着しながら参加者の固定を収集し、与えられたパンオプティカルシーンを自由に探索するヘッドマウントディスプレイ(hmd)に基づいている。しかし、この広範に使用されているデータ収集手法は、間欠的な有意なイベントを含む場合、与えられたパノプティクス内のどの領域が最も重要であるかを正確に予測する深層モデルの訓練には不十分である。主な理由は、参加者が常にパン光学シーン全体を探索するために頭を回転させ続けることができないため、HMDを使用して固定を収集する際、常に「盲ズーム」が存在するからである。その結果、収集された固定は一部のローカルビューに閉じ込められがちであり、残りの領域は「盲ズーム」である。したがって、局所的なビューを蓄積するHMDベースの手法を用いて収集した固定データは、複雑なパノラマシーンの全体的重要性を正確に表すことはできない。本稿では,HMDを必要とせず,失明を伴わないパンオプティカルビデオに対して,動的ブラリング(WinDB)による補助窓を提案する。したがって、収集された固定は地域的重要性の度合いをよく反映することができる。 WinDBアプローチを使用して、225以上のカテゴリをカバーする300のパノプティクスクリップを含む、新しいPanopticVideo-300データセットをリリースしました。さらに,我々はpanopticvideo-300をフル活用し,ブラインドブルームフリー属性による固定シフト問題に対処するためのシンプルなベースライン設計を提案した。

To date, the widely-adopted way to perform fixation collection in panoptic video is based on a head-mounted display (HMD), where participants' fixations are collected while wearing an HMD to explore the given panoptic scene freely. However, this widely-used data collection method is insufficient for training deep models to accurately predict which regions in a given panoptic are most important when it contains intermittent salient events. The main reason is that there always exist "blind zooms" when using HMD to collect fixations since the participants cannot keep spinning their heads to explore the entire panoptic scene all the time. Consequently, the collected fixations tend to be trapped in some local views, leaving the remaining areas to be the "blind zooms". Therefore, fixation data collected using HMD-based methods that accumulate local views cannot accurately represent the overall global importance of complex panoramic scenes. This paper introduces the auxiliary Window with a Dynamic Blurring (WinDB) fixation collection approach for panoptic video, which doesn't need HMD and is blind-zoom-free. Thus, the collected fixations can well reflect the regional-wise importance degree. Using our WinDB approach, we have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories. Besides, we have presented a simple baseline design to take full advantage of PanopticVideo-300 to handle the blind-zoom-free attribute-induced fixation shifting problem.

翻訳日:2023-05-30 23:40:05 公開日:2023-05-28

# AI Coach Assist: エージェントコーチングのためのコンタクトセンターにおけるコールレコメンデーション自動化アプローチ

AI Coach Assist: An Automated Approach for Call Recommendation in Contact Centers for Agent Coaching ( http://arxiv.org/abs/2305.17619v1 )

ライセンス: Link先を確認

Md Tahmid Rahman Laskar, Cheng Chen, Xue-Yong Fu, Mahsa Azizi, Shashi Bhushan, Simon Corston-Oliver

(参考訳) 近年,コンタクトセンター産業における人工知能(AI)の利用が増加している。 aiが大きな影響を与える領域の1つは、コンタクトセンターエージェントのコーチングである。自然言語処理(NLP)技術を用いてコールトランスクリプトを解析することにより、コーチング目的に最も関係のある呼び出しを素早く判断することができる。本稿では,予め学習したトランスフォーマーベースの言語モデルを用いて,コンタクトセンタマネージャや管理者が求めた品質保証(qa)質問に基づいて,あるコールがコーチ可能かどうかを判断するai coach assistを提案する。このシステムは、現実世界のコンタクトセンタから収集された大規模なデータセットでトレーニングと評価が行われ、コーチ可能なモーメントを含む可能性が高いコンタクトセンタマネージャへのコールを推奨する効果的な方法を提供する。実験の結果,AIコーチ支援がコーチングプロセスを改善する可能性を示し,コンタクトセンターエージェントの性能を高めることができた。

In recent years, the utilization of Artificial Intelligence (AI) in the contact center industry is on the rise. One area where AI can have a significant impact is in the coaching of contact center agents. By analyzing call transcripts using Natural Language Processing (NLP) techniques, it would be possible to quickly determine which calls are most relevant for coaching purposes. In this paper, we present AI Coach Assist, which leverages the pre-trained transformer-based language models to determine whether a given call is coachable or not based on the quality assurance (QA) questions asked by the contact center managers or supervisors. The system was trained and evaluated on a large dataset collected from real-world contact centers and provides an effective way to recommend calls to the contact center managers that are more likely to contain coachable moments. Our experimental findings demonstrate the potential of AI Coach Assist to improve the coaching process, resulting in enhancing the performance of contact center agents.

翻訳日:2023-05-30 17:59:01 公開日:2023-05-28

# ビジュアルクエリを2次元でローカライズするベイズ決定法

Bayesian Decision Making to Localize Visual Queries in 2D ( http://arxiv.org/abs/2305.17611v1 )

ライセンス: Link先を確認

Syed Asjad, Aniket Gupta, Hanumant Singh

(参考訳) 本稿では,EGO4D 2023 Visual Query 2D Localization Challengeに対する我々のアプローチについて述べる。本手法は,視覚的作物と提案する境界ボックスとの類似性が高いために生じる偽陽性(FP)の数を,ベースラインの地域提案ネットワーク(RPN)から削減することを目的としている。提案手法は,より高次元の類似性を決定するためにトランスフォーマを用いている。結果は,シムズヘッドの低次元の類似度と組み合わせて測定を行い,提案した境界箱との視覚的作物の最終的な類似度を決定するために,後部を生成する。私たちのコードは$\href{https://github.com/s-m-asjad/ego4d_vq2d}{here}$です。

This report describes our approach for the EGO4D 2023 Visual Query 2D Localization Challenge. Our method aims to reduce the number of False Positives (FP) that occur because of high similarity between the visual crop and the proposed bounding boxes from the baseline's Region Proposal Network (RPN). Our method uses a transformer to determine similarity in higher dimensions which is used as our prior belief. The results are then combined together with the similarity in lower dimensions from the Siamese Head, acting as our measurement, to generate a posterior which is then used to determine the final similarity of the visual crop with the proposed bounding box. Our code is publicly available $\href{https://github.com/s-m-asjad/EGO4D_VQ2D}{here}$.

翻訳日:2023-05-30 17:58:45 公開日:2023-05-28

# スピノルマター波の精密ラマン制御のための複合偏回転

Composite Biased Rotations for Precise Raman Control of Spinor Matterwaves ( http://arxiv.org/abs/2305.17610v1 )

ライセンス: Link先を確認

Liyang Qiu, Haidong Yuan and Saijun Wu

(参考訳) ラマン励起による超微粒子の精密制御は、原子ベースの量子テクノロジーのクラスに寄与する。我々は,ラマン励起電力効率と制御速度,励起状態断熱除去,自発的放出抑制条件のバランスを選択できる単光子デチューニング中間状態におけるアルカリ原子のラマンスピノル制御手法について検討した。ラマン結合による原子スピノルの回転は、実質的な光シフトによってバイアスを受ける。固定バイアス角を利用して、超微細な基底状態とレーザー照射が強い不均一な場合にも、複合偏光回転を最適化して、ナノ秒内で正確なエンサンブルスピノルマター波制御を可能にすることを示す。本手法は光パルス原子干渉計の技術的ギャップを埋め、中程度のレーザーパワーで高速ラマンスピノル物質波制御を実現する。

Precise control of hyperfine matterwaves via Raman excitations is instrumental to a class of atom-based quantum technology. We investigate the Raman spinor control technique for alkaline atoms in an intermediate regime of single-photon detuning where a choice can be made to balance the Raman excitation power efficiency with the control speed, excited-state adiabatic elimination, and spontaneous emission suppression requirements. Within the regime, rotations of atomic spinors by the Raman coupling are biased by substantial light shifts. Taking advantage of the fixed bias angle, we show that composite biased rotations can be optimized to enable precise ensemble spinor matterwave control within nanoseconds, even for multiple Zeeman pseudo-spins defined on the hyperfine ground states and when the laser illumination is strongly inhomogeneous. Our scheme fills a technical gap in light pulse atom interferometry, for achieving high speed Raman spinor matterwave control with moderate laser power.

翻訳日:2023-05-30 17:58:32 公開日:2023-05-28

# 大規模言語モデルに付随する逆崩壊

Reward Collapse in Aligning Large Language Models ( http://arxiv.org/abs/2305.17608v1 )

ライセンス: Link先を確認

Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su

(参考訳) ChatGPTやGPT-4のような大きな言語モデル(LLM)の異常な能力は、人間の好みに基づいて訓練された報酬モデルと整合させることによって、部分的には解放される。本稿では,一般的なランキングに基づくアプローチが,トレーニングの終盤におけるプロンプトの<textit{reward collapse}>分布を<textit{identical} reward distribution \textit{regardless}とする経験的観察である<textit{reward collapse}>の現象について述べる。この結果が望ましくないのは、例えば『あなたの親友について短い物語を書く』のようなオープンエンドプロンプトは、その完成に対して連続的な報酬を与えるべきであり、『ニュージーランドの首都である』のような特定のプロンプトは、高いまたは低い報酬を生成するべきである。我々の理論的調査により,報酬の崩壊は,主として最適化中にプロンプト関連情報を取り込むためのランキングに基づく客観的関数の不足によるものであることが明らかとなった。この洞察により、漸近的な方法でユーティリティ関数の集合に付随する報酬分布に対する閉形式表現を導出することができる。報酬の崩壊を克服するため,インタプリケーション・アウェア・最適化方式を導入し,インタプリケーション・レシスタンス内での報酬分布を確実に認める。提案するプロンプトアウェア効用関数は,報酬モデルのトレーニング中の報酬崩壊を著しく軽減することが示唆された。

The extraordinary capabilities of large language models (LLMs) such as ChatGPT and GPT-4 are in part unleashed by aligning them with reward models that are trained on human preferences, which are often represented as rankings of responses to prompts. In this paper, we document the phenomenon of \textit{reward collapse}, an empirical observation where the prevailing ranking-based approach results in an \textit{identical} reward distribution \textit{regardless} of the prompts during the terminal phase of training. This outcome is undesirable as open-ended prompts like ``write a short story about your best friend'' should yield a continuous range of rewards for their completions, while specific prompts like ``what is the capital of New Zealand'' should generate either high or low rewards. Our theoretical investigation reveals that reward collapse is primarily due to the insufficiency of the ranking-based objective function to incorporate prompt-related information during optimization. This insight allows us to derive closed-form expressions for the reward distribution associated with a set of utility functions in an asymptotic regime. To overcome reward collapse, we introduce a prompt-aware optimization scheme that provably admits a prompt-dependent reward distribution within the interpolating regime. Our experimental results suggest that our proposed prompt-aware utility functions significantly alleviate reward collapse during the training of reward models.

翻訳日:2023-05-30 17:58:15 公開日:2023-05-28

# more than classification:イベント時間関係抽出のための統一フレームワーク

More than Classification: A Unified Framework for Event Temporal Relation Extraction ( http://arxiv.org/abs/2305.17607v1 )

ライセンス: Link先を確認

Quzhe Huang, Yutong Hu, Shengqi Zhu, Yansong Feng, Chang Liu, Dongyan Zhao

(参考訳) イベント時間関係抽出~(etre)は、通常マルチラベル分類タスクとして定式化され、各タイプの関係は単に1つのホットラベルとして扱われる。この定式化は関係の意味を無視し、固有の依存関係を消去する。 ETREタスクにおける関係定義を調べた結果,イベントの開始点と終了点を用いてすべての関係を解釈できることがわかった。例えば、関係 \textit{includes} は、イベント 1 がイベント 2 から始まり、イベント 2 よりも早く終わると解釈できる。本稿では,時間関係を時間軸の論理式に変換し,ある時間軸対間の関係を予測してetreを完了させる統一イベント時間関係抽出フレームワークを提案する。 TB-DenseとMATRESの実験では、強いベースラインよりも大幅に改善され、両方のデータセットで最先端モデルよりも0.35%向上した。統一されたフレームワークにおけるすべての関係を表現することにより、適切なデータとの関係を利用して、他の関係の学習を支援し、低データシナリオにおける安定した改善を実現することができる。関係定義が変更されると、時間ポイントを新しいイベントリレーションにマップするロジック式を単純に変更することで、新しいものに素早く適応することができる。コードは \url{https://github.com/AndrewZhe/A-Unified-Framework-for-ETRE} でリリースされる。

Event temporal relation extraction~(ETRE) is usually formulated as a multi-label classification task, where each type of relation is simply treated as a one-hot label. This formulation ignores the meaning of relations and wipes out their intrinsic dependency. After examining the relation definitions in various ETRE tasks, we observe that all relations can be interpreted using the start and end time points of events. For example, relation \textit{Includes} could be interpreted as event 1 starting no later than event 2 and ending no earlier than event 2. In this paper, we propose a unified event temporal relation extraction framework, which transforms temporal relations into logical expressions of time points and completes the ETRE by predicting the relations between certain time point pairs. Experiments on TB-Dense and MATRES show significant improvements over a strong baseline and outperform the state-of-the-art model by 0.3\% on both datasets. By representing all relations in a unified framework, we can leverage the relations with sufficient data to assist the learning of other relations, thus achieving stable improvement in low-data scenarios. When the relation definitions are changed, our method can quickly adapt to the new ones by simply modifying the logic expressions that map time points to new event relations. The code is released at \url{https://github.com/AndrewZhe/A-Unified-Framework-for-ETRE}.

翻訳日:2023-05-30 17:57:48 公開日:2023-05-28

# 機械設計による光格子原子干渉計

A Machine-Designed Optical Lattice Atom Interferometer ( http://arxiv.org/abs/2305.17603v1 )

ライセンス: Link先を確認

Catie LeDesma, Kendall Mehling, Jieqiu Shao, John Drew Wilson, Penina Axelrad, Marco M. Nicotra, Murray Holland, and Dana Z. Anderson

(参考訳) 光の定常波によって形成される光学格子における干渉測定は、原子を光学ポテンシャルによって閉じ込めて操作できるため、その自由空間等価性に対して潜在的に有利である。このような干渉計を1次元格子で示し、その周期中に多くの段階で波動関数をイメージングし再構成することで原子を制御する能力を示す。加速信号が適用され、量子理論に従って囲む時間空間領域に対して、得られる性能は可能な限り最適に近いものとなる。われわれの機械設計の手法は、センサーをオンザフライで再構成可能とし、スケールアップすれば、さまざまな潜在的な応用が可能な最先端の慣性・重力センサーを作れる可能性がある。

Performing interferometry in an optical lattice formed by standing waves of light offers potential advantages over its free-space equivalents since the atoms can be confined and manipulated by the optical potential. We demonstrate such an interferometer in a one dimensional lattice and show the ability to control the atoms by imaging and reconstructing the wavefunction at many stages during its cycle. An acceleration signal is applied and the resulting performance is seen to be close to the optimum possible for the time-space area enclosed according to quantum theory. Our methodology of machine design enables the sensor to be reconfigurable on the fly, and when scaled up, offers the potential to make state-of-the art inertial and gravitational sensors that will have a wide range of potential applications.

翻訳日:2023-05-30 17:57:25 公開日:2023-05-28

# 適切なスコアリングルールによる正直なパフォーマンス予測の動機付け

Incentivizing honest performative predictions with proper scoring rules ( http://arxiv.org/abs/2305.17601v1 )

ライセンス: Link先を確認

Caspar Oesterheld, Johannes Treutlein, Emery Cooper, Rubi Hudson

(参考訳) 適切なスコアリングルールは、予測が結果に影響を及ぼさないと仮定して、専門家に信念を正確に報告するインセンティブを与える。この仮定を緩和し、予測が実行可能である場合、すなわち株式市場に関する公開予測を行う場合など、予測の結果に影響を与える場合のインセンティブを調査します。予測は、その予測がなされた後の専門家の信念を正確に反映するならば、不動点であると言える。この設定では、期待スコアを最大化するレポートは専門家の信念を反映せず、そのようなレポートの正確性に限界を与える。二項予測に対して、専門家の予測が結果に与える影響が限定されている場合、最適なレポートが任意に固定点に近づくスコアリングルールを定義することができる。しかし、これは2つ以上の結果に対する予測では不可能である。また、おもちゃの設定で数値シミュレーションを行い、いくつかの状況では境界がきついこと、予測誤差がかなり大きいこと(5～10%以上)を示しました。最後に,最適性の代替概念について検討し,不動点の報告にインセンティブを与えることを示す。

Proper scoring rules incentivize experts to accurately report beliefs, assuming predictions cannot influence outcomes. We relax this assumption and investigate incentives when predictions are performative, i.e., when they can influence the outcome of the prediction, such as when making public predictions about the stock market. We say a prediction is a fixed point if it accurately reflects the expert's beliefs after that prediction has been made. We show that in this setting, reports maximizing expected score generally do not reflect an expert's beliefs, and we give bounds on the inaccuracy of such reports. We show that, for binary predictions, if the influence of the expert's prediction on outcomes is bounded, it is possible to define scoring rules under which optimal reports are arbitrarily close to fixed points. However, this is impossible for predictions over more than two outcomes. We also perform numerical simulations in a toy setting, showing that our bounds are tight in some situations and that prediction error is often substantial (greater than 5-10%). Lastly, we discuss alternative notions of optimality, including performative stability, and show that they incentivize reporting fixed points.

翻訳日:2023-05-30 17:57:12 公開日:2023-05-28

# ゲームアップ: 軌道予測のためのゲームアウェアモード列挙と理解

GAME-UP: Game-Aware Mode Enumeration and Understanding for Trajectory Prediction ( http://arxiv.org/abs/2305.17600v1 )

ライセンス: Link先を確認

Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman

(参考訳) 道路エージェント間の相互作用は、特に複数のエージェントを含む場合において、軌道予測において重要な課題となる。既存の多様性を考慮した予測器はマルチエージェント予測のインタラクティブな性質を考慮しないため、これらの重要な相互作用の結果を見逃す可能性がある。本稿では,ゲーム理論の逆強化学習を活用し,マルチモーダル予測のカバレッジを向上させるための軌道予測フレームワークであるGAME-UPを提案する。我々は,エージェントの行動の分類を仮定することなく,学習時間ゲーム理論の数値解析を補助的損失として使用し,カバレッジと精度を改善した。 Waymo Open Motion Datasetのインタラクティブなサブセットに対して,対話性の高いシナリオを含む3つのサブセットを含むアプローチを実証する。実験の結果、予測器はベースラインモデルに比べて2倍の相互作用をカバーし、正確な予測を行うことがわかった。

Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose GAME-UP, a framework for trajectory prediction that leverages game-theoretic inverse reinforcement learning to improve coverage of multi-modal predictions. We use a training-time game-theoretic numerical analysis as an auxiliary loss resulting in improved coverage and accuracy without presuming a taxonomy of actions for the agents. We demonstrate our approach on the interactive subset of Waymo Open Motion Dataset, including three subsets involving scenarios with high interaction complexity. Experiment results show that our predictor produces accurate predictions while covering twice as many possible interactions versus a baseline model.

翻訳日:2023-05-30 17:56:53 公開日:2023-05-28

# キャタピラーを使って小さな画像をくすぐる

Using Caterpillar to Nibble Small-Scale Images ( http://arxiv.org/abs/2305.17644v1 )

ライセンス: Link先を確認

Jin Sun, Xiaoshuang Shi, Zhiyuan Weng, Kaidi Xu, Heng Tao Shen and Xiaofeng Zhu

(参考訳) 近年、MLPベースのモデルは人気を博し、中規模のデータセット(例えば ImageNet-1k)で大きなパフォーマンスを達成した。しかし、小規模な画像への直接の応用は限られている。この問題に対処するため,我々は,局所性の帰納的バイアスを生かしたShifted-Pillars-Concatenation (SPC) のキーモジュールを提案することで,新たなMLPベースのネットワークであるCaterpillarを設計する。 spcは、画像内のすべての柱を異なる方向に沿って移動させてコピーを生成するピラーシフトと、シフトされたコピーの離散シフト近傍から局所情報をキャプチャするピラー結合の2つのプロセスからなる。大規模な実験では、人気のある小規模データセット上でのスケーラビリティと優れたパフォーマンス、ImageNet-1Kの最近の最先端メソッドとの競合性能が実証されている。

Recently, MLP-based models have become popular and attained significant performance on medium-scale datasets (e.g., ImageNet-1k). However, their direct applications to small-scale images remain limited. To address this issue, we design a new MLP-based network, namely Caterpillar, by proposing a key module of Shifted-Pillars-Concatenation (SPC) for exploiting the inductive bias of locality. SPC consists of two processes: (1) Pillars-Shift, which is to shift all pillars within an image along different directions to generate copies, and (2) Pillars-Concatenation, which is to capture the local information from discrete shift neighborhoods of the shifted copies. Extensive experiments demonstrate its strong scalability and superior performance on popular small-scale datasets, and the competitive performance on ImageNet-1K to recent state-of-the-art methods.

翻訳日:2023-05-30 17:48:15 公開日:2023-05-28

# 演算子のエンタングリング能力

Entangling capacity of operators ( http://arxiv.org/abs/2305.17636v1 )

ライセンス: Link先を確認

Manas K Patra

(参考訳) 複合量子システムに作用するユニタリ演算子$U$を与えられた場合、絡み合う容量は$U$? この質問は幾何学的アプローチで検討される。ユニタリ群上の計量によって定義される絡み合い容量は \emph{minimax} 問題に繋がる。双対問題である \emph{maximin} は並列に研究され、慣れ親しんだ絡み合い測度が得られる。一般化制御作用素と呼ばれる絡み合い作用素のクラスが定義される。この作用素のクラスに対する絡み合うキャパシティとその他の性質について研究する。

Given a unitary operator $U$ acting on a composite quantum system what is the entangling capacity of $U$? This question is investigated using a geometric approach. The entangling capacity, defined via metrics on the unitary groups, leads to a \emph{minimax} problem. The dual, a \emph{maximin} problem, is investigated in parallel and yields some familiar entanglement measures. A class of entangling operators, called generalized control operators is defined. The entangling capacities and other properties for this class of operators is studied.

翻訳日:2023-05-30 17:47:58 公開日:2023-05-28

# DPFormer: 長期データによる個人差分変換器の学習

DPFormer: Learning Differentially Private Transformer on Long-Tailed Data ( http://arxiv.org/abs/2305.17633v1 )

ライセンス: Link先を確認

Youlong Ding, Xueyang Wu, Hao Wang and Weike Pan

(参考訳) Transformerは幅広いアプリケーションを持つ汎用的で効果的なアーキテクチャとして登場した。しかし、高ユーティリティのTransformerモデルを異なるプライバシー保証で効率的にトレーニングする方法は、まだ未解決のままである。本稿では,差分秘密変換器の学習における2つの重要な課題,すなわち,サンプルごとの勾配切り抜きや注意機構内の意図しない注意散らしによる計算オーバーヘッドについて述べる。そこで我々は,これらの課題に対処するため,Phantom ClippingとRe-Attention Mechanismを備えたDPFormerを提案する。我々の理論的分析は,DPFormerが勾配クリッピングの際の計算コストを低減し,注意散逸を効果的に軽減できることを示唆している(これはトレーニング過程を阻害し,特に長期データの存在下では顕著な性能低下につながる可能性がある)。このような分析は、2つの実世界のデータセットに対する実験結果によってさらに裏付けられ、提案したDPFormerの有効性と有効性を示す。

The Transformer has emerged as a versatile and effective architecture with broad applications. However, it still remains an open problem how to efficiently train a Transformer model of high utility with differential privacy guarantees. In this paper, we identify two key challenges in learning differentially private Transformers, i.e., heavy computation overhead due to per-sample gradient clipping and unintentional attention distraction within the attention mechanism. In response, we propose DPFormer, equipped with Phantom Clipping and Re-Attention Mechanism, to address these challenges. Our theoretical analysis shows that DPFormer can reduce computational costs during gradient clipping and effectively mitigate attention distraction (which could obstruct the training process and lead to a significant performance drop, especially in the presence of long-tailed data). Such analysis is further corroborated by empirical results on two real-world datasets, demonstrating the efficiency and effectiveness of the proposed DPFormer.

翻訳日:2023-05-30 17:47:49 公開日:2023-05-28

# 高スケーラブルユニバーサルユニタリのためのプログラム可能なフォトニック時間回路

Programmable photonic time circuits for highly scalable universal unitaries ( http://arxiv.org/abs/2305.17632v1 )

ライセンス: Link先を確認

Xianji Piao, Sunkyu Yu, and Namkyoo Park

(参考訳) プログラマブルフォトニック回路 (Programmable Photonic circuits, PPC) は、ディープラーニング加速と普遍量子計算の実現に多大な関心を集めている。 PPCを用いたフォトニック計算は、超高速な演算、エネルギー効率のマトリックス計算、室温量子状態などの重要な利点があるが、そのスケーラビリティの低さは産業アプリケーションに必要な統合を妨げている。この課題は、従来のPPCにおける伝搬光を用いた一時的ワンショット操作から生じ、デバイスフットプリントの光速増加につながる。本稿では,フォン・ノイマンアーキテクチャと量子計算におけるゲートサイクリングに類似した時間サイクル計算を用いた,プログラマブルフォトニック時間回路の概念を提案する。ビルディングブロックとして、波長可変共振を持つ2つの共振器からなる再構成可能なsu(2)タイムゲートを開発し、時間符号化されたデュアルチャネルゲージフィールドを介して結合する。我々はSU(2)時間ゲートの系統的な組立を用いて高忠実度なU(N)演算を実証し、フットプリントとゲート数の両方においてO(N^2)からO(N)へのスケーラビリティの向上を実現した。これにより、産業レベルのPPC実装を大規模に統合する道が開ける。

Programmable photonic circuits (PPCs) have garnered substantial interest in achieving deep learning accelerations and universal quantum computations. Although photonic computation using PPCs offers critical advantages, including ultrafast operation, energy-efficient matrix calculation and room-temperature quantum states, its poor scalability impedes the integration required for industrial applications. This challenge arises from the temporally one-shot operation using propagating light in conventional PPCs, which leads to the light-speed increase of device footprints. Here we propose a concept of programmable photonic time circuits, which employ time-cycle-based computations analogous to the gate cycling in the von Neumann architecture and quantum computation. As a building block, we develop a reconfigurable SU(2) time gate composed of two resonators, which have tunable resonances and are coupled through time-coded dual-channel gauge fields. We demonstrate universal U(N) operations with high fidelity using the systematic assembly of the SU(2) time gates, achieving improved scalability from O(N^2) to O(N) in both the footprint and gate number. This result opens a pathway to industrial-level PPC implementation in very large-scale integration.

翻訳日:2023-05-30 17:47:32 公開日:2023-05-28

# 射影演算子に基づくニュートンステップを用いた量子最適制御問題の解法

How to solve Quantum Optimal Control Problems using Projection Operator-based Newton Steps ( http://arxiv.org/abs/2305.17630v1 )

ライセンス: Link先を確認

Jieqiu Shao, Mantas Naris, John Hauser and Marco M. Nicotra

(参考訳) 量子PRojection Operator-based Newton method for Trajectory Optimization(Q-PRONTO)は、量子最適制御問題の解法である。本稿では,各繰り返しの解推定を安定化させるレギュレータを導入することにより,先行バージョンの量子投影演算子を著しく改善する。この修正はアルゴリズムの収束率を向上させるだけでなく、非規制の場合と比較して解法をより局所的な最小化へと導くことが示されている。数値的な例では、Q-PRONTOは、時間的なコストと過渡期に避けるべき望ましくない人口を含む、多入力の量子最適制御問題の解決に使用できる。

The Quantum PRojection Operator-based Newton method for Trajectory Optimization, a.k.a. Q-PRONTO, is a numerical method for solving quantum optimal control problems. This paper significantly improves prior versions of the quantum projection operator by introducing a regulator that stabilizes the solution estimate at every iteration. This modification is shown to not only improve the convergence rate of the algorithm, but also steer the solver towards better local minima compared to the un-regulated case. Numerical examples showcase Q-PRONTO can be used to solve multi-input quantum optimal control problems featuring time-varying costs and undesirable populations that ought to be avoided during the transient.

翻訳日:2023-05-30 17:47:11 公開日:2023-05-28

# 残意障害を考慮した頑健な自然言語理解

Robust Natural Language Understanding with Residual Attention Debiasing ( http://arxiv.org/abs/2305.17627v1 )

ライセンス: Link先を確認

Fei Wang, James Y. Huang, Tianyi Yan, Wenxuan Zhou, Muhao Chen

(参考訳) 自然言語理解(NLU)モデルは意図しないデータセットバイアスに悩まされることが多い。バイアス緩和手法のうち、アンサンブルに基づくデバイアス手法、特にpoe(product-of-experts)は印象的な成功を収めている。しかしながら、従来のアンサンブルベースのデバイアス手法は、一般的に、バイアスのある注意パターンを直接扱うことなく、トップレベルのロジットにデバイアスを適用する。注意力はplmにおける機能インタラクションと集約の主要なメディアとなり、堅牢な予測を提供する上で重要な役割を果たす。本稿では,注意から意図しないバイアスを緩和するエンド・ツー・エンド・デバイアス手法であるresent attention debiasing (read)を提案する。 3つのNLUタスクの実験では、READはショートカットを除去したOODデータ上でのBERTベースのモデルの性能を著しく改善し、HANSでは+12.9%、FEVER-Symmetricでは+11.0%、PAWSでは+2.7%である。詳細な分析により、ロバストなnluモデルにおける偏りのない注意の重要役割が示され、読解は注意のバイアスを効果的に軽減する。コードはhttps://github.com/luka-group/readで入手できる。

Natural language understanding (NLU) models often suffer from unintended dataset biases. Among bias mitigation methods, ensemble-based debiasing methods, especially product-of-experts (PoE), have stood out for their impressive empirical success. However, previous ensemble-based debiasing methods typically apply debiasing on top-level logits without directly addressing biased attention patterns. Attention serves as the main media of feature interaction and aggregation in PLMs and plays a crucial role in providing robust prediction. In this paper, we propose REsidual Attention Debiasing (READ), an end-to-end debiasing method that mitigates unintended biases from attention. Experiments on three NLU tasks show that READ significantly improves the performance of BERT-based models on OOD data with shortcuts removed, including +12.9% accuracy on HANS, +11.0% accuracy on FEVER-Symmetric, and +2.7% F1 on PAWS. Detailed analyses demonstrate the crucial role of unbiased attention in robust NLU models and that READ effectively mitigates biases in attention. Code is available at https://github.com/luka-group/READ.

翻訳日:2023-05-30 17:46:59 公開日:2023-05-28

# 事前学習言語モデルを用いた文脈分析

In-Context Analogical Reasoning with Pre-Trained Language Models ( http://arxiv.org/abs/2305.17626v1 )

ライセンス: Link先を確認

Xiaoyang Hu, Shane Storks, Richard L. Lewis, Joyce Chai

(参考訳) アナロジカル推論は人間の認知の基本的な能力であり、過去の経験に関連付けて、新しい状況を抽象的に推論することができる。 aiシステムのロバストな推論には不可欠と考えられているが、従来のアプローチでは、ベンチマークタスクに適用するには、重要なトレーニングとドメイン知識のハードコーディングが必要となる。人間の言語とアナロジー作成の関連を見出した認知科学の研究に触発され、aiシステムにおけるアナロジーをサポートするために直感的な言語ベースの抽象化の使用を探求する。具体的には、一般的な関係推論テストである visual raven's progressive matrices (rpm) に、大きな事前学習言語モデル (plm) を適用する。問題の知覚的特徴を言語形式に符号化することで、PLMはゼロショットリレーショナル推論に顕著な能力を示し、人間のパフォーマンスを超え、教師付き視覚ベースの手法に近づいた。タスク特徴よりも抽象化のレベルが異なる異なるエンコーディングを探索し、より高いレベルの抽象化がPLMのアナログ推論をさらに強化することを発見した。詳細な分析により,rpmタスク解決におけるモデル複雑性,インコンテキスト学習,事前知識の役割に関する知見が明らかになった。

Analogical reasoning is a fundamental capacity of human cognition that allows us to reason abstractly about novel situations by relating them to past experiences. While it is thought to be essential for robust reasoning in AI systems, conventional approaches require significant training and/or hard-coding of domain knowledge to be applied to benchmark tasks. Inspired by cognitive science research that has found connections between human language and analogy-making, we explore the use of intuitive language-based abstractions to support analogy in AI systems. Specifically, we apply large pre-trained language models (PLMs) to visual Raven's Progressive Matrices (RPM), a common relational reasoning test. By simply encoding the perceptual features of the problem into language form, we find that PLMs exhibit a striking capacity for zero-shot relational reasoning, exceeding human performance and nearing supervised vision-based methods. We explore different encodings that vary the level of abstraction over task features, finding that higher-level abstractions further strengthen PLMs' analogical reasoning. Our detailed analysis reveals insights on the role of model complexity, in-context learning, and prior knowledge in solving RPM tasks.

翻訳日:2023-05-30 17:46:34 公開日:2023-05-28

# バリューガイドデータフィルタリングによるクロスドメインポリシー適応

Cross-Domain Policy Adaptation via Value-Guided Data Filtering ( http://arxiv.org/abs/2305.17625v1 )

ライセンス: Link先を確認

Kang Xu, Chenjia Bai, Xiaoteng Ma, Dong Wang, Bin Zhao, Zhen Wang, Xuelong Li, Wei Li

(参考訳) 動的ミスマッチによるドメイン間のポリシーの一般化は、強化学習において重要な課題となる。例えば、ロボットはシミュレータでポリシーを学習するが、現実の世界にデプロイされると、環境のダイナミクスが異なる可能性がある。動的ミスマッチのあるソースドメインとターゲットドメインを考えると、ターゲットドメインとのオンラインインタラクションが制限されている間にエージェントが十分なソースドメインデータにアクセスすることができるオンラインダイナミクス適応問題を考える。既存の研究は、ダイナミクスの不一致の観点からこの問題を解決しようと試みている。本稿では、これらの手法の限界を明らかにし、ドメイン間の価値整合性に関する新しい洞察を通して、価値差の観点から問題を探求する。具体的には、2つの領域にまたがるペア値ターゲットの近接性に基づいて、ソースドメインからの遷移を選択的に共有するバリューガイドデータフィルタリング(VGDF)アルゴリズムを提案する。キネマティック・モルフォロジーシフトを用いた各種環境における実験結果から,従来の手法よりも優れた性能が得られることが示された。

Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. For example, a robot learns the policy in a simulator, but when it is deployed in the real world, the dynamics of the environment may be different. Given the source and target domain with dynamics mismatch, we consider the online dynamics adaptation problem, in which case the agent can access sufficient source domain data while online interactions with the target domain are limited. Existing research has attempted to solve the problem from the dynamics discrepancy perspective. In this work, we reveal the limitations of these methods and explore the problem from the value difference perspective via a novel insight on the value consistency across domains. Specifically, we present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets across the two domains. Empirical results on various environments with kinematic and morphology shifts demonstrate that our method achieves superior performance compared to prior approaches.

翻訳日:2023-05-30 17:46:13 公開日:2023-05-28

# SimpSON: シングルクリックディストリクトオブジェクトセグメンテーションネットワークによる写真クリーンアップの簡易化

SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network ( http://arxiv.org/abs/2305.17624v1 )

ライセンス: Link先を確認

Chuong Huynh, Yuqian Zhou, Zhe Lin, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Abhinav Shrivastava

(参考訳) 写真編集では、視覚障害を取り除き、全体的な画質を改善し、第一主題を強調するのが一般的である。しかし、これら小さくて密集した散逸した領域を手動で選別・取り除くことは、手間と時間を要する作業である。本稿では,ワンクリックでタスクを遂行するために最適化された対話型トラクタ選択法を提案する。提案手法は,従来のパン光学セグメント法により達成された精度とリコールを超越し,クリックを含むセグメントを選択する。また,ユーザのクリック位置に似た,より注意をそそる領域を特定するために,トランスフォーマティブベースのモジュールを使用する方法も紹介する。実験により,未知の注意をそらす物体を対話的およびグループ的に,効果的かつ正確に分割できることを実証した。画像のクリーニングとリタッチ処理を大幅に単純化することにより,レアオブジェクトのセグメンテーションとグループ選択をワンクリックで探索するためのインスピレーションを提供する。

In photo editing, it is common practice to remove visual distractions to improve the overall image quality and highlight the primary subject. However, manually selecting and removing these small and dense distracting regions can be a laborious and time-consuming task. In this paper, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click. Our method surpasses the precision and recall achieved by the traditional method of running panoptic segmentation and then selecting the segments containing the clicks. We also showcase how a transformer-based module can be used to identify more distracting regions similar to the user's click position. Our experiments demonstrate that the model can effectively and accurately segment unknown distracting objects interactively and in groups. By significantly simplifying the photo cleaning and retouching process, our proposed model provides inspiration for exploring rare object segmentation and group selection with a single click.

翻訳日:2023-05-30 17:45:54 公開日:2023-05-28

# 政策再利用における筋覚行動の価値について

On the Value of Myopic Behavior in Policy Reuse ( http://arxiv.org/abs/2305.17623v1 )

ライセンス: Link先を確認

Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li

(参考訳) 未知のシナリオで学習戦略を活用することは、人間の知性の基本である。強化学習では、他のタスクや人間の専門家から得られたポリシーを合理的に再利用することが、スクラッチから学ぶのが難しい問題に取り組む上で重要である。本研究では,Selectivemyopic bEhavior Control~(SMEC)というフレームワークを提案する。 SMECは、ハイブリッドバリュー関数アーキテクチャによる事前ポリシーの動作を評価することにより、事前ポリシーの共有可能な短期的行動とタスクポリシーの長期的挙動を適応的に集約し、協調的な決定をもたらす。操作と移動タスクのコレクションに関する実証的な結果は、SMECが既存の手法よりも優れており、SMECが関連する事前ポリシーを活用する能力を検証することを示している。

Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In reinforcement learning, rationally reusing the policies acquired from other tasks or human experts is critical for tackling problems that are difficult to learn from scratch. In this work, we present a framework called Selective Myopic bEhavior Control~(SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks. By evaluating the behaviors of prior policies via a hybrid value function architecture, SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions. Empirical results on a collection of manipulation and locomotion tasks demonstrate that SMEC outperforms existing methods, and validate the ability of SMEC to leverage related prior policies.

翻訳日:2023-05-30 17:45:35 公開日:2023-05-28

# 多様文脈における語彙検索仮説

Lexical Retrieval Hypothesis in Multimodal Context ( http://arxiv.org/abs/2305.17663v1 )

ライセンス: Link先を確認

Po-Ya Angela Wang, Pin-Er Chen, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh

(参考訳) マルチモーダルコーパスは言語科学や自然言語処理(NLP)システムにとって欠かせない言語資源となっている。本稿では,台湾初の多モーダル言語コーパス(MultiMoco)の構築に向けた取り組みについて紹介する。コーパスに基づいて語彙検索仮説(LRH)を検証し,言語定数と共起する手振りが語彙検索や他の言論機能に役立てるかどうかを検討した。台湾・マンダリンにおける8つの議会干渉に関する詳細なアノテーションを用いて, 発話定数と非言語的特徴(頭部運動, 顔運動, 手のジェスチャー, 動作機能)の共起について検討した。本研究は,手の動きが語彙検索のファシリテーターとして機能する一方で,情報強調の目的も果たすことを示唆している。本研究は,MultiMoco Corpusが深部分析やマルチモーダルコミュニケーション研究において重要な資源を提供する可能性を明らかにするものである。

Multimodal corpora have become an essential language resource for language science and grounded natural language processing (NLP) systems due to the growing need to understand and interpret human communication across various channels. In this paper, we first present our efforts in building the first Multimodal Corpus for Languages in Taiwan (MultiMoco). Based on the corpus, we conduct a case study investigating the Lexical Retrieval Hypothesis (LRH), specifically examining whether the hand gestures co-occurring with speech constants facilitate lexical retrieval or serve other discourse functions. With detailed annotations on eight parliamentary interpellations in Taiwan Mandarin, we explore the co-occurrence between speech constants and non-verbal features (i.e., head movement, face movement, hand gesture, and function of hand gesture). Our findings suggest that while hand gestures do serve as facilitators for lexical retrieval in some cases, they also serve the purpose of information emphasis. This study highlights the potential of the MultiMoco Corpus to provide an important resource for in-depth analysis and further research in multimodal communication studies.

翻訳日:2023-05-30 17:40:03 公開日:2023-05-28

# 事前学習モデルのためのプラグ・アンド・プレイ文書モジュール

Plug-and-Play Document Modules for Pre-trained Models ( http://arxiv.org/abs/2305.17660v1 )

ライセンス: Link先を確認

Chaojun Xiao, Zhengyan Zhang, Xu Han, Chi-Min Chan, Yankai Lin, Zhiyuan Liu, Xiangyang Li, Zhonghua Li, Zhao Cao, Maosong Sun

(参考訳) 大規模事前学習モデル(PTM)は、質問応答などの文書指向のNLPタスクに広く用いられている。しかし、エンコーディングとタスクの結合要件により、異なるタスクやクエリに対して同じ文書を繰り返しエンコーディングすることになり、計算効率が低下する。この目的のために、下流タスクから文書エンコーディングを分離することを目標とし、各文書をPTM(PlugD)用のプラグインであるプラグイン・アンド・プレイ文書モジュールとして表現することを提案する。下流タスクのために文書プラグインをバックボーンPTMに挿入することで、文書を1回エンコードして複数のタスクを処理することができ、タスク固有のエンコーダを用いて文書と入力クエリを同時にエンコードする従来のエンコード-タスク結合方式よりも効率的である。典型的な4つのNLPタスクの8つのデータセットに対する大規模な実験は、PlugDによって、さまざまなシナリオにまたがって、モデルがドキュメントをエンコードできることを示している。特にplugdは計算コストを節約でき、最先端のエンコーディング-タスク結合法に匹敵する性能を実現している。さらに、PlugDはタスク固有のモデルに知識を注入する効果的な後処理方法として機能し、追加のモデルトレーニングなしでモデル性能を向上させることができることを示す。

Large-scale pre-trained models (PTMs) have been widely used in document-oriented NLP tasks, such as question answering. However, the encoding-task coupling requirement results in the repeated encoding of the same documents for different tasks and queries, which is highly computationally inefficient. To this end, we target to decouple document encoding from downstream tasks, and propose to represent each document as a plug-and-play document module, i.e., a document plugin, for PTMs (PlugD). By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks, which is more efficient than conventional encoding-task coupling methods that simultaneously encode documents and input queries using task-specific encoders. Extensive experiments on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode documents once and for all across different scenarios. Especially, PlugD can save $69\%$ computational costs while achieving comparable performance to state-of-the-art encoding-task coupling methods. Additionally, we show that PlugD can serve as an effective post-processing way to inject knowledge into task-specific models, improving model performance without any additional model training.

翻訳日:2023-05-30 17:39:42 公開日:2023-05-28

# パラメトリック駆動非線形共振器を用いた動的臨界量子センシング

Dynamical critical quantum sensing with a single parametrically-driven nonlinear resonator ( http://arxiv.org/abs/2305.17656v1 )

ライセンス: Link先を確認

Ken Chen, Jia-Hao L\"u, Xin Zhu, Hao-Long Zhang, Wen Ning, Zhen-Biao Yang, and Shi-Biao Zheng

(参考訳) 量子系の臨界現象は量子センシングの強化に有用である。本稿では,ケラ非線形性とパラメトリック駆動の競合を特徴とする発振器の動的進化状態において信号が符号化されるセンシング方式の性能について検討する。量子フィッシャー情報を計算し,臨界性が有効となる拡張性を確認するシミュレーションを行う。制御パラメータの変動に対する2次数の1つの応答についてさらに詳しく述べる。数値的な結果から,その逆分散は臨界点における変動挙動を示すことが明らかとなった。

Critical phenomena of quantum systems are useful for enhancement of quantum sensing. We here investigate the performance of a sensing scheme, where the signal is encoded in the dynamically-evolving state of an oscillator, featuring a competition of the Kerr nonlinearity and parametric driving. We calculate the quantum Fisher information, and perform a simulation, which confirms the criticality-enabled enhancement. We further detail the response of one of the quadratures to the variation of the control parameter. The numerical results reveal that its inverted variance exhibits a diverging behavior at the critical point.

翻訳日:2023-05-30 17:39:18 公開日:2023-05-28

# MixDehazeNet : 画像デハジングネットワークのための混合構造ブロック

MixDehazeNet : Mix Structure Block For Image Dehazing Network ( http://arxiv.org/abs/2305.17654v1 )

ライセンス: Link先を確認

LiPing Lu, Qian Xiong, DuanFeng Chu, BingRong Xu

(参考訳) イメージデハジングは低レベル視野における典型的なタスクである。前回の研究では、大きな畳み込み核と注意機構の有効性が検証された。しかし、2つの欠点がある: 大きな畳み込みカーネルを導入すると画像のマルチスケール特性は容易に無視され、注意モジュールの標準直列接続は不均一な分布を十分に考慮しない。本稿では,上述の2つの問題を解決する,mix structure image dehazing network (mixdehazenet) という新しいフレームワークを提案する。具体的には,マルチスケール並列大規模畳み込みカーネルモジュールと拡張パラレルアテンションモジュールの2つの部分から構成されている。単一の大きなカーネルと比較して、マルチスケールの並列大規模カーネルは、デハザリングフェーズ中に部分的なテクスチャを考慮に入れることができる。また,不均一な不均一分布の除去において,注意の並列接続が良好に機能する拡張パラレルアテンションモジュールを開発した。提案手法の有効性を3つのベンチマークで検証した。例えば、これまでの最先端の手法と比較して、MixDehazeNetはSOTS屋内データセットにおいて大幅な改善(42.62dB PSNR)を達成している。コードはhttps://github.com/AmeryXiong/MixDehazeNetで公開されている。

Image dehazing is a typical task in the low-level vision field. Previous studies verified the effectiveness of the large convolutional kernel and attention mechanism in dehazing. However, there are two drawbacks: the multi-scale properties of an image are readily ignored when a large convolutional kernel is introduced, and the standard series connection of an attention module does not sufficiently consider an uneven hazy distribution. In this paper, we propose a novel framework named Mix Structure Image Dehazing Network (MixDehazeNet), which solves two issues mentioned above. Specifically, it mainly consists of two parts: the multi-scale parallel large convolution kernel module and the enhanced parallel attention module. Compared with a single large kernel, parallel large kernels with multi-scale are more capable of taking partial texture into account during the dehazing phase. In addition, an enhanced parallel attention module is developed, in which parallel connections of attention perform better at dehazing uneven hazy distribution. Extensive experiments on three benchmarks demonstrate the effectiveness of our proposed methods. For example, compared with the previous state-of-the-art methods, MixDehazeNet achieves a significant improvement (42.62dB PSNR) on the SOTS indoor dataset. The code is released in https://github.com/AmeryXiong/MixDehazeNet.

翻訳日:2023-05-30 17:39:09 公開日:2023-05-28

# 非知識集約型タスクに対するPrompt-Guided Retrieval Augmentation

Prompt-Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks ( http://arxiv.org/abs/2305.17653v1 )

ライセンス: Link先を確認

Zhicheng Guo, Sijie Cheng, Yile Wang, Peng Li, Yang Liu

(参考訳) 外部リソースからの有用な情報を活用することで,下流タスク支援に注目が集まっている。近年の研究では,知識集約型(ki)課題の探索に焦点が当てられている。しかし、nki(non-knowledge-intensive)タスクの検索は未検討のままである。 NKIタスクにおける検索強化手法の活用には,2つの課題がある。 1)多様な関連スコア機能に対する需要 2)トレーニングコストとタスクパフォーマンスのジレンマ。これらの課題に対処するため、PGRAと呼ばれるNKIタスクのための2段階のフレームワークを提案する。第1段階ではタスク非依存のレトリバーを採用し、共有静的インデックスを構築し、効率的な候補証拠を選択する。第2段階では、読者のタスク固有の関連性に応じて、最も近いエビデンスを再現するプロンプト誘導リランカを設計する。実験の結果,PGRAは他の最先端検索手法よりも優れていた。本研究は,pgraのモデル性能に及ぼす影響因子をさらに調査し,pgraの汎用性を示す。コードはhttps://github.com/thunlp-mt/pgraで入手できる。

Retrieval-augmented methods have received increasing attention to support downstream tasks by leveraging useful information from external resources. Recent studies mainly focus on exploring retrieval to solve knowledge-intensive (KI) tasks. However, the potential of retrieval for most non-knowledge-intensive (NKI) tasks remains under-explored. There are two main challenges to leveraging retrieval-augmented methods for NKI tasks: 1) the demand for diverse relevance score functions and 2) the dilemma between training cost and task performance. To address these challenges, we propose a two-stage framework for NKI tasks, named PGRA. In the first stage, we adopt a task-agnostic retriever to build a shared static index and select candidate evidence efficiently. In the second stage, we design a prompt-guided reranker to rerank the nearest evidence according to task-specific relevance for the reader. Experimental results show that PGRA outperforms other state-of-the-art retrieval-augmented methods. Our analyses further investigate the influence factors to model performance and demonstrate the generality of PGRA. Codes are available at https://github.com/THUNLP-MT/PGRA.

翻訳日:2023-05-30 17:38:47 公開日:2023-05-28

# ConaCLIP:軽量テキスト画像検索のための完全連結知識相互作用グラフの蒸留探索

ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval ( http://arxiv.org/abs/2305.17652v1 )

ライセンス: Link先を確認

Jiapeng Wang, Chengyu Wang, Xiaodan Wang, Jun Huang, Lianwen Jin

(参考訳) デュアルエンコーダアーキテクチャ(CLIPなど)を備えた大規模事前訓練されたテキストイメージモデルは通常、テキストイメージ検索を含む様々な視覚言語アプリケーションに採用されている。しかしながら、これらのモデルは、かなりのインデックス化と推論時間と計算資源の大量消費のため、エッジデバイスやリアルタイムの状況では実用的ではない。ユニモーダルモデル圧縮には知識蒸留技術が広く利用されているが,モダリティ数と教師・学生数を倍増させる方法がほとんど研究されていない。本稿では,本トピックに関する包括的実験を行い,クロスモーダルプレトレーニング蒸留のための完全連結知識相互作用グラフ(cona)手法を提案する。その結果, Flickr30K と MSCOCO のベンチマークにおいて, 軽量な設定でSOTA 性能を達成できた。本手法のe-commercial platformへの産業的応用により,ConaCLIPの有効性がさらに示された。

Large-scale pre-trained text-image models with dual-encoder architectures (such as CLIP) are typically adopted for various vision-language applications, including text-image retrieval. However,these models are still less practical on edge devices or for real-time situations, due to the substantial indexing and inference time and the large consumption of computational resources. Although knowledge distillation techniques have been widely utilized for uni-modal model compression, how to expand them to the situation when the numbers of modalities and teachers/students are doubled has been rarely studied. In this paper, we conduct comprehensive experiments on this topic and propose the fully-Connected knowledge interaction graph (Cona) technique for cross-modal pre-training distillation. Based on our findings, the resulting ConaCLIP achieves SOTA performances on the widely-used Flickr30K and MSCOCO benchmarks under the lightweight setting. An industry application of our method on an e-commercial platform further demonstrates the significant effectiveness of ConaCLIP.

翻訳日:2023-05-30 17:38:32 公開日:2023-05-28

# DPHuBERT:自己監督音声モデルの連成蒸留とプルーニング

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models ( http://arxiv.org/abs/2305.17651v1 )

ライセンス: Link先を確認

Yifan Peng, Yui Sudo, Shakeel Muhammad, Shinji Watanabe

(参考訳) 自己教師付き学習(SSL)は多くの音声処理タスクで顕著な成功を収めてきたが、大きなモデルサイズと計算コストが配置を妨げている。知識蒸留は、小さな生徒モデルを訓練し、大きな教師モデルの振る舞いを模倣する。しかしながら、学生アーキテクチャは通常、手動で設計され、トレーニング中に修正される必要がある。近年のタスク特化構造プルーニングの成功に触発されて,ジョイント蒸留とプルーニングに基づく音声sslのためのタスク非依存圧縮法であるdphubertを提案する。 SUPERBの実験では、DPHuBERTはほとんど全てのタスクにおいて純粋な蒸留法よりも優れていた。さらに、DPHuBERTはトレーニング時間が少なく、限られたトレーニングデータでうまく動作し、リソース制約のあるアプリケーションに適している。本手法は各種音声SSLモデルにも適用可能である。私たちのコードとモデルは公開されます。

Self-supervised learning (SSL) has achieved notable success in many speech processing tasks, but the large model size and heavy computational cost hinder the deployment. Knowledge distillation trains a small student model to mimic the behavior of a large teacher model. However, the student architecture usually needs to be manually designed and will remain fixed during training, which requires prior knowledge and can lead to suboptimal performance. Inspired by recent success of task-specific structured pruning, we propose DPHuBERT, a novel task-agnostic compression method for speech SSL based on joint distillation and pruning. Experiments on SUPERB show that DPHuBERT outperforms pure distillation methods in almost all tasks. Moreover, DPHuBERT requires little training time and performs well with limited training data, making it suitable for resource-constrained applications. Our method can also be applied to various speech SSL models. Our code and models will be publicly available.

翻訳日:2023-05-30 17:38:14 公開日:2023-05-28

# リカレントスパイクニューラルネットワークのための接続性の進化

Evolving Connectivity for Recurrent Spiking Neural Networks ( http://arxiv.org/abs/2305.17650v1 )

ライセンス: Link先を確認

Guan Wang, Yuhao Sun, Sijie Cheng, Sen Song

(参考訳) リカレントスパイキングニューラルネットワーク(RSNN)は、生物学的神経系からインスピレーションを得て、複雑な力学をモデル化する可能性を示すため、人工知能の進歩に大きな可能性を秘めている。しかし、RSNNの広範に使われているサロゲート勾配に基づくトレーニング手法は本質的に不正確であり、ニューロモルフィックハードウェアには不向きである。これらの制約に対処するために、RSNNをトレーニングするための推論のみの手法である進化的接続性(EC)フレームワークを提案する。 ECフレームワークは、パラメータ化された接続確率分布の探索として重み付けを再構成し、これらの分布を最適化するためにNatural Evolution Strategies (NES) を用いる。我々のECフレームワークは、グラデーションの必要性を回避し、スパースブール接続や高いスケーラビリティなど、ハードウェアフレンドリな特徴を特徴としています。そこでは、深層ニューラルネットワークと同等の性能を達成し、複雑な17-DoFヒューマノイドタスクを解くことで、勾配学習されたRSNNよりも優れた性能を発揮する。さらに、ECフレームワークは直接進化するパラメータに比べて効率が2倍から3倍に向上することを示した。 ECフレームワークは、パフォーマンスとハードウェアに優しい代替手段を提供することにより、RSNNのさらなるエネルギー効率の高い応用の基礎を築き、ニューロモルフィックデバイスの開発を進める。

Recurrent spiking neural networks (RSNNs) hold great potential for advancing artificial general intelligence, as they draw inspiration from the biological nervous system and show promise in modeling complex dynamics. However, the widely-used surrogate gradient-based training methods for RSNNs are inherently inaccurate and unfriendly to neuromorphic hardware. To address these limitations, we propose the evolving connectivity (EC) framework, an inference-only method for training RSNNs. The EC framework reformulates weight-tuning as a search into parameterized connection probability distributions, and employs Natural Evolution Strategies (NES) for optimizing these distributions. Our EC framework circumvents the need for gradients and features hardware-friendly characteristics, including sparse boolean connections and high scalability. We evaluate EC on a series of standard robotic locomotion tasks, where it achieves comparable performance with deep neural networks and outperforms gradient-trained RSNNs, even solving the complex 17-DoF humanoid task. Additionally, the EC framework demonstrates a two to three fold speedup in efficiency compared to directly evolving parameters. By providing a performant and hardware-friendly alternative, the EC framework lays the groundwork for further energy-efficient applications of RSNNs and advances the development of neuromorphic devices.

翻訳日:2023-05-30 17:37:58 公開日:2023-05-28

# Z-GMOT:ゼロショットジェネリック多目的追跡

Z-GMOT: Zero-shot Generic Multiple Object Tracking ( http://arxiv.org/abs/2305.17648v1 )

ライセンス: Link先を確認

Kim Hoang Tran, Tien-Phat Nguyen, Anh Duy Le Dinh, Pha Nguyen, Thinh Phan, Khoa Luu, Donald Adjeroh, Ngan Hoang Le

(参考訳) 近年の進歩にもかかわらず、Multi-Object Tracking(MOT)アプローチは、大規模ラベル付きデータセットの高価なアノテーションを必要とするトラッキングターゲットの事前知識への依存など、いくつかの制限を被っている。結果として、既存のMOTメソッドは、定義済みの小さなカテゴリに限られており、実世界の目に見えないオブジェクトと戦っている。これらの問題に対処するため、GMOT(Generic Multiple Object Tracking)が提案されている。しかしながら、既存のGMOTアプローチはすべてワンショットのパラダイムに従っており、主に初期バウンディングボックスに依存しており、視点、照明、閉塞、スケールなどの変種を扱うのに苦労している。本稿では,既存のMOT法とGMOT法の限界に対処する新しい手法を提案する。具体的には,ゼロショットGMOT (Z-GMOT) アルゴリズムを提案する。そこで本研究では, 偽陽性を最小化しつつ, 未確認物体を検出可能な言語画像事前学習(GLIP)の改良版iGLIPを提案する。 GMOT-40データセット、AnimalTrackテストセット、DanceTrackテストセットに基づいて、Z-GMOTを徹底的に評価する。これらの評価結果は,既存手法よりも大幅に改善された。例えば、GMOT-40データセットでは、Z-GMOTは1ショットのGMOTとOC-SORTを27.79ポイントのHOTAと44.37ポイントのMOTAで上回っている。 AnimalTrackデータセットでは、DeepSORTで完全に監督されたメソッドを12.55ポイントのHOTAと8.97ポイントのMOTAで上回っている。さらなる研究を促進するため、本論文の受理後、コードとモデルを公開します。

Despite the significant progress made in recent years, Multi-Object Tracking (MOT) approaches still suffer from several limitations, including their reliance on prior knowledge of tracking targets, which necessitates the costly annotation of large labeled datasets. As a result, existing MOT methods are limited to a small set of predefined categories, and they struggle with unseen objects in the real world. To address these issues, Generic Multiple Object Tracking (GMOT) has been proposed, which requires less prior information about the targets. However, all existing GMOT approaches follow a one-shot paradigm, relying mainly on the initial bounding box and thus struggling to handle variants e.g., viewpoint, lighting, occlusion, scale, and etc. In this paper, we introduce a novel approach to address the limitations of existing MOT and GMOT methods. Specifically, we propose a zero-shot GMOT (Z-GMOT) algorithm that can track never-seen object categories with zero training examples, without the need for predefined categories or an initial bounding box. To achieve this, we propose iGLIP, an improved version of Grounded language-image pretraining (GLIP), which can detect unseen objects while minimizing false positives. We evaluate our Z-GMOT thoroughly on the GMOT-40 dataset, AnimalTrack testset, DanceTrack testset. The results of these evaluations demonstrate a significant improvement over existing methods. For instance, on the GMOT-40 dataset, the Z-GMOT outperforms one-shot GMOT with OC-SORT by 27.79 points HOTA and 44.37 points MOTA. On the AnimalTrack dataset, it surpasses fully-supervised methods with DeepSORT by 12.55 points HOTA and 8.97 points MOTA. To facilitate further research, we will make our code and models publicly available upon acceptance of this paper.

翻訳日:2023-05-30 17:37:36 公開日:2023-05-28

# 1次元ボース気体中の分散量子衝撃波における「真空点」と灰色のソリトンの運命

The fate of the "vacuum point'' and of grey solitons in dispersive quantum shock waves in a one-dimensional Bose gas ( http://arxiv.org/abs/2305.17647v1 )

ライセンス: Link先を確認

S. A. Simmons, J. C. Pillay, and K. V. Kheruntsyan

(参考訳) 平均場近似を超えた1次元ボース気体中の分散量子衝撃波の研究を継続する。 Simmonsらによる最近の作品。 [Phys. Let. 125, 180401 (2020)], この系で発生した発振衝撃波列は, 量子力学的自己干渉の結果, 物質-波位相コヒーレンスの損失によって干渉コントラストが減少すると考えられる。このようなコヒーレンスの喪失は、平均体Gross-Pitaevskiiの記述と比較して、量子的または熱的ゆらぎと強く相互作用する状態によって起こる。本研究では、この文脈における分散量子衝撃波の解析を他の動的シナリオにまで拡張する。より具体的には、研究されたシナリオには、平均場記述でいわゆる「真空点」へと導くのに十分な密度のバンプの進化と、同じ平均場近似で灰色のソリトン列を降ろすことで知られる初期密度ディップの進化が含まれる。量子的および熱的ゆらぎの存在,および中間的および強い相互作用におけるこれらの非線形波動構造の運命について検討し,真空点と灰色のソリトンの両方が平均場的アプローチを超えないことを示す。一方, 真空点は, 局所ジムプルポテンシャルの基底状態から進化する理想的(非相互作用的)ボースガス中で発生する。自然界における分散衝撃波のユビキタス性から,非線形波動現象を表示できる他の物理系に対して有用な知見と展望を提供する必要がある。

We continue the study of dispersive quantum shock waves in a one-dimensional Bose gas beyond the mean-field approximation. In a recent work by Simmons et al. [Phys. Rev. Let. 125, 180401 (2020)], the oscillatory shock wave train developing in this system from an initial localized density bump on a uniform background was interpreted as a result of quantum mechanical self-interference, wherein the interference contrast would diminish with the loss of matter-wave phase coherence. Such loss of coherence, relative to the mean-field Gross-Pitaevskii description, occurs due to either quantum or thermal fluctuations, as well as in the strongly interacting regime. In this work, we extend the analysis of dispersive quantum shock waves in this context to other dynamical scenarios. More specifically, the scenarios studied include evolution of a sufficiently high density bump, known to lead to the so-called ``vacuum point'' in the mean-field description, and evolution of an initial density dip, known to shed a train of grey solitons in the same mean-field approximation. We study the fate of these nonlinear wave structures in the presence of quantum and thermal fluctuations, as well as at intermediate and strong interactions, and show that both the vacuum point and grey solitons cease to manifest themselves beyond the mean-field approach. On the other hand, we find that a vacuum point can occur in an ideal (noninteracting) Bose gas evolving from a ground state of a localized dimple potential. Due to the ubiquity of dispersive shock waves in nature, our results should provide useful insights and perspectives for a variety of other physical systems known to display nonlinear wave phenomena.

翻訳日:2023-05-30 17:37:06 公開日:2023-05-28

# 二重不純物アンダーソン模型における不純物スペクトル関数と電流ノイズスペクトルの近藤法

Kondo regime of the impurity spectral function and the current noise spectrum in the double impurity Anderson model ( http://arxiv.org/abs/2305.17686v1 )

ライセンス: Link先を確認

Zi-Hao Chen and YiJing Yan

(参考訳) ディシパトン運動方程式(DEOM)法は、量子不純物系をシミュレートする最も一般的な方法の一つである。本稿では、二重量子ドット(dqds)の不純物系の近藤問題を扱うために、doem理論を用いる。我々は,不純物スペクトル関数と全雑音スペクトル関数に着目し,この2つの関数を用いて,このシステムの近藤効果を記述する。相互作用, フープ, および2点間の化学ポテンシャルの差がシステムの近藤効果に及ぼす影響について検討した。 2つのドット間の相互作用はシステムの近藤効果に大きな影響を与えることが判明した。

The dissipaton equations of motion (DEOM) method is one of the most popular methods for simulating quantum impurity systems. In this article, we use DOEM theory to deal with the Kondo problem of the double quantum dots (DQDs) impurity system. We focus on the impurity spectral function and the total noise spectral function, this two function will be used to describe the Kondo effect of this system. The influence of the interaction, the hooping and the difference of the chemical potential between the two dots on the Kondo effect of the system is studied. We find that the interaction between the two dots can influence the Kondo effect of the system a lot.

翻訳日:2023-05-30 17:28:37 公開日:2023-05-28

# 連続可変量子鍵分布におけるガウス的信頼ノイズの一般処理

General treatment of Gaussian trusted noise in continuous variable quantum key distribution ( http://arxiv.org/abs/2305.17684v1 )

ライセンス: Link先を確認

Shinichiro Yamano, Takaya Matsuura, Yui Kuramochi, Toshihiko Sasaki, Masato Koashi

(参考訳) 連続可変(CV)量子鍵分布(QKD)は、既存の通信技術との互換性のため、実用的な実装には有望な候補である。検出器内の電子ノイズなどの不完全性にアクセスできないと仮定する信頼されたデバイスシナリオは、キーレートを大幅に改善することが期待されているが、これまでは特定のプロトコルや特定の証明技術のために別々に行われた。本稿では, ホモダイン/ヘテロダイン測定を用いた任意のプロトコルに対して, ガウス的信頼ノイズの効果を組み込む, 単純で一般的な処理法を開発する。提案手法では、ノイズの大きいホモダイン/ヘテロダイン検出器の結果を再スケーリングすることで、量子光学でよく知られたノイズ損失等価性のおかげで、ノイズのない検出器の損失を少し増やした結果と同等にすることができる。この手法はプロトコルやセキュリティ証明とは無関係であるため、ガウス変調および離散変調プロトコル、有限サイズ規則、そしてこれまでに開発されまだ発見されていないいかなる証明技術にも適用することができる。

Continuous Variable (CV) quantum key distribution (QKD) is a promising candidate for practical implementations due to its compatibility with the existing communication technology. A trusted device scenario assuming that an adversary has no access to imperfections such as electronic noises in the detector is expected to provide significant improvement in the key rate, but such an endeavor so far was made separately for specific protocols and for specific proof techniques. Here, we develop a simple and general treatment that can incorporate the effects of Gaussian trusted noises for any protocol that uses homodyne/heterodyne measurements. In our method, a rescaling of the outcome of a noisy homodyne/heterodyne detector renders it equivalent to the outcome of a noiseless detector with a tiny additional loss, thanks to a noise-loss equivalence well-known in quantum optics. Since this method is independent of protocols and security proofs, it is applicable to Gaussian-modulation and discrete-modulation protocols, to the finite-size regime, and to any proof techniques developed so far and yet to be discovered as well.

翻訳日:2023-05-30 17:28:27 公開日:2023-05-28

# 1つのネットワーク、多くのマスク:よりパラメーター効率のよい転送学習を目指して

One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning ( http://arxiv.org/abs/2305.17682v1 )

ライセンス: Link先を確認

Guangtao Zeng, Peiyuan Zhang, Wei Lu

(参考訳) 複数のタスクのための微調整済み言語モデルは、ストレージの点で高価である傾向がある。これを軽減するためにパラメータ効率変換学習法 (PETL) が提案されているが, 幅広いタスクに適用するには, かなりの数のパラメータと記憶が必要である。さらに大きなストレージ削減を実現するために、propetlは、プロトタイプネットワーク(例えば、アダプタ、lora、プレフィックスチューニング)と呼ばれる1つのpetlモジュールを、レイヤとタスク間で効率的に共有できる新しい方法を提案する。次にバイナリマスクを学び、共有プロトタイプネットワークから異なるサブネットワークを選択し、異なるレイヤにpetlモジュールとして適用します。二分マスクはネットワークから重要な情報を決定できるが、これは前回の研究では無視されることが多い。私たちの研究は、一見小さなpetlモジュールにも過剰パラメーターが存在することを発見したpruningメソッドの一種と見なすこともできる。各種下流タスクにおいて, ProPETL の評価を行い, パラメータ記憶の約10%で他の PETL 手法よりも優れていることを示す。

Fine-tuning pre-trained language models for multiple tasks tends to be expensive in terms of storage. To mitigate this, parameter-efficient transfer learning (PETL) methods have been proposed to address this issue, but they still require a significant number of parameters and storage when being applied to broader ranges of tasks. To achieve even greater storage reduction, we propose PROPETL, a novel method that enables efficient sharing of a single PETL module which we call prototype network (e.g., adapter, LoRA, and prefix-tuning) across layers and tasks. We then learn binary masks to select different sub-networks from the shared prototype network and apply them as PETL modules into different layers. We find that the binary masks can determine crucial information from the network, which is often ignored in previous studies. Our work can also be seen as a type of pruning method, where we find that overparameterization also exists in the seemingly small PETL modules. We evaluate PROPETL on various downstream tasks and show that it can outperform other PETL methods with approximately 10% of the parameter storage required by the latter.

翻訳日:2023-05-30 17:28:07 公開日:2023-05-28

# コンテンツモデレーションのためのGPT-3生成説明の評価

Evaluating GPT-3 Generated Explanations for Hateful Content Moderation ( http://arxiv.org/abs/2305.17680v1 )

ライセンス: Link先を確認

Han Wang, Ming Shan Hee, Md Rabiul Awal, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

(参考訳) 最近の研究は、大規模言語モデル(LLM)を使用して、微調整やプロンプトを通じてヘイトスピーチの説明を生成することに焦点を当てている。この領域への関心が高まりつつあるにもかかわらず、これらの発生した説明の有効性と潜在的な限界は未だ理解されていない。 LLMによって生成されたこれらの説明は、ユーザとコンテンツモデレーターの両方がフラグ付きコンテンツの性質について誤った判断を下す可能性がある。例えば、LCMが生成した説明は、コンテンツモデレーターが良質なコンテンツが憎悪であることを不正確に納得させるかもしれない。これを踏まえて,ヘイトスピーチの説明を解析するための枠組みを提案し,その説明を評価するための広範囲な調査を行った。具体的には、GPT-3にヘイトフルコンテンツと非ヘイトフルコンテンツの両方を説明するよう促し、2,400人の独特な回答者を対象に調査を行った。その結果,(1) 人間の評価者は, GPT による説明を, 言語流布度, 情報伝達性, 説得性, 論理音性の観点から高い品質と評価し, それらの説明の説得性は, 実施する促進戦略によって異なること, (3) 内容の嫌悪性について誤った判断を下す可能性が示唆された。本研究は,コンテンツモデレーションにllm生成説明を適用する際に注意が必要であることを強調する。コードと結果はhttps://github.com/Social-AI-Studio/GPT3-HateEvalで公開されている。

Recent research has focused on using large language models (LLMs) to generate explanations for hate speech through fine-tuning or prompting. Despite the growing interest in this area, these generated explanations' effectiveness and potential limitations remain poorly understood. A key concern is that these explanations, generated by LLMs, may lead to erroneous judgments about the nature of flagged content by both users and content moderators. For instance, an LLM-generated explanation might inaccurately convince a content moderator that a benign piece of content is hateful. In light of this, we propose an analytical framework for examining hate speech explanations and conducted an extensive survey on evaluating such explanations. Specifically, we prompted GPT-3 to generate explanations for both hateful and non-hateful content, and a survey was conducted with 2,400 unique respondents to evaluate the generated explanations. Our findings reveal that (1) human evaluators rated the GPT-generated explanations as high quality in terms of linguistic fluency, informativeness, persuasiveness, and logical soundness, (2) the persuasive nature of these explanations, however, varied depending on the prompting strategy employed, and (3) this persuasiveness may result in incorrect judgments about the hatefulness of the content. Our study underscores the need for caution in applying LLM-generated explanations for content moderation. Code and results are available at https://github.com/Social-AI-Studio/GPT3-HateEval.

翻訳日:2023-05-30 17:27:48 公開日:2023-05-28

# RuSentNE-2023:ロシア語ニューステキストにおけるエンティティ指向感分析の評価

RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts ( http://arxiv.org/abs/2305.17679v1 )

ライセンス: Link先を確認

Anton Golubev, Nicolay Rusnachenko, Natalia Loukachevitch

(参考訳) 本稿では,ロシアのニューステキストにおける感情分析を目的としたRuSentNE-2023の評価について述べる。タスクは、一文で名前付きエンティティに対する感情を予測することです。 RuSentNE-2023の評価データセットは、リッチな感情関連アノテーションを持つロシアのニュースコーパスRuSentNEに基づいている。コーパスには、これらのエンティティに対する名前付きエンティティと感情、関連する効果と感情状態が注釈されている。評価はcodalab competition frameworkを用いて行われた。主な評価尺度は, 正および負のクラスのマクロ平均値であった。その結果,Macro F-measure (Positive+Negative Class) は66%であった。また,テストセットでChatGPTを試験したところ,ChatGPTが提示したゼロショットの回答がF尺度の60%に達し,評価の4位に相当することがわかった。 ChatGPTは結論の詳細な説明も提供している。これはゼロショットアプリケーションにとって非常に高いものと考えられる。

The paper describes the RuSentNE-2023 evaluation devoted to targeted sentiment analysis in Russian news texts. The task is to predict sentiment towards a named entity in a single sentence. The dataset for RuSentNE-2023 evaluation is based on the Russian news corpus RuSentNE having rich sentiment-related annotation. The corpus is annotated with named entities and sentiments towards these entities, along with related effects and emotional states. The evaluation was organized using the CodaLab competition framework. The main evaluation measure was macro-averaged measure of positive and negative classes. The best results achieved were of 66% Macro F-measure (Positive+Negative classes). We also tested ChatGPT on the test set from our evaluation and found that the zero-shot answers provided by ChatGPT reached 60% of the F-measure, which corresponds to 4th place in the evaluation. ChatGPT also provided detailed explanations of its conclusion. This can be considered as quite high for zero-shot application.

翻訳日:2023-05-30 17:27:25 公開日:2023-05-28

# マルチモーダルハテフルミームの下位意味をデコードする

Decoding the Underlying Meaning of Multimodal Hateful Memes ( http://arxiv.org/abs/2305.17678v1 )

ライセンス: Link先を確認

Ming Shan Hee, Wen-Haw Chong and Roy Ka-Wei Lee

(参考訳) 近年、ヘイトフルミーム分類タスクに有望な性能をもたらすモデルが提案されている。それにもかかわらず、これらのモデルは基礎となる意味を解明し、分類出力をサポートする解釈可能な説明を生成しない。説明可能な憎悪のミームメソッドが欠如している主な理由は、ベンチマークやトレーニングのための根拠となる真実の説明を含む憎悪のミームデータセットがないことである。直感的には、そのような説明を持つことで、コンテンツモデレーターがフラグのある憎しみのあるミームを解釈し、取り除くことを教育し、支援することができる。本稿では,憎悪の背景にある文脈的理由にアノテートされた,新しいマルチモーダルな憎悪のミームデータセットであるdataset (hatred)を導入することで,この研究のギャップを解決する。また、ヘイトフルミームを説明するための基礎となる理由を自動的に生成し、この課題に基づいて最先端の訓練済み言語モデルのベースライン性能を確立することを目的とした、新しい条件生成タスクも定義する。我々はさらに、新しい条件生成タスクの課題を分析し、目に見える領域や見えない領域におけるミームを説明することで、HatReDの有用性を実証する。データセットとベンチマークモデルはここで利用可能である。

Recent studies have proposed models that yielded promising performance for the hateful meme classification task. Nevertheless, these proposed models do not generate interpretable explanations that uncover the underlying meaning and support the classification output. A major reason for the lack of explainable hateful meme methods is the absence of a hateful meme dataset that contains ground truth explanations for benchmarking or training. Intuitively, having such explanations can educate and assist content moderators in interpreting and removing flagged hateful memes. This paper address this research gap by introducing Hateful meme with Reasons Dataset (HatReD), which is a new multimodal hateful meme dataset annotated with the underlying hateful contextual reasons. We also define a new conditional generation task that aims to automatically generate underlying reasons to explain hateful memes and establish the baseline performance of state-of-the-art pre-trained language models on this task. We further demonstrate the usefulness of HatReD by analyzing the challenges of the new conditional generation task in explaining memes in seen and unseen domains. The dataset and benchmark models are made available here: https://github.com/Social-AI-Studio/HatRed

翻訳日:2023-05-30 17:27:11 公開日:2023-05-28

# OSPC:オンライン連続測光校正

OSPC: Online Sequential Photometric Calibration ( http://arxiv.org/abs/2305.17673v1 )

ライセンス: Link先を確認

Jawad Haidar, Douaa Khalil, Daniel Asmar

(参考訳) 測光キャリブレーションは多くのコンピュータビジョンアプリケーションに必須である。主な利点の1つは、特に標準のKLTアルゴリズムのようなトラッキングの直接的な方法に依存する場合、Visual SLAMの性能を向上させることである。もうひとつの利点は、測定された強度からセンサーの照射値を取得することであり、シェーディングの形状のような視覚アルゴリズムの事前処理ステップである。現在の測光キャリブレーションシステムは、共同最適化の問題に頼り、推定値の曖昧さに遭遇する。本稿では, 逐次推定手法を用いて, 測光パラメータを求める新しい手法を提案する。提案手法は,すべてのパラメータを高精度に推定でき,さらに定式化は線形かつ凸であり,その解を高速かつオンラインアプリケーションに適したものにしている。提案手法を検証し,その利点を実証するビジュアルオドメトリーシステムの実験を行った。

Photometric calibration is essential to many computer vision applications. One of its key benefits is enhancing the performance of Visual SLAM, especially when it depends on a direct method for tracking, such as the standard KLT algorithm. Another advantage could be in retrieving the sensor irradiance values from measured intensities, as a pre-processing step for some vision algorithms, such as shape-from-shading. Current photometric calibration systems rely on a joint optimization problem and encounter an ambiguity in the estimates, which can only be resolved using ground truth information. We propose a novel method that solves for photometric parameters using a sequential estimation approach. Our proposed method achieves high accuracy in estimating all parameters; furthermore, the formulations are linear and convex, which makes the solution fast and suitable for online applications. Experiments on a Visual Odometry system validate the proposed method and demonstrate its advantages.

翻訳日:2023-05-30 17:26:49 公開日:2023-05-28

# パラメータ効率向上のための効果的な正規化器としての確率ブリッジ

Stochastic Bridges as Effective Regularizers for Parameter-Efficient Tuning ( http://arxiv.org/abs/2305.17670v1 )

ライセンス: Link先を確認

Weize Chen, Xu Han, Yankai Lin, Zhiyuan Liu, Maosong Sun, Jie Zhou

(参考訳) パラメータ効率調整法(PET)は,大規模事前学習言語モデル(PLM)のチューニングにおいて有望な結果を得た。凍結したPLMと調整可能なパラメータをそれぞれシステムと制御として形式化することにより、PETは最適制御に理論的に基礎を置き、最適制御文献における端末コストとランニングコストの最適化とみなすことができる。この理論的根拠のエレガントさにもかかわらず、実際には既存のPETはランニングコストを無視してターミナルコストのみを最適化し、中間状態に依存するランニングコストに関係なく、出力状態の損失関数の最適化に重点を置いている。中間状態を直接モデル化してランニングコスト関数を設計するのは簡単ではないため,中間状態の正規化に潜時確率的ブリッジを用い,正規化をPETのランニングコストとして用いることを提案する。中間状態の正則化(ランニングコスト)として確率的ブリッジを用いた正則化PETを提案する最初の試みとして、この正則化の有効性と汎用性を示す。潜在能力と能力を考えると、より高度な正則化器はPET用に設計でき、将来より優れた性能が達成できると考えています。コードは \url{https://github.com/thunlp/stochastic-bridge-pet/tree/main} でリリースされる。

Parameter-efficient tuning methods (PETs) have achieved promising results in tuning large pre-trained language models (PLMs). By formalizing frozen PLMs and additional tunable parameters as systems and controls respectively, PETs can be theoretically grounded to optimal control and further viewed as optimizing the terminal cost and running cost in the optimal control literature. Despite the elegance of this theoretical grounding, in practice, existing PETs often ignore the running cost and only optimize the terminal cost, i.e., focus on optimizing the loss function of the output state, regardless of the running cost that depends on the intermediate states. Since it is non-trivial to directly model the intermediate states and design a running cost function, we propose to use latent stochastic bridges to regularize the intermediate states and use the regularization as the running cost of PETs. As the first work to propose regularized PETs that use stochastic bridges as the regularizers (running costs) for the intermediate states, we show the effectiveness and generality of this regularization across different tasks, PLMs and PETs. In view of the great potential and capacity, we believe more sophisticated regularizers can be designed for PETs and better performance can be achieved in the future. The code is released at \url{https://github.com/thunlp/stochastic-bridge-pet/tree/main}.

翻訳日:2023-05-30 17:26:30 公開日:2023-05-28

# データを簡潔に選択する:セマンティックカウンターファクトのフレームワーク

Choose your Data Wisely: A Framework for Semantic Counterfactuals ( http://arxiv.org/abs/2305.17667v1 )

ライセンス: Link先を確認

Edmund Dervakos, Konstantinos Thomas, Giorgos Filandrianos, Giorgos Stamou

(参考訳) 反事実的な説明は最も直感的な説明の1つだと論じられている。通常は、与えられたデータサンプルに対する最小限の編集セットとして定義され、適用されると、そのサンプル上のモデルの出力が変更される。しかし、最小限の編集セットは、例えば、逆の例(元のデータサンプルからエンドユーザへの区別がつかない)を構成することができるため、エンドユーザにとって必ずしも明確かつ理解可能なものではない。代わりに、反事実の文脈における最小性の概念は、特徴空間ではなく、データサンプルのセマンティクスを参照すべきである、という最近の考え方がある。本研究は,これらのアイデアに基づいて,知識グラフの観点で対実的な説明を提供するフレームワークを提案する。このような説明(基礎知識に関するいくつかの仮定)を計算し,その枠組みをユーザスタディで定量的に評価するアルゴリズムを提案する。

Counterfactual explanations have been argued to be one of the most intuitive forms of explanation. They are typically defined as a minimal set of edits on a given data sample that, when applied, changes the output of a model on that sample. However, a minimal set of edits is not always clear and understandable to an end-user, as it could, for instance, constitute an adversarial example (which is indistinguishable from the original data sample to an end-user). Instead, there are recent ideas that the notion of minimality in the context of counterfactuals should refer to the semantics of the data sample, and not to the feature space. In this work, we build on these ideas, and propose a framework that provides counterfactual explanations in terms of knowledge graphs. We provide an algorithm for computing such explanations (given some assumptions about the underlying knowledge), and quantitatively evaluate the framework with a user study.

翻訳日:2023-05-30 17:26:04 公開日:2023-05-28

# 平均運動量による確率勾配降下の加速:有限サンプルレートと漸近正規性

Acceleration of stochastic gradient descent with momentum by averaging: finite-sample rates and asymptotic normality ( http://arxiv.org/abs/2305.17665v1 )

ライセンス: Link先を確認

Kejie Tang, Weidong Liu and Yichen Zhang

(参考訳) 運動量による確率勾配降下(SGDM)は多くの機械学習や統計応用で広く用いられている。従来のSGDに対するSGDMの実証的な利点にもかかわらず、最適化過程における異なる学習率に対する運動量の役割の理論的理解は依然として広く行われている。我々は,SGDMの有限サンプル収束速度を強い凸条件下で解析し,バッチサイズが大きい場合,ミニバッチSGDMは最小バッチSGDよりも高速に最適値の近傍に収束することを示す。さらに,SGDM推定器のPolyak-averagingバージョンを分析し,その漸近正規性を確立し,その漸近等価性を平均SGDに正当化する。

Stochastic gradient descent with momentum (SGDM) has been widely used in many machine learning and statistical applications. Despite the observed empirical benefits of SGDM over traditional SGD, the theoretical understanding of the role of momentum for different learning rates in the optimization process remains widely open. We analyze the finite-sample convergence rate of SGDM under the strongly convex settings and show that, with a large batch size, the mini-batch SGDM converges faster than mini-batch SGD to a neighborhood of the optimal value. Furthermore, we analyze the Polyak-averaging version of the SGDM estimator, establish its asymptotic normality, and justify its asymptotic equivalence to the averaged SGD.

翻訳日:2023-05-30 17:25:49 公開日:2023-05-28

# aiによる自動運転配達ロボットによるラストマイル配送の実現に向けて

Towards Autonomous and Safe Last-mile Deliveries with AI-augmented Self-driving Delivery Robots ( http://arxiv.org/abs/2305.17705v1 )

ライセンス: Link先を確認

Eyad Shaklab, Areg Karapetyan, Arjun Sharma, Murad Mebrahtu, Mustofa Basri, Mohamed Nagy, Majid Khonji, and Jorge Dias

(参考訳) 顧客満足度に対する重要な影響に加えて、ラストマイル配送(LMD)は出荷プロセスの最も時間とコストのかかる段階として有名である。環境問題と最近のeコマースの売上急増が相まって、ラストマイル物流の自動化と電化への関心が再び高まっている。既存のロボット配達業者が直面するハードルに対処するため,本稿では,ai支援自律配送ロボットに基づく小規模都市コミュニティを対象とした,顧客中心かつ安全志向のlmdシステムについて紹介する。提案フレームワークは,実世界の運用上の不確実性,クライアントの好む時間スケジュール,歩行者の安全を考慮しつつ,ロジスティックなプロセスのエンドツーエンドの自動化と最適化を可能にする。この目的のために、統合最適化コンポーネントは、タイムウインドウを伴う累積容量型車両ルーティング問題のロバストな変種としてモデル化され、経路は、配送の遅延を最小化するために不確定な走行時間の下で構築される(すなわち、顧客の全体的な待ち時間であり、満足度に悪影響を及ぼす)。ロボットクーリエを1台設置した大学キャンパスにおける実地試験を通じて,提案システムの有用性を実証する。実装の側面と、配置から得られた知見と実践的な洞察を詳細に論じる。最後に,ロボット車両数と顧客数に関して,開発した数学的定式化のスケーラビリティを検討するために,数値シミュレーションによる貢献をまとめる。

In addition to its crucial impact on customer satisfaction, last-mile delivery (LMD) is notorious for being the most time-consuming and costly stage of the shipping process. Pressing environmental concerns combined with the recent surge of e-commerce sales have sparked renewed interest in automation and electrification of last-mile logistics. To address the hurdles faced by existing robotic couriers, this paper introduces a customer-centric and safety-conscious LMD system for small urban communities based on AI-assisted autonomous delivery robots. The presented framework enables end-to-end automation and optimization of the logistic process while catering for real-world imposed operational uncertainties, clients' preferred time schedules, and safety of pedestrians. To this end, the integrated optimization component is modeled as a robust variant of the Cumulative Capacitated Vehicle Routing Problem with Time Windows, where routes are constructed under uncertain travel times with an objective to minimize the total latency of deliveries (i.e., the overall waiting time of customers, which can negatively affect their satisfaction). We demonstrate the proposed LMD system's utility through real-world trials in a university campus with a single robotic courier. Implementation aspects as well as the findings and practical insights gained from the deployment are discussed in detail. Lastly, we round up the contributions with numerical simulations to investigate the scalability of the developed mathematical formulation with respect to the number of robotic vehicles and customers.

翻訳日:2023-05-30 17:18:33 公開日:2023-05-28

# KoSBI: 大規模言語モデルアプリケーションに対する社会的バイアスリスクの軽減のためのデータセット

KoSBI: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application ( http://arxiv.org/abs/2305.17701v1 )

ライセンス: Link先を確認

Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Gunhee Kim and Jung-Woo Ha

(参考訳) 大規模言語モデル(llm)は、自然テキスト生成能力だけでなく、実世界データから異なる人口集団に対する社会バイアスも学習する。 LLMベースのアプリケーションをデプロイする場合、これは重大なリスクとなる。既存の研究や資源は、言語と文化の違いにより、韓国では容易には適用できない。この制限は、LLMの安全かつ効果的なデプロイを保証するために、局所的な社会的バイアスデータセットを必要とする。この目的のために、韓国の72の人口集団を15のカテゴリーでカバーする34k対の文脈と文からなる新しい社会的バイアスデータセットKO SB Iを提案する。フィルタリングに基づくモデレーションにより、HyperCLOVA (30B, 82B) と GPT-3 では、生成されたコンテンツの社会的バイアスを平均16.47%減少させることができる。

Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data. This poses a critical risk when deploying LLM-based applications. Existing research and resources are not readily applicable in South Korea due to the differences in language and culture, both of which significantly affect the biases and targeted demographic groups. This limitation requires localized social bias datasets to ensure the safe and effective deployment of LLMs. To this end, we present KO SB I, a new social bias dataset of 34k pairs of contexts and sentences in Korean covering 72 demographic groups in 15 categories. We find that through filtering-based moderation, social biases in generated content can be reduced by 16.47%p on average for HyperCLOVA (30B and 82B), and GPT-3.

翻訳日:2023-05-30 17:18:07 公開日:2023-05-28

# 一般化意図発見のための擬似ラベル曖昧化と表現学習の分離

Decoupling Pseudo Label Disambiguation and Representation Learning for Generalized Intent Discovery ( http://arxiv.org/abs/2305.17699v1 )

ライセンス: Link先を確認

Yutao Mou, Xiaoshuai Song, Keqing He, Chen Zeng, Pei Wang, Jingang Wang, Yunsen Xian and Weiran Xu

(参考訳) 一般化されたインテント発見は、クローズドセットのインテント分類器を、インドメインやドメイン外インテントを含むオープンワールドインテントセットに拡張することを目的としている。主な課題は、擬似ラベルの曖昧さと表現学習にある。従来の手法では、擬似ラベルの曖昧さと表現学習の結合、すなわち、擬似ラベルの信頼性は表現学習に依存しており、表現学習は順番に擬似ラベルによって制限される。本稿では、擬似ラベルの曖昧さと表現学習を分離するための分離型プロトタイプ学習フレームワーク(DPL)を提案する。具体的には,まずpcl(prototypepical contrastive representation learning)を導入し,識別表現を得る。そしてプロトタイプベースのラベル曖昧化法(pld)を用いて擬似ラベルを得る。理論的にはPCLとPLDは協調的に機能し、擬似ラベル曖昧化を促進する。 3つのベンチマークデータセットの実験と分析により,本手法の有効性が示された。

Generalized intent discovery aims to extend a closed-set in-domain intent classifier to an open-world intent set including in-domain and out-of-domain intents. The key challenges lie in pseudo label disambiguation and representation learning. Previous methods suffer from a coupling of pseudo label disambiguation and representation learning, that is, the reliability of pseudo labels relies on representation learning, and representation learning is restricted by pseudo labels in turn. In this paper, we propose a decoupled prototype learning framework (DPL) to decouple pseudo label disambiguation and representation learning. Specifically, we firstly introduce prototypical contrastive representation learning (PCL) to get discriminative representations. And then we adopt a prototype-based label disambiguation method (PLD) to obtain pseudo labels. We theoretically prove that PCL and PLD work in a collaborative fashion and facilitate pseudo label disambiguation. Experiments and analysis on three benchmark datasets show the effectiveness of our method.

翻訳日:2023-05-30 17:17:55 公開日:2023-05-28

# 動的グラフ畳み込みデコーダを用いたニューラルマシン翻訳

Neural Machine Translation with Dynamic Graph Convolutional Decoder ( http://arxiv.org/abs/2305.17698v1 )

ライセンス: Link先を確認

Lei Li, Kai Fan, Lingyu Yang, Hongjia Li, Chun Yuan

(参考訳) 既存の知恵は、ニューラルマシン翻訳モデルを改善するための構文知識の重要性を示している。しかし、以前のほとんどの作品は、よく知られたエンコーダ-デコーダフレームワークのソース構文を活用することにのみ焦点が当てられている。対照的に,本研究では,対象翻訳と対応する構文グラフを共同でモデル化し,生成する(グラフ \&シーケンス)構造入力から(グラフ \&シーケンス)出力へのエンド・ツー・エンドの変換アーキテクチャを提案する。本稿では,動的空間-時空間グラフ畳み込みデコーダ(dyn-stgcd)を提案し,ソース特徴表現とその構文グラフを自動生成し,対象の構文グラフとトークンを同時に生成する。我々は5つの広く認知されている翻訳ベンチマークで広範な実験を行い、提案手法がベースラインや他の構文認識の変種よりも一貫した改善を達成できることを確認した。

Existing wisdom demonstrates the significance of syntactic knowledge for the improvement of neural machine translation models. However, most previous works merely focus on leveraging the source syntax in the well-known encoder-decoder framework. In sharp contrast, this paper proposes an end-to-end translation architecture from the (graph \& sequence) structural inputs to the (graph \& sequence) outputs, where the target translation and its corresponding syntactic graph are jointly modeled and generated. We propose a customized Dynamic Spatial-Temporal Graph Convolutional Decoder (Dyn-STGCD), which is designed for consuming source feature representations and their syntactic graph, and auto-regressively generating the target syntactic graph and tokens simultaneously. We conduct extensive experiments on five widely acknowledged translation benchmarks, verifying that our proposal achieves consistent improvements over baselines and other syntax-aware variants.

翻訳日:2023-05-30 17:17:40 公開日:2023-05-28

# SQuARe:人間と機械のコラボレーションによる感性的な質問と受け入れ可能な反応の大規模データセット

SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration ( http://arxiv.org/abs/2305.17696v1 )

ライセンス: Link先を確認

Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Meeyoung Cha, Yejin Choi, Byoung Pil Kim, Gunhee Kim, Eun-Ju Lee, Yong Lim, Alice Oh, Sangchul Park and Jung-Woo Ha

(参考訳) 攻撃的なコンテンツの生成やバイアスの強化など、大きな言語モデルがもたらす潜在的な社会的害は、急速に増加している。既存の作業では、ヘイトスピーチを明示的に行い、有害な反応を誘発するユーザなど、意図しないユーザと対話しながら、この懸念に対処することに重点を置いている。しかし、ユーザが十分に意識している場合でも、センシティブな問題に関する議論は有害になる可能性がある。このようなシナリオにおいて、より安全なモデルのために、49kのセンシティブな質問と42kの許容可能な46kの許容できない応答からなる、韓国の大規模データセットである、センシティブな質問と受け入れ可能な応答(square)データセットを提示します。データセットは、実際のニュースの見出しに基づいて、HyperCLOVAを人道的に活用して構築された。実験の結果,HyperCLOVAとGPT-3では許容応答生成が有意に改善し,このデータセットの有効性が示された。

The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset.

翻訳日:2023-05-30 17:17:22 公開日:2023-05-28

# k-NNN: 近隣住民の異常検出

k-NNN: Nearest Neighbors of Neighbors for Anomaly Detection ( http://arxiv.org/abs/2305.17695v1 )

ライセンス: Link先を確認

Ori Nizan, Ayellet Tal

(参考訳) 異常検出は、基準から著しく逸脱する画像を特定することを目的としている。通常のトレーニングサンプルを空間に埋め込んだアルゴリズムに着目し,テスト画像が与えられた場合,k-nearestトレーニングの隣人に対する特徴距離に基づいて異常を検出する。埋め込み空間における特徴の様々な構造と重要性を考慮に入れた新しい演算子を提案する。興味深いことに、これは隣人の隣人だけでなく、隣人の隣人(k-NNN)も考慮して行われる。既存のアルゴリズムに最も近いコンポーネントをk-NNN演算子に置き換えるだけで、残りのアルゴリズムに手を加えずに、各アルゴリズムの処理結果が改善されることを示す。これは、特定のタイプの花やナッツのような共通の均質なデータセットと、より多様なデータセットの両方の場合である。

Anomaly detection aims at identifying images that deviate significantly from the norm. We focus on algorithms that embed the normal training examples in space and when given a test image, detect anomalies based on the features distance to the k-nearest training neighbors. We propose a new operator that takes into account the varying structure & importance of the features in the embedding space. Interestingly, this is done by taking into account not only the nearest neighbors, but also the neighbors of these neighbors (k-NNN). We show that by simply replacing the nearest neighbor component in existing algorithms by our k-NNN operator, while leaving the rest of the algorithms untouched, each algorithms own results are improved. This is the case both for common homogeneous datasets, such as flowers or nuts of a specific type, as well as for more diverse datasets

翻訳日:2023-05-30 17:16:55 公開日:2023-05-28

# 信頼できない絡み合い支援による絡み合い防止チャネルの通信

Communication Over Entanglement-Breaking Channels With Unreliable Entanglement Assistance ( http://arxiv.org/abs/2305.17692v1 )

ライセンス: Link先を確認

Uzi Pereg

(参考訳) 絡み合い支援は通信速度を大幅に向上させることができる。しかし、その世代は容易に失敗する。最近導入された信頼できない援助のモデルはこれらの課題に対処している。以前の研究は、アンアシストとアンタグルメント支援による過剰率のトレードオフに対する漸近的な公式を提供した。エンタングルメント破壊チャネルの完全特徴を導出し,エンタングルメント支援と非アシスト符号化の組み合わせが最適であることを示す。ネットワークの観点からすると、この発見は非自明であり、重ね合わせから生じる量子的挙動を強調する。

Entanglement assistance can improve communication rates significantly. Yet, its generation can easily fail. The recently-introduced model of unreliable assistance accounts for those challenges. Previous work provided an asymptotic formula for the tradeoff between the unassisted and excess rates from entanglement assistance. We derive a full characterization for entanglement-breaking channels, and show that combining entanglement-assisted and unassisted coding is suboptimal. From a networking perspective, this finding is nontrivial and highlights a quantum behavior arising from superposition.

翻訳日:2023-05-30 17:16:28 公開日:2023-05-28

# 事前学習言語モデルのためのプラグアンドプレイ知識注入

Plug-and-Play Knowledge Injection for Pre-trained Language Models ( http://arxiv.org/abs/2305.17691v1 )

ライセンス: Link先を確認

Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye, Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou

(参考訳) 外部知識を注入することで、様々な下流NLPタスクにおける事前学習言語モデル(PLM)の性能を向上させることができる。しかし、ダウンストリームタスクに新しい知識注入メソッドや知識ベースをデプロイするには、大規模な再トレーニングが必要となる。本研究では,既存の下流モデルの再利用により,知識注入の柔軟性と効率性を向上する方法を初めて研究する。この目的のために,我々は知識ベースを,知識プラグインによって凍結した既存の下流モデルに注入する,新たなパラダイムのプラグイン・アンド・プレイナレッジインジェクションを探求する。そこで本研究では,知識埋め込みのマッピングを学習し,モデルパラメータを凍らせながらモデル入力を強調する,プラグ・アンド・プレイ・インジェクション方式のmap-tuningを提案する。 3つの知識駆動型NLPタスクの実験結果から,既存のインジェクション手法は新しいパラダイムには適さないが,マップチューニングは下流モデルの性能を効果的に向上することが示された。さらに、凍結した下流モデルは、異なるドメイン知識のマッピングネットワークを持つ異なるドメインに適用可能であることを示す。私たちのコードとモデルはhttps://github.com/THUNLP/Knowledge-Plugin.comで公開されています。

Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks. However, massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks. In this work, we are the first to study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models. To this end, we explore a new paradigm plug-and-play knowledge injection, where knowledge bases are injected into frozen existing downstream models by a knowledge plugin. Correspondingly, we propose a plug-and-play injection method map-tuning, which trains a mapping of knowledge embeddings to enrich model inputs with mapped embeddings while keeping model parameters frozen. Experimental results on three knowledge-driven NLP tasks show that existing injection methods are not suitable for the new paradigm, while map-tuning effectively improves the performance of downstream models. Moreover, we show that a frozen downstream model can be well adapted to different domains with different mapping networks of domain knowledge. Our code and models are available at https://github.com/THUNLP/Knowledge-Plugin.

翻訳日:2023-05-30 17:16:09 公開日:2023-05-28

# HaVQA:Hausa言語における視覚的質問応答とマルチモーダルリサーチのためのデータセット

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language ( http://arxiv.org/abs/2305.17690v1 )

ライセンス: Link先を確認

Shantipriya Parida, Idris Abdulmumin, Shamsuddeen Hassan Muhammad, Aneesh Bose, Guneet Singh Kohli, Ibrahim Said Ahmad, Ketan Kotwal, Sayan Deb Sarkar, Ond\v{r}ej Bojar, Habeebah Adamu Kakudi

(参考訳) 本稿では,Hausa言語における視覚質問応答(VQA)タスクのためのマルチモーダルデータセットHaVQAを提案する。データセットは、6,022の英問合せペアを手動で翻訳することで作成され、Visual Genomeデータセットから1,555のユニークな画像に関連付けられている。その結果、データセットは12,044ゴールドの標準英語とハウサの平行文を提供し、対応する視覚情報と意味的一致を保証する方法で翻訳される。視覚質問応答,視覚質問誘発,テキストのみの翻訳,マルチモーダル機械翻訳など,データセットのベースライン実験を行った。

This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.

翻訳日:2023-05-30 17:15:29 公開日:2023-05-28

# amplification trojan network: 固有の弱さを増幅してディープニューラルネットワークを攻撃する

Amplification trojan network: Attack deep neural networks by amplifying their inherent weakness ( http://arxiv.org/abs/2305.17688v1 )

ライセンス: Link先を確認

Zhanhao Hu, Jun Zhu, Bo Zhang, Xiaolin Hu

(参考訳) 最近の研究で、ディープニューラルネットワーク(DNN)は、クリーンな入力に敵のノイズを加えることで、敵の例によって騙される可能性があることが判明した。実例におけるdnnの精度は, 逆雑音の大きさが大きくなるにつれて低下する。本研究では,特定の状況下でノイズが小さい場合には,DNNも騙すことができることを示す。この新しい攻撃はアンプリフィケーション・トロイジャン・アタック(Amplification Trojan Attack,ATAttack)と呼ばれる。具体的には、ターゲットDNNに送信する前に、トロイの木馬ネットワークを用いて入力を変換する。このトロイの木馬ネットワークは、ターゲットDNN固有の弱点を増幅する増幅器として機能する。トロイの木馬ネットワークに感染した標的DNNは、通常、クリーンデータ上で動作し、敵の例に対して脆弱である。入力だけを変換するので、トロイの木馬ネットワークはDNNベースのパイプラインに隠れることができる。この新しいタイプの脅威は、安全なDNNを開発する際に考慮すべきである。

Recent works found that deep neural networks (DNNs) can be fooled by adversarial examples, which are crafted by adding adversarial noise on clean inputs. The accuracy of DNNs on adversarial examples will decrease as the magnitude of the adversarial noise increase. In this study, we show that DNNs can be also fooled when the noise is very small under certain circumstances. This new type of attack is called Amplification Trojan Attack (ATAttack). Specifically, we use a trojan network to transform the inputs before sending them to the target DNN. This trojan network serves as an amplifier to amplify the inherent weakness of the target DNN. The target DNN, which is infected by the trojan network, performs normally on clean data while being more vulnerable to adversarial examples. Since it only transforms the inputs, the trojan network can hide in DNN-based pipelines, e.g. by infecting the pre-processing procedure of the inputs before sending them to the DNNs. This new type of threat should be considered in developing safe DNNs.

翻訳日:2023-05-30 17:15:17 公開日:2023-05-28

# デッドバンド負荷アグリゲーションにおける予測可能性と公平性

Predictability and Fairness in Load Aggregation with Deadband ( http://arxiv.org/abs/2305.17725v1 )

ライセンス: Link先を確認

F. V. Difonzo and M. Roubalik and J. Marecek

(参考訳) 仮想発電所と負荷集約はますます一般的になりつつある。そこでは、分散エネルギー資源(ders)のアンサンブルの集約電力出力を規制する。 Marecekなど。 [Automatica, Volume 147, January 2023, 110743, arXiv:2110.03001] は、最近、提供された価格又はインセンティブの長期平均は、DER、アグリゲーター及び電力網の運営者の初期状態とは独立して存在するべきであることを示唆している。これは予測可能性と見なすことができ、公平さの根底にある。残念ながら、そのような平均値の存在は、デッドバンドの有無にかかわらず比例積分(PI)規制を含む多くの伝統的な規制機関では保証できない。ここでは、交流電流モデルにおける損失とコントローラのデッドバンドの影響について考察する。これにより(非線形損失による)非線形力学系は(デッドバンドによる)不連続性を示す。交互電流モデルとデッドバンドの非線形性を考慮したフィリッポフ不変測度は予測可能性と公平性についての推論を可能にする。

Virtual power plants and load aggregation are becoming increasingly common. There, one regulates the aggregate power output of an ensemble of distributed energy resources (DERs). Marecek et al. [Automatica, Volume 147, January 2023, 110743, arXiv:2110.03001] recently suggested that long-term averages of prices or incentives offered should exist and be independent of the initial states of the operators of the DER, the aggregator, and the power grid. This can be seen as predictability, which underlies fairness. Unfortunately, the existence of such averages cannot be guaranteed with many traditional regulators, including the proportional-integral (PI) regulator with or without deadband. Here, we consider the effects of losses in the alternating current model and the deadband in the controller. This yields a non-linear dynamical system (due to the non-linear losses) exhibiting discontinuities (due to the deadband). We show that Filippov invariant measures enable reasoning about predictability and fairness while considering non-linearity of the alternating-current model and deadband.

翻訳日:2023-05-30 17:07:54 公開日:2023-05-28

# 中国語スペル訂正のためのマスケッド言語モデリングの再考

Rethinking Masked Language Modeling for Chinese Spelling Correction ( http://arxiv.org/abs/2305.17721v1 )

ライセンス: Link先を確認

Hongqiu Wu and Shaohua Zhang and Yuchen Zhang and Hai Zhao

(参考訳) 本稿では,中国語のスペル補正(CSC)を,言語モデルと誤りモデルという2つの異なるモデルによる共同決定として検討する。経験的分析により、細調整されたBERTは言語モデルに不適合なままエラーモデルに過度に適合する傾向にあり、結果として分布外エラーパターンへの一般化が不十分であることがわかった。 BERTがほとんどのCSCモデルのバックボーンであることを考えると、この現象は大きな負の影響を及ぼす。この問題に対処するため、既存のベンチマークよりも高品質で多様性の高いマルチドメインベンチマークLEMONをリリースし、CSCモデルのオープンドメインの一般化を包括的に評価する。そこで我々は,入力シーケンスから20 %の非エラートークンをランダムにマスキングすることで,エラーモデルを犠牲にすることなく,より優れた言語モデルを学習できることを示す。この手法はどんなモデルアーキテクチャにも適用可能で、SIGHAN、ECSpell、LEMONで最新の結果が得られる。

In this paper, we study Chinese Spelling Correction (CSC) as a joint decision made by two separate models: a language model and an error model. Through empirical analysis, we find that fine-tuning BERT tends to over-fit the error model while under-fit the language model, resulting in poor generalization to out-of-distribution error patterns. Given that BERT is the backbone of most CSC models, this phenomenon has a significant negative impact. To address this issue, we are releasing a multi-domain benchmark LEMON, with higher quality and diversity than existing benchmarks, to allow a comprehensive assessment of the open domain generalization of CSC models. Then, we demonstrate that a very simple strategy, randomly masking 20\% non-error tokens from the input sequence during fine-tuning is sufficient for learning a much better language model without sacrificing the error model. This technique can be applied to any model architecture and achieves new state-of-the-art results on SIGHAN, ECSpell, and LEMON.

翻訳日:2023-05-30 17:07:33 公開日:2023-05-28

# FuseCap: ビジュアルデータをリッチなイメージキャプションにフェースするために大規模な言語モデルを活用する

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions ( http://arxiv.org/abs/2305.17718v1 )

ライセンス: Link先を確認

Noam Rotstein, David Bensaid, Shaked Brody, Roy Ganz, Ron Kimmel

(参考訳) 画像キャプションはコンピュータビジョンにおける中心的な課題であり、視覚言語による事前学習技術の出現以降、かなりの進歩を遂げてきた。本稿では,意味的に重要な要素を捉えるのにしばしば失敗するキャプションモデルに,しばしば見落とされがちな制限を強調する。この欠点は、テキスト画像データセットに遡ることができる。キャプションは通常、画像コンテンツの一般的な描写を提供するが、しばしば詳細を省略する。この制限を緩和するために,物体検出器,属性認識器,光学文字認識器 (OCR) などの視覚専門家から得られた視覚情報によりキャプションを充実させる新しい手法であるFuseCapを提案する。提案手法は,大規模な言語モデル (LLM) を用いて視覚専門家の出力を元のキャプションと融合し,包括的画像記述を示す豊富なキャプションを生成する。定量的および定性的な分析により,提案手法の有効性を検証した。提案手法は, 高精度かつ詳細なキャプションを生成する上で, 精度の低いパラメータとトレーニングデータを用いて, 最先端のアプローチを超越したキャプションモデルBLIPのトレーニングセットをキュレートする。さらに,12M画像強化キャプションペアからなるデータセットを提供し,提案手法が画像テキスト検索を大幅に改善することを示す。

Image captioning is a central task in computer vision which has experienced substantial progress following the advent of vision-language pre-training techniques. In this paper, we highlight a frequently overlooked limitation of captioning models that often fail to capture semantically significant elements. This drawback can be traced back to the text-image datasets; while their captions typically offer a general depiction of image content, they frequently omit salient details. To mitigate this limitation, we propose FuseCap - a novel method for enriching captions with additional visual information, obtained from vision experts, such as object detectors, attribute recognizers, and Optical Character Recognizers (OCR). Our approach fuses the outputs of such vision experts with the original caption using a large language model (LLM), yielding enriched captions that present a comprehensive image description. We validate the effectiveness of the proposed caption enrichment method through both quantitative and qualitative analysis. Our method is then used to curate the training set of a captioning model based BLIP which surpasses current state-of-the-art approaches in generating accurate and detailed captions while using significantly fewer parameters and training data. As additional contributions, we provide a dataset comprising of 12M image-enriched caption pairs and show that the proposed method largely improves image-text retrieval.

翻訳日:2023-05-30 17:07:16 公開日:2023-05-28

# InDL:ビジュアルイリュージョンに基づくインダイアグラム論理解釈のための新しいデータセットとベンチマーク

InDL: A New Datasets and Benchmark for In-Diagram Logic Interpreting based on Visual Illusion ( http://arxiv.org/abs/2305.17716v1 )

ライセンス: Link先を確認

Haobo Yang, Wenyu Wang, Ze Cao, Zhekai Duan, Xuchen Liu

(参考訳) 本稿では,深層学習モデルの論理解釈能力を評価するための新しい手法を提案する。視覚錯視の興味深い領域を活用して、これらのモデルを厳格にテストし、ベンチマークするために設計されたユニークなデータセットInDLを構築します。ディープラーニングはコンピュータビジョンや自然言語処理といった領域で顕著な進歩をみせた。しかしながら、モデルは、決定過程を曖昧にする固有の「ブラックボックス」特性のために、論理的推論を必要とするタスクに悩まされることが多い。私たちの研究は、知覚と論理の複雑な相互作用である視覚錯覚の扱いに焦点を当てることで、これらのモデルをよりよく理解するための新しいレンズを提示します。 6つの古典的な幾何学的錯覚を用いて,人間と機械の視覚知覚の比較枠組みを構築した。この方法論は、モデルをランク付けし、潜在的な弱点を解明し、モデル改善のための実行可能な洞察を提供する。実験により,本手法の有効性を検証し,その論理解釈能力に基づくモデルランキングの有効性を示す。再現可能な研究へのコミットメントの一環として、ソースコードとデータセットは、以下で公開されます(TODO GitHub repo)。

This paper introduces a novel approach to evaluating deep learning models' capacity for in-diagram logic interpretation. Leveraging the intriguing realm of visual illusions, we establish a unique dataset, InDL, designed to rigorously test and benchmark these models. Deep learning has witnessed remarkable progress in domains such as computer vision and natural language processing. However, models often stumble in tasks requiring logical reasoning due to their inherent 'black box' characteristics, which obscure the decision-making process. Our work presents a new lens to understand these models better by focusing on their handling of visual illusions -- a complex interplay of perception and logic. We utilize six classic geometric optical illusions to create a comparative framework between human and machine visual perception. This methodology offers a quantifiable measure to rank models, elucidating potential weaknesses and providing actionable insights for model improvements. Our experimental results affirm the efficacy of our benchmarking strategy, demonstrating its ability to effectively rank models based on their logic interpretation ability. As part of our commitment to reproducible research, the source code and datasets will be made publicly available here: (TODO GitHub repo).

翻訳日:2023-05-30 17:06:44 公開日:2023-05-28

# 署名付き言語翻訳のためのオープンソースのGrossベースライン

An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation ( http://arxiv.org/abs/2305.17714v1 )

ライセンス: Link先を確認

Amit Moryossef, Mathias M\"uller, Anne G\"ohring, Zifan Jiang, Yoav Goldberg, and Sarah Ebling

(参考訳) 手話翻訳システムは複雑で多くのコンポーネントを必要とする。その結果、出版物間で手法を比較することは非常に困難である。本稿では,ドイツ語からスイスドイツ語への変換,スイスのフランス語からフランス語への変換,スイスのイタリア語からイタリア語への変換を示す,テキストから音声へのパイプライン方式のオープンソース実装を提案する。テキストから言語への翻訳には,レマタイザ,ルールに基づく単語の並べ替えとドロップ,ニューラルマシン翻訳システムという3つの異なるコンポーネントが提案されている。 Gloss-to-pose変換は、ビデオから骨格のポーズを抽出した3つの異なる符号付き言語のための辞書のデータを使用して発生する。文を生成するために、まずtext-to-glossシステムを実行し、その結果の符号のポーズ表現を縫い合わせる。

Sign language translation systems are complex and require many components. As a result, it is very hard to compare methods across publications. We present an open-source implementation of a text-to-gloss-to-pose-to-video pipeline approach, demonstrating conversion from German to Swiss German Sign Language, French to French Sign Language of Switzerland, and Italian to Italian Sign Language of Switzerland. We propose three different components for the text-to-gloss translation: a lemmatizer, a rule-based word reordering and dropping component, and a neural machine translation system. Gloss-to-pose conversion occurs using data from a lexicon for three different signed languages, with skeletal poses extracted from videos. To generate a sentence, the text-to-gloss system is first run, and the pose representations of the resulting signs are stitched together.

翻訳日:2023-05-30 17:06:25 公開日:2023-05-28

# gibbs状態生成のための変分量子アルゴリズム

Variational Quantum Algorithms for Gibbs State Preparation ( http://arxiv.org/abs/2305.17713v1 )

ライセンス: Link先を確認

Mirko Consiglio

(参考訳) ノイズの多い中間スケール量子(NISQ)デバイス上で相互作用する量子多体系のギブス状態を作成することは、量子状態における熱力学的性質を探索するための重要な課題である。熱化や平衡外熱力学などの理解プロトコルや、忠実に準備されたギブス状態からのサンプリングは、量子アルゴリズムに有用なリソースを提供する方法を作ることができる。変分量子アルゴリズム(VQA)は、ギブス状態を効率的に作成する上で最も有望であるが、NISQコンピュータ上でギブス状態を効果的に決定および準備するために適用できる様々なアプローチがある。本稿では,システム-環境結合,量子イマジナリー時間発展,ヘルムホルツ自由エネルギーをコスト関数として用いた最新のvqaなど,gibbs状態の合成が可能なアルゴリズムの簡潔な概要について述べる。さらに,consiglioら (arxiv:2303.11276) が開発した最新の変分ギブス状態生成アルゴリズムのベンチマークを行い,スピン1/2 1次元 xy$モデルに適用した。

Preparing the Gibbs state of an interacting quantum many-body system on noisy intermediate-scale quantum (NISQ) devices is a crucial task for exploring the thermodynamic properties in the quantum regime. It encompasses understanding protocols such as thermalization and out-of-equilibrium thermodynamics, as well as sampling from faithfully prepared Gibbs states could pave the way to providing useful resources for quantum algorithms. Variational quantum algorithms (VQAs) show the most promise in efficiently preparing Gibbs states, however, there are many different approaches that could be applied to effectively determine and prepare Gibbs states on a NISQ computer. In this paper, we provide a concise overview of the algorithms capable of preparing Gibbs states, including joint Hamiltonian evolution of a system--environment coupling, quantum imaginary time evolution, and modern VQAs utilizing the Helmholtz free energy as a cost function, among others. Furthermore, we perform a benchmark of one of the latest variational Gibbs state preparation algorithms, developed by Consiglio et al. (arXiv:2303.11276), by applying it to the spin 1/2 one-dimensional $XY$ model.

翻訳日:2023-05-30 17:06:09 公開日:2023-05-28

# OccCasNet:光深度推定のためのオクルージョン対応カスケードコストボリューム

OccCasNet: Occlusion-aware Cascade Cost Volume for Light Field Depth Estimation ( http://arxiv.org/abs/2305.17710v1 )

ライセンス: Link先を確認

Wentao Chao, Fuqing Duan, Xuechun Wang, Yingqian Wang, Guanghui Wang

(参考訳) 光場(LF)深度推定は,多くの実用応用において重要な課題である。しかし、マルチビューステレオ(MVS)に基づく主流の手法は、より細かいコストのボリュームを構築する必要があるため、リソース集約的で時間を要する。この問題に対処し,精度と効率のトレードオフを改善するために,LF深度推定のためのオクルージョン対応カスケードコストボリュームを提案する。提案手法は,細かなコストボリュームの構築時にサンプリング間隔を一定に保ちながらサンプリング数を削減する。また,オクルージョン対応のコスト容積を構築する際の精度を高めるために,オクルージョンマップを導入する。具体的には,まず粗異性推定ネットワークを通して粗異性マップを得る。そして、初期差分マップに基づいて、サイドビューのサブアパーチャ画像(SAI)をセンタービューにワープする。次に、歪んだSAIと中央SAIとの間の光一貫性制約を提案し、各SAIに対して閉塞マップを生成する。最後に, 粗分散マップとオクルージョンマップを導入し, オクルージョン・アウェア・コストボリュームの構築を行い, 洗練された不一致推定ネットワークによりより正確な不一致マップが得られるようにした。広範な実験により本手法の有効性が実証された。本手法は最先端の手法と比較して精度と効率のバランスが良く,HCI 4D ベンチマークで発表された手法のうち,MSE と Q25 の指標が第一位である。提案手法のコードとモデルはhttps://github.com/chaowentao/occcasnetで入手できる。

Light field (LF) depth estimation is a crucial task with numerous practical applications. However, mainstream methods based on the multi-view stereo (MVS) are resource-intensive and time-consuming as they need to construct a finer cost volume. To address this issue and achieve a better trade-off between accuracy and efficiency, we propose an occlusion-aware cascade cost volume for LF depth (disparity) estimation. Our cascaded strategy reduces the sampling number while keeping the sampling interval constant during the construction of a finer cost volume. We also introduce occlusion maps to enhance accuracy in constructing the occlusion-aware cost volume. Specifically, we first obtain the coarse disparity map through the coarse disparity estimation network. Then, the sub-aperture images (SAIs) of side views are warped to the center view based on the initial disparity map. Next, we propose photo-consistency constraints between the warped SAIs and the center SAI to generate occlusion maps for each SAI. Finally, we introduce the coarse disparity map and occlusion maps to construct an occlusion-aware refined cost volume, enabling the refined disparity estimation network to yield a more precise disparity map. Extensive experiments demonstrate the effectiveness of our method. Compared with state-of-the-art methods, our method achieves a superior balance between accuracy and efficiency and ranks first in terms of MSE and Q25 metrics among published methods on the HCI 4D benchmark. The code and model of the proposed method are available at https://github.com/chaowentao/OccCasNet.

翻訳日:2023-05-30 17:05:50 公開日:2023-05-28

# ニューラルエンティティの参照解決を支援する並列データ

Parallel Data Helps Neural Entity Coreference Resolution ( http://arxiv.org/abs/2305.17709v1 )

ライセンス: Link先を確認

Gongbo Tang, Christian Hardmeier

(参考訳) コリファレンス解決(coreference resolution)とは、テキスト内の同じエンティティを参照する式を見つける作業である。コリファレンスモデルは、一般的には単言語アノテートデータで訓練されるが、コリファレンスへのアノテートは高価かつ困難である。 Hardmeierら。 (2013) は、並列データが潜在照応的知識を含むことを示したが、エンドツーエンドのニューラルモデルではまだ研究されていない。本稿では,並列データからコア参照知識を活用するための,シンプルで効果的なモデルを提案する。アノテーションからコリファレンスを学ぶ従来のモジュールに加えて,言語間コリファレンス知識をキャプチャする教師なしモジュールも導入する。提案手法は,9つの異なる合成並列データセットを用いて,OntoNotes 5.0の英語データセットに対して最大1.74ポイントの一貫した改善を実現する。これらの実験結果から、並列データは、コリファレンス解決タスクに有用な追加のコリファレンス知識を提供できることが確認された。

Coreference resolution is the task of finding expressions that refer to the same entity in a text. Coreference models are generally trained on monolingual annotated data but annotating coreference is expensive and challenging. Hardmeier et al.(2013) have shown that parallel data contains latent anaphoric knowledge, but it has not been explored in end-to-end neural models yet. In this paper, we propose a simple yet effective model to exploit coreference knowledge from parallel data. In addition to the conventional modules learning coreference from annotations, we introduce an unsupervised module to capture cross-lingual coreference knowledge. Our proposed cross-lingual model achieves consistent improvements, up to 1.74 percentage points, on the OntoNotes 5.0 English dataset using 9 different synthetic parallel datasets. These experimental results confirm that parallel data can provide additional coreference knowledge which is beneficial to coreference resolution tasks.

翻訳日:2023-05-30 17:05:25 公開日:2023-05-28

# 量子古典的多重カーネル学習

Quantum-Classical Multiple Kernel Learning ( http://arxiv.org/abs/2305.17707v1 )

ライセンス: Link先を確認

Ara Ghukasyan and Jack S. Baker and Oktay Goktas and Juan Carrasquilla and Santosh Kumar Radha

(参考訳) 量子コンピュータがますます実用的になるにつれて、従来のアルゴリズムを改善するために量子計算を使う可能性も高まる。機械学習におけるカーネルメソッドは、近い将来にそのような改善が実現可能な分野のひとつだ。サポートベクターマシンのようなカーネル手法により、小さくてノイズの多い量子コンピュータは古典的に硬い量子カーネルを評価し、データの類似性のユニークな概念を捉えることができる。古典的機械学習の手法から着想を得て、マルチカーネル学習(mkl)の文脈でシミュレーションされた量子カーネルについて検討する。本研究では, 古典的, 量子量子的, 量子古典的カーネルのペアワイズ組み合わせについて, 支持ベクトルマシンによる分類性能の実証的研究を行った。 QCC-net (quantum-classical-convex Neural Network) と呼ばれる新しいアプローチを導入し、カーネルパラメータとともにベースカーネルの重みを最適化する。本手法は,MKL設定における各種性能指標の強化に有効であることを示す。より多くの機能(最大13次元)を持つデータを見ると、いくつかの組み合わせでカーネルの重み付けに成功するためのパラメータトレーニングが重要であることが分かります。相対効用指標として最適カーネル重みを用いると、特徴の数が増加するにつれて量子古典的カーネルの組み合わせにおけるトレーニング可能な量子カーネルからの寄与が増加する。単純な非パラメトリック量子カーネルを含む組合せの逆の傾向を観察する。

As quantum computers become increasingly practical, so does the prospect of using quantum computation to improve upon traditional algorithms. Kernel methods in machine learning is one area where such improvements could be realized in the near future. Paired with kernel methods like support-vector machines, small and noisy quantum computers can evaluate classically-hard quantum kernels that capture unique notions of similarity in data. Taking inspiration from techniques in classical machine learning, this work investigates simulated quantum kernels in the context of multiple kernel learning (MKL). We consider pairwise combinations of several classical-classical, quantum-quantum, and quantum-classical kernels in an empirical investigation of their classification performance with support-vector machines. We also introduce a novel approach, which we call QCC-net (quantum-classical-convex neural network), for optimizing the weights of base kernels together with any kernel parameters. We show this approach to be effective for enhancing various performance metrics in an MKL setting. Looking at data with an increasing number of features (up to 13 dimensions), we find parameter training to be important for successfully weighting kernels in some combinations. Using the optimal kernel weights as indicators of relative utility, we find growing contributions from trainable quantum kernels in quantum-classical kernel combinations as the number of features increases. We observe the opposite trend for combinations containing simpler, non-parametric quantum kernels.

翻訳日:2023-05-30 17:05:10 公開日:2023-05-28

# 雑音と混合音声からのスポットキーワード

Spot keywords from very noisy and mixed speech ( http://arxiv.org/abs/2305.17706v1 )

ライセンス: Link先を確認

Ying Shi, Dong Wang, Lantian Li, Jiqing Han and Shi Yin

(参考訳) 現存するほとんどのキーワードスポッティング研究は、わずかまたは中程度の雑音のある条件に焦点を当てている。本稿では,強い干渉音声の下に埋もれたキーワード(振幅の10倍)を検出し,さらにさらに悪いことに,他のキーワードと混在する,より困難な課題に取り組むことを試みる。本稿では,雑音と混合音声から低エネルギーのキーワードを発見することをモデルに促す新しい混合訓練手法を提案する。バニラCNNと2つのEfficientNet (B0/B2)アーキテクチャで実験を行った。 google speech commandデータセットで評価された結果は、提案されたmix trainingアプローチが極めて効果的であり、標準データ拡張とmixupトレーニングを上回っていることを示している。

Most existing keyword spotting research focuses on conditions with slight or moderate noise. In this paper, we try to tackle a more challenging task: detecting keywords buried under strong interfering speech (10 times higher than the keyword in amplitude), and even worse, mixed with other keywords. We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech. Experiments were conducted with a vanilla CNN and two EfficientNet (B0/B2) architectures. The results evaluated with the Google Speech Command dataset demonstrated that the proposed mix training approach is highly effective and outperforms standard data augmentation and mixup training.

翻訳日:2023-05-30 17:04:47 公開日:2023-05-28

# 短文ストリームにおける信頼性と解釈可能なドリフト検出

Reliable and Interpretable Drift Detection in Streams of Short Texts ( http://arxiv.org/abs/2305.17750v1 )

ライセンス: Link先を確認

Ella Rabinovich, Matan Vetzler, Samuel Ackerman, Ateret Anaby-Tavor

(参考訳) データドリフトはモデル入力データの変化であり、機械学習モデルの性能劣化につながる重要な要因の1つである。ドリフトのモニタリングはこれらの問題を検知し、有害な結果を防ぐのに役立つ。意味のあるドリフト解釈は、モデルの効果的な再訓練に向けた基本的なステップである。本研究では,大規模タスク指向ダイアログシステムにおいて,信頼性の高いモデル非依存な変更点検出と解釈のためのエンドツーエンドフレームワークを提案する。当社のアプローチを評価し,顧客要求をダイアログシステムにシミュレートする意図分類学習データセットの新たな変種を用いて,そのメリットを実証する。データを公開しています。

Data drift is the change in model input data that is one of the key factors leading to machine learning models performance degradation over time. Monitoring drift helps detecting these issues and preventing their harmful consequences. Meaningful drift interpretation is a fundamental step towards effective re-training of the model. In this study we propose an end-to-end framework for reliable model-agnostic change-point detection and interpretation in large task-oriented dialog systems, proven effective in multiple customer deployments. We evaluate our approach and demonstrate its benefits with a novel variant of intent classification training dataset, simulating customer requests to a dialog system. We make the data publicly available.

翻訳日:2023-05-30 16:58:31 公開日:2023-05-28

# 音波伝搬のベイズ推定とニューラル推定

Bayesian inference and neural estimation of acoustic wave propagation ( http://arxiv.org/abs/2305.17749v1 )

ライセンス: Link先を確認

Yongchao Huang, Yuhang He, Hong Ge

(参考訳) 本研究では,物理と機械学習を組み合わせて音響信号を解析する新しい枠組みを提案する。この課題に対して, スペクトル音響特性を推定するためのベイズ推定手法, 前方および後方の物理的損失をニューラルネットワークに装備するニューラルネットワーク物理モデル, ベンチマークとして機能する非線形最小二乗手法の3つの手法を開発した。推定伝搬係数は、不確実性のある再局在に使用できる室インパルス応答(RIR)量につながる。このフレームワークの単純さと効率性は、シミュレーションデータ上で実証的に検証される。

In this work, we introduce a novel framework which combines physics and machine learning methods to analyse acoustic signals. Three methods are developed for this task: a Bayesian inference approach for inferring the spectral acoustics characteristics, a neural-physical model which equips a neural network with forward and backward physical losses, and the non-linear least squares approach which serves as benchmark. The inferred propagation coefficient leads to the room impulse response (RIR) quantity which can be used for relocalisation with uncertainty. The simplicity and efficiency of this framework is empirically validated on simulated data.

翻訳日:2023-05-30 16:58:21 公開日:2023-05-28

# タンパー検出のための画像ハッシュ最小化

Image Hash Minimization for Tamper Detection ( http://arxiv.org/abs/2305.17748v1 )

ライセンス: Link先を確認

Subhajit Maity, Ram Kumar Karsh

(参考訳) 画像ハッシュを用いたタンパー検出は現代の非常に一般的な問題である。この問題に対処するためのいくつかの研究と進歩がすでに行われている。しかし,既存の手法の多くは,改ざん面積が低い場合には改ざん検出の精度が低く,画像ハッシュも長い。本論文では,低改質領域の性能を向上しつつ,ハッシュ長を客観的に最小化する手法を提案する。

Tamper detection using image hash is a very common problem of modern days. Several research and advancements have already been done to address this problem. However, most of the existing methods lack the accuracy of tamper detection when the tampered area is low, as well as requiring long image hashes. In this paper, we propose a novel method objectively to minimize the hash length while enhancing the performance at low tampered area.

翻訳日:2023-05-30 16:58:11 公開日:2023-05-28

# ホワイトニングに基づく文埋め込みのコントラスト学習

Whitening-based Contrastive Learning of Sentence Embeddings ( http://arxiv.org/abs/2305.17746v1 )

ライセンス: Link先を確認

Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang

(参考訳) 本稿では,新しいシャッフルグループホワイトニングとコントラスト学習を組み合わせた,文埋め込み学習(whitenedcse)のためのホワイトニングベースのコントラスト学習手法を提案する。一般的に、対照的学習は単一のサンプル(すなわち正のサンプル)の歪みを閉じて負のサンプルを遠くへ押し出し、特徴空間のアライメントと均一性を促進する。プッシング」操作の一般的な代替手段は、全てのサンプルを均一に散乱させる特徴空間の白化である。ホワイトニングとコントラスト学習は、均一性に大きな冗長性を持つため、通常は個別に使用され、共同作業は容易ではない。本論文は, 初めて, ホワイトニングをコントラスト学習方式に統合し, 2つの利点を享受する。 1) 統一性の向上。これらの2つのアプローチは完全に冗長ではなく、実際には異なる均一性機構のために相補性を持っている。 2)アライメントの改善。特徴をチャネル軸に沿って複数のグループにランダムに分割し,各グループ内で独立してホワイトニングを行う。群分割をシャッフルすることで、単一のサンプルの複数の歪みを導き、正のサンプル多様性を増加させる。その結果、多様性が向上した複数の正のサンプルを使用することで、アライメントの向上によるコントラスト学習がさらに向上する。 7つの意味的テキスト類似性タスクに関する広範囲な実験は、我々の手法が対照的な学習ベースラインよりも一貫した改善を達成し、STSタスク上のスピアマン相関を78.78\%(+2.53\%)に設定していることを示している。

This paper presents a whitening-based contrastive learning method for sentence embedding learning (WhitenedCSE), which combines contrastive learning with a novel shuffled group whitening. Generally, contrastive learning pulls distortions of a single sample (i.e., positive samples) close and push negative samples far away, correspondingly facilitating the alignment and uniformity in the feature space. A popular alternative to the "pushing'' operation is whitening the feature space, which scatters all the samples for uniformity. Since the whitening and the contrastive learning have large redundancy w.r.t. the uniformity, they are usually used separately and do not easily work together. For the first time, this paper integrates whitening into the contrastive learning scheme and facilitates two benefits. 1) Better uniformity. We find that these two approaches are not totally redundant but actually have some complementarity due to different uniformity mechanism. 2) Better alignment. We randomly divide the feature into multiple groups along the channel axis and perform whitening independently within each group. By shuffling the group division, we derive multiple distortions of a single sample and thus increase the positive sample diversity. Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment. Extensive experiments on seven semantic textual similarity tasks show our method achieves consistent improvement over the contrastive learning baseline and sets new states of the art, e.g., 78.78\% (+2.53\% based on BERT\ba) Spearman correlation on STS tasks.

翻訳日:2023-05-30 16:58:04 公開日:2023-05-28

# LEAPで言語バリアを壊す:多言語LLMの学習戦略

Breaking Language Barriers with a LEAP: Learning Strategies for Polyglot LLMs ( http://arxiv.org/abs/2305.17740v1 )

ライセンス: Link先を確認

Akshay Nambi, Vaibhav Balloli, Mercy Ranjit, Tanuja Ganu, Kabir Ahuja, Sunayana Sitaram, Kalika Bali

(参考訳) 大規模言語モデル(llm)は、多くのドメインをグローバルに変革する最前線にある。しかしながら、その傾向と有効性は、非ラテン語スクリプトや低リソース言語に限られている。本稿では,LLMの多言語的性能向上という課題に取り組み,特に生成モデルに着目した。一般的な質問応答(QA)データセットを用いた多言語言語の体系的調査と評価を通じて,多言語ランドスケープにおけるLLMの真のポテンシャルを解き放つ新しい手法を提案する。提案手法は,多言語習熟度を著しく向上させる3つの重要な戦略を含む。まず,ポリグロットLLMに適したプロンプトを巧みに最適化することにより,その潜在能力を解放し,言語間で大幅な性能向上を実現する。第2に,GPT生成を多言語埋め込みと相乗化し,QAや検索といった重要なタスクにおいて,多言語のパフォーマンス向上を実現するハイブリッド手法を提案する。最後に,多言語LLMの性能をさらに向上させるために,最適プロンプト戦略,LLMモデル,クエリ毎の埋め込みを動的に選択する新しい学習アルゴリズムを提案する。この動的適応は言語間のLLMの有効性を最大化し、最高の静的およびランダムな戦略より優れる。以上の結果から,多言語理解と多言語生成の進歩が示唆された。

Large language models (LLMs) are at the forefront of transforming numerous domains globally. However, their inclusivity and effectiveness remain limited for non-Latin scripts and low-resource languages. This paper tackles the imperative challenge of enhancing the multilingual performance of LLMs, specifically focusing on Generative models. Through systematic investigation and evaluation of diverse languages using popular question-answering (QA) datasets, we present novel techniques that unlock the true potential of LLMs in a polyglot landscape. Our approach encompasses three key strategies that yield remarkable improvements in multilingual proficiency. First, by meticulously optimizing prompts tailored for polyglot LLMs, we unlock their latent capabilities, resulting in substantial performance boosts across languages. Second, we introduce a new hybrid approach that synergizes GPT generation with multilingual embeddings and achieves significant multilingual performance improvement on critical tasks like QA and retrieval. Finally, to further propel the performance of polyglot LLMs, we introduce a novel learning algorithm that dynamically selects the optimal prompt strategy, LLM model, and embeddings per query. This dynamic adaptation maximizes the efficacy of LLMs across languages, outperforming best static and random strategies. Our results show substantial advancements in multilingual understanding and generation across a diverse range of languages.

翻訳日:2023-05-30 16:57:34 公開日:2023-05-28

# spoofローカライズのためのレンジベース等誤差レート

Range-Based Equal Error Rate for Spoof Localization ( http://arxiv.org/abs/2305.17739v1 )

ライセンス: Link先を確認

Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi

(参考訳) spoofのローカライズ(spoof localization)はセグメントレベル検出(segment-level detection)とも呼ばれ、部分的なspoof音声中のspoofを見つけるための重要なタスクである。等誤差率(EER)は、このような生体シナリオのパフォーマンスを測定するために広く使われている。 eerは唯一のしきい値のないメトリクスであるが、通常はスコアと参照を予め定義された時間分解能で使用し、誤分類されたセグメントの数をカウントするポイントベースで計算される。このような点に基づく測定は、この解決法に過度に依存し、誤った分類範囲を正確に測定することができない。誤分類範囲を適切に測定し,スプーフ局所化性能をよりよく評価するために,点ベースEERを範囲ベースEERにアップグレードする。そして,この二進探索アルゴリズムを範囲ベースEERの計算に適用し,古典的点ベースEERと比較する。そこで本研究では,適切な時間分解能を持つレンジベースEERとポイントベースEERを併用することにより,スプーフ局所化の性能を適切に評価できることを示す。

Spoof localization, also called segment-level detection, is a crucial task that aims to locate spoofs in partially spoofed audio. The equal error rate (EER) is widely used to measure performance for such biometric scenarios. Although EER is the only threshold-free metric, it is usually calculated in a point-based way that uses scores and references with a pre-defined temporal resolution and counts the number of misclassified segments. Such point-based measurement overly relies on this resolution and may not accurately measure misclassified ranges. To properly measure misclassified ranges and better evaluate spoof localization performance, we upgrade point-based EER to range-based EER. Then, we adapt the binary search algorithm for calculating range-based EER and compare it with the classical point-based EER. Our analyses suggest utilizing either range-based EER, or point-based EER with a proper temporal resolution can fairly and properly evaluate the performance of spoof localization.

翻訳日:2023-05-30 16:57:13 公開日:2023-05-28

# 記憶と無記憶の異なる重力波バーストプロファイルのための絡み合い収穫法

Entanglement harvesting for different gravitational wave burst profiles with and without memory ( http://arxiv.org/abs/2305.17735v1 )

ライセンス: Link先を確認

Subhajit Barman, Indranil Chakraborty, Sajal Mukherjee

(参考訳) 絡み合った収穫の可能性は驚くべき現象であり、背景形状や検出器の動きなどの影響を受けている。本稿では、線形化重力における異なる重力波(GW)バーストプロファイルが、2つの静的Unruh-DeWitt検出器間の収穫にどのように影響するかを考察する。この目的のために, ガウス, sech-squared および tanh のバーストプロファイルについて検討する。これらのうち、最初の2つのバーストはメモリを含まないが、後者はバニッシュしないメモリ効果からなる。いずれの場合も、絡み合いの収穫が可能であり、検出器間の距離が大きくなると減少することがわかった。また、この収穫は、記憶の有無によって定性的に異なる。記憶のない2つのバーストプロファイルでは、より長いバーストは低検出器遷移エネルギーレジームでより大きな収穫に対応し、この特性はより大きな遷移エネルギーのために反転する。一方、メモリを持つタン型プロファイルでは、短いバーストでは収穫が常に大きい。我々はこの発見の結果について簡単に議論する。

The possibility of entanglement harvesting is a fascinating phenomenon, which gets affected due to the background geometry, the motion of detectors, etc. In the present article, we study how different gravitational wave (GW) burst profiles in linearized gravity, with and without the asymptotic memory, may influence the harvesting between two static Unruh-DeWitt detectors. To this end, we investigate the following burst profiles -- Gaussian, sech-squared, and tanh. Out of these, the first two bursts contain no memory, while the latter consists of a non-vanishing memory effect. We found that in all of these cases, entanglement harvesting is possible, and it decreases with the increasing distance between detectors. Moreover, the harvesting differs qualitatively based on the presence or absence of the memory. For the two burst profiles without memory, longer bursts correspond to greater harvesting in the low detector transition energy regime, and this characteristic is reversed for larger transition energy. Meanwhile, for the tanh type profile with memory, harvesting is always greater for shorter bursts. We briefly discuss some of the consequences of our findings.

翻訳日:2023-05-30 16:56:54 公開日:2023-05-28

# 低リソース環境における事前学習オーディオエンコーダの検討

Investigating Pre-trained Audio Encoders in the Low-Resource Condition ( http://arxiv.org/abs/2305.17733v1 )

ライセンス: Link先を確認

Hao Yang, Jinming Zhao, Gholamreza Haffari, Ehsan Shareghi

(参考訳) 事前訓練された音声エンコーダは、様々な音声理解および生成タスクにおいて最先端の結果をプッシュする中心となっている。それでも、低リソース設定でのエンコーダの能力は、まだ十分に検討されていない。そこで本研究では,3つの最先端エンコーダ(Wav2vec2,WavLM,Whisper)を7つの音声理解および生成タスクにまたがる低リソース環境で,包括的な実験を行う。本稿では,エンコーダのタスク性能,収束速度,表現特性に関する定量的・定性的な解析を行う。これらのエンコーダの事前学習プロトコルと、それらが内部層で情報を取得する方法との接続を観察する。特に、whisperエンコーダは、パフォーマンスと収束速度の観点から、コンテンツ駆動タスクにおいて最大の低リソース能力を示す。

Pre-trained speech encoders have been central to pushing state-of-the-art results across various speech understanding and generation tasks. Nonetheless, the capabilities of these encoders in low-resource settings are yet to be thoroughly explored. To address this, we conduct a comprehensive set of experiments using a representative set of 3 state-of-the-art encoders (Wav2vec2, WavLM, Whisper) in the low-resource setting across 7 speech understanding and generation tasks. We provide various quantitative and qualitative analyses on task performance, convergence speed, and representational properties of the encoders. We observe a connection between the pre-training protocols of these encoders and the way in which they capture information in their internal layers. In particular, we observe the Whisper encoder exhibits the greatest low-resource capabilities on content-driven tasks in terms of performance and convergence speed.

翻訳日:2023-05-30 16:56:36 公開日:2023-05-28

# マルチターン会話データセットのための三段階共同自然言語理解

Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets ( http://arxiv.org/abs/2305.17729v1 )

ライセンス: Link先を確認

Henry Weld, Sijia Hu, Siqu Long, Josiah Poon, Soyeon Caren Han

(参考訳) 自然言語理解は通常、単語レベルでの単一発話を二重レベルの意味フレーム、文レベルの意図、スロットラベルにマッピングする。最高のパフォーマンスモデルは、インテント検出とスロットフィリングの間の明示的な相互作用を強制する。本稿では,新しい3レベル統合自然言語理解手法を提案し,ドメインを追加し,すべてのレベル間で意味情報を明示的に交換する。このアプローチでは、単一発話よりも自然な会話環境であるマルチターンデータセットの使用を可能にする。我々は,2つのマルチターンデータセットを用いて,共同スロット充填とインテント検出を行った最初のモデルとして評価を行った。本モデルはマルチターンデータセットのスロット充填とインテント検出において最先端のジョイントモデルを上回る。層間の明示的な相互作用の場所を解析する。ドメイン情報を含むとモデルの性能が向上する。

Natural language understanding typically maps single utterances to a dual level semantic frame, sentence level intent and slot labels at the word level. The best performing models force explicit interaction between intent detection and slot filling. We present a novel tri-level joint natural language understanding approach, adding domain, and explicitly exchange semantic information between all levels. This approach enables the use of multi-turn datasets which are a more natural conversational environment than single utterance. We evaluate our model on two multi-turn datasets for which we are the first to conduct joint slot-filling and intent detection. Our model outperforms state-of-the-art joint models in slot filling and intent detection on multi-turn data sets. We provide an analysis of explicit interaction locations between the layers. We conclude that including domain information improves model performance.

翻訳日:2023-05-30 16:56:19 公開日:2023-05-28

# 会話における直観推論のための構造因果モデル学習

Learning a Structural Causal Model for Intuition Reasoning in Conversation ( http://arxiv.org/abs/2305.17727v1 )

ライセンス: Link先を確認

Hang Chen, Bingyu Liao, Jing Luo, Wenjing Zhu, Xinyu Yang

(参考訳) NLP研究の重要な側面である推論は、大規模言語モデルを含む一般的なモデルでは適切に対処されていない。会話推論は、その重要な要素として、よく設計された認知モデルがないため、ほとんど未解明のままである。本稿では,会話認知に関する直観理論に触発された会話認知モデル(ccm)を開発し,各発話が情報チャネルをどのように受信し,再帰的に活性化するかを説明する。さらに, 代数的にCCMを構造因果モデル (Strucical causal model, SCM) に変換し, 様々な因果発見法と互換性を持たせた。さらに、発話レベルの関係推論のためのSCMの確率的実装を提案する。変分推論を利用することで、暗黙的原因の代用品を探索し、観測不能の問題に対処し、エビデンスの下限を通じて発話の因果表現を再構築する。さらに,すべての利用可能なデータセットが暗黙的原因非依存である現状を緩和し,暗黙的原因と完全原因ラベルを組み込んだ合成およびシミュレーションデータセットを構築した。広範な実験により,提案手法は,合成,シミュレーション,実世界のデータセットにおいて,既存の手法を大幅に上回ることを示した。最後に,潜在共同設立者の下でのccmの性能を分析し,現在解決されていない問題に対処するための理論的アイデアを提案する。

Reasoning, a crucial aspect of NLP research, has not been adequately addressed by prevailing models including Large Language Model. Conversation reasoning, as a critical component of it, remains largely unexplored due to the absence of a well-designed cognitive model. In this paper, inspired by intuition theory on conversation cognition, we develop a conversation cognitive model (CCM) that explains how each utterance receives and activates channels of information recursively. Besides, we algebraically transformed CCM into a structural causal model (SCM) under some mild assumptions, rendering it compatible with various causal discovery methods. We further propose a probabilistic implementation of the SCM for utterance-level relation reasoning. By leveraging variational inference, it explores substitutes for implicit causes, addresses the issue of their unobservability, and reconstructs the causal representations of utterances through the evidence lower bounds. Moreover, we constructed synthetic and simulated datasets incorporating implicit causes and complete cause labels, alleviating the current situation where all available datasets are implicit-causes-agnostic. Extensive experiments demonstrate that our proposed method significantly outperforms existing methods on synthetic, simulated, and real-world datasets. Finally, we analyze the performance of CCM under latent confounders and propose theoretical ideas for addressing this currently unresolved issue.

翻訳日:2023-05-30 16:56:07 公開日:2023-05-28

# convgenvismo:対話型生成視覚モデルの評価

ConvGenVisMo: Evaluation of Conversational Generative Vision Models ( http://arxiv.org/abs/2305.17784v1 )

ライセンス: Link先を確認

Narjes Nikzad Khasmakhi, Meysam Asgari-Chenaghlu, Nabiha Asghar, Philipp Schaer, Dietlind Z\"uhlke

(参考訳) Visual ChatGPT (Wu et al., 2023)のような会話生成視覚モデル(CGVM)は、コンピュータビジョンと自然言語処理技術の合成から最近登場した。これらのモデルは、ユーザからの言語入力を理解し、視覚的な出力とともに自然言語で応答を生成するため、人間と機械間のより自然な対話的なコミュニケーションを可能にする。これらのモデルの利用と展開に関するインフォームドな意思決定を行うには、現実的なデータセット上での適切な評価フレームワークを通じて、それらのパフォーマンスを分析することが重要である。本稿では,CGVMの評価を行う新しいタスクのためのフレームワークであるConvGenVisMoを提案する。 ConvGenVisMoは、このタスクのための新しいベンチマーク評価データセットを導入し、アウトプットを評価するために、既存のおよび新しい自動評価メトリクスのスイートを提供する。データセットと評価コードを含むすべてのconvgenvismoアセットは、githubで公開される予定だ。

Conversational generative vision models (CGVMs) like Visual ChatGPT (Wu et al., 2023) have recently emerged from the synthesis of computer vision and natural language processing techniques. These models enable more natural and interactive communication between humans and machines, because they can understand verbal inputs from users and generate responses in natural language along with visual outputs. To make informed decisions about the usage and deployment of these models, it is important to analyze their performance through a suitable evaluation framework on realistic datasets. In this paper, we present ConvGenVisMo, a framework for the novel task of evaluating CGVMs. ConvGenVisMo introduces a new benchmark evaluation dataset for this task, and also provides a suite of existing and new automated evaluation metrics to evaluate the outputs. All ConvGenVisMo assets, including the dataset and the evaluation code, will be made available publicly on GitHub.

翻訳日:2023-05-30 16:48:01 公開日:2023-05-28

# ロボット探索誘導のための視力予測

Visual Affordance Prediction for Guiding Robot Exploration ( http://arxiv.org/abs/2305.17783v1 )

ライセンス: Link先を確認

Homanga Bharadhwaj, Abhinav Gupta, Shubham Tulsiani

(参考訳) 人間の相互作用の空間に関する直感的な理解と、その理解を以前目にしたことのない場面に一般化できる容易さに動機づけられ、ロボットの探索を誘導するための視覚能力を学ぶためのアプローチを開発した。シーンの入力画像が与えられた場合、我々はそれと相互作用することで実現可能な、可算な将来の状態の分布を推測する。我々はTransformerベースのモデルを用いて,VQ-VAEの潜伏埋め込み空間における条件分布を学習し,これらのモデルが大規模かつ多種多様な受動的データを用いて訓練可能であることを示す。ロボット操作における視覚目標条件ポリシー学習中に,目標サンプル分布として振る舞うことによって探索を誘導するために,訓練されたアプライアンスモデルをどのように利用できるかを示す。

Motivated by the intuitive understanding humans have about the space of possible interactions, and the ease with which they can generalize this understanding to previously unseen scenes, we develop an approach for learning visual affordances for guiding robot exploration. Given an input image of a scene, we infer a distribution over plausible future states that can be achieved via interactions with it. We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE and show that these models can be trained using large-scale and diverse passive data, and that the learned models exhibit compositional generalization to diverse objects beyond the training distribution. We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.

翻訳日:2023-05-30 16:47:48 公開日:2023-05-28

# RASR2:RWTH ASR Toolkit for Generic Sequence-to-Sequence Speech Recognition

RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition ( http://arxiv.org/abs/2305.17782v1 )

ライセンス: Link先を確認

Wei Zhou, Eugen Beck, Simon Berger, Ralf Schl\"uter, Hermann Ney

(参考訳) 現代のパブリックASRツールは、様々なシーケンス・ツー・シーケンス(S2S)モデルをトレーニングするためのリッチなサポートを提供するが、むしろオープン語彙シナリオのみをデコードするための単純なサポートを提供する。クローズドボキャブラリのシナリオでは、語彙制約付きデコードをサポートする公開ツールは、通常、古典的なASRのみに限られる。モデリングユニットの選択などの研究の可能性に関するこの制限を排除するため、本研究では、c++で実装された研究指向ジェネリックs2sデコーダであるrasr2を紹介する。さまざまなS2Sモデル、言語モデル、ラベル単位/トポロジ、ニューラルネットワークアーキテクチャに対して、強力な柔軟性/互換性を提供する。オープンおよびクローズドボキャブラリーの両方のシナリオに対して,検索モードと設定が豊富な汎用検索フレームワークに基づく,効率的なデコーディングを提供する。 RASR2をスイッチボードとLibrispeech corporaの両方で幅広い実験により評価した。ソースコードはオンラインで公開されている。

Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only. For closed-vocabulary scenarios, public tools supporting lexical-constrained decoding are usually only for classical ASR, or do not support all S2S models. To eliminate this restriction on research possibilities such as modeling unit choice, we present RASR2 in this work, a research-oriented generic S2S decoder implemented in C++. It offers a strong flexibility/compatibility for various S2S models, language models, label units/topologies and neural network architectures. It provides efficient decoding for both open- and closed-vocabulary scenarios based on a generalized search framework with rich support for different search modes and settings. We evaluate RASR2 with a wide range of experiments on both switchboard and Librispeech corpora. Our source code is public online.

翻訳日:2023-05-30 16:47:33 公開日:2023-05-28

# 計画的概要再配置のためのEDU抽出液の生成

Generating EDU Extracts for Plan-Guided Summary Re-Ranking ( http://arxiv.org/abs/2305.17779v1 )

ライセンス: Link先を確認

Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Kathleen McKeown, No\'emie Elhadad

(参考訳) 要約候補を生成して1つの要約を返す2段階のアプローチでは、標準的な単一ステップアプローチよりもROUGEスコアを改善することができる。しかし、標準的な復号法(ビーム探索、核サンプリング、多種多様なビーム探索)は、冗長でしばしば低品質なコンテンツの候補を生成する。本稿では,これらの問題に対処する候補を生成する新しい手法を設計する。それぞれの候補を独自のコンテンツプランで抽象化し、モデルのトップビームを用いて個別の計画誘導抽象を生成する。より具体的には、標準言語モデル(BART LM)が抽出コピー機構を備えた要素談話単位(EDU)コンテンツプランを自動回帰生成する。次に、コンテンツプランジェネレータからの上位kビームを使用して、個別のlmをガイドし、各個別のプランに対して単一の抽象的候補を生成する。提案手法から生成した抽象的候補とベースライン復号法に,既存のリランカ(BRIO)を適用した。 CNN/Dailymail,NYT,Xsumでは,ROUGE-2 F1が0.88,2.01,0.38,それぞれ上昇した。 CNN/DMの人間による評価は、これらの結果を検証する。同様に、CNN/DMの1kサンプルでは、GPT-3 を EDU に追従させると、サンプリングベース法を 1.05 ROUGE-2 F1 点で上回った。計画の生成と実現のためのコードはhttps://github.com/griff4692/edu-sumで公開されている。

Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach. Yet, standard decoding methods (i.e., beam search, nucleus sampling, and diverse beam search) produce candidates with redundant, and often low quality, content. In this paper, we design a novel method to generate candidates for re-ranking that addresses these issues. We ground each candidate abstract on its own unique content plan and generate distinct plan-guided abstracts using a model's top beam. More concretely, a standard language model (a BART LM) auto-regressively generates elemental discourse unit (EDU) content plans with an extractive copy mechanism. The top K beams from the content plan generator are then used to guide a separate LM, which produces a single abstractive candidate for each distinct plan. We apply an existing re-ranker (BRIO) to abstractive candidates generated from our method, as well as baseline decoding methods. We show large relevance improvements over previously published methods on widely used single document news article corpora, with ROUGE-2 F1 gains of 0.88, 2.01, and 0.38 on CNN / Dailymail, NYT, and Xsum, respectively. A human evaluation on CNN / DM validates these results. Similarly, on 1k samples from CNN / DM, we show that prompting GPT-3 to follow EDU plans outperforms sampling-based methods by 1.05 ROUGE-2 F1 points. Code to generate and realize plans is available at https://github.com/griff4692/edu-sum.

翻訳日:2023-05-30 16:47:13 公開日:2023-05-28

# ブロックチェーンにおける地理空間分布の解析

Analyzing Geospatial Distribution in Blockchains ( http://arxiv.org/abs/2305.17771v1 )

ライセンス: Link先を確認

Shashank Motepalli and Hans-Arno Jacobsen

(参考訳) ブロックチェーンは分散化されている。我々は、しばしば見過ごされているが定量化可能な次元である、トランザクション処理の地理空間分布を分析する。ブロックチェーンは、地理的に分散したトランザクション処理の可能性をもたらす。彼らは地理的に離れた場所のバリケータがコンセンサスプロトコルに参加することを可能にする。我々の観測に基づいて、実際には、ほとんどのバリデータはしばしば地理的に近接して集中している。さらに,少数の検証者が性能要件を満たさない傾向にあり,しばしばクラッシュ障害と誤認される。その結果、投獄(バリデータセットからの削除)および/またはスラッシュ(ネイティブトークンの罰金)によって罰せられる。我々のエミュレーションは, 制御条件下でも同様の結果を示し, バリスタの地理空間集中化の可能性について深刻な懸念を提起する。そこで我々は,コンセンサスプロトコルと容易に統合可能なソリューションを開発し,その有効性を実証した。

Blockchains are decentralized; are they genuinely? We analyze blockchain decentralization's often-overlooked but quantifiable dimension: geospatial distribution of transaction processing. Blockchains bring with them the potential for geospatially distributed transaction processing. They enable validators from geospatially distant locations to partake in consensus protocols; we refer to them as minority validators. Based on our observations, in practice, most validators are often geographically concentrated in close proximity. Furthermore, we observed that minority validators tend not to meet the performance requirements, often misidentified as crash failures. Consequently, they are subject to punishment by jailing (removal from the validator set) and/or slashing (penalty in native tokens). Our emulations, under controlled conditions, demonstrate the same results, raising serious concerns about the potential for the geospatial centralization of validators. To address this, we developed a solution that easily integrates with consensus protocols, and we demonstrated its effectiveness.

翻訳日:2023-05-30 16:46:45 公開日:2023-05-28

# ポイントPC:因果推論による事前知識によるポイントクラウド補完

Point-PC: Point Cloud Completion Guided by Prior Knowledge via Causal Inference ( http://arxiv.org/abs/2305.17770v1 )

ライセンス: Link先を確認

Weizhi Nie, Chuanqi Jiao, Ruidong Chen, Weijie Wang, Bruno Lepri, Nicu Sebe and Anan Liu

(参考訳) ポイント・クラウド・コンプリート(point cloud completion)は、閉塞と視野角の制限による部分的観察からスキャナーが捉えた生のポイント・クラウドを回復することを目的としている。多くのアプローチでは、部分的な入力から学習した大域的特徴によって、欠落部分を直接予測する部分完全パラダイムを採用している。これにより、グローバル機能が欠落している部分の完全な詳細を捉えられないため、詳細を復元することが難しくなる。本稿では,記憶ネットワークを用いて形状先行を検索し,欠落した形状情報を追加の幾何情報として選択する効果的な因果推論モデルを設計し,ポイントクラウド補完を支援するpoint-pcという新しい手法を提案する。具体的には,完全な形状特徴と対応する形状を ``key-value''' ペアの形式で格納するメモリ操作機構を提案する。部分入力から類似した形状を取り出すために,不完全形状の特徴を完全形状特徴の領域に伝達するために,コントラスト学習に基づく事前学習手法を適用する。さらに,部分的な入力と同じ意味構造を持つ,前もって形状の一部であった共同創設者を排除するためにバックドア調整を用いる。 ShapeNet-55、PCN、KITTIデータセットの実験結果から、Point-PCは最先端の手法に対して良好に動作することが示された。

Point cloud completion aims to recover raw point clouds captured by scanners from partial observations caused by occlusion and limited view angles. Many approaches utilize a partial-complete paradigm in which missing parts are directly predicted by a global feature learned from partial inputs. This makes it hard to recover details because the global feature is unlikely to capture the full details of all missing parts. In this paper, we propose a novel approach to point cloud completion called Point-PC, which uses a memory network to retrieve shape priors and designs an effective causal inference model to choose missing shape information as additional geometric information to aid point cloud completion. Specifically, we propose a memory operating mechanism where the complete shape features and the corresponding shapes are stored in the form of ``key-value'' pairs. To retrieve similar shapes from the partial input, we also apply a contrastive learning-based pre-training scheme to transfer features of incomplete shapes into the domain of complete shape features. Moreover, we use backdoor adjustment to get rid of the confounder, which is a part of the shape prior that has the same semantic structure as the partial input. Experimental results on the ShapeNet-55, PCN, and KITTI datasets demonstrate that Point-PC performs favorably against the state-of-the-art methods.

翻訳日:2023-05-30 16:46:30 公開日:2023-05-28

# AIMS:オールインクルーシブなマルチレベルセグメンテーション

AIMS: All-Inclusive Multi-Level Segmentation ( http://arxiv.org/abs/2305.17768v1 )

ライセンス: Link先を確認

Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang

(参考訳) 正確な視覚エンティティセグメンテーションのための画像セグメンテーションの進展にもかかわらず、異なるレベルの興味のある領域選択のための画像編集アプリケーションの多様な要件は未解決のままである。本稿では,視覚領域をパート,エンティティ,リレーション(意味的関係を持つ2つのエンティティ)の3つのレベルに分割する新しいタスク,All-Inclusive Multi-Level Segmentation (AIMS)を提案する。また,マルチデータセットによるマルチタスクトレーニングによる統一aimモデルを構築し,アノテーションの不整合とタスク相関の2つの大きな課題に対処した。具体的には,3段階予測のためのタスク補完性,アソシエーション,プロンプトマスクエンコーダを提案する。本手法の有効性と一般化能力は, 単一データセット上の他の最先端手法や, セグメンテーションにおける並列処理と比較して実証された。コードとトレーニングモデルを一般公開する予定です。

Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved. In this paper, we propose a new task, All-Inclusive Multi-Level Segmentation (AIMS), which segments visual regions into three levels: part, entity, and relation (two entities with some semantic relationships). We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation. Specifically, we propose task complementarity, association, and prompt mask encoder for three-level predictions. Extensive experiments demonstrate the effectiveness and generalization capacity of our method compared to other state-of-the-art methods on a single dataset or the concurrent work on segmenting anything. We will make our code and training model publicly available.

翻訳日:2023-05-30 16:46:06 公開日:2023-05-28

# NeurOCS: モノクロ3次元物体定位のためのニューラルNOCSスーパービジョン

NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization ( http://arxiv.org/abs/2305.17763v1 )

ライセンス: Link先を確認

Zhixiang Min, Bingbing Zhuang, Samuel Schulter, Buyu Liu, Enrique Dunn, Manmohan Chandraker

(参考訳) 運転シーンにおけるモノキュラーな3dオブジェクトのローカライゼーションは重要な課題だが、その不適切な性質のために難しい。物体表面上の各画素の3d座標の推定は、pnp問題に対して密度の高い2d-3d幾何制約を提供するため、大きなポテンシャルを持つ。しかし,リダデータの多彩さや多種多様なアーティファクトによる運転シーンや,インスタンスCADモデル収集の実用性などにより,高品質な地上真実監視は利用できない。本研究では,3次元物体の形状を識別可能なレンダリングにより学習するための入力として,インスタンスマスクと3次元ボックスを用いたNeurOCSを提案する。私たちのアプローチは、実際の運転シーンから直接、カテゴリレベルの形状を学習する上での洞察にかかっています。さらに,オブジェクト中心の視点からオブジェクト座標をより効果的に学習するための重要な設計選択について検討する。また,本フレームワークは,KITTI-Objectベンチマークで1位にランクインしたモノキュラー3Dローカライゼーションの新たな最先端化につながる。

Monocular 3D object localization in driving scenes is a crucial task, but challenging due to its ill-posed nature. Estimating 3D coordinates for each pixel on the object surface holds great potential as it provides dense 2D-3D geometric constraints for the underlying PnP problem. However, high-quality ground truth supervision is not available in driving scenes due to sparsity and various artifacts of Lidar data, as well as the practical infeasibility of collecting per-instance CAD models. In this work, we present NeurOCS, a framework that uses instance masks and 3D boxes as input to learn 3D object shapes by means of differentiable rendering, which further serves as supervision for learning dense object coordinates. Our approach rests on insights in learning a category-level shape prior directly from real driving scenes, while properly handling single-view ambiguities. Furthermore, we study and make critical design choices to learn object coordinates more effectively from an object-centric view. Altogether, our framework leads to new state-of-the-art in monocular 3D localization that ranks 1st on the KITTI-Object benchmark among published monocular methods.

翻訳日:2023-05-30 16:45:51 公開日:2023-05-28

# 機密コンピューティングに向けて - ビッグデータ分析とaiのためのセキュアなクラウドアーキテクチャ

Towards Confidential Computing: A Secure Cloud Architecture for Big Data Analytics and AI ( http://arxiv.org/abs/2305.17761v1 )

ライセンス: Link先を確認

Naweiluo Zhou, Florent Dufour, Vinzent Bode, Peter Zinterhof, Nicolay J Hammer, Dieter Kranzlm\"uller

(参考訳) クラウドコンピューティングは、需要に基づいてコンピュータリソースをコスト効率よく供給する。そのため、さまざまな領域科学で広く採用されているビッグデータ分析や人工知能の有効なソリューションとなっている。バイオメディカルリサーチのような特定の分野におけるデータセキュリティは、ワークフローをクラウドに移行する際の大きな関心事である。セキュアなクラウドアーキテクチャを提示し、データ、ロジック、計算をトランジット、使用時、および停止時に安全に保ちながら、ワークフローのパッケージングとスケジューリングを可能にする方法について説明する。

Cloud computing provisions computer resources at a cost-effective way based on demand. Therefore it has become a viable solution for big data analytics and artificial intelligence which have been widely adopted in various domain science. Data security in certain fields such as biomedical research remains a major concern when moving their workflows to cloud, because cloud environments are generally outsourced which are more exposed to risks. We present a secure cloud architecture and describes how it enables workflow packaging and scheduling while keeping its data, logic and computation secure in transit, in use and at rest.

翻訳日:2023-05-30 16:45:28 公開日:2023-05-28

# 言語モデルは実用的な話者です

Language Models are Pragmatic Speakers ( http://arxiv.org/abs/2305.17760v1 )

ライセンス: Link先を確認

Khanh Nguyen

(参考訳) 言語モデルはどのように考えるのか? 本稿では,言語モデルの多変量操作を特徴付ける有界プラグマティック話者と呼ばれる確率論的認知モデルを定式化する。特に,人間のフィードバックから強化学習を施した大規模言語モデル(Ouyang et al., 2022)では,概念的には高速・低速モデルに類似した思考モデルが実装されている(Kahneman, 2011)。本稿では,人間フィードバックからの強化学習の限界を思考の素早いモデルとして議論し,この枠組みを拡張するための方向性を提案する。全体として、我々の研究は、認知確率論的モデリングのレンズを通して言語モデルを見ることが、言語モデルを理解し、評価し、開発するための貴重な洞察を提供することを示した。

How do language models "think"? This paper formulates a probabilistic cognitive model called bounded pragmatic speaker, which can characterize the operation of different variants of language models. In particular, we show that large language models fine-tuned with reinforcement learning from human feedback (Ouyang et al., 2022) implements a model of thought that conceptually resembles a fast-and-slow model (Kahneman, 2011). We discuss the limitations of reinforcement learning from human feedback as a fast-and-slow model of thought and propose directions for extending this framework. Overall, our work demonstrates that viewing language models through the lens of cognitive probabilistic modeling can offer valuable insights for understanding, evaluating, and developing them.

翻訳日:2023-05-30 16:45:18 公開日:2023-05-28

# tab-cot: 思考のゼロショットタブチェーン

Tab-CoT: Zero-shot Tabular Chain of Thought ( http://arxiv.org/abs/2305.17812v1 )

ライセンス: Link先を確認

Ziqi Jin and Wei Lu

(参考訳) 様々な自然言語処理(NLP)タスクにおいて、基礎となる複雑な推論プロセスを公開する能力により、チェーン・オブ・ファインメント(CoT)のプロンプト手法が成功した。このような推論プロセスは通常、暗黙的に構造化されたステップを示す。近年、より明確に構造化された推論手順を取り込むように促す方法を調査し始めた。本研究では,複雑な推論処理を高度に構造化された方法で明示的にモデル化できる新しい表形式CoTプロンプトであるTab-CoTを提案する。その単純さにもかかわらず、我々のアプローチは複数の次元(行と列の両方)にわたる推論を行うことができることを示す。我々は、様々な推論タスクに関する広範な実験を通じて、アプローチの強いゼロショットと少数ショットの能力を実証する。

The chain-of-though (CoT) prompting methods were successful in various natural language processing (NLP) tasks thanks to their ability to unveil the underlying complex reasoning processes. Such reasoning processes typically exhibit implicitly structured steps. Recent efforts also started investigating methods to encourage more explicitly structured reasoning procedures to be captured. In this work, we propose Tab-CoT, a novel tabular-format CoT prompting method, which allows the complex reasoning process to be explicitly modelled in a highly structured manner. Despite its simplicity, we show that our approach is capable of performing reasoning across multiple dimensions (i.e., both rows and columns). We demonstrate our approach's strong zero-shot and few-shot capabilities through extensive experiments on a range of reasoning tasks.

翻訳日:2023-05-30 16:38:48 公開日:2023-05-28

# 絡み合い測度について:離散位相空間とインバータ連鎖的視点

On Entanglement Measures: Discrete Phase Space and Inverter-Chain Link Viewpoint ( http://arxiv.org/abs/2305.17806v1 )

ライセンス: Link先を確認

Felix A. Buot

(参考訳) 文献における抽象統計解析とは対照的に, エンタングルメント解析と測定の具体的物理ダイアグラムモデルとその基礎となる離散位相空間物理学について述べる。本論文は, この複雑な絡み合い対策の教育的治療として機能する。我々は,エンタングル量子ビットの固有帰納特性と,その創発的量子ビット挙動について考察する。離散位相空間の観点から、共役は、絡み合いの定量的測度において、絡み合う二分系の変換対称性に翻訳される。焦点はバイパーティイトシステムであるが、物理インバータチェーンリンクモデルから容易に導出できるように、この概念はキュービットのマルチパートシステムに容易に拡張可能である。任意の多成分量子ビット系における形成の絡み合いの図式解析

In contrast to abstract statistical analyses in the literature, we present a concrete physical diagrammatic model of entanglement characterization and measure with its underlying discrete phase-space physics. This paper serves as a pedagogical treatment of this complex subject of entanglement measures. We review the important inherent concurrence property of entangled qubits, as well as underscore its emergent qubit behavior. From the discrete phase space point of view, concurrence translates to translation symmetry of entangled binary systems in some quantitative measure of entanglement. Although the focus is on bipartite system, the notion is readily extendable to multi-partite system of qubits, as can easily be deduced from the physical inverter-chain link model. A diagrammatic analysis of the entanglement of formation for any multi-partite qubit system is given

翻訳日:2023-05-30 16:38:36 公開日:2023-05-28

# シングルプレイヤー不完全リコールゲームの計算複雑性

The Computational Complexity of Single-Player Imperfect-Recall Games ( http://arxiv.org/abs/2305.17805v1 )

ライセンス: Link先を確認

Emanuel Tewolde, Caspar Oesterheld, Vincent Conitzer, Paul W. Goldberg

(参考訳) 睡眠美観問題や欠席ドライバゲームなど,不完全なリコールを伴うシングルプレイヤーの広範型ゲームについて検討した。そのようなゲームに対して、2つの自然な平衡概念が、最適解の代替概念として提案されている。 1つの平衡概念は、一般化二重半減法(gdh)を信念体系と証拠決定理論(edt)とし、もう1つは一般化三分法(gt)を信念体系と因果決定理論(cdt)として用いる。本研究は,多項式最大化問題の解概念である大域最適点,変数の部分集合に対する最適点,KKT(Karush-Kuhn-Tucker)点の3つの解概念について考察した。これらの対応に基づいて,これらの戦略の計算に関する様々な複雑性理論的な疑問を解決できる。元アンティー最適性と(EDT,GDH)-平衡についてはNP硬度と不適応性を求め,(CDT,GT)-平衡についてはCLS完全性を求める。

We study single-player extensive-form games with imperfect recall, such as the Sleeping Beauty problem or the Absentminded Driver game. For such games, two natural equilibrium concepts have been proposed as alternative solution concepts to ex-ante optimality. One equilibrium concept uses generalized double halving (GDH) as a belief system and evidential decision theory (EDT), and another one uses generalized thirding (GT) as a belief system and causal decision theory (CDT). Our findings relate those three solution concepts of a game to solution concepts of a polynomial maximization problem: global optima, optimal points with respect to subsets of variables and Karush-Kuhn-Tucker (KKT) points. Based on these correspondences, we are able to settle various complexity-theoretic questions on the computation of such strategies. For ex-ante optimality and (EDT,GDH)-equilibria, we obtain NP-hardness and inapproximability, and for (CDT,GT)-equilibria we obtain CLS-completeness results.

翻訳日:2023-05-30 16:38:25 公開日:2023-05-28

# ターゲットデータ生成:モデルの弱点の発見と修正

Targeted Data Generation: Finding and Fixing Model Weaknesses ( http://arxiv.org/abs/2305.17804v1 )

ライセンス: Link先を確認

Zexue He, Marco Tulio Ribeiro, Fereshte Khani

(参考訳) 集約精度が高い場合でも、最先端のNLPモデルはデータの特定のサブグループで体系的に失敗し、不公平な結果とユーザ信頼を損なう。新たなデータ収集は、これらの弱点に対処する助けにならない可能性がある。本稿では,課題のあるサブグループを自動的に識別し,ループ内に人間が参加する大規模言語モデル(llms)を用いて,それらのサブグループに対して新たなデータを生成するフレームワークであるtarget data generation(tdg)を提案する。 TDGは、各サブグループに対するデータ拡張の期待される利益と潜在的な害を推定し、全体的なパフォーマンスを損なうことなく、グループパフォーマンス内で最も改善する可能性のあるものを選択する。実験では、TDGは、最先端の感情分析と自然言語推論モデルのための挑戦的なサブグループの精度を大幅に向上するとともに、全体のテスト精度も向上する。

Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data, resulting in unfair outcomes and eroding user trust. Additional data collection may not help in addressing these weaknesses, as such challenging subgroups may be unknown to users, and underrepresented in the existing and new data. We propose Targeted Data Generation (TDG), a framework that automatically identifies challenging subgroups, and generates new data for those subgroups using large language models (LLMs) with a human in the loop. TDG estimates the expected benefit and potential harm of data augmentation for each subgroup, and selects the ones most likely to improve within group performance without hurting overall performance. In our experiments, TDG significantly improves the accuracy on challenging subgroups for state-of-the-art sentiment analysis and natural language inference models, while also improving overall test accuracy.

翻訳日:2023-05-30 16:38:02 公開日:2023-05-28

# 有限時間および弱過程の断熱性への普遍的ショートカット

Universal shortcuts to adiabaticity of finite-time and weak processes ( http://arxiv.org/abs/2305.17802v1 )

ライセンス: Link先を確認

Pierre Naz\'e

(参考訳) 有限時間および弱過程の切替時間に対する断熱性への近道に関する解析式を提示する。弱いプロセスの最適プロトコルの普遍解に基づいており、そこでは待ち時間の概念を用いて断熱的プロセスの拡張が行われた。そのようなショートカットを見つけるために、振動緩和関数の典型例と横場量子イジング鎖の2つの例が解かれる。最後に、量子アニーリングにおけるこれらのショートカットの適用可能性の限界に関する議論が行われる。

The analytical expression for shortcuts to adiabaticity for any switching time of finite-time and weak processes is presented. It is based on the universal solution of the optimal protocols of weak processes, where the extension to adiabatic processes was made by means of the concept of waiting time. Two examples are solved in order to find such shortcuts: the typical case of oscillatory relaxation function and the transverse-field quantum Ising chain. In the end, a discussion about the limitations of the applicability of these shortcuts in quantum annealing is made.

翻訳日:2023-05-30 16:37:45 公開日:2023-05-28

# T2FNorm:OOD検出のための超簡易な列車時特徴正規化

T2FNorm: Extremely Simple Scaled Train-time Feature Normalization for OOD Detection ( http://arxiv.org/abs/2305.17797v1 )

ライセンス: Link先を確認

Sudarshan Regmi, Bibek Panthi, Sakar Dotel, Prashnna K. Gyawali, Danail Stoynov, Binod Bhattarai

(参考訳) ニューラルネットワークは、自信過剰な予測者として有名であり、現実世界のアプリケーションにおける安全なデプロイメントにとって大きな課題となっている。機能正規化は深層学習の文献で注目されているが、現在の列車時間正規化手法であるOut-of-Distribution(OOD)検出は、この可能性を十分に活用していない。実際、ニューラルネットワークにおける特徴正規化のナイーブな組み込みは、ood検出性能の改善を保証しない。本研究では,OODスコーリングの目的に非変換空間を用いながら,特徴を正規化を通じて超球面空間に変換するニューラルネットワークのトレーニング手法であるT2FNormを紹介する。 In-distribution(ID)におけるモデル精度を損なうことなく,OOD検出能力を驚くほど向上させる。本研究は,提案手法がすべてのサンプルの特徴の規範を実質的に減少させることを実証するものである。提案手法は, ポストホックOOD検出法を大幅に改善する。

Neural networks are notorious for being overconfident predictors, posing a significant challenge to their safe deployment in real-world applications. While feature normalization has garnered considerable attention within the deep learning literature, current train-time regularization methods for Out-of-Distribution(OOD) detection are yet to fully exploit this potential. Indeed, the naive incorporation of feature normalization within neural networks does not guarantee an improvement in OOD detection performance. In this work, we introduce T2FNorm, a novel approach to training neural networks that transforms features to hyperspherical space through normalization, while employing non-transformed space for OOD-scoring purposes. This method yields a surprising enhancement in OOD detection capabilities without compromising model accuracy in in-distribution(ID). Our investigation demonstrates that the proposed technique substantially diminishes the norm of the features of all samples, more so in the case of out-of-distribution samples, thereby addressing the prevalent concern of overconfidence in neural networks. The proposed method also significantly improves various post-hoc OOD detection methods.

翻訳日:2023-05-30 16:37:37 公開日:2023-05-28

# コーン・シャム計算と密度汎関数論の双変量観

Kohn-Sham computation and the bivariate view of density functional theory ( http://arxiv.org/abs/2305.17795v1 )

ライセンス: Link先を確認

Paul E. Lammert

(参考訳) KSマシンと呼ばれるコーン・シャム計算の抽象化により、密度汎関数論の数学的側面に基づいて関数解析的視点が発達する。この機械の自然な意味論は二変量であり、基底密度と対になるポテンシャルの列からなる。 ksマシンがいつ解(ポテンシャル成分が指定された目標に一致する)に収束できるかという問題はここでは解決されないが、関連するものがいくつかある。例えば、マシンはソリューションに向かって前進できるのか? エネルギー的な意味では、おそらく例外的な状況を避けるが、通常の密度混合ではなくポテンシャル混合方式を用いる。近接解のエネルギー的および関数的空間距離の概念は相容れないか? はい、かなりの程度です。もし一連の接地対のポテンシャル成分が目標密度に収束した場合、その密度成分は接地密度に集合するだろうか? はい、無限に漂う粒子番号をバリングします。

Informed by an abstraction of Kohn-Sham computation called a KS machine, a functional analytic perspective is developed on mathematical aspects of density functional theory. A natural semantics for the machine is bivariate, consisting of a sequence of potentials paired with a ground density. Although the question of when the KS machine can converge to a solution (where the potential component matches a designated target) is not resolved here, a number of related ones are. For instance: Can the machine progress toward a solution? Barring presumably exceptional circumstances, yes in an energetic sense, but using a potential-mixing scheme rather than the usual density-mixing variety. Are energetic and function space distance notions of proximity-to-solution commensurate? Yes, to a significant degree. If the potential components of a sequence of ground pairs converges to a target density, do the density components cluster on ground densities thereof? Yes, barring particle number drifting to infinity.

翻訳日:2023-05-30 16:37:18 公開日:2023-05-28

# LowDINO - 低パラメータ自己監督型学習モデル

LowDINO -- A Low Parameter Self Supervised Learning Model ( http://arxiv.org/abs/2305.17791v1 )

ライセンス: Link先を確認

Sai Krishna Prathapaneni, Shvejan Shashank and Srikar Reddy K

(参考訳) 本研究は,画像分類やセグメンテーションなどの下流タスクにおいて,ssl(self-supervised learning)が成功していることを示す巨大ネットワークの特性を,小ネットワークが適用可能なニューラルネットワークアーキテクチャを設計する可能性を検討することを目的とする。従来の研究では、畳み込みニューラルネットワーク(ConvNets)を使用することで、深層学習モデルにおける表現の学習に欠かせない、固有の帰納バイアスが得られることが示されている。パラメータ数を減らすために、mobilevitブロックの使用によって注意メカニズムが利用され、結果として500万未満のパラメータを持つモデルが生成される。このモデルは、運動量エンコーダを用いた自己蒸留を用いて訓練され、教師の重み付けでは、最近のSOTA SSLモデルから視覚変換器(ViT)を使用する。モデルはImageNet1kデータセットでトレーニングされる。この研究は、重モデルに匹敵するSSLタスクを実行できる、より小さく、より効率的なニューラルネットワークアーキテクチャを設計するためのアプローチを提供する。

This research aims to explore the possibility of designing a neural network architecture that allows for small networks to adopt the properties of huge networks, which have shown success in self-supervised learning (SSL), for all the downstream tasks like image classification, segmentation, etc. Previous studies have shown that using convolutional neural networks (ConvNets) can provide inherent inductive bias, which is crucial for learning representations in deep learning models. To reduce the number of parameters, attention mechanisms are utilized through the usage of MobileViT blocks, resulting in a model with less than 5 million parameters. The model is trained using self-distillation with momentum encoder and a student-teacher architecture is also employed, where the teacher weights use vision transformers (ViTs) from recent SOTA SSL models. The model is trained on the ImageNet1k dataset. This research provides an approach for designing smaller, more efficient neural network architectures that can perform SSL tasks comparable to heavy models

翻訳日:2023-05-30 16:37:02 公開日:2023-05-28

# リアルタイムオブジェクト検出:PyTorchにおけるYOLOv1再実装

Real-time Object Detection: YOLOv1 Re-Implementation in PyTorch ( http://arxiv.org/abs/2305.17786v1 )

ライセンス: Link先を確認

Michael Shenoda

(参考訳) リアルタイムオブジェクト検出は、検出に基づく適切な判断をタイムリーに行う必要があるコンピュータビジョンシステムにおいて、解決すべき重要な問題である。私は、PyTorchフレームワークを使って実装するためにYOLO v1アーキテクチャを選択しました。最後に、私の実装のメトリクスとオリジナルのメトリクスを比較します。

Real-time object detection is a crucial problem to solve when in comes to computer vision systems that needs to make appropriate decision based on detection in a timely manner. I have chosen the YOLO v1 architecture to implement it using PyTorch framework, with goal to familiarize with entire object detection pipeline I attempted different techniques to modify the original architecture to improve the results. Finally, I compare the metrics of my implementation to the original.

翻訳日:2023-05-30 16:36:44 公開日:2023-05-28

# YOLOv5に基づく照明・回転不変リアルタイム車いす検出装置

Lighting and Rotation Invariant Real-time Vehicle Wheel Detector based on YOLOv5 ( http://arxiv.org/abs/2305.17785v1 )

ライセンス: Link先を確認

Michael Shenoda

(参考訳) コンピュータビジョンにおけるオブジェクト検出器の作成は、最初は畳み込みニューラルネットワーク(CNN)アーキテクチャに基づいて開発されたとき、いくつかの共通の課題がある。これらの課題は、様々なカメラの向き、照明条件、環境変化によって捉えられた画像に適応する必要があるモデルを作成するときにより明らかである。これらの条件をすべてカバーする最初のトレーニングサンプルが利用可能であることは、時間とコストのかかる大きな課題である。あらゆるタイプのオブジェクト検出を作成する場合、問題は存在するが、いくつかの型は一般的ではなく、公開されているラベル付きイメージデータセットを持たない。公開データセットは、まれなオブジェクトタイプに対して信頼性や包括性がない場合もあります。車いすは、YOLOv5アーキテクチャに基づいた光と回転不変のリアルタイム検出器のアプローチを示すために選ばれた例の1つである。目的は、他のタイプのリアルタイムオブジェクト検出器の開発にリファレンスとして使用できるシンプルなアプローチを提供することである。

Creating an object detector, in computer vision, has some common challenges when initially developed based on Convolutional Neural Network (CNN) architecture. These challenges are more apparent when creating model that needs to adapt to images captured by various camera orientations, lighting conditions, and environmental changes. The availability of the initial training samples to cover all these conditions can be an enormous challenge with a time and cost burden. While the problem can exist when creating any type of object detection, some types are less common and have no pre-labeled image datasets that exists publicly. Sometime public datasets are not reliable nor comprehensive for a rare object type. Vehicle wheel is one of those example that been chosen to demonstrate the approach of creating a lighting and rotation invariant real-time detector based on YOLOv5 architecture. The objective is to provide a simple approach that could be used as a reference for developing other types of real-time object detectors.

翻訳日:2023-05-30 16:36:37 公開日:2023-05-28

# 単一物体ポーズ追跡のための反仮説粒子フィルタ

Counter-Hypothetical Particle Filters for Single Object Pose Tracking ( http://arxiv.org/abs/2305.17828v1 )

ライセンス: Link先を確認

Elizabeth A. Olson, Jana Pavlasek, Jasmine A. Berry, Odest Chadwicke Jenkins

(参考訳) 粒子フィルタリングは、6自由度(6D)のポーズ推定のための一般的な手法である。しかし, 粒子フィルタは6次元ポーズの高次元特性のため, 粒子の除去が困難である。粒子欠落が発生すると、重要サンプリング中の信念分布のモード崩壊を引き起こす可能性がある。真の状態を取り巻く領域がモード崩壊に苦しむ場合、その領域は粒子によって形成された確率質量では表現されないため、その信念の回復は困難である。以前の方法は、信念分布における粒子のランダム化と再設定によってこの問題を軽減するが、再活性化の頻度の決定は、ハンドチューニングされた抽象ヒューリスティックに依存している。本稿では,各段階における必要な再活性化速度を,標準確率と平行して用いられる対数-補足的確率関数を導入することによって推定する。 Evidential Reasoning の可算性と不確実性の概念にインスパイアされた我々の反補足的可能性関数の追加は、各粒子に疑念のレベルを割り当てる。粒子集合全体の信頼度と疑念の競合累積値は、再活性化する粒子の部分を決定するためにフィルタ内の故障のレベルを推定するために用いられる。剛体物体6次元ポーズトラッキングタスクにおける本手法の有効性を実証する。

Particle filtering is a common technique for six degree of freedom (6D) pose estimation due to its ability to tractably represent belief over object pose. However, the particle filter is prone to particle deprivation due to the high-dimensional nature of 6D pose. When particle deprivation occurs, it can cause mode collapse of the underlying belief distribution during importance sampling. If the region surrounding the true state suffers from mode collapse, recovering its belief is challenging since the area is no longer represented in the probability mass formed by the particles. Previous methods mitigate this problem by randomizing and resetting particles in the belief distribution, but determining the frequency of reinvigoration has relied on hand-tuning abstract heuristics. In this paper, we estimate the necessary reinvigoration rate at each time step by introducing a Counter-Hypothetical likelihood function, which is used alongside the standard likelihood. Inspired by the notions of plausibility and implausibility from Evidential Reasoning, the addition of our Counter-Hypothetical likelihood function assigns a level of doubt to each particle. The competing cumulative values of confidence and doubt across the particle set are used to estimate the level of failure within the filter, in order to determine the portion of particles to be reinvigorated. We demonstrate the effectiveness of our method on the rigid body object 6D pose tracking task.

翻訳日:2023-05-30 16:29:12 公開日:2023-05-28

# NOTABLE: プロンプトベースNLPモデルに対するトランスファー可能なバックドア攻撃

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models ( http://arxiv.org/abs/2305.17826v1 )

ライセンス: Link先を確認

Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma

(参考訳) プロンプトベースの学習は、バックドア攻撃に弱い。プロンプトベースのモデルに対する既存のバックドア攻撃は、埋め込み層全体や単語埋め込みベクターにバックドアを注入することを検討する。このような攻撃は、下流タスクの再トレーニングや異なるプロンプト戦略によって容易に影響を受け、バックドア攻撃の転送可能性を制限することができる。そこで本研究では,ダウンストリームタスクやプロンプト戦略とは独立したプロンプトベースモデルに対する転送可能なバックドア攻撃を提案する。具体的には、適応型動詞化器を用いて特定の単語(例えばアンカー)にトリガーをバインドすることで、plmのエンコーダにバックドアを注入する。インプットにトリガーを貼り付け、敵に望まれるアンカーに到達し、下流タスクから独立し、戦略を促すことでバックドアを起動する。我々は,6つのNLPタスク,3つの人気モデル,および3つのプロンプト戦略の実験を行った。実験の結果、NOTABLEは優れた攻撃性能(すなわち、すべてのデータセットで90%以上の攻撃成功率)を達成し、2つの最先端ベースラインを上回ります。 3つの防衛策の評価は、NOTABLEの堅牢性を示している。私たちのコードはhttps://github.com/RU-System-Software-and-Security/Notableにある。

Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks. In this work, we propose transferable backdoor attacks against prompt-based models, called NOTABLE, which is independent of downstream tasks and prompting strategies. Specifically, NOTABLE injects backdoors into the encoders of PLMs by utilizing an adaptive verbalizer to bind triggers to specific words (i.e., anchors). It activates the backdoor by pasting input with triggers to reach adversary-desired anchors, achieving independence from downstream tasks and prompting strategies. We conduct experiments on six NLP tasks, three popular models, and three prompting strategies. Empirical results show that NOTABLE achieves superior attack performance (i.e., attack success rate over 90% on all the datasets), and outperforms two state-of-the-art baselines. Evaluations on three defenses show the robustness of NOTABLE. Our code can be found at https://github.com/RU-System-Software-and-Security/Notable.

翻訳日:2023-05-30 16:28:50 公開日:2023-05-28

# 多項ロジスティック回帰:高次元におけるヌル共変量に対する漸近正規性

Multinomial Logistic Regression: Asymptotic Normality on Null Covariates in High-Dimensions ( http://arxiv.org/abs/2305.17825v1 )

ライセンス: Link先を確認

Kai Tan and Pierre C. Bellec

(参考訳) 本稿では,次元と試料サイズが同じ順序の高次元状態における多相ロジスティックモデルにおける最大形推定(MLE)の漸近分布について検討する。古典的大サンプル理論は、ある条件下で mle の漸近正規性を与えるが、そのような古典的結果は、sul と cand\`es [2019] の独創的著作における二元ロジスティックの場合で文書化された高次元で失敗することが期待される。この問題は、3つ以上のクラスを持つ分類問題において、ヌル共変体上の多項ロジスティックMLE(クロスエントロピー最小化器)に対する漸近正規性および漸近カイ二乗結果を開発することで解決される。私たちの理論は、ある特徴の重要性をテストする新しい方法論につながります。合成データに関する広範囲なシミュレーション研究は、これらの漸近的な結果と、ある特徴の意義をテストするために提案されたp値の有効性を検証している。

This paper investigates the asymptotic distribution of the maximum-likelihood estimate (MLE) in multinomial logistic models in the high-dimensional regime where dimension and sample size are of the same order. While classical large-sample theory provides asymptotic normality of the MLE under certain conditions, such classical results are expected to fail in high-dimensions as documented for the binary logistic case in the seminal work of Sur and Cand\`es [2019]. We address this issue in classification problems with 3 or more classes, by developing asymptotic normality and asymptotic chi-square results for the multinomial logistic MLE (also known as cross-entropy minimizer) on null covariates. Our theory leads to a new methodology to test the significance of a given feature. Extensive simulation studies on synthetic data corroborate these asymptotic results and confirm the validity of proposed p-values for testing the significance of a given feature.

翻訳日:2023-05-30 16:28:28 公開日:2023-05-28

# エッジ検出器のROC解析

Analysis of ROC for Edge Detectors ( http://arxiv.org/abs/2305.17820v1 )

ライセンス: Link先を確認

Kai Yi Ji

(参考訳) 本稿では,BIPEDデータセットを用いた受信機動作特性(ROC)解析によるエッジ検出器の評価を行う。本研究は,この手法をMatlabに適用する際のメリットと欠点について検討する。 ROC分析は特定のエッジフィルタに適しているが,Laplacian,Laplacian of Gaussian,Cannyなどのフィルタでは,ROC測定値を用いてその性能を正確に測定する際の課題が提示される。この問題に対処するために,これらのフィルタの性能を向上させるために,より正確な評価を可能にするカスタマイズ技術を導入する。われわれのカスタマイズ努力により、より良い結果が得られ、最終的にエッジ検出器の包括的な評価が促進された。

This paper presents an evaluation of edge detectors using receiver operating characteristic (ROC) analysis on the BIPED dataset. Our study examines the benefits and drawbacks of applying this technique in Matlab. We observed that while ROC analysis is suitable for certain edge filters, but for filters such as Laplacian, Laplacian of Gaussian, and Canny, it presents challenges when accurately measuring their performance using ROC metrics. To address this issue, we introduce customization techniques to enhance the performance of these filters, enabling more accurate evaluation. Through our customization efforts, we achieved improved results, ultimately facilitating a comprehensive assessment of the edge detectors.

翻訳日:2023-05-30 16:28:09 公開日:2023-05-28

# 大規模言語モデル, 科学的知識, 事実性: 抗生物質発見の体系的分析

Large Language Models, scientific knowledge and factuality: A systematic analysis in antibiotic discovery ( http://arxiv.org/abs/2305.17819v1 )

ライセンス: Link先を確認

Magdalena Wysocka, Oskar Wysocki, Maxime Delmas, Vincent Mutel, Andre Freitas

(参考訳) 大規模言語モデル(LLM)から科学文献の大規模なコーパスに訓練された情報を推測して抽出することは、生体医学研究の新しい時代を招き、既存の医学的証拠にアクセスする障壁を減らせる可能性がある。本研究は,抗生物質発見の文脈をモチベーションシナリオとして,生体医学的背景知識を用いた対話におけるllmの可能性を検討する。天然物からの生物医学的発見の文脈は、生物と関連する化学物質とそれに関連する抗生物質の性質の間の関係的な証拠を理解することを伴う。我々は,これらの関係をエンコードし,表現するllmの能力に関する体系的な評価を行い,フラレンシ,即効性,意味的一貫性,事実的知識,生成した応答の特異性を検証する。化学化合物定義生成と化学化合物-真菌関係決定の2つの課題において, 体系的解析を9つの最先端モデル(chatgptとgpt-4を含む)に適用した。その結果,近年のモデルでは流動性が向上しているが,事実的正確性は依然として低く,過度に表現されたエンティティに偏っていることがわかった。 LLMが生物医学的知識基盤として機能する能力は疑問視され、新たな体系的評価フレームワークの必要性が強調される。最高性能のGPT-4は70%の化合物と43.6%のキノコとの事実関係を、最高のオープンソースモデルであるBioGPTは30%の化合物を、最も優れたプロンプトの30%を生産した。その結果, LLMは, 現在, バイオメディカルな事実知識基盤としての利用には適していないものの, モデルがドメインに特化し, サイズ, フィードバックのレベルが上がるにつれて, 現実性に有望な新規性があることが示唆された。

Inferring over and extracting information from Large Language Models (LLMs) trained on a large corpus of scientific literature can potentially drive a new era in biomedical research, reducing the barriers for accessing existing medical evidence. This work examines the potential of LLMs for dialoguing with biomedical background knowledge, using the context of antibiotic discovery as an exemplar motivational scenario. The context of biomedical discovery from natural products entails understanding the relational evidence between an organism, an associated chemical and its associated antibiotic properties. We provide a systematic assessment on the ability of LLMs to encode and express these relations, verifying for fluency, prompt-alignment, semantic coherence, factual knowledge and specificity of generated responses. The systematic analysis is applied to nine state-of-the-art models (including ChatGPT and GPT-4) in two prompting-based tasks: chemical compound definition generation and chemical compound-fungus relation determination. Results show that while recent models have improved in fluency, factual accuracy is still low and models are biased towards over-represented entities. The ability of LLMs to serve as biomedical knowledge bases is questioned, and the need for additional systematic evaluation frameworks is highlighted. The best performing GPT-4 produced a factual definition for 70% of chemical compounds and 43.6% factual relations to fungi, whereas the best open source model BioGPT-large 30% of the compounds and 30% of the relations for the best-performing prompt. The results show that while LLMs are currently not fit for purpose to be used as biomedical factual knowledge bases, there is a promising emerging property in the direction of factuality as the models become domain specialised, scale-up in size and level of human feedback.

翻訳日:2023-05-30 16:27:50 公開日:2023-05-28

# 限られたトレーニングデータを用いた停電検出タスクの転送学習

Transfer Learning for Power Outage Detection Task with Limited Training Data ( http://arxiv.org/abs/2305.17817v1 )

ライセンス: Link先を確認

Olukunle Owolabi

(参考訳) 停電の早期検出は信頼性の高い配電システムの維持に不可欠である。本研究では,限定ラベルデータによる障害検出におけるトランスファー学習と言語モデルの利用について検討する。事前トレーニングと転送学習を活用することで、モデルは未知のクラスに一般化することができる。停電に関連するソーシャルメディアツイートのバランスの取れたデータセットを用いて,ゼロショット学習と少数ショット学習を用いた実験を行った。私たちの仮説は、限られたデータで事前学習された言語モデルは、ベースラインモデルよりも停止検出タスクにおいて高いパフォーマンスを達成できるというものです。その結果、古典的なモデルはゼロショット言語モデルよりも優れているが、少数ショットの微調整は性能を大幅に改善している。例えば、10%の微調整で、BERTは81.3%(+15.3%)、GPTは74.5%(+8.5%)である。これは、データ可用性に制限のあるシナリオで障害を分析し、ローカライズするために、実用的な意味を持つ。私たちの評価は、停電検出のための言語モデルによる、少数ショットの微調整の可能性に関する洞察を与え、その強みと限界を強調します。本研究は、重要なインフラを管理するために高度な自然言語処理技術を活用するための知識基盤に寄与する。

Early detection of power outages is crucial for maintaining a reliable power distribution system. This research investigates the use of transfer learning and language models in detecting outages with limited labeled data. By leveraging pretraining and transfer learning, models can generalize to unseen classes. Using a curated balanced dataset of social media tweets related to power outages, we conducted experiments using zero-shot and few-shot learning. Our hypothesis is that Language Models pretrained with limited data could achieve high performance in outage detection tasks over baseline models. Results show that while classical models outperform zero-shot Language Models, few-shot fine-tuning significantly improves their performance. For example, with 10% fine-tuning, BERT achieves 81.3% accuracy (+15.3%), and GPT achieves 74.5% accuracy (+8.5%). This has practical implications for analyzing and localizing outages in scenarios with limited data availability. Our evaluation provides insights into the potential of few-shot fine-tuning with Language Models for power outage detection, highlighting their strengths and limitations. This research contributes to the knowledge base of leveraging advanced natural language processing techniques for managing critical infrastructure.

翻訳日:2023-05-30 16:26:49 公開日:2023-05-28

# チェビシェフゲインプロファイルと高飽和度を有するジョセフソンパラメトリック増幅器

Josephson parametric amplifier with Chebyshev gain profile and high saturation ( http://arxiv.org/abs/2305.17816v1 )

ライセンス: Link先を確認

Ryan Kaufman, Theodore White, Mark I. Dykman, Andrea Iorio, George Stirling, Sabrina Hong, Alex Opremcak, Andreas Bengtsson, Lara Faoro, Joseph C. Bardin, Tim Burger, Robert Gasca, Ofer Naaman

(参考訳) 本稿では,3階Chebyshevプロトタイプに基づく帯域通過インピーダンスマッチングネットワークを用いたジョセフソンパラメトリック増幅器の設計を示す。我々は、8個の増幅器を4.6GHzで動作させ、1dB未満の利得と最大500MHzの帯域幅を持つ20dBの利得を示した。増幅器はさらに、rf-SQUIDアレイを非線形素子として使用することにより、-73dBm程度の高出力飽和出力を実現する。我々は,Sycamoreプロセッサを用いて,システム読み出し効率と,その飽和付近の信号-雑音比を特徴付け,増幅器の量子制限ノイズ性能と一致したデータを求める。さらに、入力電力と音間デチューニングの関数として2音実験における増幅器の変調歪みを測定し、信号パワーの関数として顕著なディップで小さなデチューニングで余分な歪みを観測し、電力依存誘電損失の観点から解釈する。

We demonstrate a Josephson parametric amplifier design with a band-pass impedance matching network based on a third-order Chebyshev prototype. We measured eight amplifiers operating at 4.6 GHz that exhibit gains of 20 dB with less than 1 dB gain ripple and up to 500 MHz bandwidth. The amplifiers further achieve high output saturation powers around -73 dBm based on the use of rf-SQUID arrays as their nonlinear element. We characterize the system readout efficiency and its signal-to-noise ratio near saturation using a Sycamore processor, finding the data consistent with near quantum limited noise performance of the amplifiers. In addition, we measure the amplifiers' intermodulation distortion in two-tone experiments as a function of input power and inter-tone detuning, and observe excess distortion at small detuning with a pronounced dip as a function of signal power, which we interpret in terms of power-dependent dielectric losses.

翻訳日:2023-05-30 16:26:20 公開日:2023-05-28

# ナノスケールにおける効率的な量子作業貯水池

Efficient Quantum Work Reservoirs at the Nanoscale ( http://arxiv.org/abs/2305.17815v1 )

ライセンス: Link先を確認

Jinghao Lyu and Alexander B. Boyd and James P. Crutchfield

(参考訳) 資源理論として再編成されると、熱力学は単発のレジームでシステムの挙動を解析できる。この場合、状態遷移を実装するのに必要な作業はα-レニイの発散によって制限されるため、確率的熱力学と比較して効率的な演算の同定が異なる。したがって, 確率的熱力学と資源論的熱力学との差を詳細に理解する必要がある。そこで本研究では,単発システムにおける可逆性について検討し,多段作業貯水池に使用する2段作業貯水池を一般化した。これにより、単発体制におけるあらゆる遷移において可逆性が得られる。そこで我々は,非散逸状態の多層作業貯水池を触媒と無触媒で体系的に探索する。資源理論的な結果から、ランダウアーの制約下にある2段階の作業貯水池は、計算中のエネルギー散逸を誤解を招く。対照的に,マルチレベル作業貯水池はランドウアーの束縛を達成し,エントロピーをゼロにする。

When reformulated as a resource theory, thermodynamics can analyze system behaviors in the single-shot regime. In this, the work required to implement state transitions is bounded by alpha-Renyi divergences and so differs in identifying efficient operations compared to stochastic thermodynamics. Thus, a detailed understanding of the difference between stochastic thermodynamics and resource-theoretic thermodynamics is needed. To this end, we study reversibility in the single-shot regime, generalizing the two-level work reservoirs used there to multi-level work reservoirs. This achieves reversibility in any transition in the single-shot regime. Building on this, we systematically explore multi-level work reservoirs in the nondissipation regime with and without catalysts. The resource-theoretic results show that two-level work reservoirs undershoot Landauer's bound, misleadingly implying energy dissipation during computation. In contrast, we demonstrate that multi-level work reservoirs achieve Landauer's bound and produce zero entropy.

翻訳日:2023-05-30 16:26:03 公開日:2023-05-28

# ナレッジデザイン:ナレッジリファインメントによるタンパク質設計の限界を押し上げる

Knowledge-Design: Pushing the Limit of Protein Design via Knowledge Refinement ( http://arxiv.org/abs/2305.15151v3 )

ライセンス: Link先を確認

Zhangyang Gao, Cheng Tan, Stan Z. Li

(参考訳) 近年の研究では、アミノ酸配列を所望の構造に折りたたむことを目的としたタンパク質設計における競合性が示されている。しかし、その多くは予測信頼の重要性を無視し、広大なタンパク質空間をカバーできず、共通のタンパク質知識を取り入れていない。タンパク質関連タスクにおける事前学習モデルの成功と、リカバリが信頼と非常に相関しているという事実を目撃した後、この知識がタンパク質設計の限界をさらに推し進めるかどうか疑問である。そこで,我々は,低品質残基を洗練する知識認識モジュールを提案する。また、トレーニング時間の50%以上を節約するメモリ検索機構も導入しました。提案手法をCATH, TS50, TS500データセット上で広範囲に評価した結果, 知識設計法は従来のPiFold手法よりも約9倍高い性能を示した。具体的には、知識設計はCATH、TS50、TS500ベンチマークで60%以上のリカバリを達成する最初の方法である。また,提案手法の有効性を示すための追加分析を行った。コードは公開される予定だ。

Recent studies have shown competitive performance in protein design that aims to find the amino acid sequence folding into the desired structure. However, most of them disregard the importance of predictive confidence, fail to cover the vast protein space, and do not incorporate common protein knowledge. After witnessing the great success of pretrained models on diverse protein-related tasks and the fact that recovery is highly correlated with confidence, we wonder whether this knowledge can push the limits of protein design further. As a solution, we propose a knowledge-aware module that refines low-quality residues. We also introduce a memory-retrieval mechanism to save more than 50\% of the training time. We extensively evaluate our proposed method on the CATH, TS50, and TS500 datasets and our results show that our Knowledge-Design method outperforms the previous PiFold method by approximately 9\% on the CATH dataset. Specifically, Knowledge-Design is the first method that achieves 60+\% recovery on CATH, TS50 and TS500 benchmarks. We also provide additional analysis to demonstrate the effectiveness of our proposed method. The code will be publicly available.

翻訳日:2023-05-30 11:18:06 公開日:2023-05-28

# 幾何学的多グラフニューラルネットワークを用いた多状態RNA設計

Multi-State RNA Design with Geometric Multi-Graph Neural Networks ( http://arxiv.org/abs/2305.14749v3 )

ライセンス: Link先を確認

Chaitanya K. Joshi, Arian R. Jamasb, Ramon Vi\~nas, Charles Harris, Simon Mathis, Pietro Li\`o

(参考訳) 計算RNAの設計は、合成生物学や治療開発に広く応用されている。 RNAの多様な生物学的機能の基本はコンフォメーションの柔軟性であり、単一の配列が様々な異なる3D状態を採用することができる。現在、計算的生体分子設計タスクは逆問題として描かれており、配列は1つの望ましい構造的コンフォメーションを採用することに基づいて設計されている。本研究は,3次元RNAのバックボーン構造からなる形状RNA設計パイプラインであるgRNAdeを提案し,その設計におけるRNAコンフォメーションの多様性を明示的に説明・反映する。本稿では,新しい大規模3次元RNA設計データセット,特に多状態および構造的に多様なRNAに対して,単一状態アプローチによるネイティブシークエンスリカバリの改善のためのgRNAdeの有用性を示す。私たちのコードはhttps://github.com/chaitjo/geometric-rna-designで利用可能です。

Computational RNA design has broad applications across synthetic biology and therapeutic development. Fundamental to the diverse biological functions of RNA is its conformational flexibility, enabling single sequences to adopt a variety of distinct 3D states. Currently, computational biomolecule design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired structural conformation. In this work, we propose gRNAde, a geometric RNA design pipeline that operates on sets of 3D RNA backbone structures to explicitly account for and reflect RNA conformational diversity in its designs. We demonstrate the utility of gRNAde for improving native sequence recovery over single-state approaches on a new large-scale 3D RNA design dataset, especially for multi-state and structurally diverse RNAs. Our code is available at https://github.com/chaitjo/geometric-rna-design

翻訳日:2023-05-30 11:17:47 公開日:2023-05-28

# 非常に大きなグラフのための高速オンラインノードラベリング

Fast Online Node Labeling for Very Large Graphs ( http://arxiv.org/abs/2305.16257v2 )

ライセンス: Link先を確認

Baojian Zhou, Yifan Sun, Reza Babanezhad

(参考訳) 本稿では,トランスダクティブ学習環境下でのオンラインノード分類問題について検討する。現在の手法は、$\mathcal{O}(n^3)$ランタイムと$\mathcal{O}(n^2)$空間の複雑さでグラフカーネル行列を反転させるか、ランダムに広がる木を大量にサンプリングする。本研究では,一連の著作(rakhlin et al., 2012, rakhlin and sridharan, 2015; 2017)によって導入された, \textit{online relax} 技法に基づく改善を提案する。まず、適切なパラメータ化されたグラフカーネルが選択されたときに、有効後悔$\mathcal{O}(\sqrt{n^{1+\gamma}})$を証明し、この緩和に基づいて、$\mathcal{O}(k\sqrt{n^{1+\gamma}})を満足する近似アルゴリズムFastONLを提案する。 FastONLの鍵は、逆行列列を効果的に近似し、一連の人気のあるカーネルに適用する \textit{ Generalized local push} メソッドである。さらに、予測コストは$\mathcal{O}(\text{vol}({\mathcal{S}})\log 1/\epsilon)$ である。実験の結果,我々のスケーラブルな手法は,局所的一貫性とグローバル的一貫性のトレードオフを良好に享受できることがわかった。

This paper studies the online node classification problem under a transductive learning setting. Current methods either invert a graph kernel matrix with $\mathcal{O}(n^3)$ runtime and $\mathcal{O}(n^2)$ space complexity or sample a large volume of random spanning trees, thus are difficult to scale to large graphs. In this work, we propose an improvement based on the \textit{online relaxation} technique introduced by a series of works (Rakhlin et al.,2012; Rakhlin and Sridharan, 2015; 2017). We first prove an effective regret $\mathcal{O}(\sqrt{n^{1+\gamma}})$ when suitable parameterized graph kernels are chosen, then propose an approximate algorithm FastONL enjoying $\mathcal{O}(k\sqrt{n^{1+\gamma}})$ regret based on this relaxation. The key of FastONL is a \textit{generalized local push} method that effectively approximates inverse matrix columns and applies to a series of popular kernels. Furthermore, the per-prediction cost is $\mathcal{O}(\text{vol}({\mathcal{S}})\log 1/\epsilon)$ locally dependent on the graph with linear memory cost. Experiments show that our scalable method enjoys a better tradeoff between local and global consistency.

翻訳日:2023-05-30 11:09:39 公開日:2023-05-28

# asrと感情音声 : 音声と感情認識の相互影響に関する単語レベルでの検討

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition ( http://arxiv.org/abs/2305.16065v2 )

ライセンス: Link先を確認

Yuanchao Li, Zeyu Zhao, Ondrej Klejch, Peter Bell, Catherine Lai

(参考訳) 音声感情認識(SER: Speech Emotion Recognition)では、テキストデータは音声信号とともに、その固有の変動に対処するためにしばしば使用される。しかし、ほとんどの研究における注釈付きテキストへの依存は、実用的なSERシステムの開発を妨げる。この課題を克服するために、感情コーパス上でのASRパフォーマンスを分析し、ASR文字中の単語誤りと信頼スコアの分布を調べ、感情がASRにどう影響するかを把握し、感情音声認識(ASR)が感情音声にどのように作用するかを検討する。我々は、Kaldi ASR、wav2vec2、Conformer、Whisperの4つのASRシステムと、IEMOCAP、MOSI、MELDの3つのコーパスを用いて、一般化性を確保する。さらに、テキストベースのSERを単語誤り率を増大させ、ASRがSERに与える影響を調査する。本研究の目的は,情緒的音声へのASR適応と実世界におけるSERの利用を促進するために,ASRとSERの関係と相互影響を明らかにすることである。

In Speech Emotion Recognition (SER), textual data is often used alongside audio signals to address their inherent variability. However, the reliance on human annotated text in most research hinders the development of practical SER systems. To overcome this challenge, we investigate how Automatic Speech Recognition (ASR) performs on emotional speech by analyzing the ASR performance on emotion corpora and examining the distribution of word errors and confidence scores in ASR transcripts to gain insight into how emotion affects ASR. We utilize four ASR systems, namely Kaldi ASR, wav2vec2, Conformer, and Whisper, and three corpora: IEMOCAP, MOSI, and MELD to ensure generalizability. Additionally, we conduct text-based SER on ASR transcripts with increasing word error rates to investigate how ASR affects SER. The objective of this study is to uncover the relationship and mutual impact of ASR and SER, in order to facilitate ASR adaptation to emotional speech and the use of SER in real world.

翻訳日:2023-05-30 11:08:43 公開日:2023-05-28

# 効率良く解釈可能な自己回帰変圧器のための動的コンテキストプルーニング

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers ( http://arxiv.org/abs/2305.15805v2 )

ライセンス: Link先を確認

Sotiris Anagnostidis, Dario Pavllo, Luca Biggio, Lorenzo Noci, Aurelien Lucchi, Thomas Hofmann

(参考訳) 大規模言語モデル(llm)で採用されている自己回帰トランスフォーマーは、長いシーケンスにスケールするのは難しい。計算コストを減らそうとするいくつかの研究にもかかわらず、LLMのほとんどの研究は、シークエンス内の全てのトークン間の注意層を採用しており、2次的なコストが生じる。本研究では,モデル表現性を維持しながら文脈情報を動的にプルーピングする新しい手法を提案する。本手法では,生成プロセスの任意の時点において,どの非形式的トークンをドロップするかを決定する学習可能な機構を用いる。そうすることで、私たちのアプローチはパフォーマンスの懸念に対処するだけでなく、解釈性も向上させ、モデルの意思決定プロセスに対する貴重な洞察を提供します。本手法は, 簡易な微調整プロセスによって既存の事前学習モデルに適用でき, 刈り込み強度をスパーシティパラメータで指定できる。特に,経験的な結果から,下流タスクの大幅なパフォーマンス低下を伴わずに,コンテクストの最大80\%を効果的にプルーピングできることが示され,推論コストの軽減に有用なツールを提供することができた。リファレンス実装では、推論スループットの最大$2\times$向上と、さらにメモリ節約を実現しています。

Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational requirements during inference. Our method employs a learnable mechanism that determines which uninformative tokens can be dropped from the context at any point across the generation process. By doing so, our approach not only addresses performance concerns but also enhances interpretability, providing valuable insight into the model's decision-making process. Our technique can be applied to existing pre-trained models through a straightforward fine-tuning process, and the pruning strength can be specified by a sparsity parameter. Notably, our empirical findings demonstrate that we can effectively prune up to 80\% of the context without significant performance degradation on downstream tasks, offering a valuable tool for mitigating inference costs. Our reference implementation achieves up to $2\times$ increase in inference throughput and even greater memory savings.

翻訳日:2023-05-30 11:06:48 公開日:2023-05-28

PDF登録状況（公開日: 20230528）